Taylor approximation is a powerful and general strategy for modeling the behavior of a function within a local neighborhood of inputs. The utility of this strategy, however, can be limited when the output of the target function is constrained to some subset of values. In this case study we'll see how Taylor approximations can be combined with transformations from the constrained output space to an unconstrained output space and back to robustly model the local behavior of constrained functions.

We will begin by examining some of the limitations of directly Taylor approximating constrained functions before demonstrating how these limitations can largely be avoided by removing the constraint before constructing a Taylor approximation and then incorporating it back afterwards. Next we will discuss in detail two common constraints and implement those insights with explicit examples and then finish off with a short discussion of multi-dimensional function approximation.

1 Approximating Functions With Constrained Outputs

1.1 Direct Taylor Approximations

I present Taylor approximation theory in great depth, perhaps even too much depth, in Section 1 of my Taylor modeling case study. To summarize: a Taylor approximation captures the behavior of a real-valued function \(f : X \rightarrow \mathbb{R}\) in a local neighborhood \(U\) around some baseline input \(x_{0}\), \[ f_{I}(x; x_{0}) \approx f(x) \] for \(x \in U\).




More precisely a Taylor approximation uses the differential structure of \(f\) at \(x_{0}\) to inform an \(I\)th-order polynomial function that approximates the exact functional behavior within this local neighborhood. A good approximation can be engineered by building a sufficiently high-order polynomial or restricting the local neighborhood to a narrow interval of inputs around \(x_{0}\).

Because of this local context Taylor approximations can theoretically be applied to any real-valued function, including those whose outputs are confined to some subset of the real line, \(f : X \rightarrow V \subset \mathbb{R}\).




While the polynomial functions that form a Taylor approximation have unconstrained outputs the outputs within a small enough local neighborhood will satisfy the given constraint.




If the neighborhood is too large, however, then evaluations of the Taylor approximation at some inputs will return outputs that violate the constraint.




When the baseline \(x_{0}\) is close to the constraint boundary the constraint can be violated even when the absolute approximation error is small.




In other words enforcing compatibility between a specific Taylor approximation \(f_{I}(x; x_{0})\) and an output constraint restricts the geometry of the local neighborhoods. If we know the local differential structure of \(f\) then we may be able to explicitly work out what inputs will lead to constraint-violating Taylor approximation outputs and then craft appropriate local neighborhoods. When we are inferring that local differential structure, however, establishing local neighborhoods that avoid constraint violations becomes much more difficult.

Constrained functions also have a strong influence on the local error of Taylor approximations. Constrained functions that are smooth need to be highly nonlinear near a constraint boundary in order to contort their outputs away from the boundary and avoid violating the constraint.




Because of this nonlinearity Taylor approximations will behave very differently when the baseline input \(x_{0}\) is close to a constraint boundary and when it is far away. Away from the boundary constrained functions will tend to be more linear which makes Taylor approximations of a fixed order \(I\) more accurate. This then allows the Taylor approximation to be employed over a wider local neighborhood. Near a constraint boundary, however, constrained functions will tend to exhibit stronger nonlinearities which introduce large approximation errors that require smaller local neighborhoods.




Alternatively if we need to fix the local neighborhood then this varying curvature will require carefully tuning the order of the Taylor approximation for each baseline \(x_{0}\).

The strong sensitivity to the baseline input substantially complicates the implementation of Taylor approximations for constrained functions. Unless we need to evaluate the constrained function only far from the constraint boundaries direct Taylor approximation will be at best a fragile approach to modeling the exact functional behavior.

1.2 General Taylor Approximations

If Taylor approximating constrained functions is so difficult then why don't we just eliminate the constraint before building a Taylor approximation in the first place?

Consider a one-to-one function that maps the constrained output space to the entire real line, \[ g : V \rightarrow \mathbb{R}. \] The function \(g\) is referred to as a link function because it "links" the nominal constrained output to an unconstrained output.

Composing the link function with the constrained function of interest defines a completely unconstrained function, \[ \begin{alignat*}{6} g \circ f :\; &X& &\rightarrow& \; &\mathbb{R}& \\ &x& &\mapsto& &g(f(x))&. \end{alignat*} \] Without any output constraints \(g \circ f\) should be must easier to approximate with a Taylor approximation, \[ g \circ f \approx (g \circ f)_{I}(x; x_{0}). \]

That said our model depends on the function \(f\), not the composed function \(g \circ f\). In order to incorporate this Taylor approximation into our model we need to undo the action of the link function. Mathematically we achieve this by applying the inverse link function, \[ \begin{alignat*}{6} g^{-1} :\; &\mathbb{R}& &\rightarrow& \; &V& \\ &w& &\mapsto& &g^{-1}(w)&, \end{alignat*} \] to the unconstrained composition, \[ \begin{align*} f &= \mathbb{I} \circ f \\ &= (g^{-1} \circ g) \circ f \\ &= g^{-1} \circ (g \circ f). \end{align*} \] Because we required that the link function is one-to-one this inverse function will always be well-defined.

Substituting our Taylor approximation for the unconstrained composition \(g \circ f\) then gives a general Taylor approximation \[ \begin{align*} f &= g^{-1} \circ (g \circ f) \\ &\approx g^{-1} \circ (g \circ f)_{I}, \end{align*} \] or for a given input, \[ f(x) \approx g^{-1} \left( (g \circ f)_{I}(x; x_{0}) \right). \] This construction results in a local functional model that always respects the output constraint regardless of the chosen input neighborhood. Moreover, because the link function is one-to-one we don't lose any information going to the constrained space and back.

A well-chosen link function can also have the added benefit of warping the constrained function, allowing us to better resolve all of the rapidly changing behavior near the output boundaries. When the composite function \(g \circ f\) exhibits more uniform curvature the Taylor approximation \((g \circ f)_{I}(x; x_{0})\) will be much less sensitive to the choice of input baseline and hence much easier to wield in practice.




If we need to model functional behavior in only a small neighborhood of inputs, with output values far away from the constraint boundaries, then a Taylor approximation model can be directly applicable. When we need to consider wider ranges of input values, or perhaps more realistically when we don't know what range of inputs values we might need to consider, then a general Taylor approximation becomes a more robust tool.

Mechanically the warping of any polynomial function, with or without the Taylor approximation interpretation, through an inverse link function is often referred to as general linear modeling or generalized linear modeling [1]. The same construction with piece-wise polynomial models is also sometimes referred to as general(ized) additive modeling. That said the use of "linear" and "additive" in this terminology can be confusing for the same reasons discussed in Section 2.3.3 of the Taylor modeling case study.

4 Conclusion

5 Acknowledgements

A very special thanks to everyone supporting me on Patreon: Aapeli Nevala, Abhinav Katoch, Adam Bartonicek, Adam Fleischhacker, Adan, Adriano Yoshino, Alan Chang, Alan O'Donnell, Alessandro Varacca, Alexander Bartik, Alexander Noll, Alexander Rosteck, Anders Valind, Andrea Serafino, Andrew Mascioli, Andrew Rouillard, Andrew Vigotsky, Angie_Hyunji Moon, Ara Winter, asif zubair, Austin Rochford, Austin Rochford, Avraham Adler, Ben Matthews, Ben Swallow, Benoit Essiambre, Bertrand Wilden, Bryan Yu, Brynjolfur Gauti Jónsson, Cameron Smith, Camron, Canaan Breiss, Cat Shark, Charles Naylor, Chase Dwelle, Chris Jones, Chris Zawora, Christopher Mehrvarzi, Christopher Peters, Chuck Carlson, Cole Monnahan, Colin Carroll, Colin McAuliffe, Cruz, Damien Mannion, Damon Bayer, dan mackinlay, Dan Muck, Dan W Joyce, Dan Waxman, Dan Weitzenfeld, Daniel, Daniel Edward Marthaler, Daniel Rowe, Darshan Pandit, Darthmaluus , David Burdelski, David Galley, David Humeau, David Wurtz, dilsher singh dhillon, Doug Rivers, Dr. Jobo, Dr. Omri Har Shemesh, Dylan Murphy, Ed Cashin, Ed Henry, Edgar Merkle, edith darin, Eric LaMotte, Erik Banek, Ero Carrera, Eugene O'Friel, Evan Cater, Fabio Pereira, Fabio Zottele, Felipe González, Fergus Chadwick, Finn Lindgren, Florian Wellmann, Francesco Corona, Geoff Rollins, George Ho, Granville Matheson, Greg Sutcliffe, Guido Biele, Hamed Bastan-Hagh, Haonan Zhu, Harrison Katz, Hector Munoz, Henri Wallen, Hugo Botha, Huib Meulenbelt, Håkan Johansson, Ian , idontgetoutmuch, Ignacio Vera, Ilaria Prosdocimi, Isaac S, Isaac Vock, J, J Michael Burgess, Jair Andrade, James McInerney, James Wade, Janek Berger, Jason Martin, Jason Pekos, Jeff Dotson, Jeff Helzner, Jeff Stulberg, Jeffrey Burnett, Jeffrey Erlich, Jesse Wolfhagen, Jessica Graves, Joe Wagner, John Flournoy, Jon , Jonathan H. Morgan, Jonathan St-onge, Jonathon Vallejo, Joran Jongerling, Joseph Despres, Josh Weinstock, Joshua Duncan, Joshua Griffith, Josué Mendoza, JU, Justin Bois, Karim Naguib, Karim Osman, Keith O'Rourke, Kejia Shi, Kevin Foley, Kristian Gårdhus Wichmann, Kádár András, lizzie , LOU ODETTE, Luiz Pessoa, Marc Dotson, Marc Trunjer Kusk Nielsen, Marcel Lüthi, Marek Kwiatkowski, Mario Becerra, Mark Donoghoe, Mark Worrall, Markus P., Martin Modrák, Matthew, Matthew Hawthorne, Matthew Kay, Matthew T Stern, Matthieu LEROY, Maurits van der Meer, Maxim Kesin, Merlin Noel Heidemanns, Michael Colaresi, Michael DeWitt, Michael Dillon, Michael Lerner, Mick Cooney, Mikael, Mike Lawrence, Márton Vaitkus, N Sanders, Name, Nathaniel Burbank, Nic Fishman, Nicholas Clark, Nicholas Cowie, Nicholas Erskine, Nicholas Ursa, Nick S, Nicolas Frisby, Octavio Medina, Ole Rogeberg, Oliver Crook, Olivier Ma, Pablo León Villagrá, Patrick Kelley, Patrick Boehnke, Pau Pereira Batlle, Peter Smits, Pieter van den Berg , ptr, Putra Manggala, Ramiro Barrantes Reynolds, Ravin Kumar, Raúl Peralta Lozada, Riccardo Fusaroli, Richard Nerland, Robert Frost, Robert Goldman, Robert kohn, Robert Mitchell V, Robin East, Robin Taylor, Rong Lu, Ross McCullough, Ryan Grossman, Ryan Wesslen, Rémi , S Hong, Scott Block, Scott Brown, Sean Pinkney, Sean Wilson, Serena, Seth Axen, shira, Simon Duane, Simon Lilburn, Srivatsa Srinath, sssz, Stan_user, Stefan, Stephanie Fitzgerald, Stephen Lienhard, Steve Bertolani, Stone Chen, Sus, Susan Holmes, Svilup, Sören Berg, Tagir Akhmetshin, Tao Ye, Tate Tunstall, Tatsuo Okubo, Teresa Ortiz, Thiago de Paula Oliveira, Thomas Kealy, Thomas Lees, Thomas Vladeck, Tiago Cabaço, Tim Radtke, Tobychev , Tom Knowles, Tom McEwen, Tomáš Frýda, Tony Wuersch, Virginia Fisher, Vitaly Druker, Vladimir Markov, Wil Yegelwel, Will Farr, Will Kurt, Will Tudor-Evans, woejozney, yolhaj, yureq, Zach A, Zad Rafi, and Zhengchen Cai.

Appendix A: Hyperbolic Geometry

A.1 Generalizing Addition

We cannot add and subtract arbitrary elements of a constrained space without violating the constraint. For example subtracting \(v_{2} = 3\) from \(v_{1} = 2\) gives a negative number which violates a positivity constraint, \[ v_{1} - v_{2} = 2 - 3 = -1 < 0 \] even though both inputs are positive. Similarly adding \(v_{1} = 3/4\) and \(v_{2} = 2/3\) gives a number larger than one which violates a unit interval constraint, \[ v_{1} + v_{2} = \frac{3}{4} + \frac{2}{3} = \frac{17}{12} > 1 \] even though both inputs are within the unit interval.

When the output of an operation is not contained in the same space as the inputs we say that the space is not closed under that operation. Consequently we can say that positive real line is not closed under addition and subtraction, nor is the unit interval.

We might then ask whether we can construct some generalized notion of addition that does respect given constraints. In other words can we define a binary operation \[ \begin{alignat*}{6} \star :\; &V \times V& &\rightarrow& \; &V& \\ &v_{1}, v_{2}& &\mapsto& &v_{1} \star v_{2}&? \end{alignat*} \]

One way to construct such an operation is to apply an inverse link function to the two inputs, sum the resulting unconstrained values using the usual addition operation, and then apply the constraining link function. In other words given the link function \[ g : V \rightarrow \mathbb{R} \] and the corresponding inverse link function \[ g^{-1} : \mathbb{R} \rightarrow V \] we can define the binary operation \[ \begin{alignat*}{6} \star :\; &V \times V& &\rightarrow& \; &V& \\ &v_{1}, v_{2}& &\mapsto& &g^{-1}( g(v_{1}) + g(v_{2}) )&. \end{alignat*} \]

For example in Section 2.1 we introduced the link function \(g(v) = \log(v)\) and the inverse link function \(g^{-1}(w) = \exp(w)\). This pair defines the binary operation \[ \begin{align*} v_{1} \star v_{2} &= \exp \left( \log(v_{1}) + \log(v_{2}) \right) \\ &= \exp \left( \log(v_{1} \cdot v_{2}) \right) \\ &= v_{1} \cdot v_{2}. \end{align*} \] In this case the generalized addition operator is just multiplication! This explains why proportions are so natural when using the log link function and its inverse.

A.2 Hyperbolic Addition

If \(V\) is a bounded interval, \(V = [a, b]\) then a natural link function can be constructed from the hyperbolic arctangent function, \[ \begin{alignat*}{6} g :\; &[a, b]& &\rightarrow& \; &\mathbb{R}& \\ &v& &\mapsto& &w = \alpha \, \mathrm{arctanh} \! \left( \frac{ 2 \, v }{b - a} - \frac{b + a}{b - a} \right)&, \end{alignat*} \] for any positive scaling coefficient \(\alpha \in \mathbb{R}^{+}\). The corresponding inverse link function is given by the hyperbolic tangent function \[ \begin{alignat*}{6} g^{-1} :\; &\mathbb{R}& &\rightarrow& \; &[a, b]& \\ &w& &\mapsto& &v = \frac{b - a}{2} \, \left( \tanh \left( \frac{w}{\alpha} \right) + \frac{b + a}{b - a} \right)&. \end{alignat*} \]

For safety we can explicitly verify that these transformations are indeed inverses: \[ \begin{align*} g\circ g^{-1}(w) &= \alpha \, \mathrm{arctanh} \! \left( \frac{ 2 }{b - a} \, g^{-1}(w) - \frac{b + a}{b - a} \right) \\ &= \alpha \, \mathrm{arctanh} \! \left( \frac{ 2 }{b - a} \left( \frac{b - a}{2} \, \left( \tanh \left( \frac{w}{\alpha} \right) + \frac{b + a}{b - a} \right) \right) - \frac{b + a}{b - a} \right) \\ &= \alpha \, \mathrm{arctanh} \! \left( \tanh \left( \frac{w}{\alpha} \right) + \frac{b + a}{b - a} - \frac{b + a}{b - a} \right) \\ &= \alpha \, \mathrm{arctanh} \! \left( \tanh \left( \frac{w}{\alpha} \right) \right) \\ &= \alpha \, \frac{w}{\alpha} \\ &= w. \end{align*} \]

For this choice of link function the generalized addition operator becomes \[ \begin{align*} v_{1} \star v_{2} &= g^{-1}( g(v_{1}) + g(v_{2}) ) \\ &= \frac{b - a}{2} \, \left( \tanh \left( \frac{g(v_{1})}{\alpha} + \frac{g(v_{2})}{\alpha} \right) + \frac{b + a}{b - a} \right). \end{align*} \] Before substituting the link function we can use the addition formula for the hyperbolic tangent function to simplify, \[ \begin{align*} v_{1} \star v_{2} &= \frac{b - a}{2} \, \left( \tanh \left( \frac{g(v_{1})}{\alpha} + \frac{g(v_{2})}{\alpha} \right) + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ \tanh \left( \frac{g(v_{1})}{\alpha} \right) + \tanh \left( \frac{g(v_{2})}{\alpha} \right) } { 1 + \tanh \left( \frac{g(v_{1})}{\alpha} \right) \, \tanh \left( \frac{g(v_{2})}{\alpha} \right) } + \frac{b + a}{b - a} \right). \end{align*} \]

At this point we can substitute the link function into each term, \[ \begin{align*} \tanh \left( \frac{g(v)}{\alpha} \right) &= \tanh \left( \frac{1}{\alpha} \alpha \, \mathrm{arctanh} \! \left( \frac{ 2 \, v }{b - a} - \frac{b + a}{b - a} \right) \right) \\ &= \tanh \left( \mathrm{arctanh} \! \left( \frac{ 2 \, v }{b - a} - \frac{b + a}{b - a} \right) \right) \\ &= \frac{ 2 \, v }{b - a} - \frac{b + a}{b - a}, \end{align*} \] which gives \[ \begin{align*} v_{1} \star v_{2} &= \frac{b - a}{2} \, \left( \frac{ \tanh \left( \frac{g(v_{1})}{\alpha} \right) + \tanh \left( \frac{g(v_{2})}{\alpha} \right) } { 1 + \tanh \left( \frac{g(v_{1})}{\alpha} \right) \, \tanh \left( \frac{g(v_{2})}{\alpha} \right) } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ \frac{ 2 \, v_{1} }{b - a} - \frac{b + a}{b - a} + \frac{ 2 \, v_{2} }{b - a} - \frac{b + a}{b - a} } { 1 + \left( \frac{ 2 \, v_{1} }{b - a} - \frac{b + a}{b - a} \right) \left( \frac{ 2 \, v_{2} }{b - a} - \frac{b + a}{b - a} \right)} + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ \frac{ 2 \, v_{1} + 2 \, v_{2} - 2 (b + a) }{b - a} } { 1 + \frac{ 4 \, v_{1} \, v_{2} - 2 \, (b + a) \, (v_{1} + v_{2}) + (b + a)^{2} }{(b - a)^{2}} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ \frac{ 2 \, v_{1} + 2 \, v_{2} - 2 (b + a) }{b - a} } { \frac{ 4 \, v_{1} \, v_{2} - 2 \, (b + a) \, (v_{1} + v_{2}) + (b - a)^{2} + (b + a)^{2} }{(b - a)^{2}} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ \frac{ 2 \, v_{1} + 2 \, v_{2} - 2 (b + a) }{b - a} } { \frac{ 4 \, v_{1} \, v_{2} - 2 \, (b + a) \, (v_{1} + v_{2}) + (b - a)^{2} + (b + a)^{2} }{(b - a)^{2}} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ \frac{ 2 \, v_{1} + 2 \, v_{2} - 2 (b + a) }{b - a} } { \frac{ 4 \, v_{1} \, v_{2} - 2 \, (b + a) \, (v_{1} + v_{2}) + 2 \, a^{2} + 2 \, b^{2} }{(b - a)^{2}} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v_{1} + v_{2} - (b + a) ) } { 2 \, v_{1} \, v_{2} - (b + a) \, (v_{1} + v_{2}) + a^{2} + b^{2} } + \frac{b + a}{b - a} \right). \end{align*} \]

This operator has a few interesting properties. Firstly because this formula is symmetric in \(v_{1}\) and \(v_{2}\) the generalized addition operator will also be symmetric, \[ v_{1} \star v_{2} = v_{2} \star v_{1}. \]

Additionally one input at the lower boundary will result in a lower boundary output for any other interior input, \[ v \star a = a \star v = a \] for any \(v \ne b\). We can verify this by direct calculation, \[ \begin{align*} v \star a &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v + a - (b + a) ) } { 2 \, v \, a - (b + a) \, (v + a) + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - b ) } { 2 \, v \, a - b \, v - a \, b - a \, v - a^{2} + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - b ) } { v \, a - b \, v - a \, b + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - b ) } { -v \, (b - a) + b \, (b - a) } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - b ) } { (b - v) \, (b - a) } + \frac{b + a}{b - a} \right). \end{align*} \] So long as \(v - b \ne 0\) we can cancel the common terms to give \[ \begin{align*} v \star a &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - b ) } { (b - v) \, (b - a) } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( -1 + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{b - a + b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{-b + a + b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{2 \, a}{b - a} \right) \\ &= a. \end{align*} \]

Similarly an input at the upper boundary saturates the output that the upper boundary, \[ v \star b = b \star v = b \] for any \(v \ne a\). Once again we can verify by direct calculation, \[ \begin{align*} v \star b &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v + b - (b + a) ) } { 2 \, v \, b - (b + a) \, (v + b) + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - a) ) } { 2 \, v \, b - v \, b - b^{2} - a \, v - a \, b + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - a ) ) } { v \, b - v \, a - a \, b + a^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - a) ) } { v \, (b - a) - a (b - a) } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - a) ) } { (v - a) \, (b - a) } + \frac{b + a}{b - a} \right) \end{align*} \] If \(v - a \ne 0\) then we can cancel the common terms to give \[ \begin{align*} v \star b &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - a) ) } { (v - a) \, (b - a) } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( 1 + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{b - a + b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{2 \, b}{b - a} \right) \\ &= b. \end{align*} \]

Finally the identify element for this operator is given not by \(0\) as we might expect but rather by the center of the interval, \[ v \star \frac{b + a}{2} = \frac{b + a}{2} \star v = v. \] To verify we grind, \[ \begin{align*} v \star \frac{b + a}{2} &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v + \frac{b + a}{2} - (b + a) ) } { 2 \, v \, \frac{b + a}{2} - (b + a) \, (v + \frac{b + a}{2}) + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - \frac{b + a}{2} ) } { v \, (b + a) - v \, (b + a) - \frac{(b + a)^{2}}{2} + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - \frac{b + a}{2} ) } { - \frac{1}{2} b^{2} - b \, a - \frac{1}{2} a^{2} + a^{2} + b^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - \frac{b + a}{2} ) } { \frac{1}{2} b^{2} - b \, a + \frac{1}{2} a^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ (b - a) \, ( v - \frac{b + a}{2} ) } { \frac{1}{2} (b - a)^{2} } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ 2 ( v - \frac{b + a}{2} ) }{ b - a } + \frac{b + a}{b - a} \right) \\ &= \frac{b - a}{2} \, \left( \frac{ 2 \, v - (b + a) + (b + a) }{ b - a } \right) \\ &= \frac{b - a}{2} \, \left( \frac{ 2 \, v }{ b - a } \right) \\ &= v. \end{align*} \]

This interval-constrained algebra also defines what is known as a hyperbolic geometry. To see why let's consider the hyperbola \[ -\frac{4 \, a \, b}{(b - a)^{2} } v^{2} + \frac{4 (b^{2} - a^{2}}{(b - a)^{3}} \, v \, y - \frac{4}{(b - a)^{2}} y^{2} = 1 \] with the positive \(v\) solutions \[ z = \frac{b + a}{2} \, v \mp \frac{b - a}{2} \sqrt{v^{2} - 1}. \]




Notice that as \(v \rightarrow +\infty\) this hyperbola is bounded below by a line with slope \(a\) and above by a line with slope \(b\), \[ \begin{align*} \lim_{v \rightarrow +\infty} \frac{y}{v} &= \lim_{v \rightarrow +\infty} \frac{b + a}{2} \mp \frac{b - a}{2} \sqrt{1 - \frac{1}{v^{2}} } \\ &= \frac{b + a}{2} \mp \frac{b - a}{2} \\ &= \frac{(b \mp b) + (a \pm a)}{2} \end{align*} \] which equals \(a\) for the first branch and \(b\) for the second branch.




We can parameterize this hyperbola with a hyperbolic angle \(w \in \mathbb{R}\). The coordinates of the point along this hyperbola given by the angle \(w\) are given by \[ \begin{align*} z_{1} &= \frac{b - a}{2} \sinh \left( \frac{v}{\alpha} \right) + \frac{b + a}{2} \cosh \left( \frac{v}{\alpha} \right) \\ z_{2} &= \cosh \left( \frac{v}{\alpha} \right) \end{align*} \] for any positive scaling coefficient \(\alpha \in \mathbb{R}^{+}\).




A ray connects the origin to the point \((z_{1}(v), z_{2}(v)\), and the ray slope is the slope of this line. For a given parameterization the ray slope becomes \[ \begin{align*} m(v) &= \frac{z_{2}(v)}{z_{1}(v)} \\ &= \frac{ \frac{b - a}{2} \sinh \left( \frac{v}{\alpha} \right) + \frac{b + a}{2} \cosh \left( \frac{v}{\alpha} \right) } { \cosh \left( \frac{v}{\alpha} \right) } \\ &= \frac{b - a}{2} \tanh \left( \frac{v}{\alpha} \right) + \frac{b + a}{2} \\ &= \frac{b - a}{2} \left( \tanh \left( \frac{v}{\alpha} \right) + \frac{b + a}{b - a} \right). \end{align*} \] This, however, is exactly the link function that we introduced above! Indeed the geometry of this specific hyperbola confines the ray slopes to the interval \([a, b]\), which is the same constraint enforced by the inverse link function.

In other words we can interpret the unconstrained image of the link function \(g([a, b])\) as a parameterization of this particular hyperbola. Likewise the domain \([a, b]\) can be interpreted as the collection of ray slopes. From this perspective the generalized addition operator defines the subtle way in which translations along the hyperbola transform the ray slopes. Because of this \(\star\) is sometimes referred to as defining hyperbolic addition.




Compare this construction to that of circular geometry. An angle \(\theta\) parameterizes the circle and the slope of the line that connects the central origin to a point on the circle defines the tangent \(\tan \theta\). Adding angles together generates rotations along the circle with the tangents transforming as \[ \tan(\theta_{1} + \theta_{2}) = \frac{ \tan \theta_{1} + \tan \theta_{2} } { 1 - \tan \theta_{1} \, \tan \theta_{2} }. \] The key difference is that a circular angle is confined to a closed circle while a hyperbolic angle is completely unconstrained.

A.3 Relativistic Probabilities

When working with probabilities \(p \in [0, 1]\) we have \(a = 0\) and \(b = 1\). In this case the link function reduces to \[ \begin{align*} l &= g(p) \\ &= \alpha \, \mathrm{arctanh} \! \left(2 \, p - 1 \right) \\ &= \frac{\alpha}{2} \log \frac{ 1 + 2 \, p - 1 }{ 1 - 2 \, p + 1 } \\ &= \frac{\alpha}{2} \log \frac{ 2 \, p }{ 2 - 2 \, p } \\ &= \frac{\alpha}{2} \log \frac{ p }{ 1 - \, p }. \end{align*} \] Taking \(\alpha = 2\) recovers the logit function that we introduced in Section 2.2.

Similarly the inverse link function becomes \[ \begin{align*} g^{-1} (l) &= \frac{1}{2} \, \left( \tanh \left( \frac{l}{2} \right) + 1 \right) \\ &= \frac{1}{2} \, \left( \frac{ \exp \left( \frac{l}{2} \right) - \exp \left( -\frac{l}{2} \right)} {\exp \left( \frac{l}{2} \right) + \exp\left(- \frac{l}{2} \right)} + 1 \right) \\ &= \frac{1}{2} \, \left( \frac{2 \, \exp \left( \frac{l}{2} \right)} {\exp \left( \frac{l}{2} \right) + \exp \left( -\frac{l}{2} \right)} \right) \\ &= \frac{\exp \left( \frac{l}{2} \right)} {\exp \left( \frac{l}{2} \right) + \exp \left( -\frac{l}{2} \right)} \\ &= \frac{1} {1 + \exp \left( -2 \, \frac{l}{2} \right)} \\ &= \frac{1} {1 + \exp \left( -l \right)}, \end{align*} \] which is the logistic function, as expected.

The action of the hyperbolic addition operation becomes \[ \begin{align*} p_{1} \star p_{2} &= \frac{1}{2} \, \left( \frac{ p_{1} + p_{2} - 1 } { 2 \, p_{1} \, p_{2} - (p_{1} + p_{2}) + 1 } + 1 \right) \\ &= \frac{1}{2} \, \left( \frac{ p_{1} + p_{2} - 1 + 2 \, p_{1} \, p_{2} - (p_{1} + p_{2}) + 1 } { 2 \, p_{1} \, p_{2} - (p_{1} + p_{2}) + 1 } \right) \\ &= \frac{1}{2} \, \left( \frac{ 2 \, p_{1} \, p_{2} } { 2 \, p_{1} \, p_{2} - (p_{1} + p_{2}) + 1 } \right) \\ &= \frac{ p_{1} \, p_{2} } { 2 \, p_{1} \, p_{2} - (p_{1} + p_{2}) + 1 }. \end{align*} \] Familiarity with this somewhat obscure hyperbolic addition operator gives us another way to translate domain expertise through the log link function besides working with intermediate odds.

That said with some clever manipulations the denominator can be simplified to \[ \begin{align*} 2 \, p_{1} \, p_{2} - (p_{1} + p_{2}) + 1 &= p_{1} \, p_{2} - p_{1} + p_{1} \, p_{2} - p_{2} + 1 \\ &= p_{1} \, ( p_{2} - 1) + (p_{2} - 1) + p_{1} \, p_{2} \\ &= (p_{1} - 1) \, (p_{2} - 1) + p_{1} \, p_{2}. \end{align*} \] In this case if neither \(p_{1}\) nor \(p_{2}\) are at either boundary then we can write \[ \begin{align*} p_{1} \star p_{2} &= \frac{ p_{1} \, p_{2} } { (p_{1} - 1) \, (p_{2} - 1) + p_{1} \, p_{2} } \\ &= \frac{ \frac{p_{1}}{ p_{1} - 1} \, \frac{p_{2}} { p_{2} - 1 } } { 1 + \frac{p_{1}}{ p_{1} - 1} \, \frac{p_{2}} { p_{2} - 1 } } \\ &= \frac{ o_{1}(p_{1}) \, o_{2}(p_{2}) } { 1 + o_{1}(p_{1}) \, o_{2}(p_{2}) }. \end{align*} \] The summation formula becomes quite natural when written in terms of the odds! Even if we work with the hyperbolic addition operator familiarity with the odds will still be helpful.

Hyperbolic addition also shows up in the theory of special relativity where velocities are bounded by the speed of light, \(v \in [-c, c]\). Here \[ b = -a = c \] and the link function reduces to \[ \begin{align*} \eta &= g(v) \\ &= \alpha \, \mathrm{arctanh} \! \left( \frac{v}{c} \right) \\ &= \alpha \, \frac{1}{2} \log \frac{ 1 + \frac{v}{c} }{ 1 - \frac{v}{c} } \\ &= \alpha \, \log \sqrt{ \frac{ 1 + \frac{v}{c} }{ 1 - \frac{v}{c} } }. \end{align*} \] When \(\alpha = 1\) this unconstrained transformation of the velocity is known as a rapidity [3].

The corresponding inverse link function reduces to \[ \begin{align*} g^{-1} (\eta) &= c \, \tanh(\eta) \\ &= c \, \frac{e^{\eta} - e^{-\eta}}{e^{\eta} + e^{-\eta}}. \end{align*} \]

In this case the action of the hyperbolic addition operator becomes \[ \begin{align*} v_{1} \star v_{2} &= c \, \left( \frac{ 2 \, c \, ( v_{1} + v_{2} ) } { 2 \, v_{1} \, v_{2} + 2 \, c^{2} } \right) \\ &= \frac{ 2 \, c^{2} \, ( v_{1} + v_{2} ) } { 2 \, v_{1} \, v_{2} + 2 \, c^{2} } \\ &= \frac{v_{1} + v_{2} } { \frac{v_{1}}{c} \, \frac{v_{2}}{c} + 1 }, \end{align*} \] which is exactly the formula for adding relativistic velocities!

Because probabilities and relativistic velocities can both be be described by hyperbolic geometries they can be related to each other in a variety of different ways. For example recall that by definition \[ \begin{align*} \frac{p}{1 - p} &= \\ p &= o \, (1 - p) \end{align*} \] or, in expectation, \[ \begin{align*} p &= o \, (1 - p) \\ \frac{ \left< N_{\text{success}} \right> }{ N } &= o \, \frac{ \left< N_{\text{failures}} \right> }{ N } \\ \left< N_{\text{success}} \right> &= o \, \left< N_{\text{failures}} \right> \end{align*} \] In other words the odds is the slope between the expected number of failures and the expected number of successes.

On the relativistic side consider a particle that starts at position \(0\) at time \(0\) and moves to position \(x\) by time \(t\) so that the velocity is given by \(v = x / t\). Light-cone coordinates [4] mix space and time, \[ \begin{align*} x^{+} &= \frac{c \, t + x}{\sqrt{2}} \\ x^{-} &= \frac{c \, t - x}{\sqrt{2}} \end{align*} \] so that \[ \begin{align*} \frac{ x^{+} }{ x^{-} } &= \frac{c \, t + x}{\sqrt{2}} \, \frac{\sqrt{2}}{c \, t - x} \\ &= \frac{c \, t + x}{c \, t - x} \\ &= \frac{1 + \frac{x}{c \, t} }{t - \frac{x}{c \, t} } \\ &= \frac{1 + \frac{v}{c} }{1 - \frac{v}{c} } \\ &= \exp \left( \log \left( \frac{1 + \frac{v}{c} }{1 - \frac{v}{c} } \right) \right) \\ &= \exp \left( 2 \log \sqrt{ \frac{1 + \frac{v}{c} }{1 - \frac{v}{c} } } \right) \\ &= \exp \left( 2 \, \eta \right), \end{align*} \] or \[ x^{+} = \exp \left( 2 \, \eta \right) \, x^{-}. \] In other words the exponentiated rapidity is the slope between the first light cone coordinate and the second light cone coordinate.

Plotting lines in these two spaces allows us to directly compare odds to exponentiated rapidities, logit probabilities to rapidities, and probabilities to velocities.




All of this is to say that if one deeply understands logits and logistics then they will already have a pretty strong grasp on the theory of special relativity, even if only implicitly. Similarly a thorough understanding of special relativity is a strong preparation for understanding logit probabilities. Alternatively this connection makes it clear that understanding logit probabilities is as subtle and complex as understanding special relativity, which may help set the proper expectations when approaching this material!

References

[1] McCullagh, P. and Nelder, J. (1989). Generalized linear models. Chapman; Hall/CRC.

[2] Berkson, J. (1944). Application of the logistic function to bio-assay. Journal of the American Statistical Association 39 357–65.

[3] Taylor, E. F. and Wheeler, J. A. (1965). Spacetime physics. W. H. Freeman; Company, San Francisco.

[4] Zwiebach, B. (2009). A first course in string theory. Cambridge University Press.

License

The code in this case study is copyrighted by Michael Betancourt and licensed under the new BSD (3-clause) license:

https://opensource.org/licenses/BSD-3-Clause

The text and figures in this case study are copyrighted by Michael Betancourt and licensed under the CC BY-NC 4.0 license:

https://creativecommons.org/licenses/by-nc/4.0/

Original Computing Environment

writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))
CC=clang

CXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function -Wno-macro-redefined -Wno-unneeded-internal-declaration
CXX=clang++ -arch x86_64 -ftemplate-depth-256

CXX14FLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function -Wno-macro-redefined -Wno-unneeded-internal-declaration -Wno-unknown-pragmas
CXX14=clang++ -arch x86_64 -ftemplate-depth-256
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.19.3         ggplot2_3.3.1        StanHeaders_2.21.0-3
[4] colormap_0.1.4      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6       compiler_4.0.2     pillar_1.4.4       prettyunits_1.1.1 
 [5] tools_4.0.2        pkgbuild_1.0.8     digest_0.6.25      jsonlite_1.6.1    
 [9] evaluate_0.14      lifecycle_0.2.0    tibble_3.0.1       gtable_0.3.0      
[13] pkgconfig_2.0.3    rlang_0.4.6        cli_2.0.2          parallel_4.0.2    
[17] curl_4.3           yaml_2.2.1         xfun_0.16          loo_2.2.0         
[21] gridExtra_2.3      withr_2.2.0        stringr_1.4.0      dplyr_1.0.0       
[25] knitr_1.29         generics_0.0.2     vctrs_0.3.0        stats4_4.0.2      
[29] grid_4.0.2         tidyselect_1.1.0   inline_0.3.15      glue_1.4.1        
[33] R6_2.4.1           processx_3.4.2     fansi_0.4.1        rmarkdown_2.2     
[37] callr_3.4.3        purrr_0.3.4        magrittr_1.5       codetools_0.2-16  
[41] matrixStats_0.56.0 ps_1.3.3           scales_1.1.1       htmltools_0.4.0   
[45] ellipsis_0.3.1     assertthat_0.2.1   colorspace_1.4-1   V8_3.2.0          
[49] stringi_1.4.6      munsell_0.5.0      crayon_1.3.4