Taylor approximation is a powerful and general strategy for modeling the behavior of a function within a local neighborhood of inputs. The utility of this strategy, however, can be limited when the output of the target function is constrained to some subset of values. In this case study we'll see how Taylor approximations can be combined with transformations from the constrained output space to an unconstrained output space and back to robustly model the local behavior of constrained functions.

We will begin by examining some of the limitations of directly Taylor approximating constrained functions before demonstrating how these limitations can largely be avoided by removing the constraint before constructing a Taylor approximation and then incorporating it back afterwards. Next we will discuss in detail two common constraints and implement those insights with explicit examples and then finish off with a short discussion of multi-dimensional function approximation.

1 Approximating Functions With Constrained Outputs

1.1 Direct Taylor Approximations

I present Taylor approximation theory in great depth, perhaps even too much depth, in Section 1 of my Taylor modeling case study. To summarize: a Taylor approximation captures the behavior of a real-valued function \(f : X \rightarrow \mathbb{R}\) in a local neighborhood \(U\) around some baseline input \(x_{0}\), \[ f_{I}(x; x_{0}) \approx f(x) \] for \(x \in U\).




More precisely a Taylor approximation uses the differential structure of \(f\) at \(x_{0}\) to inform an \(I\)th-order polynomial function that approximates the exact functional behavior within this local neighborhood. A good approximation can be engineered by building a sufficiently high-order polynomial or restricting the local neighborhood to a narrow interval of inputs around \(x_{0}\).

Because of this local context Taylor approximations can theoretically be applied to any real-valued function, including those whose outputs are confined to some subset of the real line, \(f : X \rightarrow V \subset \mathbb{R}\).




While the polynomial functions that form a Taylor approximation have unconstrained outputs the outputs within a small enough local neighborhood will satisfy the given constraint.




If the neighborhood is too large, however, then evaluations of the Taylor approximation at some inputs will return outputs that violate the constraint.




When the baseline \(x_{0}\) is close to the constraint boundary the constraint can be violated even when the absolute approximation error is small.




In other words enforcing compatibility between a specific Taylor approximation \(f_{I}(x; x_{0})\) and an output constraint restricts the geometry of the local neighborhoods. If we know the local differential structure of \(f\) then we may be able to explicitly work out what inputs will lead to constraint-violating Taylor approximation outputs and then craft appropriate local neighborhoods. When we are inferring that local differential structure, however, establishing local neighborhoods that avoid constraint violations becomes much more difficult.

Constrained functions also have a strong influence on the local error of Taylor approximations. Constrained functions that are smooth need to be highly nonlinear near a constraint boundary in order to contort their outputs away from the boundary and avoid violating the constraint.




Because of this nonlinearity Taylor approximations will behave very differently when the baseline input \(x_{0}\) is close to a constraint boundary and when it is far away. Away from the boundary constrained functions will tend to be more linear which makes Taylor approximations of a fixed order \(I\) more accurate. This then allows the Taylor approximation to be employed over a wider local neighborhood. Near a constraint boundary, however, constrained functions will tend to exhibit stronger nonlinearities which introduce large approximation errors that require smaller local neighborhoods.




Alternatively if we need to fix the local neighborhood then this varying curvature will require carefully tuning the order of the Taylor approximation for each baseline \(x_{0}\).

The strong sensitivity to the baseline input substantially complicates the implementation of Taylor approximations for constrained functions. Unless we need to evaluate the constrained function only far from the constraint boundaries direct Taylor approximation will be at best a fragile approach to modeling the exact functional behavior.

1.2 General Taylor Approximations

If Taylor approximating constrained functions is so difficult then why don't we just eliminate the constraint before building a Taylor approximation in the first place?

Consider a one-to-one function that maps the constrained output space to the entire real line, \[ g : V \rightarrow \mathbb{R}. \] The function \(g\) is referred to as a link function because it "links" the nominal constrained output to an unconstrained output.

Composing the link function with the constrained function of interest defines a completely unconstrained function, \[ \begin{alignat*}{6} g \circ f :\; &X& &\rightarrow& \; &\mathbb{R}& \\ &x& &\mapsto& &g(f(x))&. \end{alignat*} \] Without any output constraints \(g \circ f\) should be must easier to approximate with a Taylor approximation, \[ g \circ f \approx (g \circ f)_{I}(x; x_{0}). \]

That said our model depends on the function \(f\), not the composed function \(g \circ f\). In order to incorporate this Taylor approximation into our model we need to undo the action of the link function. Mathematically we achieve this by applying the inverse link function, \[ \begin{alignat*}{6} g^{-1} :\; &\mathbb{R}& &\rightarrow& \; &V& \\ &w& &\mapsto& &g^{-1}(w)&, \end{alignat*} \] to the unconstrained composition, \[ \begin{align*} f &= \mathbb{I} \circ f \\ &= (g^{-1} \circ g) \circ f \\ &= g^{-1} \circ (g \circ f). \end{align*} \] Because we required that the link function is one-to-one this inverse function will always be well-defined.

Substituting our Taylor approximation for the unconstrained composition \(g \circ f\) then gives a general Taylor approximation \[ \begin{align*} f &= g^{-1} \circ (g \circ f) \\ &\approx g^{-1} \circ (g \circ f)_{I}, \end{align*} \] or for a given input, \[ f(x) \approx g^{-1} \left( (g \circ f)_{I}(x; x_{0}) \right). \] This construction results in a local functional model that always respects the output constraint regardless of the chosen input neighborhood. Moreover, because the link function is one-to-one we don't lose any information going to the constrained space and back.

A well-chosen link function can also have the added benefit of warping the constrained function, allowing us to better resolve all of the rapidly changing behavior near the output boundaries. When the composite function \(g \circ f\) exhibits more uniform curvature the Taylor approximation \((g \circ f)_{I}(x; x_{0})\) will be much less sensitive to the choice of input baseline and hence much easier to wield in practice.




If we need to model functional behavior in only a small neighborhood of inputs, with output values far away from the constraint boundaries, then a Taylor approximation model can be directly applicable. When we need to consider wider ranges of input values, or perhaps more realistically when we don't know what range of inputs values we might need to consider, then a general Taylor approximation becomes a more robust tool.

Mechanically the warping of any polynomial function, with or without the Taylor approximation interpretation, through an inverse link function is often referred to as general linear modeling or generalized linear modeling [1]. The same construction with piece-wise polynomial models is also sometimes referred to as general(ized) additive modeling. That said the use of "linear" and "additive" in this terminology can be confusing for the same reasons discussed in Section 2.3.3 of the Taylor modeling case study.