Mixed Procedures For Stochastic Optimization - WODES 2016
This is part of ongoing work with Professor Felisa Vázquez-Abad of Hunter College. This is the preliminary draft submitted for the WODES 2016 conference. The particularly tricky bit this time around was our decision to use a less trivial example scenario: instead of just a deterministic function plus a normally distributed noise term, we chose to construct a more complicated stochastic function. Crucially, since we wanted to speak to the usefulness of gradient estimation, we wanted a function where IPA derivative estimation could be used.
Put another way, let be some random variable controlled by the parameter and let be some real-valued measurable mapping where is defined for any . How do we define ?
One way to think about it is by a kind of stretchy analogy with weak derivatives: we are looking for a measurable mapping such that . Put this way, if we know that the random variable is differentiable and that swapping expectation and differentiation is permitted (ex. through some bounded convergence theorem-like result) then we have an expression for ; indeed:
where is just the usual derivative of . The quantity is called the sample path derivative of and this approach for estimating the derivative of an expected value is called infinitesimal perturbation analysis (IPA).
In the paper, we needed a function that not only supported IPA but still had a relatively small curvature so that the non-gradient method we were discussing would converge in reasonable time.
Next: Paper Highlight: The Economic Organisation of a P.O.W. Camp