instituto de matemáticas universidad de sevilla
Antonio de Castro Brzezicki
Introduction to Bayesian optimization using batch acquisition functions
Some optimization problems do not follow the usual assumptions needed to apply classical optimization techniques. In this talk, we are going to assume the objective function is unknown (it does not have a closed-form) and that its evaluation is expensive (from a computational or cost point of view). In Bayesian optimization, the unknown function is assumed to be a realization of a stochastic process (in our case, a Gaussian process). This is the prior, which represents our belief about the space of possible functions. The optimization method consists on select points to be evaluated, combine the prior with these points and, obtain the posterior distribution updating the Gaussian process. An auxiliary function (called acquisition function) is design to efficiently select the points to be evaluated. Following this methodology, the original optimization problem is transformed into an iterative adaptive process where, the points to be observed are obtained by maximizing the acquisition function. This new problem can be solved easily applying standard optimization techniques. The final aim is to find as fast as possible the optimal value of the unknown function. Different acquisition functions appear in the literature considering a different trade-off between exploration (select points with high uncertainty) and exploitation (select points with high expected value). Applications, where the evaluation of several points can be done in parallel, are very common nowadays. In theses cases, a batch of points, instead of a single point, has to be selected at each step. We are going to focus on batch procedures based on the Upper Confidence Bound function.

For a brief bibliography on the topic see: [1, 2, 3, 4]

References [1] E. Brochu, V. M. Cora, and N. de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning, 2010. [2] E. Contal. Méthodes d’apprentissage statistique pour l’optimisation globale. PhD thesis, Ecole doctorale de mathématiques Hadamard, 2016. [3] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9:235–284, 2008. [4] C. E. Rasmussen and K. I. Williams. Gaussian Processes for Machine Learning. Massachusetts Institute of Technology, 2006.