Bayesian optimization is an effective methodology for the global optimizationof functions with expensive evaluations. It relies on querying a distributionover functions defined by a relatively cheap surrogate model. An accurate modelfor this distribution over functions is critical to the effectiveness of theapproach, and is typically fit using Gaussian processes (GPs). However, sinceGPs scale cubically with the number of observations, it has been challenging tohandle objectives whose optimization requires many evaluations, and as such,massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPsto model distributions over functions. We show that performing adaptive basisfunction regression with a neural network as the parametric form performscompetitively with state-of-the-art GP-based approaches, but scales linearlywith the number of data rather than cubically. This allows us to achieve apreviously intractable degree of parallelism, which we apply to large scalehyperparameter optimization, rapidly finding competitive models on benchmarkobject recognition tasks using convolutional networks, and image captiongeneration using neural language models.
translated by 谷歌翻译