Stochastic Gradient Methods for Principled Estimation with Large Datasets

doi:10.1201/b19567-23

ABSTRACT

Parameter estimation by optimization of an objective function, such as maximum-likelihood and maximum a posteriori, is a fundamental idea in statistics and machine learning (Fisher, 1922, Lehmann and Casella, 2003, Hastie et al., 2011). However, widely used optimizationbased estimation algorithms, such as Fisher scoring, the Expectation-Maximization (EM) algorithm, and iteratively reweighted least squares (Fisher, 1925, Dempster et al., 1977, Green, 1984), are not scalable to modern datasets with hundreds of millions of data points and hundreds or thousands of covariates (National Research Council, 2013).