PLS generalised linear regression

P. Bastien, V. Esposito Vinzi, M. TENENHAUS

Computational Statistics and Data Analysis

janvier 2005, vol. 48, n°1, pp.17-46

Départements : Economie et Sciences de la décision

Mots clés : Partial least squares, Stepwise regression, Variable selection, Modified PLS regression

PLS univariate regression is a model linking a dependent variable y to a set X={x1,…,xp} of (numerical or categorical) explanatory variables. It can be obtained as a series of simple and multiple regressions. By taking advantage from the statistical tests associated with linear regression, it is feasible to select the significant explanatory variables to include in PLS regression and to choose the number of PLS components to retain. The principle of the presented algorithm may be similarly used in order to yield an extension of PLS regression to PLS generalised linear regression. The modifications to classical PLS regression, the case of PLS logistic regression and the application of PLS generalised linear regression to survival data are studied in detail. Some examples show the use of the proposed methods in real practice. As a matter of fact, classical PLS univariate regression is the result of an iterated use of ordinary least squares (OLS) where PLS stands for partial least squares. PLS generalised linear regression retains the rationale of PLS while the criterion optimised at each step is based on maximum likelihood. Nevertheless, the acronym PLS is kept as a reference to a general methodology for relating a response variable to a set of predictors. The approach proposed for PLS generalised linear regression is simple and easy to implement. Moreover, it can be easily generalised to any model that is linear at the level of the explanatory variables