失效链接处理 |
A Statistical View of Some Chemometrics Regression Tools PDF 下载
本站整理下载:
相关截图:
主要内容:
plied to many types of chemical problems for some
time. For example, experimental design techniques
have had a strong impact on understanding and im-
proving industrial chemical processes. Recently the
field of chemometrics has emerged with a focus on
analyzing observational data originating mostly from
organic and analytical chemistry, food research, and
environmental studies. These data tend to be char-
acterized by many measured variables on each of a
few observations. Often the number of such variables
p greatly exceeds the observation count N. There is
generally a high degree of collinearity among the
variables, which are often (but not always) digiti-
zations of analog signals.
Many of the tools employed by chemometricians
are the same as those used in other fields that pro-
duce and analyze observational data and are more
or less well known to statisticians. These tools in-
clude data exploration through principal components
and cluster analysis, as well as modern computer
graphics. Predictive modeling (regression and clas-
sification) is also an important goal in most appli-
cations. In this area, however, chemometricians have
invented their own techniques based on heuristic rea-
soning and intuitive ideas, and there is a growing
body of empirical evidence that they perform well in
many situations. The most popular regression method
in chemometrics is partial least squares (PLS) (H.
Wold 1975) and, to a somewhat lesser extent, prin-
cipal components regression (PCR) (Massy 1965).
Although PLS is heavily promoted (and used) by
chemometricians, it is largely unknown to statisti-
cians. PCR is known to, but seldom recommended
by, statisticians. [The Journal of Chemometrics (John
Wiley) and Chemometrics and Intelligent Laboratory
Systems (Elsevier) contain many articles on regres-
sion applications to chemical problems using PCR
and PLS. See also Martens and Naes (1989).]
The original ideas motivating PLS and PCR were
entirely heuristic, and their statistical properties re-
main largely a mystery. There has been some recent
progress with respect to PLS (Helland 1988; Lorber,
Wangen, and Kowalski 1987; Phatak, Reilly, and
Penlidis 1991; Stone and Brooks 1990). The purpose
of this article is to view these procedures from a
statistical perspective, attempting to gain some in-
sight as to when and why they can be expected to
work well. In situations for which they do perform
well, they are compared to standard statistical meth-
odology intended for those situations. These include
ordinary least squares (OLS) regression, variable
subset selection (VSS) methods, and ridge regression
(RR) (Hoerl and Kennard 1970). The goal is to bring
all of these methods together into a common frame-
work to attempt to shed some light on their similar-
ities and differences. The characteristics of PLS in
particular have so far eluded theoretical understand-
ing. This has led to unsubstantiated claims concern-
ing its performance relative to other regression pro-
109
This content downloaded from
106.37.205.239 on Mon, 26 Oct 2020 13:05:23 UTC
All use subject to https://about.jstor.org/terms
ILDIKO E. FRANK AND JEROME H. FRIEDMAN
cedures, such as that it makes fewer assumptions
concerning the nature of the data. Simply not under-
standing the nature of the assumptions being made
does not mean that they do not exist.
Space limitations force us to limit our discussion
here to methods that so far have seen the most use
in practice. There are many other suggested ap-
proaches [e.g., latent root regression (Hawkins 1973;
Webster, Gunst, and Mason 1974), intermediate least
squares (Frank 1987), James-Stein shrinkage (James
and Stein 1961), and various Bayes and empirical
Bayes methods] that, although potentially promis-
ing, have not yet seen wide applications.
1.1 Summary Conclusions
RR, PCR, and PLS are seen in Section 3 to operate
in a similar fashion. Their principal goal is to shrink
the solution coefficient vector away from the OLS
solution toward directions in the predictor-variable
space of larger sample spread. Section 3.1 provides
a Bayesian motivation for this under a prior distri-
bution that provides no information concerning the
direction of the true coefficient vector-all direc-
tions are equally likely to be encountered. Shrinkage
away from low spread directions is seen to control
the variance of the estimate. Section 3.2 examines
the relative shrinkage structure of these three meth-
ods in detail. PCR and PLS are seen to shrink more
heavily away from the low spread directions than
RR, which provides the optimal shrinkage (among
linear estimators) for an equidirection prior. Thus
PCR and PLS make the assumption that the truth is
likely to have particular preferential alignments with
the high spread directions of the predictor-variable
(sample) distribution. A somewhat surprising result
is that PLS (in addition) places increased probability
mass on the true coefficient vector aligning with the
Kth principal component direction, where K is the
number of PLS components used, in fact expanding
the OLS solution in that direction. The solutions and
hence the performance of RR, PCR, and PLS tend
to be quite similar in most situations, largely because
they are applied to problems involving high colli-
nearity in which variance tends to dominate the bias,
especially in the directions of small predictor spread,
causing all three methods to shrink heavily along
those directions. In the presence of more symmetric
designs, larger differences between them might well
emerge.
The most popular method of regression regulari-
zation used in statistics, VSS, is seen in Section 4 to
make quite different assumptions. It is shown to cor-
respond to a limiting case of a Bayesian procedure
in which the prior probability distribution places all
mass on the original predictor variable (coordinate)
axes. This leads to the assumption that the response
is likely to be influenced by a few of the predictor
variables but leaves unspecified which ones. It will
therefore tend to work best in situations character-
ized by true coefficient vectors with components
consisting of a very few (relatively) large (absolute)
values.
Section 5 presents a simulation study comparing
the performance of OLS, RR, PCR, PLS, and VSS
in a variety of situations. In all of the situations stud-
ied, RR dominated the other methods, closely fol-
lowed by PLS and PCR, in that order. VSS provided
distinctly inferior performance to these but still con-
siderably better than OLS, which usually performed
quite badly.
Section 6 examines multiple-response regression,
investigating the circumstances under which consid-
ering all of the responses together as a group might
lead to better performance than a sequence of sep-
arate regressions of each response individually on
the predictors. Two-block multiresponse PLS is an-
alyzed. It is seen to bias the solution coefficient vec-
tors away from low spread directions in the predictor
variable space (as would a sequence of separate PLS
regressions) but also toward directions in the pre-
dictor space that preferentially predict the high spread
directions in the response-variable space. An (em-
pirical) Bayesian motivation for this behavior is de-
veloped by considering a joint prior on all of the
(true) coefficient vectors that provides information
on the degree of similarity of the dependence of the
responses on the predictors (through the response
correlation structure) but no information as to the
particular nature of those dependences. This leads
to a multiple-response analog of RR that exhibits
similar behavior to that of two-block PLS. The two
procedures are compared in a small simulation study
in which multiresponse ridge slightly outperformed
two-block PLS. Surprisingly, however, neither did
dramatically better than the corresponding unire-
sponse procedures applied separately to the individ-
ual responses, even though the situations were de-
signed to be most favorable to the multiresponse
methods.
Section 7 discusses the invariance properties of
these regression procedures. Only OLS is equivar-
iant under all nonsingular affine (linear-rotation
and/or scaling) transformations of the variable axes.
RR, PCR, and PLS are equivariant under rotation
but not scaling. VSS is equivariant under scaling but
not rotation. These properties are seen to follow from
the nature of the (informal) priors and loss structures
associated with the respective procedures.
Finally, Section 8 provides a short discussion of
interpretability issues.
|