失效链接处理 |
An Introduction to Statistical Learning PDF 下载
本站整理下载:
提取码:ywum
相关截图:
主要内容:
An Overview of Statistical Learning
Statistical learning refers to a vast set of tools for understanding data. These
tools can be classified as supervised or unsupervised. Broadly speaking,
supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs. Problems of
this nature occur in fields as diverse as business, medicine, astrophysics, and
public policy. With unsupervised statistical learning, there are inputs but
no supervising output; nevertheless we can learn relationships and structure from such data. To provide an illustration of some applications of
statistical learning, we briefly discuss three real-world data sets that are
considered in this book.
Wage Data
In this application (which we refer to as the Wage data set throughout this
book), we examine a number of factors that relate to wages for a group of
males from the Atlantic region of the United States. In particular, we wish
to understand the association between an employee’s age and education, as
well as the calendar year, on his wage. Consider, for example, the left-hand
panel of Figure 1.1, which displays wage versus age for each of the individuals in the data set. There is evidence that wage increases with age but then
decreases again after approximately age 60. The blue line, which provides
an estimate of the average wage for a given age, makes this trend clearer.
G. James et al., An Introduction to Statistical Learning: with Applications in R,
Springer Texts in Statistics, DOI 10.1007/978-1-4614-7138-7 1,
© Springer Science+Business Media New York 2013
1
2 1. Introduction
Age Year
20 40 60 80 2003 2006 2009 12345
Education Level
FIGURE 1.1. Wage data, which contains income survey information for males
from the central Atlantic region of the United States. Left: wage as a function of
age. On average, wage increases with age until about 60 years of age, at which
point it begins to decline. Center: wage as a function of year. There is a slow
but steady increase of approximately $10,000 in the average wage between 2003
and 2009. Right: Boxplots displaying wage as a function of education, with 1
indicating the lowest level (no high school diploma) and 5 the highest level (an
advanced graduate degree). On average, wage increases with the level of education.
Given an employee’s age, we can use this curve to predict his wage. However,
it is also clear from Figure 1.1 that there is a significant amount of variability associated with this average value, and so age alone is unlikely to
provide an accurate prediction of a particular man’s wage.
We also have information regarding each employee’s education level and
the year in which the wage was earned. The center and right-hand panels of
Figure 1.1, which display wage as a function of both year and education, indicate that both of these factors are associated with wage. Wages increase
by approximately $10,000, in a roughly linear (or straight-line) fashion,
between 2003 and 2009, though this rise is very slight relative to the variability in the data. Wages are also typically greater for individuals with
higher education levels: men with the lowest education level (1) tend to
have substantially lower wages than those with the highest education level
(5). Clearly, the most accurate prediction of a given man’s wage will be
obtained by combining his age, his education, and the year. In Chapter 3,
we discuss linear regression, which can be used to predict wage from this
data set. Ideally, we should predict wage in a way that accounts for the
non-linear relationship between wage and age. In Chapter 7, we discuss a
class of approaches for addressing this problem.
|