Java知识分享网 - 轻松学习从此开始!    

Java知识分享网

Java1234官方群25:java1234官方群17
Java1234官方群25:838462530
        
SpringBoot+SpringSecurity+Vue+ElementPlus权限系统实战课程 震撼发布        

最新Java全栈就业实战课程(免费)

springcloud分布式电商秒杀实战课程

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦!

Python学习路线图

锋哥开始收Java学员啦!
当前位置: 主页 > Java文档 > Java基础相关 >

Machine Learning A Probabilistic Perspective PDF 下载


分享到:
时间:2020-09-10 10:33来源:http://www.java1234.com 作者:小锋  侵权举报
Machine Learning A Probabilistic Perspective PDF 下载
失效链接处理
Machine Learning A Probabilistic Perspective  PDF 下载

 
本站整理下载:
 
相关截图:
 
主要内容:

1.2 Supervised learning
We begin our investigation of machine learning by discussing supervised learning, which is the
form of ML most widely used in practice.
1.2.1 Classification
In this section, we discuss classification. Here the goal is to learn a mapping from inputs x
to outputs y, where y ∈ {1,...,C}, with C being the number of classes. If C = 2, this is
called binary classification (in which case we often assume y ∈ {0, 1}); if C > 2, this is called
multiclass classification. If the class labels are not mutually exclusive (e.g., somebody may be
classified as tall and strong), we call it multi-label classification, but this is best viewed as
predicting multiple related binary class labels (a so-called multiple output model). When we
use the term “classification”, we will mean multiclass classification with a single output, unless
we state otherwise.
One way to formalize the problem is as function approximation. We assume y = f(x) for
some unknown function f, and the goal of learning is to estimate the function f given a labeled
training set, and then to make predictions using yˆ = ˆf(x). (We use the hat symbol to denote
an estimate.) Our main goal is to make predictions on novel inputs, meaning ones that we have
not seen before (this is called generalization), since predicting the response on the training set
is easy (we can just look up the answer).
1.2.1.1 Example
As a simple toy example of classification, consider the problem illustrated in Figure 1.1(a). We
have two classes of object which correspond to labels 0 and 1. The inputs are colored shapes.
These have been described by a set of D features or attributes, which are stored in an N × D
design matrix X, shown in Figure 1.1(b). The input features x can be discrete, continuous or a
combination of the two. In addition to the inputs, we have a vector of training labels y.
In Figure 1.1, the test cases are a blue crescent, a yellow circle and a blue arrow. None of
these have been seen before. Thus we are required to generalize beyond the training set. A
4 Chapter 1. Introduction
reasonable guess is that blue crescent should be y = 1, since all blue shapes are labeled 1 in the
training set. The yellow circle is harder to classify, since some yellow things are labeled y = 1
and some are labeled y = 0, and some circles are labeled y = 1 and some y = 0. Consequently
it is not clear what the right label should be in the case of the yellow circle. Similarly, the correct
label for the blue arrow is unclear.
1.2.1.2 The need for probabilistic predictions
To handle ambiguous cases, such as the yellow circle above, it is desirable to return a probability.
The reader is assumed to already have some familiarity with basic concepts in probability. If
not, please consult Chapter 2 for a refresher, if necessary.
We will denote the probability distribution over possible labels, given the input vector x and
training set D by p(y|x, D). In general, this represents a vector of length C. (If there are just two
classes, it is sufficient to return the single number p(y = 1|x, D), since p(y = 1|x, D) + p(y = 0|x, D)=1.) In our notation, we make explicit that the probability is conditional on the test
input x, as well as the training set D, by putting these terms on the right hand side of the
conditioning bar |. We are also implicitly conditioning on the form of model that we use to make
predictions. When choosing between different models, we will make this assumption explicit by
writing p(y|x, D, M), where M denotes the model. However, if the model is clear from context,
we will drop M from our notation for brevity.
Given a probabilistic output, we can always compute our “best guess” as to the “true label”
using
yˆ = ˆf(x) = C
argmax c=1
p(y = c|x, D) (1.1)
This corresponds to the most probable class label, and is called the mode of the distribution
p(y|x, D); it is also known as a MAP estimate (MAP stands for maximum a posteriori). Using
the most probable label makes intuitive sense, but we will give a more formal justification for
this procedure in Section 5.7.
Now consider a case such as the yellow circle, where p(ˆy|x, D) is far from 1.0. In such a
case we are not very confident of our answer, so it might be better to say “I don’t know” instead
of returning an answer that we don’t really trust. This is particularly important in domains
such as medicine and finance where we may be risk averse, as we explain in Section 5.7.
Another application where it is important to assess risk is when playing TV game shows, such
as Jeopardy. In this game, contestants have to solve various word puzzles and answer a variety
of trivia questions, but if they answer incorrectly, they lose money. In 2011, IBM unveiled a
computer system called Watson which beat the top human Jeopardy champion. Watson uses a
variety of interesting techniques (Ferrucci et al. 2010), but the most pertinent one for our present
purposes is that it contains a module that estimates how confident it is of its answer. The system
only chooses to “buzz in” its answer if sufficiently confident it is correct. Similarly, Google has a
system known as SmartASS (ad selection system) that predicts the probability you will click on
an ad based on your search history and other user and ad-specific features (Metz 2010). This
probability is known as the click-through rate or CTR, and can be used to maximize expected
profit. We will discuss some of the basic principles behind systems such as SmartASS later in
this book.

 

------分隔线----------------------------

锋哥公众号


锋哥微信


关注公众号
【Java资料站】
回复 666
获取 
66套java
从菜鸡到大神
项目实战课程

锋哥推荐