失效链接处理 |
Bayesian methods for hackers PDF 下载
本站整理下载:
提取码:sco9
相关截图:
![]()
主要内容:
5.1 Introduction
Statisticians can be a sour bunch. Instead of considering their winnings, they only measure
how much they have lost. In fact, they consider their wins to be negative losses. But what’s
interesting is how they measure their losses.
For example, consider the following:
A meteorologist is predicting the probability of a hurricane striking his city. He estimates, with
95% confidence, that the probability of it not striking is between 99% and 100%. He is very
happy with his precision and advises the city that a major evacuation is unnecessary.
Unfortunately, the hurricane does strike and the city is flooded.
This stylized example shows the flaw in using a pure accuracy metric to measure
outcomes. Using a measure that emphasizes estimation accuracy, while an appealing and
objective thing to do, misses the point of why you are even performing the statistical
inference in the first place: results of inference. Furthermore, we’d like a method that
stresses the importance of payoffs of decisions, not the accuracy of the estimation alone.
Read puts this succinctly: “It is better to be roughly right than precisely wrong.”[1]
5.2 Loss Functions
We introduce what statisticians and decision theorists call loss functions. A loss function
is a function of the true parameter, and an estimate of that parameter
L(θ, θ )ˆ = f (θ, θ )ˆ
The important point of loss functions is that they measure how bad our current estimate
is: The larger the loss, the worse the estimate is according to the loss function. A simple,
and very common, example of a loss function is the squared-error loss, a type of loss
function that increases quadratically with the difference, used in estimators like linear
regression, calculation of unbiased statistics, and many areas of machine learning
L(θ, θ )ˆ = (θ θ )ˆ 2
128 Chapter 5 Would You Rather Lose an Arm or a Leg?
The squared-error loss function is used in estimators like linear regression, calculation
of unbiased statistics, and many areas of machine learning. We can also consider an
asymmetric squared-error loss function, something like:
L(θ, θ )ˆ =
(θ θ )ˆ 2
θ < θ ˆ c(θ θ )ˆ 2 θˆ ≥ θ, 0 < c < 1
which represents that estimating a value larger than the true estimate is preferable to
estimating a value that is smaller. A situation where this might be useful is in estimating
Web traffic for the next month, where an overestimated outlook is preferred so as to avoid
an underallocation of server resources.
A negative property about the squared-error loss is that it puts a disproportionate
emphasis on large outliers. This is because the loss increases quadratically, and not linearly,
as the estimate moves away. That is, the penalty of being 3 units away is much less than
being 5 units away, but the penalty is not much greater than being 1 unit away, though in
both cases the magnitude of difference is the same:
|