Human in the Loop Machine Learning PDF 下载_Java知识分享网-免费Java资源下载

失效链接处理

Human in the Loop Machine Learning PDF 下载

本站整理下载：

链接：https://pan.baidu.com/s/1dxAX3BVQR8KsUNpQTaZUdQ

提取码：x71s

相关截图：

主要内容：

1.2 Introducing annotation

Annotation is the process of labeling raw data so that it becomes training data for

machine learning. Most data scientists will tell you that they spend much more time

curating and annotating datasets than they spend building the machine learning

models. Quality control for human annotation relies on more complicated statistics

than most machine learning models do, so it is important to take the necessary time to

learn how to create quality training data.

1.2.1 Simple and more complicated annotation strategies

An annotation process can be simple. If you want to label social media posts about a

product as positive, negative, or neutral to analyze broad trends in sentiment about

that product, for example, you could build and deploy an HTML form in a few hours.

A simple HTML form could allow someone to rate each social media post according

Key

Label A

Label B

Unlabeled

Sample to

be labeled

Transfer learning:

Use an existing

model to start or

augment training.

Annotation:

Humans label items.

Training data:

Labeled items

become the

training data.

? ? ?

Active learning:

Sample unlabeled items

that are interesting for

humans to review.

Deploy model:

Deploy trained

model over

unlabeled data.

Train model:

Use training data to

create machine

learning model.

Pool of unlabeled data

Deployed model:

Predict labels.

Data with predicted

labels

Figure 1.1 A mental model of the human-in-the-loop process for predicting labels on data

6 C HAPTER 1 Introduction to human-in-the-loop machine learning

to the sentiment option, and each rating would become the label on the social media

post for your training data.

An annotation process can also be complicated. If you want to label every object in

a video with a bounding box, for example, a simple HTML form is not enough; you

need a graphical interface that allows annotators to draw those boxes, and a good user

experience might take months of engineering hours to build.

1.2.2 Plugging the gap in data science knowledge

Your machine learning algorithm strategy and your data annotation strategy can be

optimized at the same time. The two strategies are closely intertwined, and you often

get better accuracy from your models faster if you have a combined approach. Algo-

rithms and annotation are equally important components of good machine learning.

All computer science departments offer machine learning courses, but few offer

courses on creating training data. At most, you might find one or two lectures about

creating training data among hundreds of machine learning lectures across half a

dozen courses. This situation is changing, but slowly. For historical reasons, academic

machine learning researchers have tended to keep the datasets constant and evalu-

ated their research only in terms of different algorithms.

By contrast with academic machine learning, it is more common in industry to

improve model performance by annotating more training data. Especially when the

nature of the data is changing over time (which is also common), using a handful of

new annotations can be far more effective than trying to adapt an existing model to a

new domain of data. But far more academic papers focus on how to adapt algorithms

to new domains without new training data than on how to annotate the right new train-

ing data efficiently.

Because of this imbalance in academia, I’ve often seen people in industry make

the same mistake. They hire a dozen smart PhDs who know how to build state-of-the-

art algorithms but don’t have experience creating training data or thinking about the

right interfaces for annotation. I saw exactly this situation recently at one of the

world’s largest auto manufacturers. The company had hired a large number of recent

machine learning graduates, but it couldn’t operationalize its autonomous vehicle

technology because the new employees couldn’t scale their data annotation strategy.

The company ended up letting that entire team go. During the aftermath, I advised

the company how to rebuild its strategy by using algorithms and annotation as equally-

important, intertwined components of good machine learning.

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦！

Python学习路线图

Human in the Loop Machine Learning PDF 下载

Java1234官方群25：
Java1234官方群25：	838462530