Distributed Representations of Sentences and Documents PDF 下载

失效链接处理

Distributed Representations of Sentences and Documents PDF 下载

本站整理下载：

链接：https://pan.baidu.com/s/1D39iUhpLFLSqsQUBD9gnTQ

提取码：a219

相关截图：

主要内容：

1. Introduction

Text classification and clustering play an important role

in many applications, e.g, document retrieval, web search,

spam filtering. At the heart of these applications is ma-

chine learning algorithms such as logistic regression or K-

means. These algorithms typically require the text input to

be represented as a fixed-length vector. Perhaps the most

common fixed-length vector representation for texts is the

bag-of-words or bag-of-n-grams (Harris, 1954) due to its

simplicity, efficiency and often surprising accuracy.

However, the bag-of-words (BOW) has many disadvan-

Proceedings of the 31 st International Conference on Machine

Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy-

right 2014 by the author(s).

tages. The word order is lost, and thus different sentences

can have exactly the same representation, as long as the

same words are used. Even though bag-of-n-grams con-

siders the word order in short context, it suffers from data

sparsity and high dimensionality. Bag-of-words and bag-

of-n-grams have very little sense about the semantics of the

words or more formally the distances between the words.

This means that words “powerful,” “strong” and “Paris” are

equally distant despite the fact that semantically, “power-

ful” should be closer to “strong” than “Paris.”

In this paper, we propose Paragraph Vector, an unsuper-

vised framework that learns continuous distributed vector

representations for pieces of texts. The texts can be of

variable-length, ranging from sentences to documents. The

name Paragraph Vector is to emphasize the fact that the

method can be applied to variable-length pieces of texts,

anything from a phrase or sentence to a large document.

In our model, the vector representation is trained to be use-

ful for predicting words in a paragraph. More precisely, we

concatenate the paragraph vector with several word vec-

tors from a paragraph and predict the following word in the

given context. Both word vectors and paragraph vectors are

trainedbythestochasticgradientdescentandbackpropaga-

tion (Rumelhart et al., 1986). While paragraph vectors are

unique among paragraphs, the word vectors are shared. At

prediction time, the paragraph vectors are inferred by fix-

ing the word vectors and training the new paragraph vector

until convergence.

Our technique is inspired by the recent work in learn-

ing vector representations of words using neural net-

works (Bengio et al., 2006; Collobert & Weston, 2008;

Mnih & Hinton, 2008; Turian et al., 2010; Mikolov et al.,

2013a;c). In their formulation, each word is represented by

a vector which is concatenated or averaged with other word

vectors in a context, and the resulting vector is used to pre-

dict other words in the context. For example, the neural

network language model proposed in (Bengio et al., 2006)

uses the concatenation of several previous word vectors to

form the input of a neural network, and tries to predict the

next word. The outcome is that after the model is trained,

the word vectors are mapped into a vector space such that

Distributed Representations of Sentences and Documents

semantically similar words have similar vector representa-

tions (e.g., “strong” is close to “powerful”).

Following these successful techniques, researchers have

tried to extend the models to go beyond word level

to achieve phrase-level or sentence-level representa-

tions (Mitchell & Lapata, 2010; Zanzotto et al., 2010;

Yessenalina & Cardie, 2011; Grefenstette et al., 2013;

Mikolov et al., 2013c). For instance, a simple approach is

using a weighted average of all the words in the document.

A more sophisticated approach is combining the word vec-

tors in an order given by a parse tree of a sentence, using

matrix-vector operations (Socher et al., 2011b). Both ap-

proaches have weaknesses. The first approach, weighted

averaging of word vectors, loses the word order in the same

way as the standard bag-of-words models do. The second

approach, using a parse tree to combine word vectors, has

been shown to work for only sentences because it relies on

parsing.

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦！

Python学习路线图

Distributed Representations of Sentences and Documents PDF 下载

Java1234官方群25：
Java1234官方群25：	838462530