Clustering by fast search and find of PDF 下载_Java知识分享网-免费Java资源下载

失效链接处理

Clustering by fast search and find of PDF 下载

本站整理下载：

链接：https://pan.baidu.com/s/1FT1ErrYcskfg3dyGduLA9Q

提取码：lqic

相关截图：

主要内容：

Cluster analysis is aimed at classifying elements into categories on the basis of their

similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern

recognition. We propose an approach based on the idea that cluster centers are characterized

by a higher density than their neighbors and by a relatively large distance from points with

higher densities. This idea forms the basis of a clustering procedure in which the number of

clusters arises intuitively, outliers are automatically spotted and excluded from the analysis, and

clusters are recognized regardless of their shape and of the dimensionality of the space in which

they are embedded. We demonstrate the power of the algorithm on several test cases.

lustering algorithms attempt to classify

elements into categories, or clusters, on

the basis of their similarity. Several different clustering strategies have been proposed (1), but no consensus has been reached

even on the definition of a cluster. In K-means (2)

and K-medoids (3) methods, clusters are groups

of data characterized by a small distance to the

cluster center. An objective function, typically the

sum of the distance to a set of putative cluster

centers, is optimized (3–6) until the best cluster

centers candidates are found. However, because

a data point is always assigned to the nearest

center, these approaches are not able to detect

nonspherical clusters (7). In distribution-based algorithms, one attempts to reproduce the observed

realization of data points as a mix of predefined

probability distribution functions (8); the accuracy

of such methods depends on the capability of the

trial probability to represent the data.

Clusters with an arbitrary shape are easily

detected by approaches based on the local density of data points. In density-based spatial clustering of applications with noise (DBSCAN) (9),

one chooses a density threshold, discards as noise

the points in regions with densities lower than

this threshold, and assigns to different clusters

disconnected regions of high density. However,

choosing an appropriate threshold can be nontrivial, a drawback not present in the mean-shift

clustering method (10, 11). There a cluster is defined as a set of points that converge to the same

local maximum of the density distribution function. This method allows the finding of nonspherical clusters but works only for data defined by a

set of coordinates and is computationally costly.

Here, we propose an alternative approach.

Similar to the K-medoids method, it has its

basis only in the distance between data points.

Like DBSCAN and the mean-shift method, it is

able to detect nonspherical clusters and to automatically find the correct number of clusters.

The cluster centers are defined, as in the meanshift method, as local maxima in the density of

data points. However, unlike the mean-shift method, our procedure does not require embedding

the data in a vector space and maximizing explicitly the density field for each data point.

The algorithm has its basis in the assumptions

that cluster centers are surrounded by neighbors

with lower local density and that they are at a

relatively large distance from any points with a

higher local density. For each data point i, we

compute two quantities: its local density ri and

its distance di from points of higher density. Both

these quantities depend only on the distances dij

between data points, which are assumed to satisfy the triangular inequality. The local density ri

of data point i is defined as

ri ¼ ∑j cðdij − dcÞ ð1Þ

where cðxÞ ¼ 1 if x < 0 and cðxÞ ¼ 0 otherwise,

and dc is a cutoff distance. Basically, ri is equal to

the number of points that are closer than dc to

point i. The algorithm is sensitive only to the relative magnitude of ri in different points, implying

that, for large data sets, the results of the analysis

are robust with respect to the choice of dc.

1492 27 JUNE 2014 • VOL 344 ISSUE 6191 sciencemag.org SCIENCE

RESEARCH | REPORTS

SISSA (Scuola Internazionale Superiore di Studi Avanzati),

via Bonomea 265, I-34136 Trieste, Italy.

E-mail: laio@sissa.it (A.L.); alexrod@sissa.it (A.R.)

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦！

Python学习路线图

Clustering by fast search and find of PDF 下载

Java1234官方群25：
Java1234官方群25：	838462530