Java知识分享网 - 轻松学习从此开始!    

Java知识分享网

Java1234官方群25:java1234官方群17
Java1234官方群25:838462530
        
SpringBoot+SpringSecurity+Vue+ElementPlus权限系统实战课程 震撼发布        

最新Java全栈就业实战课程(免费)

springcloud分布式电商秒杀实战课程

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦!

Python学习路线图

锋哥开始收Java学员啦!
当前位置: 主页 > Java文档 > Java基础相关 >

Clustering by fast search and find of PDF 下载


分享到:
时间:2021-05-12 09:56来源:http://www.java1234.com 作者:转载  侵权举报
Clustering by fast search and find of PDF 下载
失效链接处理
Clustering by fast search and find of PDF 下载


本站整理下载:
提取码:lqic 
 
 
相关截图:
 
主要内容:

Cluster analysis is aimed at classifying elements into categories on the basis of their
similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern
recognition. We propose an approach based on the idea that cluster centers are characterized
by a higher density than their neighbors and by a relatively large distance from points with
higher densities. This idea forms the basis of a clustering procedure in which the number of
clusters arises intuitively, outliers are automatically spotted and excluded from the analysis, and
clusters are recognized regardless of their shape and of the dimensionality of the space in which
they are embedded. We demonstrate the power of the algorithm on several test cases.
C
lustering algorithms attempt to classify
elements into categories, or clusters, on
the basis of their similarity. Several different clustering strategies have been proposed (1), but no consensus has been reached
even on the definition of a cluster. In K-means (2)
and K-medoids (3) methods, clusters are groups
of data characterized by a small distance to the
cluster center. An objective function, typically the
sum of the distance to a set of putative cluster
centers, is optimized (3–6) until the best cluster
centers candidates are found. However, because
a data point is always assigned to the nearest
center, these approaches are not able to detect
nonspherical clusters (7). In distribution-based algorithms, one attempts to reproduce the observed
realization of data points as a mix of predefined
probability distribution functions (8); the accuracy
of such methods depends on the capability of the
trial probability to represent the data.
Clusters with an arbitrary shape are easily
detected by approaches based on the local density of data points. In density-based spatial clustering of applications with noise (DBSCAN) (9),
one chooses a density threshold, discards as noise
the points in regions with densities lower than
this threshold, and assigns to different clusters
disconnected regions of high density. However,
choosing an appropriate threshold can be nontrivial, a drawback not present in the mean-shift
clustering method (10, 11). There a cluster is defined as a set of points that converge to the same
local maximum of the density distribution function. This method allows the finding of nonspherical clusters but works only for data defined by a
set of coordinates and is computationally costly.
Here, we propose an alternative approach.
Similar to the K-medoids method, it has its
basis only in the distance between data points.
Like DBSCAN and the mean-shift method, it is
able to detect nonspherical clusters and to automatically find the correct number of clusters.
The cluster centers are defined, as in the meanshift method, as local maxima in the density of
data points. However, unlike the mean-shift method, our procedure does not require embedding
the data in a vector space and maximizing explicitly the density field for each data point.
The algorithm has its basis in the assumptions
that cluster centers are surrounded by neighbors
with lower local density and that they are at a
relatively large distance from any points with a
higher local density. For each data point i, we
compute two quantities: its local density ri and
its distance di from points of higher density. Both
these quantities depend only on the distances dij
between data points, which are assumed to satisfy the triangular inequality. The local density ri
of data point i is defined as
ri ¼ ∑j cðdij − dcÞ ð1Þ
where cðxÞ ¼ 1 if x < 0 and cðxÞ ¼ 0 otherwise,
and dc is a cutoff distance. Basically, ri is equal to
the number of points that are closer than dc to
point i. The algorithm is sensitive only to the relative magnitude of ri in different points, implying
that, for large data sets, the results of the analysis
are robust with respect to the choice of dc.
1492 27 JUNE 2014 • VOL 344 ISSUE 6191 sciencemag.org SCIENCE
RESEARCH | REPORTS
SISSA (Scuola Internazionale Superiore di Studi Avanzati),
via Bonomea 265, I-34136 Trieste, Italy.
E-mail: laio@sissa.it (A.L.); alexrod@sissa.it (A.R.)

 

------分隔线----------------------------

锋哥公众号


锋哥微信


关注公众号
【Java资料站】
回复 666
获取 
66套java
从菜鸡到大神
项目实战课程

锋哥推荐