Anomaly Detection - A Survey PDF 下载_Java知识分享网-免费Java资源下载

失效链接处理

Anomaly Detection - A Survey PDF 下载

本站整理下载：

链接：https://pan.baidu.com/s/1C7hT0swlq73xBBYqlmYvZg

提取码：32vk

相关截图：

主要内容：

tection, we provide a detailed discussion of the application domains where anomaly

detection techniques have been used. For each domain we discuss the notion of an

anomaly, the different aspects of the anomaly detection problem, and the challenges

faced by the anomaly detection techniques. We also provide a list of techniques

that have been applied in each application domain.

The existing surveys discuss anomaly detection techniques that detect the simplest form of anomalies. We distinguish the simple anomalies from complex anomalies. The discussion of applications of anomaly detection reveals that for most application domains, the interesting anomalies are complex in nature, while most of

the algorithmic research has focussed on simple anomalies.

1.5 Organization

This survey is organized into three parts and its structure closely follows Figure

2. In Section 2 we identify the various aspects that determine the formulation

of the problem and highlight the richness and complexity associated with anomaly

detection. We distinguish simple anomalies from complex anomalies and define two

types of complex anomalies, viz., contextual and collective anomalies. In Section

3 we briefly describe the different application domains where anomaly detection

has been applied. In subsequent sections we provide a categorization of anomaly

detection techniques based on the research area which they belong to. Majority

of the techniques can be categorized into classification based (Section 4), nearest

neighbor based (Section 5), clustering based (Section 6), and statistical techniques

(Section 7). Some techniques belong to research areas such as information theory

(Section 8), and spectral theory (Section 9). For each category of techniques we also

discuss their computational complexity for training and testing phases. In Section

10 we discuss various contextual anomaly detection techniques. We discuss various

collective anomaly detection techniques in Section 11. We present some discussion

on the limitations and relative performance of various existing techniques in Section

12. Section 13 contains concluding remarks.

2. DIFFERENT ASPECTS OF AN ANOMALY DETECTION PROBLEM

This section identifies and discusses the different aspects of anomaly detection. As

mentioned earlier, a specific formulation of the problem is determined by several

different factors such as the nature of the input data, the availability (or unavailability) of labels as well as the constraints and requirements induced by the application

domain. This section brings forth the richness in the problem domain and justifies

the need for the broad spectrum of anomaly detection techniques.

2.1 Nature of Input Data

A key aspect of any anomaly detection technique is the nature of the input data.

Input is generally a collection of data instances (also referred as object, record, point,

vector, pattern, event, case, sample, observation, entity) [Tan et al. 2005, Chapter

2] . Each data instance can be described using a set of attributes (also referred

to as variable, characteristic, feature, field, dimension). The attributes can be of

different types such as binary, categorical or continuous. Each data instance might

consist of only one attribute (univariate) or multiple attributes (multivariate). In

To Appear in ACM Computing Surveys, 09 2009.

Anomaly Detection : A Survey · 7

the case of multivariate data instances, all attributes might be of same type or

might be a mixture of different data types.

The nature of attributes determine the applicability of anomaly detection techniques. For example, for statistical techniques different statistical models have to

be used for continuous and categorical data. Similarly, for nearest neighbor based

techniques, the nature of attributes would determine the distance measure to be

used. Often, instead of the actual data, the pairwise distance between instances

might be provided in the form of a distance (or similarity) matrix. In such cases,

techniques that require original data instances are not applicable, e.g., many statistical and classification based techniques.

Input data can also be categorized based on the relationship present among data

instances [Tan et al. 2005]. Most of the existing anomaly detection techniques deal

with record data (or point data), in which no relationship is assumed among the

data instances.

In general, data instances can be related to each other. Some examples are

sequence data, spatial data, and graph data. In sequence data, the data instances

are linearly ordered, e.g., time-series data, genome sequences, protein sequences. In

spatial data, each data instance is related to its neighboring instances, e.g., vehicular

traffic data, ecological data. When the spatial data has a temporal (sequential)

component it is referred to as spatio-temporal data, e.g., climate data. In graph

data, data instances are represented as vertices in a graph and are connected to

other vertices with edges. Later in this section we will discuss situations where

such relationship among data instances become relevant for anomaly detection.

2.2 Type of Anomaly

An important aspect of an anomaly detection technique is the nature of the desired

anomaly. Anomalies can be classified into following three categories:

2.2.1 Point Anomalies. If an individual data instance can be considered as

anomalous with respect to the rest of data, then the instance is termed as a point

anomaly. This is the simplest type of anomaly and is the focus of majority of

research on anomaly detection.

For example, in Figure 1, points o1 and o2 as well as points in region O3 lie

outside the boundary of the normal regions, and hence are point anomalies since

they are different from normal data points.

As a real life example, consider credit card fraud detection. Let the data set

correspond to an individual’s credit card transactions. For the sake of simplicity,

let us assume that the data is defined using only one feature: amount spent. A

transaction for which the amount spent is very high compared to the normal range

of expenditure for that person will be a point anomaly.

2.2.2 Contextual Anomalies. If a data instance is anomalous in a specific context (but not otherwise), then it is termed as a contextual anomaly (also referred

to as conditional anomaly [Song et al. 2007]).

The notion of a context is induced by the structure in the data set and has to be

specified as a part of the problem formulation. Each data instance is defined using

following two sets of attributes:

To Appear in ACM Computing Surveys, 09 2009.

Monthly Temp

Time

Mar Jun Sept Dec Mar Jun Sept Dec Mar Jun Sept Dec

t t2 1 8 · Chandola, Banerjee and Kumar

(1) Contextual attributes. The contextual attributes are used to determine the

context (or neighborhood) for that instance. For example, in spatial data sets,

the longitude and latitude of a location are the contextual attributes. In timeseries data, time is a contextual attribute which determines the position of an

instance on the entire sequence.

(2) Behavioral attributes. The behavioral attributes define the non-contextual characteristics of an instance. For example, in a spatial data set describing the

average rainfall of the entire world, the amount of rainfall at any location is a

behavioral attribute.

The anomalous behavior is determined using the values for the behavioral attributes

within a specific context. A data instance might be a contextual anomaly in a given

context, but an identical data instance (in terms of behavioral attributes) could

be considered normal in a different context. This property is key in identifying

contextual and behavioral attributes for a contextual anomaly detection technique

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦！

Python学习路线图

Anomaly Detection - A Survey PDF 下载

Java1234官方群25：
Java1234官方群25：	838462530