Graph Databases PDF 下载_Java知识分享网-免费Java资源下载

失效链接处理

Graph Databases PDF 下载

本站整理下载：

链接：https://pan.baidu.com/s/1gTqF0xBzsVoRPCuqPj76xg

提取码：fkr1

相关截图：

主要内容：

Graph Databases

Adrian Silvescu, Doina Caragea, Anna Atramentov

Artificial Intelligence Research Laboratory

Department of Computer Science

Iowa State University

Ames, Iowa 50011

Abstract

Gathering huge amounts of complex information (data and knowledge) is very common nowadays. This calls

for the necessity to represent, store and manipulate complex information (e.g. detect correlations and patterns,

discover explanations, construct predictive models etc.). Furthermore, being autonomously maintained, data can

change in time or even change its base structure, making it difficult for representation systems to accommodate

these changes.

Current representation and storage systems are not very flexible in dealing with big changes and also they are

not concerned with the ability of performing complex data manipulations of the sort mentioned above. On the

other hand, data manipulation systems cannot easily work with structural or relational data, but just with flat data

representations. We want to bridge the gap between the two, by introducing a new type of database structure,

called Graph Databases (GDB), based on a natural graph representation. Our Graph Databases are able to

represent as graphs any kind of information, naturally accommodate changes in data, and they also make easier for

Machine Learning methods to use the stored information.

We are mainly concerned with the design and implementation of GDB in this paper. Thus, we define the Data

Definition Language (DDL) that contains extensional definitions as well as intentional definitions, and Data

Manipulation Language (DML) that we use to pose queries. Then we show conceptually how to do updates, how

to accommodate changes, and how to define new concepts or ask higher order queries that are not easily answer

by SQL language. We also show how to transform a relational database into a GDB.

Although, we show how all these things can be done using our graph databases, we do not implement all of

them. This is a very laborious project that goes much beyond our class project goals. We do implement the graph

databases, show how we can ask queries on them, and also how to transform relational databases into graph

databases in order to be able to reuse relational data already existent.

1. Introduction

Huge amounts of information (data and knowledge) become increasingly common in nowadays

sciences. The sheer volume and diversity make it increasingly difficult to the unaided contemporary

scientist to be able to exploit this information at its full potential. This observation calls for automatic and

semi-automatic (human-driven, but not in every single detail) methods of acquiring, integrating,

representing and manipulating complex data (detecting correlations and patterns, discovering hidden

variables, explanations, a-priori unknown regularities, constructing predictive models etc.).

Usually data resides in multiple locations, is autonomously maintained, and may change in time, thus

posing even more challenges. Being autonomous, it is possible that the data sources don't have a unifying

schema or we cannot control their schemas. Furthermore, these schemas can be dynamically modified, and

it may be difficult to accommodate changes. In this setting, several major issues arise: how to structurally

match schemas of different data sources, how to obtain a global schema from multiple schemas, and how to

change the global schema according to possible changes in data sources schemas.

Sometimes the attempts to solve these problems depart from the traditional approach to databases,

namely RDBMS, leading into the realm of Object-Oriented and XML technologies. The new approaches

can deal with increasingly complex data, namely objects in Object-Oriented Databases and trees in XML.

However, neither the objects nor the trees can naturally represent graphs, which are the most general data

structure. Subsequently, they do not support natural queries on graphs. Moreover, when changes in schema

of the data sources occur, these approaches may lead to major restructuring (redesigning) of global schema.

This may imply very sophisticated and expensive operations, such as renormalization, reindexing etc.,

which may not be performed automatically. Thus, the necessity to represent, store and manipulate complex

data makes RDBMS somewhat obsolete.

We identify three major problems that can arise in the kind of dynamic, distributed, autonomous

environments described above. First, violations of the 1NF can appear when dealing with multi-valued

attributes, complex attributes, or combination of multi-valued and complex attributes. We denote this

problem by [P1] in what follows. Second, when acquiring data from autonomous dynamic sources (e.g.

gene expression data) or from Web (e.g. data about restaurants), a good data representation should be able

to accommodate changes not only in data, but also in the data schema. We call this problem [P2]. It should

be noted that RDBMS might require schema renormalization in such cases, which is not always easy to do.

The third problem, which we denote by [P3], refers to the ability of a database to represent data in a

unified way. Thus, it would be very good if data, knowledge (e.g. schemas), queries (or more generally,

goals) and models (e.g. concepts) would have a unified representation. This would give us a better

understanding of a data problem.

We should note that the three problems mentioned above are very relevant to the Scientific Discovery

(SD) process using Artificial Intelligence (AI), and especially Machine Learning tools. This is because,

much of the data available for SD is structural in nature, needs be often updated, can change easily in

structure etc. Although most of the traditional machines learning algorithms work with data found in flat

data representations, currently there is a considerable tendency to design intelligent learning algorithms that

can deal with structural [Cook and Holder, 2000] or relational data [Getoor et. al., 2001]. However, these

approaches, one based on graphs structures, the other on relational databases, do not take into consideration

the problems [P1], [P2], [P3] described above. They work with data that does not change in time, and they

are not dealing with any updates. Also, they do not address the unified representation problem. The

problem they solve is to extract useful knowledge from data that is already represented as graphs or as

relations. In [Getoor et. al., 2001], data and query do not have a unified representation at all. Although, in

[Cook and Holder, 2000] the authors regard both data and query as graphs, they are not able to put together

data and knowledge. Furthermore, they are not interested in these structures from a databases perspective at

all, but just from a learning perspective. However, we want to stress on the fact that more and more

graphical representations are used in ML lately [Jordan, 1998; Pearl, 2000; Jensen, 2001]. Although quite

distant from what we are doing here, this suggests the importance of being able to deal with the natural

graph representation of information.

We aim at designing database structures that can take into account problems [P1], [P2], [P3], facilitate

the application of learning and reasoning methods on these structures, and also be able to reuse data that is

already stored in relational databases. We call our newly designed structures graph databases, and we think

about them as being at the core of information storage, information integration, and information

exploration. The big picture showing how all these components come into place into the DS process is

shown in Figure 1.

We are not interested in the integration process in this paper. We can assume that the available

information is already integrated [Reinoso et al., 2001]. The problem that we are trying to solve is how to

represent this integrated available information in a way that makes it easier to accommodate changes,

updates, and also provides a unified representaion for data and knowledge, so that the learning process is

facilitated.

K N O W L E D G E

S O U R C E S

L IN K

S O U R C E S

D A T A

S O U R C E S

A V A IL A B L E

IN F O R M A T IO N

IN T H E W O R L D

... ... ...

In fo rm a tio n

In te g ra ti o n

D is c o ve r y & M o d e l li n g Q u e r y = G o a l S p e c i fic a ti o n

U S E R

In fo rm a ti o n

E xp lo ita ti o n In fo rm a tio n

IN F O R M A T IO N IN T E G R A T IO N

Figure 1: Big Picture of the Discovery Science Process

2. Related Work

We mentioned above that RDBMS systems are somewhat obsolete regarding the ability to represent,

store and manipulate complex information that can change in time. Thus, they might require very

sophisticated and expensive operations, such as renormalization, reindexing etc., which may not be

performed automatically. Schema renormalization in such cases is nor desirable neither easy to do.

Therefore, the attempts to solve these problems depart from the traditional approach to databases,

namely RDBMS, leading into the realm of Object-Oriented and XML technologies. The new approaches

can deal with increasingly complex data, namely objects in Object-Oriented Databases and trees in XML.

However, neither the objects nor the trees can naturally represent graphs, which are the most general data

structure. Subsequently, they do not support natural queries on graphs. Moreover, when changes in schema

of the data sources occur, these approaches may lead to major restructuring (redesigning) of global schema.

We briefly mention some of the existing older and newer databases systems, showing how they

address the problems [P1], [P2] and [P3] that we are interested in. We do not go into too many details,

because, as we mentioned, we are not only interested into database system appropriate for representing,

storing and querying data, but also in systems that facilitate the information extraction process. On the

other hand, we do not talk more about graphical learning methods [Jordan, 1998; Pearl, 2000; Jensen, 2001;

Cook and Holder, 2000; Getoor et. al., 2001] because they assume fixed representations, not appropriate for changes,

being concerned only with the learning part. They will be relevant to a discussion about how to use GDB for learning,

which is beyond the purpose of this current paper, but very probable in future work.

To be more precise, we are not aware of any system that tries to bridge the gap between representation and

learning.

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦！

Python学习路线图

Graph Databases PDF 下载

Java1234官方群25：
Java1234官方群25：	838462530