Java知识分享网 - 轻松学习从此开始!    

Java知识分享网

Java1234官方群25:java1234官方群17
Java1234官方群25:838462530
        
SpringBoot+SpringSecurity+Vue+ElementPlus权限系统实战课程 震撼发布        

最新Java全栈就业实战课程(免费)

springcloud分布式电商秒杀实战课程

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦!

Python学习路线图

锋哥开始收Java学员啦!
当前位置: 主页 > Java文档 > Java基础相关 >

Graph Databases PDF 下载


分享到:
时间:2020-05-18 14:44来源:http://www.java1234.com 作者:小锋  侵权举报
Graph Databases PDF 下载
失效链接处理
Graph Databases PDF 下载

本站整理下载:
 
相关截图:
 
主要内容:

Graph Databases
Adrian Silvescu, Doina Caragea, Anna Atramentov 
Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University
Ames, Iowa 50011
Abstract
Gathering huge amounts of complex information (data and knowledge) is very common nowadays. This calls 
for the necessity to represent, store and manipulate complex information (e.g. detect correlations and patterns, 
discover explanations, construct predictive models etc.). Furthermore, being autonomously maintained, data can 
change in time or even change its base structure, making it difficult for representation systems to accommodate 
these changes. 
Current representation and storage systems are not very flexible in dealing with big changes and also they are 
not concerned with the ability of performing complex data manipulations of the sort mentioned above. On the 
other hand, data manipulation systems cannot easily work with structural or relational data, but just with flat data 
representations. We want to bridge the gap between the two, by introducing a new type of database structure, 
called Graph Databases (GDB), based on a natural graph representation. Our Graph Databases are able to 
represent as graphs any kind of information, naturally accommodate changes in data, and they also make easier for 
Machine Learning methods to use the stored information.
We are mainly concerned with the design and implementation of GDB in this paper. Thus, we define the Data 
Definition Language (DDL) that contains extensional definitions as well as intentional definitions, and Data 
Manipulation Language (DML) that we use to pose queries. Then we show conceptually how to do updates, how 
to accommodate changes, and how to define new concepts or ask higher order queries that are not easily answer 
by SQL language. We also show how to transform a relational database into a GDB.
Although, we show how all these things can be done using our graph databases, we do not implement all of 
them. This is a very laborious project that goes much beyond our class project goals. We do implement the graph 
databases, show how we can ask queries on them, and also how to transform relational databases into graph 
databases in order to be able to reuse relational data already existent.
1. Introduction
Huge amounts of information (data and knowledge) become increasingly common in nowadays 
sciences. The sheer volume and diversity make it increasingly difficult to the unaided contemporary 
scientist to be able to exploit this information at its full potential. This observation calls for automatic and 
semi-automatic (human-driven, but not in every single detail) methods of acquiring, integrating, 
representing and manipulating complex data (detecting correlations and patterns, discovering hidden 
variables, explanations, a-priori unknown regularities, constructing predictive models etc.).
Usually data resides in multiple locations, is autonomously maintained, and may change in time, thus 
posing even more challenges. Being autonomous, it is possible that the data sources don't have a unifying 
schema or we cannot control their schemas. Furthermore, these schemas can be dynamically modified, and 
it may be difficult to accommodate changes. In this setting, several major issues arise: how to structurally 
match schemas of different data sources, how to obtain a global schema from multiple schemas, and how to 
change the global schema according to possible changes in data sources schemas.
Sometimes the attempts to solve these problems depart from the traditional approach to databases, 
namely RDBMS, leading into the realm of Object-Oriented and XML technologies. The new approaches 
can deal with increasingly complex data, namely objects in Object-Oriented Databases and trees in XML. 
However, neither the objects nor the trees can naturally represent graphs, which are the most general data 
structure. Subsequently, they do not support natural queries on graphs. Moreover, when changes in schema 
of the data sources occur, these approaches may lead to major restructuring (redesigning) of global schema. 
This may imply very sophisticated and expensive operations, such as renormalization, reindexing etc., 
which may not be performed automatically. Thus, the necessity to represent, store and manipulate complex 
data makes RDBMS somewhat obsolete.
We identify three major problems that can arise in the kind of dynamic, distributed, autonomous 
environments described above. First, violations of the 1NF can appear when dealing with multi-valued 
attributes, complex attributes, or combination of multi-valued and complex attributes. We denote this 
problem by [P1] in what follows. Second, when acquiring data from autonomous dynamic sources (e.g. 
gene expression data) or from Web (e.g. data about restaurants), a good data representation should be able 
to accommodate changes not only in data, but also in the data schema. We call this problem [P2]. It should 
be noted that RDBMS might require schema renormalization in such cases, which is not always easy to do. 
The third problem, which we denote by [P3], refers to the ability of a database to represent data in a 
unified way. Thus, it would be very good if data, knowledge (e.g. schemas), queries (or more generally, 
goals) and models (e.g. concepts) would have a unified representation. This would give us a better 
understanding of a data problem.
We should note that the three problems mentioned above are very relevant to the Scientific Discovery 
(SD) process using Artificial Intelligence (AI), and especially Machine Learning tools. This is because, 
much of the data available for SD is structural in nature, needs be often updated, can change easily in 
structure etc. Although most of the traditional machines learning algorithms work with data found in flat 
data representations, currently there is a considerable tendency to design intelligent learning algorithms that 
can deal with structural [Cook and Holder, 2000] or relational data [Getoor et. al., 2001]. However, these 
approaches, one based on graphs structures, the other on relational databases, do not take into consideration 
the problems [P1], [P2], [P3] described above. They work with data that does not change in time, and they 
are not dealing with any updates. Also, they do not address the unified representation problem. The 
problem they solve is to extract useful knowledge from data that is already represented as graphs or as 
relations. In [Getoor et. al., 2001], data and query do not have a unified representation at all. Although, in 
[Cook and Holder, 2000] the authors regard both data and query as graphs, they are not able to put together 
data and knowledge. Furthermore, they are not interested in these structures from a databases perspective at 
all, but just from a learning perspective. However, we want to stress on the fact that more and more 
graphical representations are used in ML lately [Jordan, 1998; Pearl, 2000; Jensen, 2001]. Although quite 
distant from what we are doing here, this suggests the importance of being able to deal with the natural 
graph representation of information.
We aim at designing database structures that can take into account problems [P1], [P2], [P3], facilitate 
the application of learning and reasoning methods on these structures, and also be able to reuse data that is 
already stored in relational databases. We call our newly designed structures graph databases, and we think 
about them as being at the core of information storage, information integration, and information 
exploration. The big picture showing how all these components come into place into the DS process is 
shown in Figure 1.
We are not interested in the integration process in this paper. We can assume that the available 
information is already integrated [Reinoso et al., 2001]. The problem that we are trying to solve is how to 
represent this integrated available information in a way that makes it easier to accommodate changes, 
updates, and also provides a unified representaion for data and knowledge, so that the learning process is 
facilitated. 
K N O W L E D G E 
S O U R C E S
L IN K 
S O U R C E S
D A T A 
S O U R C E S
A V A IL A B L E 
IN F O R M A T IO N
IN T H E W O R L D
... ... ...
In fo rm a tio n 
In te g ra ti o n
D is c o ve r y & M o d e l li n g Q u e r y = G o a l S p e c i fic a ti o n
U S E R
In fo rm a ti o n 
E xp lo ita ti o n In fo rm a tio n
IN F O R M A T IO N IN T E G R A T IO N
Figure 1: Big Picture of the Discovery Science Process
2. Related Work 
We mentioned above that RDBMS systems are somewhat obsolete regarding the ability to represent, 
store and manipulate complex information that can change in time. Thus, they might require very 
sophisticated and expensive operations, such as renormalization, reindexing etc., which may not be 
performed automatically. Schema renormalization in such cases is nor desirable neither easy to do.
Therefore, the attempts to solve these problems depart from the traditional approach to databases, 
namely RDBMS, leading into the realm of Object-Oriented and XML technologies. The new approaches 
can deal with increasingly complex data, namely objects in Object-Oriented Databases and trees in XML. 
However, neither the objects nor the trees can naturally represent graphs, which are the most general data 
structure. Subsequently, they do not support natural queries on graphs. Moreover, when changes in schema 
of the data sources occur, these approaches may lead to major restructuring (redesigning) of global schema. 
We briefly mention some of the existing older and newer databases systems, showing how they 
address the problems [P1], [P2] and [P3] that we are interested in. We do not go into too many details, 
because, as we mentioned, we are not only interested into database system appropriate for representing, 
storing and querying data, but also in systems that facilitate the information extraction process. On the 
other hand, we do not talk more about graphical learning methods [Jordan, 1998; Pearl, 2000; Jensen, 2001; 
Cook and Holder, 2000; Getoor et. al., 2001] because they assume fixed representations, not appropriate for changes, 
being concerned only with the learning part. They will be relevant to a discussion about how to use GDB for learning, 
which is beyond the purpose of this current paper, but very probable in future work.
To be more precise, we are not aware of any system that tries to bridge the gap between representation and 
learning.

 

------分隔线----------------------------

锋哥公众号


锋哥微信


关注公众号
【Java资料站】
回复 666
获取 
66套java
从菜鸡到大神
项目实战课程

锋哥推荐