The Log-Structured Merge-Tree (LSM-Tree) PDF 下载_Java知识分享网-免费Java资源下载

失效链接处理

The Log-Structured Merge-Tree (LSM-Tree) PDF 下载

本站整理下载：

链接：https://pan.baidu.com/s/1ovmEBGLktOhSiM_mRLnOZg

提取码：ziak

相关截图：

主要内容：

1. Introduction

As long-lived transactions in activity flow management systems become commercially available

([10], [11], [12], [20], [24], [27]), there will be increased need to provide indexed access

to transactional log records. Traditionally, transactional logging has focused on aborts and recovery, and has required the system to refer back to a relatively short-term history in normal

processing with occasional transaction rollback, while recovery was performed using batched

sequential reads. However, as systems take on responsibility for more complex activities, the

duration and number of events that make up a single long-lived activity will increase to a point

where there is sometimes a need to review past transactional steps in real time to remind users

of what has been accomplished. At the same time, the total number of active events known to a

system will increase to the point where memory-resident data structures now used to keep

track of active logs are no longer feasible, notwithstanding the continuing decrease in memory

cost to be expected. The need to answer queries about a vast number of past activity logs implies

that indexed log access will become more and more important.

1Dept. of Math & C.S, UMass/Boston, Boston, MA 02125-3393, {poneil | eoneil}@cs.umb.edu

2Digital Equipment Corporation, Palo Alto, CA 94301, edwardc@pa.dec.com

3Oracle Corporation, Redwood Shores, CA, dgawlick@us.oracle.com

-1-

Even with current transactional systems there is clear value in providing indexing to support

queries on history tables with high insert volume. Networking, electronic mail, and other

nearly-transactional systems produce huge logs often to the detriment of their host systems. To

start from a concrete and well-known example, we explore a modified TPC-A benchmark in the

following Examples 1.1 and 1.2. Note that examples presented in this paper deal with specific

numeric parametric values for ease of presentation; it is a simple task to generalize these

results. Note too that although both history tables and logs involve time-series data, the index

entries of the LSM-Tree are not assumed to have indentical temporal key order. The only assumption for improved efficiency is high update rates compared to retrieval rates.

The Five Minute Rule

The following two examples both depend on the Five Minute Rule [13]. This basic result states

that we can reduce system costs by purchasing memory buffer space to keep pages in memory,

thus avoiding disk I/O, when page reference frequency exceeds about once every 60 seconds. The

time period of 60 seconds is approximate, a ratio between the amortized cost for a disk arm

providing one I/O per second and memory cost to buffer a disk page of 4 KBytes amortized over

one second. In terms of the notation of section 3, the ratio is COSTP/COSTm divided by the page

size in Mbytes. Here we are simply trading off disk accesses for memory buffers while the

tradeoff gives economic gain. Note that the 60 second time period is expected to grow over the

years as memory prices come down faster than disk arms. The reason it is smaller now in 1995

than when defined in 1987 when it was five minutes, is partly technical (different buffering

assumptions) and partly due to the intervening introduction of extremely inexpensive massproduced disks.

Example 1.1. Consider the multi-user application envisioned by the TPC-A benchmark [26]

running 1000 transactions per second (this rate can be scaled, but we will consider only 1000

TPS in what follows). Each transaction updates a column value, withdrawing an amount Delta

from a Balance column, in a randomly chosen row containing 100 bytes, from each of three

tables: the Branch table, with 1000 rows, the Teller table with 10,000 rows, and the Account

table, with 100,000,000 rows; the transaction then writes a 50 byte row to a History table

before committing, with columns: Account-ID, Branch-ID, Teller-ID, Delta, and Timestamp.

Accepted calculations projecting disk and memory costs shows that Account table pages will not

be memory resident for a number of years to come (see reference [6]), while the Branch and

Teller tables should be entirely memory resident now. Under the assumptions given, repeated

references to the same disk page of the Accounts table will be about 2,500 seconds apart, well

below the frequency needed to justify buffer residence by the Five Minute rule. Now each

transaction requires about two disk I/Os, one to read in the desired Account record (we treat the

rare case where the page accessed is already in buffer as insignificant), and one to write out a

prior dirty Account page to make space in buffers for a read (necessary for steady-state behavior). Thus 1000 TPS will correspond to about 2000 I/Os per second. This requires 80 disk

arms (actuators) at the nominal rate of 25 I/Os per disk-arm-second assumed in [13]. In the

8 years since then (1987 to 1995) the rate has climbed by less than 10%/year so that the

nominal rate is now about 40 I/Os per second, or 50 disk arms for 2000 I/Os per second. The

cost of disk for the TPC application was calculated to be about half the total cost of the system in

[6], although it is somewhat less on IBM mainframe systems. However, the cost for supporting

I/O is clearly a growing component of the total system cost as the cost of both memory and CPU

drop faster than disk.

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦！

Python学习路线图

The Log-Structured Merge-Tree (LSM-Tree) PDF 下载

Java1234官方群25：
Java1234官方群25：	838462530