失效链接处理 |
The Log-Structured Merge-Tree (LSM-Tree) PDF 下载
本站整理下载:
相关截图:
主要内容:
1. Introduction
As long-lived transactions in activity flow management systems become commercially available
([10], [11], [12], [20], [24], [27]), there will be increased need to provide indexed access
to transactional log records. Traditionally, transactional logging has focused on aborts and recovery, and has required the system to refer back to a relatively short-term history in normal
processing with occasional transaction rollback, while recovery was performed using batched
sequential reads. However, as systems take on responsibility for more complex activities, the
duration and number of events that make up a single long-lived activity will increase to a point
where there is sometimes a need to review past transactional steps in real time to remind users
of what has been accomplished. At the same time, the total number of active events known to a
system will increase to the point where memory-resident data structures now used to keep
track of active logs are no longer feasible, notwithstanding the continuing decrease in memory
cost to be expected. The need to answer queries about a vast number of past activity logs implies
that indexed log access will become more and more important.
1Dept. of Math & C.S, UMass/Boston, Boston, MA 02125-3393, {poneil | eoneil}@cs.umb.edu
2Digital Equipment Corporation, Palo Alto, CA 94301, edwardc@pa.dec.com
3Oracle Corporation, Redwood Shores, CA, dgawlick@us.oracle.com
-1-
Even with current transactional systems there is clear value in providing indexing to support
queries on history tables with high insert volume. Networking, electronic mail, and other
nearly-transactional systems produce huge logs often to the detriment of their host systems. To
start from a concrete and well-known example, we explore a modified TPC-A benchmark in the
following Examples 1.1 and 1.2. Note that examples presented in this paper deal with specific
numeric parametric values for ease of presentation; it is a simple task to generalize these
results. Note too that although both history tables and logs involve time-series data, the index
entries of the LSM-Tree are not assumed to have indentical temporal key order. The only assumption for improved efficiency is high update rates compared to retrieval rates.
The Five Minute Rule
The following two examples both depend on the Five Minute Rule [13]. This basic result states
that we can reduce system costs by purchasing memory buffer space to keep pages in memory,
thus avoiding disk I/O, when page reference frequency exceeds about once every 60 seconds. The
time period of 60 seconds is approximate, a ratio between the amortized cost for a disk arm
providing one I/O per second and memory cost to buffer a disk page of 4 KBytes amortized over
one second. In terms of the notation of section 3, the ratio is COSTP/COSTm divided by the page
size in Mbytes. Here we are simply trading off disk accesses for memory buffers while the
tradeoff gives economic gain. Note that the 60 second time period is expected to grow over the
years as memory prices come down faster than disk arms. The reason it is smaller now in 1995
than when defined in 1987 when it was five minutes, is partly technical (different buffering
assumptions) and partly due to the intervening introduction of extremely inexpensive massproduced disks.
Example 1.1. Consider the multi-user application envisioned by the TPC-A benchmark [26]
running 1000 transactions per second (this rate can be scaled, but we will consider only 1000
TPS in what follows). Each transaction updates a column value, withdrawing an amount Delta
from a Balance column, in a randomly chosen row containing 100 bytes, from each of three
tables: the Branch table, with 1000 rows, the Teller table with 10,000 rows, and the Account
table, with 100,000,000 rows; the transaction then writes a 50 byte row to a History table
before committing, with columns: Account-ID, Branch-ID, Teller-ID, Delta, and Timestamp.
Accepted calculations projecting disk and memory costs shows that Account table pages will not
be memory resident for a number of years to come (see reference [6]), while the Branch and
Teller tables should be entirely memory resident now. Under the assumptions given, repeated
references to the same disk page of the Accounts table will be about 2,500 seconds apart, well
below the frequency needed to justify buffer residence by the Five Minute rule. Now each
transaction requires about two disk I/Os, one to read in the desired Account record (we treat the
rare case where the page accessed is already in buffer as insignificant), and one to write out a
prior dirty Account page to make space in buffers for a read (necessary for steady-state behavior). Thus 1000 TPS will correspond to about 2000 I/Os per second. This requires 80 disk
arms (actuators) at the nominal rate of 25 I/Os per disk-arm-second assumed in [13]. In the
8 years since then (1987 to 1995) the rate has climbed by less than 10%/year so that the
nominal rate is now about 40 I/Os per second, or 50 disk arms for 2000 I/Os per second. The
cost of disk for the TPC application was calculated to be about half the total cost of the system in
[6], although it is somewhat less on IBM mainframe systems. However, the cost for supporting
I/O is clearly a growing component of the total system cost as the cost of both memory and CPU
drop faster than disk.
|