失效链接处理 |
Advanced_Design_and_Implementation_of_Virtual_Machines PDF 下载
本站整理下载:
相关截图:
主要内容:
I
n this chapter, we introduce the concept of the virtual machine. Virtual machines have
been developed for decades in various forms. They became known to normal developers
in 1995 when Sun Microsystem published the Java programming language and the associated Java virtual machine (JVM).
1.1 TYPES OF VIRTUAL MACHINES
Virtual machine is a computing system. The ultimate goal of a computing system is to execute programmed logics. The logics can be expressed at a very low level with all the details
of an actual computer, or at a very high level with scripting or markup language. From this
perspective, virtual machines can be broadly categorized into four types according to the
level of abstraction and scope of emulation.
Type 1. Full instruction set architecture (ISA) virtual machine provides a full computer
system’s ISA emulation or virtualization. Guest operating system and applications
can run on the top of the virtual machine as on an actual computer (e.g., VirtualBox,
QEMU, and XEN).
Type 2. Application Binary Interface (ABI) virtual machine provides a guest process
ABI emulation. Applications against that ABI can run in the process side by side
with other processes of native ABI applications (e.g., Intel’s IA-32 Execution Layer on
Itanium, Transmeta’s Code Morphing for X86 emulation, and Apple’s Rosetta translation layer for PowerPC emulation).
Type 3. Virtual ISA virtual machine provides a runtime engine so that applications
coded in the virtual ISA can execute on it. Virtual ISA usually defines a high level
and limited scope of ISA semantics, so it does not require the virtual machine to
4 ◾ Advanced Design and Implementation of Virtual Machines
emulate a full computer system (e.g., Sun Microsystem’s JVM, Microsoft’s Common
Language Runtime, and Parrot Foundation’s Parrot virtual machine).
Type 4. Language virtual machine provides a runtime engine that executes programs
expressed in a guest language. The programs are usually presented to the virtual
machine in source form of the guest language, without being fully compiled into
machine code beforehand. The runtime engine needs to interpret or translate the program and also fulfill certain functionalities that are abstracted by the language such
as memory management (e.g., the runtime engines for Basic, Lisp, Tcl, and Ruby).
The boundaries between virtual machine types are not clear-cut. There are many virtual
machine designs crossing the boundaries. For example, a language virtual machine can
also employ the technique of a virtual ISA virtual machine by compiling the program into
a kind of virtual ISA and then executing the code on a virtual machine of that virtual ISA.
Still it is meaningful to categorize the virtual machine types so as to facilitate community
communications.
The first two types of virtual machines are of ISA or ABI emulation. Their goal is to run
existing guest operating systems or guest applications that are developed for ISA or ABI
other than the host native one. Sometimes, they are also called emulators.
The other two types of virtual machines are of language runtime engines whose goal is
to execute the logics programmed in the form of virtual ISA or guest language. In some
context, virtual ISA is considered a special kind of language; apart from that, there is no
essential difference between the two types of language runtime engines.
The topic of this book is the language runtime engines. The key phrase “virtual machine”
in the following chapters refers only to language runtime engine unless otherwise stated, and
“runtime engine” can be used interchangeably as “virtual machine.” “Runtime engine” is so
called because the services provided by the virtual machine are mostly only available at runtime. As a comparison, in the traditional setting of “compiler + operating system,” applications are compiled statically by a compiler before its distribution. For the same reason, some
people use “runtime system” to refer to the services available at runtime that enables a software to execute.
1.2 WHY VIRTUAL MACHINE?
Virtual machines are indispensable to modern programming. They help (computer) security, (programming) productivity, and (application) portability.
Virtual machines are necessary for safe languages. Safe language is a very broad term
here and mainly refers to the language that has properties of memory safety, operation
safety, and control safety. With a safe language, it is easier to catch program bugs or execution errors early and safely.
1. Memory safety ensures that a certain type of data in the memory always follow the
restrictions of that type. For example, a variable of pointer type never holds an illegal
pointer; an array never has elements out of bound.
Introduction of the Virtual Machine ◾ 5
2. Operation safety ensures that the operations on a certain type of data always follow
the restrictions of that type. For example, a variable of pointer type does not allow
arbitrary arithmetic operations on it.
3. Control safety ensures that the flow of code execution never reach any point that
either gets stuck or goes wild, for example, jump to a malicious code segment. Control
safety can be considered a special kind of operation safety.
Almost all modern languages such as Java, C#, Java bytecode, Microsoft Intermediate
Language, and JavaScript are safe languages, although their individual safety extents can
be different.
To support a safe language, a virtual machine is necessary because the safe language
itself cannot fulfill all the safety requirements. For example, the program should not
directly allocate a piece of memory that has no type associated; it needs the assistance of a
virtual machine to provide the typed memory for it, such as a certain type of object.
Virtual machine provides “management” on the code and data of the safe language.
Therefore, the code and data sometimes are called “managed code” and “managed data.” In
turn, the virtual machine is sometimes also called “managed runtime,” “managed system,”
or “managed execution environment.”
Since it is harder for a program written in a safe language to be attacked by a malicious
code, virtual machine is sometimes employed in security sandboxing. One example is the
Google Chrome NaCl technique.
Since a safe language can catch program bugs or execution errors early and safely at the
compile-time or runtime, it largely improves developer’s productivity.
Virtual machine helps portability in the sense that the virtual ISA or guest language is
not tied to any specific native ISA or ABI definition. Applications in virtual ISA or guest
language can run on any systems that have the virtual machine deployed. Another perspective of portability is that many applications written in other programming languages
choose to compile to the virtual ISA or guest language rather than the machine native code
directly because then they can benefit from the virtual machine’s various properties such
as portability, performance, and security.
Virtual machine can be designed to support unsafe languages too, but that is only an
extension rather than the original design purpose. An unsafe language is used to facilitate
the safe language to access low-level resources or to reuse legacy code written in the unsafe
language.
1.3 VIRTUAL MACHINE EXAMPLES
A virtual machine, as the runtime engine of the guest language, can be categorized according to the implementation of its execution engine. An execution engine is the component
that expresses the applications’ operational semantics. The two basic execution engines are
interpretation and compilation.
With interpretation, there is usually no machine code generated from the application code. The application code is parsed by an interpreter into certain form of internal
6 ◾ Advanced Design and Implementation of Virtual Machines
representation that can express the program’s semantics, based on the syntax specification
of the guest language, and then the execution engine manipulates the program’s states
(i.e., executes the code) by following the operational semantics of the internal representation.
With compilation, the application code is also parsed syntactically, but is then translated into the machine code according to the operational semantics. Later the machine
code is executed by the host machine through which application states are manipulated.
There is no strict boundary between the two types of virtual machines. It is quite common for the interpreter-based virtual machine to compile the application code in one guest
language into the code of another guest language and then interpret it. The code of another
guest language is usually called “intermediate representation” (IR) in the compiler community. It is also common for a virtual machine to execute a piece of the application code
with interpretation and then do the next piece with compilation.
A virtual machine can be implemented in software or hardware or both combined.
Some hardware is designed to directly execute the virtual ISA instructions, which is no
longer a virtual machine since the virtual ISA is no longer virtual. Conventionally, it is still
called virtual machine but implemented in hardware.
Since almost all modern programming languages rely on a virtual machine, it is no surprise that a user probably cannot live without one virtual machine or two. The following
are some of the examples.
1.3.1 JavaScript Engine
The most commonly used virtual machine can be the one for JavaScript in web browsers.
For example, Google Chrome has V8 JavaScript engine; Mozilla Firefox has SpiderMonkey;
Apple Safari has JavaScriptCore; and Microsoft Internet Explorer has Chakra. Each of
them has been developed independently and adopted different techniques to accelerate
JavaScript code execution.
SpiderMonkey is the name of the world’s first JavaScript engine. Firefox has evolved it
from a purely interpretation-based virtual machine into a compiler-based engine through
projects such as TraceMonkey, JägerMonkey, and IonMonkey. The current version of
SpiderMonkey as of year 2015 translates the JavaScript code into its IR in the form of
bytecode and then invokes IonMonkey to compile the bytecode into the machine code.
Internally, IonMonkey, as a traditional static compiler, builds up a control flow graph
(CFG) with a static single assignment (SSA) representation so as to make advanced optimizations possible.
1.3.2 Perl Engine
Another kind of widely used virtual machines are for traditional scripting languages such
as Unix shell, Windows PowerShell, Perl, Python, and Ruby. They are called scripting languages because they are commonly used in an interactive way of “type and run,” and with
a fast development turnaround. Interactive execution means the program executes one line
of code then waits for the programmer’s input to execute the next line of code. Scripting
languages are also commonly used to batch or automate the execution of a sequence of
tasks.
|