失效链接处理 |
visualizing_data PDF 下载
本站整理下载:
提取码:yrd2
相关截图:
![]()
主要内容:
A Combination of Many Disciplines
Given the complexity of data, using it to provide a meaningful solution requires
insights from diverse fields: statistics, data mining, graphic design, and information
visualization. However, each field has evolved in isolation from the others.
Thus, visual design—-the field of mapping data to a visual form—typically does not
address how to handle thousands or tens of thousands of items of data. Data mining
techniques have such capabilities, but they are disconnected from the means to interact with the data. Software-based information visualization adds building blocks for
interacting with and representing various kinds of abstract data, but typically these
methods undervalue the aesthetic principles of visual design rather than embrace their
strength as a necessary aid to effective communication. Someone approaching a data
representation problem (such as a scientist trying to visualize the results of a study
involving a few thousand pieces of genetic data) often finds it difficult to choose a representation and wouldn’t even know what tools to use or books to read to begin.
Process
We must reconcile these fields as parts of a single process. Graphic designers can learn
the computer science necessary for visualization, and statisticians can communicate
their data more effectively by understanding the visual design principles behind data
representation. The methods themselves are not new, but their isolation within individual fields has prevented them from being used together. In this book, we use a process that bridges the individual disciplines, placing the focus and consideration on how
data is understood rather than on the viewpoint and tools of each individual field.
The process of understanding data begins with a set of numbers and a question. The
following steps form a path to the answer:
Acquire
Obtain the data, whether from a file on a disk or a source over a network.
Parse
Provide some structure for the data’s meaning, and order it into categories.
Filter
Remove all but the data of interest.
Mine
Apply methods from statistics or data mining as a way to discern patterns or
place the data in mathematical context.
Represent
Choose a basic visual model, such as a bar graph, list, or tree.
Refine
Improve the basic representation to make it clearer and more visually engaging.
Interact
Add methods for manipulating the data or controlling what features are visible.
6 | Chapter 1: The Seven Stages of Visualizing Data
Of course, these steps can’t be followed slavishly. You can expect that they’ll be
involved at one time or another in projects you develop, but sometimes it will be four
of the seven, and at other times all of them.
Part of the problem with the individual approaches to dealing with data is that the
separation of fields leads to different people each solving an isolated part of the problem. When this occurs, something is lost at each transition—like a “telephone game”
in which each step of the process diminishes aspects of the initial question under
consideration. The initial format of the data (determined by how it is acquired and
parsed) will often drive how it is considered for filtering or mining. The statistical
method used to glean useful information from the data might drive the initial presentation. In other words, the final representation reflects the results of the statistical
method rather than a response to the initial question.
Similarly, a graphicdesigner brought in at the next stage will most often respond to
specific problems with the representation provided by the previous steps, rather than
focus on the initial question. The visualization step might add a compelling and
interactive means to look at the data filtered from the earlier steps, but the display is
inflexible because the earlier stages of the process are hidden. Furthermore,
practitioners of each of the fields that commonly deal with data problems are often
unclear about how to traverse the wider set of methods and arrive at an answer.
This book covers the whole path from data to understanding: the transformation of a
jumble of raw numbers into something coherent and useful. The data under consideration might be numbers, lists, or relationships between multiple entities.
It should be kept in mind that the term visualization is often used to describe the art
of conveying a physical relationship, such as the subway map mentioned near the
start of this chapter. That’s a different kind of analysis and skill from information
visualization, where the data is primarily numericor symbolic(e.g., A, C, G, and T—
the letters of geneticcode—and additional annotations about them). The primary
focus of this book is information visualization: for instance, a series of numbers that
describes temperatures in a weather forecast rather than the shape of the cloud cover
contributing to them.
An Example
To illustrate the seven steps listed in the previous section, and how they contribute
to effective information visualization, let’s look at how the process can be applied to
understanding a simple data set. In this case, we’ll take the zip code numbering system that the U.S. Postal Service uses. The application is not particularly advanced,
but it provides a skeleton for how the process works. (Chapter 6 contains a full
implementation of the project.)
An Example | 7
What Is the Question?
All data problems begin with a question and end with a narrative construct that provides a clear answer. The Zipdecode project (described further in Chapter 6) was
developed out of a personal interest in the relationship of the zip code numbering
system to geographicareas. Living in Boston, I knew that numbers starting with a
zero denoted places on the East Coast. Having spent time in San Francisco, I knew
the initial numbers for the West Coast were all nines. I grew up in Michigan, where
all our codes were four-prefixed. But what sort of area does the second digit specify?
Or the third?
The finished application was initially constructed in a few hours as a quick way to
take what might be considered a boring data set (a long list of zip codes, towns, and
their latitudes and longitudes) and create something engaging for a web audience
that explained how the codes related to their geography.
Acquire
The acquisition step involves obtaining the data. Like many of the other steps, this
can be either extremely complicated (i.e., trying to glean useful data from a large system) or very simple (reading a readily available text file).
A copy of the zip code listing can be found on the U.S. Census Bureau web site, as it
is frequently used for geographic coding of statistical data. The listing is a freely
available file with approximately 42,000 lines, one for each of the codes, a tiny portion of which is shown in Figure 1-1.
|