...

Visualization of maintenance data to facilitate analysis and promote lifetime management of

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Visualization of maintenance data to facilitate analysis and promote lifetime management of
LIU-ITN-TEK-A--15/014--SE
Visualization of maintenance
data to facilitate analysis and
promote lifetime management of
gas turbines
Jonas Petersson
2015-06-09
Department of Science and Technology
Linköping University
SE- 6 0 1 7 4 No r r köping , Sw ed en
Institutionen för teknik och naturvetenskap
Linköpings universitet
6 0 1 7 4 No r r köping
LIU-ITN-TEK-A--15/014--SE
Visualization of maintenance
data to facilitate analysis and
promote lifetime management of
gas turbines
Examensarbete utfört i Medieteknik
vid Tekniska högskolan vid
Linköpings universitet
Jonas Petersson
Handledare Katerina Vrotsou
Examinator Aida Nordman
Norrköping 2015-06-09
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –
under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för
ickekommersiell forskning och för undervisning. Överföring av upphovsrätten
vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for your own use and to
use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity,
please refer to its WWW home page: http://www.ep.liu.se/
© Jonas Petersson
Abstract
This report documents the work and result of a master thesis in Media Technology and
Engineering at Linköping University conducted in collaboration with Siemens Industrial
Turbomachinery AB. The aim of the project was to develop an interactive visualization
application to be used for data exploration. The purpose of the application is to provide
Siemens with valuable insights about how different configurations of their gas turbines
affect the lifetime of the machines and their components. The result is a JavaScript based
web application, ViSITelligence, that allows employees at Siemens to explore the data in
order to discover patterns and relationship between different settings in the configuration
of the turbines. ViSITelligence has been developed through an agile process with usability
and perception in mind, and facilitates the answering of questions and the emergence of
new ones.
Acknowledgements
I would like to thank Davood Naderi, Daniel Dagnelund and Pontus Slottner, all at
Siemens, for their support and feedback throughout the project. I would also like to
thank my supervisor Katerina Vrotsou and my examiner Aida Nordman for their guidance
throughout the work. A special thanks goes to Martina Norlin for her encouragement
and endless support.
Contents
1 Introduction
1.1 Siemens . . . .
1.2 Background and
1.3 Objectives . . .
1.4 Limitations . .
1.5 Thesis outline .
. . . . .
problem
. . . . .
. . . . .
. . . . .
. . . . . . .
description
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Theoretical Background
2.1 Perception . . . . . . . . . . . . . . . . . . . . .
2.1.1 Preattentive processing . . . . . . . . . .
2.1.2 Gestalt Laws . . . . . . . . . . . . . . .
2.2 Usability . . . . . . . . . . . . . . . . . . . . . .
2.2.1 User-interface design guidelines . . . . .
2.3 Information Visualization . . . . . . . . . . . .
2.3.1 Visualization Stages . . . . . . . . . . .
2.3.2 Data Types . . . . . . . . . . . . . . . .
2.3.3 The Visual Information Seeking Mantra
2.3.4 Visualization techniques . . . . . . . . .
2.3.5 Interaction techniques . . . . . . . . . .
2.4 Related work . . . . . . . . . . . . . . . . . . .
3 Implementation
3.1 The Development Process
3.1.1 Agile . . . . . . . .
3.1.2 Prototypes . . . . .
3.2 Data extraction . . . . . .
3.3 Client side technologies . .
3.4 Application design . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
4
4
5
.
.
.
.
.
.
.
.
.
.
.
.
6
6
6
7
8
8
9
9
10
11
11
17
18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
21
21
22
24
24
4 ViSITelligence - Results
4.1 Data . . . . . . . . . . . . .
4.2 Options panels . . . . . . .
4.3 Visualization techniques . .
4.3.1 Scatter plot . . . . .
4.3.2 Histogram . . . . . .
4.3.3 Parallel sets . . . . .
4.4 Coordinated representations
4.5 Use case scenario . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
28
29
30
31
32
33
5 Discussion
5.1 Visualization techniques . .
5.2 Implementation . . . . . . .
5.2.1 Development process
5.2.2 Application design .
5.3 Interactivity . . . . . . . . .
5.4 Siemens . . . . . . . . . . .
.
.
.
.
.
.
35
35
36
36
36
37
37
6 Conclusions
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
39
Bibliography
43
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 Introduction
This report is the result of a Master Thesis carried out in the Master of Science in
Media Technology and Engineering program at Linköping University. The thesis has been
conducted in collaboration with Siemens Industrial Turbomachinery AB [1] and describes
an interactive application for visualization and analysis of turbines’ maintenance data.
1.1
Siemens
Siemens [2] is a global powerhouse in electronics and electrical engineering, actively operating in more than 190 countries and offering a wide range of pioneering products for
energy efficiency, industrial productivity, affordable healthcare and intelligent infrastructure, with a quickly growing focus on sustainability.
Siemens Industrial Turbomachinery AB, SIT AB, based in Finspäng is one of the
main settlements of Siemens in Sweden. SIT AB produces gas and steam turbines as well
as a full service program for both.
1.2
Background and problem description
As part of Siemens service program, maintenance facts are collected out of the performed
inspections on the operating gas turbines at different locations. The inspections can be
planned or unplanned considering the scope or the outage time. The maintenance facts
and details of the performed activities are reported regularly at the end of each maintenance event. These reports are produced based on pre-defined templates but there is no
common vocabulary to be used by the maintenance team. The lack of a common vocabulary on how to describe the findings of the inspection on the gas turbine components
and the severity of them results in multiple descriptions for the same type of damage.
Moreover, some inspectors might be more detailed than others in the documentation of
the findings which affects the data quality.
As a part of continuous improvement within Siemens service organization and to
provide a uniform description of the collected data, a taxonomy has been developed within
the recent years. The aim is to implement this taxonomy to digitize the data collection.
One of the objectives is to develop an interactive visualization tool that can be used to
evaluate the adequacy of this taxonomy. This evaluation will be done by mapping the
collected historical findings versus the developed taxonomy. These historical findings are
collected and stored in a database which has been developed for this purpose since the
last three years. Investigation of the data quality and finding the missing information are
the other aspects of this evaluation.
3
The other important usage of the interactive visualization tool is to explore the collected data to find patterns and relationships among different attributes of the configuration, location and the specific findings during inspections. In order to improve the
lifetime of the components, the engineering department is interested to find out if site
parameters (such as salt exposure, altitude and distance from sea), configuration parameters (such as fuel type and turbine model) and other relevant factors are correlated to
specific inspection findings portraying the lifetime of the turbines and their components.
The lifetime is defined as the occurrence of the remarks on the components, measured by
the number of planned and/or unplanned inspections and replacements in combination
with the extent of the remarks.
1.3
Objectives
The main objective of the thesis is to create an interactive visualization application
to represent the collected maintenance data. The purpose of the application is to allow
employees at Siemens to explore the data to provide them with valuable insights regarding
the lifetime of the turbines and their components. The application will be used to answer
existing questions regarding the lifetime as well as opening up for new questions that may
occur when exploring the data.
The thesis will examine appropriate visualization techniques to be used to represent
the provided multivariate data. The representations chosen should provide an overview
of the data as well as a more detailed view of it. Users should be able to interact with
the application to see relationships and patterns and to spot outliers in the data.
The focus of the thesis is to show the relationship primarily between categorical variables. However, the user should also, in a lesser extent, be able to discover possible
relationship and correlation between quantitative variables.
1.4
Limitations
With a given time frame for the thesis of 20 full time weeks the application’s functionality
has to be restricted. The following limitations will therefore act as constraints to the
project.
• The different remarks found on the inspected components are classified into five
failure modes that do not take into account the severity of the damages. The use
of failure modes reduces the level of detail provided for the remarks.
• The data set is not fully complete and consists of information from only two out of
five different types of gas turbines manufactured by SIT AB. Additionally, historical
inspection reports are reviewed partially and the collected facts only cover the life
supervised components.
• Only a subset of all available attributes within the database have been selected to
be used by the application.
• The application is web based and developed for modern web browsers supporting scalable vector graphics (SVG). Supported browsers include recent versions of
4
Chrome, Firefox, Safari and Opera. Internet Explorer 8 and earlier versions do not
support SVG and are therefore not considered.
• Smart phones and tablets will not be considered during the development process
and are out of the scope of this thesis.
• Because of time constrains, extensive usability tests are excluded. However, the
supervisor and the closest stakeholders at Siemens will be questioned about the
usability of the application.
1.5
Thesis outline
The thesis will be structured as follows. Chapter 2 contains the theory used as a foundation for the thesis, including usability, perception and information visualization. A
description of the implementation and development process can be found in chapter 3
together with the design and layout of the application. The resulting application and its
functionality is presented in chapter 4 with a discussion found in chapter 5. Chapter 6
consists of final thoughts and conclusions drawn by the thesis and future work to extend
and improve the application.
5
2 Theoretical Background
During development of any application, several design choices need to be made. Human
perception and usability are two of the most important aspects to be considered in order
to create a user-friendly and interactive visualization application. Understanding basics
of human perception and usability are useful when choosing visualization techniques to
represent different types of data. It also helps in choosing the proper interaction technique
to maximize the user experience. This chapter presents the theoretical background used
as a foundation for the project and introduces the theory behind the choices made for
the application.
2.1
Perception
According to Johnson [3] the human perception is influenced by at least three factors:
the past, the present and the future. These three factors can bias our perception at any
situation in multiple ways and can be explained as our experience, the current context
and our goals. Experience from similar situations affect the way the current situation is
perceived according to what is expected to happen when, for example, objects or events
are encountered. Placing the same object in two different contexts might leverage the
human perception into believing it is two different objects. For example, the length of a
line may be perceived differently depending on the context where it is used. The goals of
the task at hand also bias the perception. If the focus is on finding a particular object,
it is often easier to perceive it. Common for all sources of perceptual influences is that
they have an impact on user-interface design [3].
2.1.1
Preattentive processing
Preattentive processing is a step in the visual selection process of quickly detecting visual features. Preattentive processing precedes the focused attention - is noticed before
awareness of it - and occurs when a visual feature is detected in less than 200-250 milliseconds [4]. In figure 2.1, the preattentive visual features of hue, length and density
are presented. An object differing from all other objects, in terms of features, is easily
distinguished and pops out from the surrounding distracting objects. For most intense
effect, the distracting objects should be identical or at least very similar [5].
Situations when patterns do not pop out may occur when targeting an object based
on two features. This is referred to as a visual conjunctive search and particular objects
are often hard to see because our primary visual cortex only can be tuned for one feature.
For example, trying to identify either square shapes or blue objects in figure 2.2a can be
hard. Also trying to target objects with the same feature set but differently oriented or
6
(a) Hue. The red dot gets focus immediately.
(b) Length. The longest bar
is recognized early.
(c) Density. The High density area receives focus first.
Figure 2.1: Preattentive processing is an automatic selection process performed by the human
vision. Hue, length and density can be used to guide the user’s focus taking advantage of the
preattentive process.
target objects with similar colors are difficult [5], see figure 2.2b.
(a) Conjunctive search.
(b) Similar colors.
Figure 2.2: Visual conjunctive search is the difficulty of targeting objects by two features. In
2.2a, the blue square is not preattentively perceived because it is surrounded by blue and red
circles and squares. Furthermore, an object with similar color to the surrounding, distracting
objects, is not seen preattentively. In 2.2b, the reddish circle is easy to distinguish, but the bluish
circle is hard because the distracting objects have similar colors.
During the interaction with a representation, such as selecting or hovering with the
mouse over objects, preattentive processing should be considered. The selected objects
need to be quickly recognized. They can be detected preattentively by changing, for
example, their color, or other features [5].
2.1.2
Gestalt Laws
Gestalt laws are robust rules describing the way human vision perceives patterns and
groups of elements. One of the most useful Gestalt law in design is the law of proximity.
The law of proximity states that objects that are close to each other are perceived as a
group. In figure 2.3a, three groups of circles are depicted. The principle is useful when
designing control panels to divide different usages into groups separated by lines or extra
spacing [3, 6].
The similarity principle also conveys grouping of objects, with the difference of using
similarity instead of proximity. In figure 2.3b the objects are perceived as grouped in
rows due to their similarity [3, 6].
7
(a) Proximity.
(b) Similarity.
Figure 2.3: The Gestalt laws of proximity and similarity are about grouping objects. In (a) the
dots are perceived in three groups because of their relative spacing to each other. In (b) the
objects are grouped in rows because of their similarity. The circles are perceived as one row and
the stars as another and so on.
A third principle is the law of continuity which declares that it is easier for the human
mind to follow smooth and continuous elements rather than elements with abrupt changes
in the direction. In other words, lines are seen as following the smoothest path. Slider
controls are an example of this principle where the handle is depicting a value on a single
range rather than a divider between two different ranges [3, 6].
2.2
Usability
According to Krug [7], the first and most important law of usability is:
”Don’t make me think” - Steve Krug.
When looking at a web page, the user should be able to understand and use it without
any further thinking or try and error. It should be obvious and self-explanatory. Hence,
when developing a web application, a challenge is to get rid of all questions a user might
ask. What is this? Can I click on that? Based on the fact that users scan pages instead of
reading them, it is important to create a clear visual hierarchy on the page. Additionally,
the web page should be divided into clearly defined areas allowing users scanning the
page to decide which areas to focus on [7].
2.2.1
User-interface design guidelines
There are several guidelines for designing user interfaces, all based on human psychology
[3]. Prevent errors is one of the eight golden rules defined by Schneiderman and Plaisant
[8]. A system should be designed in a way that users cannot make serious errors. However,
as the inscription Sigmund Freud wrote on his portrait that there are no rules to protect
against all errors.
”There is no medicine against death, and against error no rule has been
found.” - Sigmund Freud.
8
One way to prevent some errors is to disable the areas that the user should not be able
to interact with at a certain time. If errors do occur, one should make sure that the
user receives feedback of the error and instructions for recovery. Furthermore, developers
should strive for consistency in their design by using, for example, a consistent color
scheme and text font throughout the application. Informative feedback should be offered
to the user on interaction like as described about tooltips in section 2.3.5. Additionally,
one needs to make sure users feel they are in control of the application they are using
and minimize the short-term memory load by keeping displays simple and following the
rule of thumb for information processing that humans can remember ”seven plus minus
two chunks of information” [8].
2.3
Information Visualization
Visualization is derived from the word ”visualize” which means to form a mental model
or mental image of something. Thus, visualization is the cognitive activity of the human
brain when images or data are interpreted [9]. Recently, however, the meaning of the
term ”visualization” has mostly been described as the graphical representation of data
[6].
The goal of visualization is to facilitate the understanding of the data by utilizing
the human visual system’s ability to find patterns and trends, as well as identify outliers.
One challenge of visualization involves the creation of appropriate and well-designed visual
representations which can be used to improve understanding, memorizing and decision
making [10]. The use of visualization, or graphical representations of data, may aid the
formation of hypotheses and the understanding of features of the data [6].
2.3.1
Visualization Stages
According to Ware [6] there are four basic stages in the process of data visualization,
connected through a set of feedback loops as shown in figure 2.4. The four stages are:
• Data gathering - This stage consists of the collection and storage of data. The data
is gathered from one or several sources and is part of the longest feedback loop.
• Data transformation - The data is preprocessed and transformed to reduce the
amount of data. Filtering is usually part of these transformations to remove irrelevant data and possibly reveal otherwise hidden aspects of it. Other transformations
may include restructuring of data into suitable data structures to ease future manipulation. This process of selecting data prior to the visual mapping is called data
exploration.
• Visual mapping - Selected data is mapped to visual cues, such as position, length
and area, through the use of algorithms. An example of such mapping could be
to map a pair of data values into a position in a two-dimensional space. Users are
often allowed to interact with these graphical representations for getting a better
understanding of the data. Common user interactions are to select and highlight
a subset of the data or filter out data not fulfilling given conditions. This kind of
user interaction is often referred to as view manipulation.
9
• Perceptual and cognitive processing - User interpretation of the information, involves
perceptual and cognitive processing, for gaining insight and solving the task at hand.
Figure 2.4: The process of data visualization involves four basic stages, which can be combined
as a pipeline. The user can interact with these stages by choosing how the data is gathered,
explore the data and manipulate the view through a set of feedback loops.
Visual representations often reveal problems with the gathered data and the gathering
process. Appropriate visual representations often highlight errors and artifacts, and are
therefore useful for examining the quality of the data [6].
2.3.2
Data Types
In general, data used in information visualization can be categorized into two categories
- quantitative and categorical data [11].
Quantitative Data
Quantitative data, or numerical data, is data in form of numbers that measures things.
Numerical data is useless unless it is used together with its related categorical value [12].
For example, the value 449,964 is useless unless its categorical value is provided, which
is the area of Sweden in square kilometers.
10
Categorical Data
As opposed to quantitative data, categorical data is often non-numerical. According
to Few [11, 12] the categorical data identifies what the quantitative data represents, and
comes with three fundamental types when used in graphs: Nominal, Ordinal and Interval.
• Nominal - Items in a nominal scale are discrete values without an intrinsic order
and are only differing in their names (that is, nominally). The items in a nominal
scale do not relate to one another in any particular way although they belong to
a common category. Examples of nominal scales are fruits (e.g. apples, bananas,
oranges) or regions (e.g. Sweden, USA, Russia). As stated by Yau [13] numbers can
be used with nominal scales in some cases, like for the number on a bus representing
the route on which it travels.
• Ordinal - Ordinal scales consist of items with an intrinsic order but as for nominal
data the individual items do not represent quantitative data. Examples of ordinal
scales involve rankings such as “First, Second, Third” or “Small, Medium, Large”.
Listing items in an ordinal scale out of sequence does not make sense and would
create confusion.
• Interval - Items in an interval scale has an intrinsic order like the items in an ordinal
scale, but for interval scales they represent quantitative values. An interval scale
is a quantitative scale that has been converted into a categorical scale by grouping
the values into smaller ranges of equal size. Interval scales often represent units of
time, such as year and month, although years and months are not always of equal
size.
2.3.3
The Visual Information Seeking Mantra
A basic principle to follow when creating visual representations is the Visual Information
Seeking Mantra: ”Overview first, zoom and filter, then details-on-demand”, by Schneiderman [14]. Craft and Cairns [15] discuss the importance of this mantra. An overview
provides a general display of the dataset, allowing users to get an understanding of the
data and spot relationships and patterns. Zoom and filter allows for a simplified view,
by selecting or deselecting subsets of the data to be shown or removed from the view.
Zooming and filtering can reduce the complexity of the display assisting in further investigation. Details-on-demand can be provided on mouse-over or selection of elements.
As mentioned in section 2.3.5, tooltip is a suitable way to provide the details when the
user is hovering the elements of a representation by the mouse. When hovering elements
with the mouse cursor in a representation, additional or detailed information about the
selection can be shown in a tooltip, as described in section 2.3.5.
2.3.4
Visualization techniques
Data can and should be represented in different ways, depending on the data itself and
the message to present. Common for all visualization techniques is though that the
data values are mapped to visual attributes such as position, size, shape and color. The
human brain is better to decode some visual attributes than others. The visual attributes
of position and length are more accurate encodings than area and color [9, 10, 16].
11
A set of techniques used for multivariate data visualization will be presented in this
section. Scatter plots, scatter plot matrices and parallel coordinates are often used for
quantitative data while the parallel sets and mosaic plot representations are mainly used
for categorical data. Other techniques for representing multivariate categorical data exist,
such as treemaps [10, 17], sunburst [10] and icicle plots [10]. However, they are mostly
used for displaying hierarchies so they are not discussed in this thesis.
Bar chart
A bar chart, see figure 2.5a, uses length to encode quantitative data. Each rectangle
represents an item in the dataset, where its height represents the measured value for
the category. Bar charts are most often used to display discrete data for comparison of
multiple categories [17].
Variations of bar charts exist. By grouping bars, see figure 2.5b, with the same
categorical variable multiple measures can be displayed for each category. Bars can also
be stacked on top of each other showing the relationship of each part to a whole. Stacked
bar charts, see figure 2.5c, are considered when both the total and its parts are important
for the message presented by the graph.
A third variation of the bar chart is called histogram. A histogram, see figure 2.5d,
shows the distribution of a dataset, by grouping measures into bins or ranges of equal
size and counting their occurrence. In a histogram, each bar represents a range and the
height of each bar represents the number of occurrences of categories within that range.
Histograms can facilitate in finding clusters and spot outliers in a dataset [16].
Location map
When the geographic location of a data point is of importance, a location map can
be used. In a location map, the data points are placed on a map according to their
corresponding latitude and longitude values. One additional dimension can be added by
encoding the size of the dots as a variable [16].
Scatter plot
A scatter plot, see figure 2.6, is a visualization technique used to represent data in a twodimensional space. Scatter plots use the most accurate visual attribute, namely position,
to encode data values. Shapes representing the data points are positioned in a Cartesian
coordinate system according to their values for each axis. For every data point, the value
of each axis represents the quantitative scale. By using different color, size and shapes
additional dimensions can be displayed in the representation. A scatter plot representing
bubbles of different size is called a bubble chart [17]. When size is used to represent a
dimension, the bubbles should be sized by their area and not by their radius, diameter
or circumference [13].
Scatter plots can be used to determine if there is a correlation between two dimensions
in the data, like if the value for one dimension increases then the value for the other
dimension also increases or decreases in a corresponding manner. They can also be used
to find outliers, data points differing from all other points, in the data [17].
12
(a) Bar chart.
(b) Grouped bar chart.
(c) Stacked bar chart.
(d) Histogram.
Figure 2.5: Different variations of bar charts exist. Common for all is that each object of a
dataset is represented by a rectangle, where rectangle’s height depicts its measure for the variable
on the y axis. In (a), a bar chart is showing three objects. In (b), rectangles are grouped in
pairs, representing two features of the same object. In (c), bars are stacked on top of each other
representing parts of a whole. The histogram in (d) is showing the distribution of a variable
with the number of occurrences as the height of each bar. The extent of possible values for the
variable has been divided into twenty equally sized intervals with one rectangle per interval.
Figure 2.6: The scatter plot is a multi-dimensional visualization technique displaying a circle
for each data item. The positions of the circles are determined by the values on the x and y
axes. Color are added to represent an additional dimension of the dataset.
Scatter plot matrix (SPLOM)
A scatter plot matrix, also called SPLOM, see figure 2.7, consists of multiple scatter
plots organized in a grid. As for scatter plots, SPLOMs are used to determine corre13
lation and relationships between data points, but not limited to only two dimensions
(provided that no additional dimension are mapped to, for example, the size). In a
SPLOM, the correlation between any pair of variables can be inspected. Patterns in the
pairwise relationships are easily observed, but with higher dimensions some patterns may
be unrecognized [10, 18].
Figure 2.7: The SPLOM consists of multiple scatter plots organized as a matrix. Each scatter
plot represents the relationship between a distinct pair of variables. Each relationship is shown
twice, but mirrored (above and below the diagonal (top left to bottom right)).
One disadvantage of the SPLOM is when the number of dimensions increases the
number of different pairwise relations increases rapidly. Also, the number of data points
needed is multiplied by the number of unique pairwise relations. For example, 600 data
points are required to visualize the pairwise relationship for 100 data points in four
dimensions which means 100 data points for each of the six unique pairwise combinations
of the dimensions [9].
The interaction techniques of brushing and linking, which will be described in section
2.3.5, can be applied to highlight interesting points in all views, and thus limit the number
of data points preattentively focused in the different views [18].
An alternative to using SPLOMs for representing multivariate data is to use a single
interactive scatter plot with the possibility to change what is represented by the axes and
filter the data [9]. However, using this approach, the relationships of the different pairs
cannot be compared directly.
14
Parallel coordinates
Parallel coordinates, see figure 2.8, is a technique used to represent multidimensional data.
The data are turned into sets of points with each point representing one dimension of the
dataset. The points for each set are placed on uniformly spaced parallel axes (instead
of orthogonal axes, like scatter plots), one for each dimension, and are connected by line
segments creating a polyline for each set [19, 20]. The parallel axes are independent
of each other, making it possible to display up to about 10-15 axes in the same view
[19]. Using 10-15 axes would, however, violate the design guideline of minimizing the
short-term memory load, described in section 2.2.1.
Figure 2.8: Parallel coordinates representation showing one polyline for each set of points connected to a particular data dimension. Colors are used to add an additional dimension to the
plot dividing the polylines into two different classes.
Parallel coordinates should not be thought of as a normal line graph where the slope
of the lines indicates change through time. Instead, the lines connect a series of data
points that measure multiple aspects of an entity, such as a fruit or region [21].
Limitations of the technique includes problem with analysis of correlation between all
but adjacent axes due to their parallel placement. Parallel coordinates representations
also often suffer from cluttering problems appearing already for medium-sized data sets,
resulting in an image that is hard to analyze for trends or structure [20, 22]. Instead, it
has one of its strengths when used interactively for analysis using techniques like brushing
[20], described in section 2.3.5. Other advantages of the parallel coordinates technique is
its capability to present many related dimensions in a limited space. Additionally, in the
absence of cluttering problem, relationships between results can easily be investigated
and trends of data become visible [20].
Parallel sets
Parallel sets, see figure 2.9, is a technique and interaction framework for mapping multidimensional categorical variables to visual entities. Parallel sets are influenced by parallel
15
coordinates and thus share their layout, with the difference that the point intersections
are replaced with sets of lines, originally boxes, representing the categories. The length
of each line corresponds to the frequency of the category it represents. By displaying
the frequencies in a discrete design model and having independent axes, the parallel sets
implementation combines the advantages of both frequency-based techniques and parallel
coordinates [19, 23].
Figure 2.9: Parallel sets representation with a ribbon for each unique combination of categories.
The blue and orange colors represent each category of the upper most dimension (Dimension1).
This increases the possibility to distinguish between the flows corresponding to the different
categories of the top dimension.
Using parallel sets in combination with a selection feature allows users to deselect
categories and dimensions to get a more detailed view of the interesting selection, which
follows the principle of the Visual Information Seeking Mantra [24] described in section
2.3.3. A parallel sets implementation provides an overview of the data and its flow or
distribution among the categories for multiple dimensions (see figure 2.9). By selecting
or deselecting categories or dimensions the data is filtered and the representation is only
showing the selected data. Details-on-demand are added by showing details of a selection
when users hover a flow in the representation. The possibility to rearrange axes grants
the permission of viewing the overview with different arrangements.
One drawback of the parallel sets implementation arises when there are several categories for a dimension or when they differ a lot in size. In those cases the many intersections can make it hard to interpret and compare the different categories [19].
16
Mosaic plot
Like parallel sets, mosaic plots use frequency-based techniques for representing multivariate categorical data. Mosaic plot, see figure 2.10, is a recursive space-subdivision
technique [25], which means that frequency measures of a category are divided into
subcategories. For example, a dataset containing information about an accident, with
the categories survived and gender can be divided into the subcategories yes/no and
male/female respectively. Each frequency value is mapped to the area of a rectangle, as
opposed to parallel sets where each value is mapped to the length of a line.
Multiple dimensions can be added to the representation increasing the number of
rectangles displayed. With increased number of dimensions, spacing is needed between
the rectangles to group the connected combinations of categories within dimensions. Displaying more variables makes the plot more difficult to interpret because the human brain
has difficulties to distinguish differences in area, especially when the areas are not aligned
according to a certain baseline and differ in aspect ratio [26].
Figure 2.10: Mosaic plot with each rectangle representing the frequency of pair of variables from
two dimensions in a dataset. The categories of one of the dimensions are divided into columns,
while the categories of the other dimension are grouped by color.
2.3.5
Interaction techniques
Filtering
Filtering allows for selection of a subset in the data to be focused in the view. Filtering
removes uninteresting data points from the view not fulfilling the criteria of the filter.
This reduces the cognitive effort required from the user to focus on a subset of the data
when all data points are visible [9].
17
Brushing and linking
Brushing is a technique for selecting data interactively with the mouse, for example by
filtering it. Brushing is commonly combined with linking, which allows the selection to
be displayed in other views of the same dataset [20].
Tooltip
A tooltip, see figure 2.11, is an information box showing additional information about a
selection when the user selects or mouse-over an element in a representation. Tooltips
can be used to show details of the selection, such as the exact values it represents.
Figure 2.11: A tooltip shown when hovering with the mouse over an element in a scatter plot.
The tooltip provides detailed information about the element and its value on each axis.
Zooming and panning
Zooming gives different levels of detail to a visualization technique. By zooming, a subset
of the data set can be displayed more accurately revealing information that otherwise may
be hidden when displayed as an overview [20]. Panning is often used in combination with
zooming to allow users to change the area of the representation currently zoomed.
2.4
Related work
Applications for representing multidimensional data have been created before. The applications are chosen because they are used to help users discover valuable information and
patterns in multidimensional data using visualization techniques. The visualization techniques used in these applications are either similar to the ones chosen for the application
presented in the thesis or applicable to the datasets used. Several studies involving the
representation of multidimensional categorical data has been conducted [24, 27]. These
studies describe applications representing categorical data using parallel sets. The applications include several features that can be useful for representing multidimensional
data. In one of the studies [27] aggregated cancer registry data is presented in a parallel
sets representation implemented for the web. The representation uses curved flows to
increase traceability and also allow users to highlight a flow of interest. However, the
18
application lacks the possibility of filtering among dimensions and categories to remove
uninteresting parts of the data for the current task. The other study [24] uses an application developed in Java for presenting the result of a service awareness campaign,
the steps of a record cleanup process and profiling data from a bank. This application
provides filtering functionality to be able to explore sub parts of the dataset, but opposed
to the web application in the former study, the ribbons are straight instead of curved.
Additionally, a Java application needs to be installed on all clients that will use it. The
applications mentioned in the studies above only use one type of representation (parallel
sets) to present the data which limits the possibilities to display multiple aspects of the
data set, which is the key requirement for this project.
Another application for multidimensional data visualization, also implemented in
Java, is PRISMA [28]. PRISMA incorporates the use of coordinated views to represent multiple aspects of a dataset. With coordinated views, interactions in one view will
be extended to the other views to display or highlight parts of the data. This is a useful
feature for data exploration and fulfills The Visual Information Seeking Mantra presented
in section 2.3.3. The visualization techniques implemented in the application are: scatter
plot, treemap and parallel coordinates. Treemap is a space-subdivision technique (like
mosaic plots described in section 2.3.4) that represents hierarchical data. Using a treemap
for categorical data without a hierarchy would need the use of multiple rectangles for the
same category, but as sub areas of a larger area. The parallel coordinates representation has its strength in showing relationships and correlation between quantitative data
attributes. With the primary focus of this thesis to represent categorical data there are
better alternatives.
Mondrian [29] is a Java application for data visualization. Mondrian offers mosaic
plots and parallel coordinates for high dimensional data visualization together with standard plots like histograms, bar charts, scatter plots and maps. Similar to PRISMA, it
uses coordinated views to allow for advanced data analysis. The disadvantage of a Java
application has already been discussed and the same for parallel coordinates. In addition
to a traditional parallel coordinates representation Mondrian offers a version that uses
boxplots as the axes to display how the values are distributed over each axis. A boxplot is
not very effective in use with categorical data when there are only a few possible values.
The mosaic plot is useful when displaying up to four or five dimensions. When more
dimensions need to be displayed, it gets harder to interpret the area of each rectangle as
well as keeping in mind what dimension and category it represents. Mosaic plots often use
tooltips instead of labels around the plot because as the number of dimensions increases
the space needed for the labels is also increased. Moreover, if users want to know the
total value of a category as part of the whole they need to summarize the values for each
rectangle representing the category. It has to be provided by the tooltip or by another
visualization technique, for example a bar chart. Histograms, bar charts and scatter plots
are useful for comparing values of different entities, spot outliers and correlation. With
limited space, a large number of entities is hard to display in a bar chart. Mondrian
uses scroll bars to allow users to scroll through all the rectangles. With different types of
sorting some interpretation is possible, but it still misses the possibility of an overview
and comparison of certain rectangles not closely positioned. An alternative provided by
the application is histograms, to show a distribution as described in section 2.3.4 about
bar charts. The maps provided are choropleth maps which use color to map a variable
to a region on the map. For the purpose of the thesis, choropleth maps are not relevant
because the location (latitude and longitude) of a gas turbine is not of major importance.
19
Even less its region. More important are the parameters of the location, like distance
from sea and salt exposure.
Multidimensional datasets for car specifications have been visualized in applications
using parallel coordinates or star coordinates [30, 31]. Star coordinates have been found
useful for gaining insight into clustered datasets. Due to the radial placement of the axes
the exact data values for points in the representation are hard to interpret. The axes of
a star coordinates representation are arranged on a circle with the origin at the center
of the circle. Initially they are separated with equal angles but through interaction both
the length and orientation of the axes can be changed to not be equally spaced. Due
to the circular arrangement of the star coordinates representation, the space needed is
independent of the number of dimensions. For parallel coordinates, the size increases
linearly by the number of dimensions, if distance between axes is kept the same. Star
coordinates has also been used for a dataset containing ratings of American cities for a
number of criteria [31].
20
3 Implementation
This chapter describes the development process and the technologies used to extract the
data and to create the visual representations of the web application. As mentioned in
section 1.3, the aim of the application is to allow users to explore the data to gain valuable
insights regarding the lifetime of the turbines and their components. The design of the
application is also described together with how the golden rules for design presented in
section 2.2.1 are considered.
3.1
3.1.1
The Development Process
Agile
The application has been developed using an agile development methodology [32]. Unlike
traditional development methodologies, such as the waterfall model, where the process
takes place in stages with the previous stage finalized before continuing to the next, agile
methodologies are more dynamic. Dynamic methodologies are less vulnerable to changes
late in the process, because the development takes place iteratively [32].
A product backlog was created from a set of general user stories based on the needs
of the application. The user stories were broken down into tasks to be implemented.
The tasks were rated according to their importance to the application and each week,
the highest prioritized tasks from the product backlog were selected. The selected tasks
were broken down into smaller tasks, which were time estimated and conducted during
a sprint. On weekly meetings, at the end of each sprint (iteration) a working prototype
was created, incrementally extended and improved during later sprints. During the meetings the prototype was reviewed and possible changes and improvements were discussed.
The changes and improvements appearing during the process were added to the product
backlog to be selected in a later sprint.
3.1.2
Prototypes
Lo-Fi prototypes were drawn in an early stage of the development process. In this project,
the prototypes of the layout (figure 3.1) are focusing on the concept of the layout and the
functionality of the option panels. The Lo-Fi prototypes were used in combination with
interactive prototypes that were shown to the closest stakeholders at Siemens in order
to present the different possibilities. The use of prototypes can facilitate in planning
the structure of the application and the selection of visualization techniques to be used.
Figure 3.2, shows the final Lo-Fi prototype of the visualization application which was
used as a template when implementing the application.
21
(a) Application layout.
(b) Application layout with option panels visible.
Figure 3.1: Lo-Fi prototypes of the application layout designed in the early stages of the development process. The prototypes represent the layout of the application with (a) the options
panels hidden and (b) shown.
3.2
Data extraction
The first and second stage in the process of data visualization, section 2.3.1, is about
the collection and transformation of data. The data used by the application is extracted
from a Microsoft SQL Server database using the web server, Internet Information Services
(IIS) [33] and the ASP.NET Web Pages [34]. The data extraction is written in ASP.NET
with Razor Syntax [35] which includes SQL query functionality. Razor syntax is a server
22
Figure 3.2: Lo-Fi prototype of the application constructed and refined during the development
process.
side programming syntax for embedding server-based code into web pages. Figure 3.3
illustrates the process of the data extraction, starting with the client’s request. The
process continues on the server that handles the request and sends a response back to
the client.
Figure 3.3: The process of extracting data from the server. The client requests data when
entering a page. The web server handles the request and lets ASP.NET process the page. A
code snippet from the markup code of ASP.NET with Razor Syntax is called to execute a SQL
query to select the data from the database. The database processes the query, and returns the
data which is formatted as JSON by the ASP.NET with Razor Syntax code before sending the
response to the client.
23
3.3
Client side technologies
The application uses several technologies to create and structure the content and visual
representations. The structure of the web application is created in HTML5, which is the
most recent web standard of the markup language HTML (HyperText Markup Language)
[36].
The style and design of the web page is written in CSS (Cascading Style Sheets) [37]
that can access elements created by a markup language, such as HTML5, and change the
design (for example color, placement and size) of the elements.
To create the visual representations the JavaScript library D3.js (Data-Driven Documents) [38] was used. D3 allows for manipulation of documents through the Document
Object Model (DOM) [39] based on data.
The outline of D3.js is to create SVG elements that when combined specify the
graphical representations. SVG, Scalable Vector Graphics, is a format for creating twodimensional images with support for animation and interaction. D3 provides functionality
(such as mouse events for interaction) and components (such as axes and scales) ready to
be used in the application as well as the possibility of customizing the visualizations. The
functionality and components are combined to create different graphical representations.
Furthermore, jQuery [40] and jQuery UI [41] were used to provide additional functionality and components to the web page. jQuery is a fast and feature-rich JavaScript
library that provides functionality for animation and event handling. jQuery UI is a set
of user interface interactions and widgets built on top of jQuery. One of many widgets
offered by jQuery is a range slider allowing users to filter data within an interval selection.
3.4
Application design
The general design and layout of the application has been constructed to support and
follow guidelines for usability (see section 2.2) and facilitate user perception and interaction, as discussed in section 2.1. The application is limited to the size and resolution of
the monitor, meaning that the representations and their option panels should fit on one
page with the scroll bars disabled.
The page is divided into three clearly defined areas allowing users to scan the page,
deciding what area to focus on. The different areas of the application are separated by
a light gray color, as seen in figure 3.4. Additionally, to create a clear visual hierarchy
of the page, the option panels are hidden from start, showing only buttons to toggle
their visibility. The buttons are positioned in corners of each area away from adjacent
areas, making clear what area each option panel belongs to. Thus, with one exception for
the lower right area where the button is placed in the top right corner to be consistent
with the uppermost area to the right. With the option panels hidden, the visual noise is
reduced guiding the users’ attention to the representations.
Furthermore, it should be obvious how to interact with each element. By changing
mouse cursor the user is informed about what actions are possible for the element in focus.
The different cursors used in the application, together with their associated interactions
are displayed in figure 3.5.
Five of the golden rules defined by Schneiderman and Plaisant, section 2.2.1, have
been taken into account for designing the application:
24
Figure 3.4: The page is divided into three clearly defined areas containing each representation.
Buttons to toggle the visibility of the option panels are located in a corner of the area it is used
with.
Figure 3.5: Mouse cursors used by the application to guide user interaction and their related
actions.
• Prevent errors - To prevent from errors, an intelligent filtering functionality is
available. By filtering out data points for one parameter, if no data points exist
for other parameters they are disabled so users cannot select or deselect them.
Furthermore, the parallel sets representation requires at least two dimensions with
at least one category each. By disabling controls for the filtering scenario that
violates this, errors occurring when no data is selected can be prevented. To prevent
errors during the initialization process of the application, a loading screen with
information about the application is visible until all data has been loaded and the
representations are created.
• Strive for consistency - By using a common color scheme for the representations
the design is more consistent. Moreover, a uniform description of the mouse cursors
used throughout the application and the design of control panels and tooltips are
also contributors of striving for consistency.
• Informative feedback - Informative feedback, can be connected to the third part of
the Visual Information Seeking Mantra: Details on demand, section 2.3.3. When
users interact with the application and are hovering objects they are provided with
tooltips containing detailed information. Additionally, the representations are coordinated and updated when the data is filtered giving users immediate feedback of
their action. Filtering data for one representation affects all other representations
25
accordingly.
• Make sure users feel they are in control - The second part of the Visual Information
Seeking Mantra: Zoom and filter, section 2.3.3, is related to the feeling of control.
Initially users are given an overview of the data. By providing option panels with
filtering functionality and the ability of zooming in and out users are able to simplify
the view, showing only data that is relevant for the task at hand. Furthermore,
adding the opportunity to change parameters of the axes and their arrangement,
users are given control over how the data is shown.
• Minimize the short-term memory load - In graphs providing a legend, the number
of items are limited according to the rule of thumb for information processing, section 2.2.1. The legends are sized dynamically according to the number of items.
However, to minimize the short-term memory load, users are limited into selecting
dimensions with at most seven categories, making it at most seven different colors
and their corresponding labels to keep in mind when investigating the representations.
26
4 ViSITelligence - Results
The result of the thesis is an interactive visualization application that fulfills the objectives presented in section 1.3. Images shown in this chapter are not created with the
maintenance data described in the next section due to confidentiality.
4.1
Data
The application uses two datasets where data objects of the datasets are connected by
an identification number for the gas turbine they are associated with. A gas turbine is
also referred to as a machine. A simplified view of the retrieved data attributes is found
in figure 4.1.
Figure 4.1: Five types of data attributes has been retrieved. Climate, site and configuration
data for general information about a machine and its location. Inspection data for the general
inspection information and inspected components data for maintenance findings on components.
One dataset contains more than two hundred machines with specific information about
their location and configuration. The dataset consists of information about the altitude
and distance from sea of the machines’ location. It also contains the configuration settings
of the machine, such as the turbine model and fuel type used. For each machine, data collected during inspections are also provided, which includes the time the machine has been
in operation, among other. Additionally, the number of inspections, both planned and
unplanned, and the number of removed components for each machine are provided. The
dataset contains primarily quantitative variables. However, some categorical variables
are provided to be able to group machines by, for example, coloring the visual entities
connected to the machine according to its turbine model or fuel type.
27
As opposed to the first dataset, the second dataset includes mostly categorical variables. The second dataset contains information from the inspections of the machines.
Both inspection and component specific information is provided together with configuration settings for the machines and if the machine is exposed to high levels of salinity.
The inspection specific information is, for example, whether the inspection was planned
or unplanned. Component specific data is about the findings when inspecting the components. This includes the remarks found on the components and a judgment from the
inspector about the findings on the components and if it still can be in operation or need
to be replaced. Configuration settings are the same for this dataset as for the first. A
total of ten categorical variables are selected and grouped into unique combinations of the
values for the variables, together with an identifier for the machine they are connected
to. For each unique combination, the accumulated number of inspected components is
presented. Each of the ten variables, or dimensions, has between two and eight different
values.
4.2
Options panels
The Gestalt laws of proximity, similarity and continuity, described in section 2.1.2, are
considered when designing and positioning the controls in the options panels, as shown
in figure 4.3, 4.4 and 4.5. Controls affecting the same parameter of the representation,
or with similar usage, are grouped together by separating them from other controls by
adding extra space. The law of continuity is utilized by the use of slider controls to depict
a value or interval within the range of the control. The content of each options panel is
described thoroughly in the next sections, dedicated to different visualization techniques.
4.3
Visualization techniques
Figure 4.2 shows an example of the application’s interface. This interface can be used to
gain a deeper understanding of the raw data described in section 4.1. Therefore, multidimensional visualization techniques are used to provide users with as much information
as possible. By using three different types of representations, multiple aspects of the
data can be displayed, both as an overview and in more detail. By connecting the three
representations, all views will display data for the same considered scenario.
To avoid problems with conjunctive search and similar colors (see section 2.1.1), only
one shape is used in the scatter plot to represent the different machines. The problem
of identifying objects with similar colors is solved by using a categorical color scale with
colors as distinct as possible from one another.
The two datasets, presented in section 4.1, are used for the different representations.
The first dataset, with primarily quantitative variables, is represented in a scatter plot and
a histogram (section 2.3.4) while the second dataset, with primarily categorical variables,
is demonstrated in a parallel sets representation (section 2.3.4). The scatter plot is
chosen to give the user the possibility to explore the relationship between quantitative
variables. As described in section 2.3.4, scatter plots can be used to examine whether
there is any correlation between two attributes (for example distance from sea and number
of unplanned inspections). The histogram is chosen to provide information about, for
example, how often a machine is inspected in general and if there are any machines
28
Figure 4.2: The ViSITelligence application allows users to explore the data in order to find
patterns in the data and relationship between variables. The representations are coordinated
and sharing the same filters on data.
having inspections more or less often. The parallel sets representation is chosen to show
the general flow of the data and possible relationships between categories. Furthermore,
with its independent axes, as described in section 2.3.4, a large number of dimensions
can be shown.
4.3.1
Scatter plot
In the scatter plot, figure 4.3, each dot represents a machine and the colors represent the
possible categories for a selected dimension, which for example, can be the model of the
turbine or the fuel type. The position of each dot corresponds to the measures of the
selected variables for the axes using linear scales. Users have the opportunity to choose
the variables used for the scales on the axes as well as the categorical dimension used to
color the dots. An interactive legend is provided in the view to give the ability to filter
out data points represented by a certain value upon selection. All dots are initially of
the same size to limit the representation to three dimensions. Through the use of the
options panel, figure 4.3, the user can select a fourth dimension to be displayed as the
size of the dots. When using the fourth dimension, the user has the possibility to filter
out data outside of a selected interval for the variable.
Some variables have no value associated to them for a certain dimension. Their
visibility can be toggled with a control in the options panel. If visible, they are represented
with a value of zero. Machines with no value associated to them for one or both axes are
hidden by default for limiting misinterpretations of the values for the machines.
In addition to the interactive legends and the possibility to change the variables on
the axes, tooltips are applied in the chart to give the ability of hovering dots with the
mouse pointer and get detailed information about the selection. While hovering a dot,
guidelines are drawn from the object to the axes to guide the user in interpreting the
values for each axis. Zooming and panning functionality can also be applied, giving the
29
possibility to focus on a certain area of the graph.
Figure 4.3: The scatter plot is used to display the relationship and correlation between two
variables for the machines. The user can filter by items in the legend and change the dimension
for coloring the bubbles. The user may also change the variable used for the size of the bubbles,
which in this case is Variable10. For Variable10 only machines with a value between 6 and 17
are shown. The variables representing the axes can also be changed.
4.3.2
Histogram
The histogram, figure 4.4, is used to show the distribution of the current data for a
selected quantitative variable. This allows spotting outliers in the data and discovering
which machines are similar to the average. Initially the histogram groups the data points
into bins of equal size. By default, the number of bins is half the number of possible values
for the variable used for the bins. Due to limitation in space the maximum number of
bins that can be displayed is thirty. If more than thirty bins are used the tick labels for
the x axis cannot easily be read. This issue is part of the future work discussed in section
6.1. The number of bins can be changed, using a slider provided in the options panel.
This may result in differently sized intervals for the bins, if the number of possible values
for the variable is not evenly divisible by the number of bins. Due to the discrete nature
of the dataset used by the representation some intervals may include one value less or
more than other intervals. However, the width of the bins are equal independent of the
interval size. The variable used for the bins of the histogram can be altered from the
options panel, figure 4.4. The vertical axis is in the current version always represented
by the number of machines.
By hovering a bin it changes color to differ from all other bins. A tooltip with detailed
information about the selection is also displayed. The tooltip includes information about
the actual interval of values connected to the bin, the number of machines and the name
of at most twenty machines within the interval of the bin. The rest (if more than twenty
machines) are hidden represented by three dots (...).
30
Figure 4.4: The histogram shows the distribution of number of machines for a variable. In the
options panel users have the opportunity to select the variable to be used for the distribution as
well as the number of bins.
4.3.3
Parallel sets
The parallel sets representation, figure 4.5, allows users to analyze flows and patterns in
the data. A wide flow further down in the chart implies that multiple observations have
had the same combination of categories for the different dimensions. The ribbons in the
parallel sets representation are curved by default to improve the traceability of each flow.
With smoother changes in direction for curved than straight ribbons, the Gestalt law of
continuity (section 2.1.2) is taken into consideration. However, straight ribbons make it
easier to spot correlation between variables, why it is possible to toggle between straight
and curved ribbons in the options panel.
The representation does, like the scatter plot and histogram, provide a tooltip showing
detailed information when the user hovers with the mouse cursor over a ribbon. The
information tells about the categories connected to the flow from the top to the dimension
of selection. Moreover, it tells the amount and percentage of the aggregated number of
maintenance findings of the total currently shown in the representation. Dimensions
and categories within dimensions can be rearranged to display the flows in different
perspectives.
The options panel, figure 4.5, uses a three column layout to be able to use most of the
limited space. The first column has a checkbox for toggling the curvature of the ribbons
and a button for resetting all of the applied filters. The other two columns consist of
a checkbox for each dimension and an expandable menu to toggle the visibility of the
controls for the categories connected to the dimension. These controls allow the user to
remove uninteresting variables from all views by deselecting the checkboxes.
31
Figure 4.5: The parallel sets representation shows the flow of the maintenance findings divided
in the different categories of each dimension. In addition to filtering the data by the different
dimensions and categories, users can also rearrange the axes for the dimensions and the categories to get a different perspective. A button for resetting the filters and a checkbox for toggling
the curvature of the ribbons are also provided.
4.4
Coordinated representations
The three representations used by the application are coordinated meaning that filtering
or selecting objects in one graph affects the view of all other representations. By selecting
a dot in the scatter plot, it is highlighted by increasing its opacity while decreasing the
opacity of all other dots. The bin in the histogram connected to the selected dot (with
its corresponding value included in the interval) is highlighted by changing its color to
differ in color from all other bins. In this way it is preattentively processed by the
user when changing focus to the histogram representation. The connection between the
32
representations also works in the opposite direction. By selecting a bin in the histogram
all machines connected to that bin are highlighted in the scatter plot. In both cases, the
parallel sets representation is updated to display only data connected to the machine or
machines included in the selection. Figure 4.6, shows the connection when a bin in the
histogram has been selected.
Figure 4.6: The representations in the application are coordinated. By selecting a bin in the
histogram, the machines related to that bin are highlighted in the scatter plot to be easily distinguished and preattentively processed (section 2.1.1). The parallel sets representation is updated
to only show the flow for the machines connected to the selected bin.
Filtering data in the parallel sets representation, by selecting or deselecting checkboxes
in the options panel updates the data used by the scatter plot and histogram. The data
shown includes only machines that have a connection to the combination of categories
currently shown. In this way uninteresting data points are removed from all views allowing
for detailed analysis of subsets of the data.
4.5
Use case scenario
Although the representations are connected during all time, they can be used individually
to answer questions by interpreting the results. An employee wonder what remarks
are most common to cause an engine failure or exceed the inspection criteria. Using
ViSITelligence the employee can focus on the parallel sets representation and filter out
uninteresting dimensions to only see the relationships for the attributes relevant for the
task at hand. The data is explored and the user gains insight in the relationship between
different remarks and an engine failure. The original question expands and additional
dimensions are added to explore if a certain configuration setting or the exposure to
salt have any impact on the failure. The focus of attention changes to the scatter plot
and the histogram where the machines connected to the shown flows of the parallel sets
are presented. Are they all close to sea or located on a similar altitude? Are they on
average inspected within the same interval of hours in operation? These questions can
33
be answered by the scatter plot or the histogram.
A new question appears and the user resets all filters applied. Are any sites deviating
strongly from the average of hours in operation between inspections? The user found
some sites differing from the normal. Do those machines have something in common?
The user selects the bin in the histogram and gains insight in their relationship according
to the selected attributes in the scatter plot, and the flow of the data in the parallel sets.
The knowledge, about the lifetime of the machines, gained by the user is communicated
to other parts of the company for improving the components and the configuration of
machines in order to increase the lifetime.
34
5 Discussion
As there are many visualization techniques that can be used for multivariate data exploration, three representations are chosen. The choice of using three representations was
made due to the limited availability of the screen space. Moreover, the use of three representations made it possible to display multiple aspects of the data and compare both
categorical and quantitative attributes. The representations chosen show the relationship
between both categorical and quantitative variables as well as the correlation between
different attributes. Using the application can facilitate the finding of patterns in the
data, spot outliers, and identify correlation between attributes.
The choice of using one options panel for each representation can be questioned when
looking back. With the representations coordinated and most of the controls acting on
all representations it may have been better to create one options panel with all controls.
5.1
Visualization techniques
For displaying an overview of the data and the relationship between multiple dimensions a
choice had to be made. One important question in the process of deciding was the importance of exploring the relationship between categorical or quantitative variables. Parallel
sets and mosaic plots are designed for the use with categorical data while SPLOMs and
parallel coordinates have their strengths in displaying quantitative and continuous data.
Due to the categorical nature of the configuration options for gas turbines, parallel sets
and mosaic plot were preferred in order to explore how different combinations of the categories affect the lifetime. By using a parallel sets representation rather than a mosaic
plot the more accurate visual encoding of length was preferred over area. Additionally,
with the independent axes of a parallel sets representation the number of dimensions that
can be shown are large. The curved ribbons facilitate in tracing a certain flow while the
straight ribbons facilitate in spotting relationship between variables. If using as many
dimensions for a mosaic plot, the different areas would have been small and more difficult
to interpret the different combinations of categories.
The scatter plot and histogram were chosen according to, in a limited space, be able to
identify correlation between variables, spot outliers in the data, and distinguish machines
with similar values for different attributes. The possibility to change the attributes of the
axes makes it possible to identify correlation between multiple attributes in the scatter
plot. Although it is only possible to show a single pair of attributes at a time, instead of
multiple which would be possible with a SPLOM. With a SPLOM it should be harder to
display the data in more detail. The advantage of a histogram rather than a traditional
bar chart or its variations is the possibility to spot outliers and identify the most common
values for an attribute easily. Moreover, the use of a bar chart would need some kind of
35
filtering itself because displaying and interpreting a large number of bars can be hard.
5.2
Implementation
The choice of writing the code for the scatter plot and histogram instead of using a framework with ready-made representations led to more work. However, only the functionality
needed for the representations were implemented instead of using representations that
lack some of the functionality needed or includes functionality not to be used. At the
same time, the use of a framework probably had prevented the fact that the code for
the scatter plot had to be rewritten and restructured halfway through the project. The
code had to be rewritten to be able to add new functionality and connect the different
representations. After the code for the scatter plot was refactored, its use became more
general and made it easier to add new functionality.
As only a portion of all available variables from the database are used by the application, reusability has been an important factor when creating the representations. With
reusable representations almost any dataset can be used, provided it is structured like
the datasets used by the current version of the application. When the database grows
and more variables are to be selected, the representations are flexible and able to use the
extended (or new) dataset without further development.
5.2.1
Development process
The work was carried out following an agile methodology. Working alone has hampered
the work with an agile workflow when all the responsibility is in your own hands to follow
all principles. The weekly status meetings with Siemens have been a large part of the
agile workflow. The meetings facilitated discussions about choices made and valuable
feedback was given. Working in pair of two would also have been beneficial for the
project’s organization and code quality. More concretely, with two developers it would
have been possible to develop the different representations in parallel with regular peer
reviews of the code.
5.2.2
Application design
When designing the application only five out of eight golden rules for user-interface design
where considered. The five considered are presented in sections 2.2.1 and 3.4. The
remaining three rules were excluded from the thesis because of time constraints. They
were also considered as extra functionality not required to fulfill tasks. The excluded
rules are:
• Cater to universal usability is about improving the user experience for a wide range
of users (from experts to new users). For example, keyboard shortcuts can be
considered for experienced users helping them speed up their use of the application.
Providing descriptions of the different parts of the application can help new users
to get familiar with the application. Shortcuts and extensive descriptions were not
considered because of time constraints and the fact that they are not required to
complete tasks.
36
• Design dialogs to yield closure is about showing the different steps of a predefined
process. For example, when buying something from a shop on the Internet, different
steps are followed. Firstly, the customer places the items in the shopping cart.
Secondly, the customer has to pay for the items and then receives a confirmation
of the purchase. For the current version of the application there is no predefined
process to follow, which is why it is not considered.
• Permit easy reversal of actions is about allowing users to undo their actions. It
can, for example, be the use of a button or keyboard shortcut that can be used to
undo the most recent action. This was considered as a functionality not necessary
for the current version of the application, because the user can undo their actions
by performing the latest action in reverse. That is, for example, to select the most
recent deselected option in the options panel.
These three rules can be considered in the future when the application is extended
with additional functionality to improve user experience.
5.3
Interactivity
The main performance issue when interacting with the current implementation is when
the number of dimensions shown in the parallel sets representation is large in combination
with several categories for each dimension. When interacting with the representation,
for example, by reordering the dimensions a slight delay is introduced, due to the heavy
calculations needed to redraw the flows. This delay is not noticed when a lower number of
dimensions are displayed, or when the number of categories within each dimension is low.
Changing parameters for the other representations is not affected by this performance
issue unless filtering by a legend item in the scatter plot.
Interactivity is an important aspect of an application that will be used for analysis.
Interaction with the representations is instantaneous, except for the case with a large
number of dimensions and categories described earlier. All interactions use transitions
to be able to trace movement or resizing of an element, instead of giving an immediate
update. By allowing users to filter among data attributes and change attributes for the
axes, the users have more control of the exploration of data. The selection functionality
makes it possible to, for example, select a machine of a customer in the scatter plot and
interpret the flow and correlation between attributes in the parallel sets representation,
during discussions with the customer.
With the interactivity of the application and the possibility to show attributes and/or
machines of interest for the task at hand, users can gain insights regarding the lifetime
of the turbines and its components. Questions may also be answered by interacting with
the application and interpreting the visual representations. The interpretation can also
raise awareness and open up for new questions that have not been thought of before.
5.4
Siemens
The stakeholders at Siemens are satisfied with the result and want to continue the work
with the application. Work is in progress to improve the quality of the data in the
37
database which has been one of the limitations presented in section 1.4. The application
is to be presented for employees in several Siemens’ departments.
38
6 Conclusions
The end result is a visualization application for the web, which will provide Siemens with
valuable insights in their turbines’ maintenance data. The application uses visualization
techniques appropriate to the collected data and for the purpose of data exploration.
From questions an employee at Siemens might have about different configurations for the
turbines and how they affect the lifetime, the application can be used to explore the data
and find answers to these questions. It can also be used to facilitate the emergence of new
questions occurring from answering other questions, when more insight has been gained
or upon discussion with colleagues. The application has been developed with focus on
usability and perception. This master thesis is a step into the further understanding and
improvement of the lifetime of Siemens gas turbines.
Overall, the application fulfills the objectives presented in section 1.3. The application
allows employees at Siemens to explore and analyze the maintenance data in order to gain
insights regarding the lifetime of the gas turbines.
6.1
Future work
The current version of ViSITelligence is limited to the selection of only one dot in the
scatter plot or one bar in the histogram. A valuable improvement would be to allow the
user to select multiple dots or bars in order to increase the ability to customize selections.
With multiple selections the user can, for example, select machines of a certain customer
to explore data for only selected customers.
The process of extracting data from the database and make it accessible by the application is time consuming. In a future version of ViSITelligence, another method for
formatting the data as JSON should be used to speed up the initial loading time of the
application. Further improvements to the performance of the application need to be
made in order to reduce delays on user interaction when showing multiple dimensions
with several categories in the parallel sets representation.
Additionally, improvements to the connection between the representations can be
made. In the current version, selection is required to highlight or filter elements in the
representations. In a future version, hovering over an element can highlight elements in
the connected representations.
The tick labels for the histogram are not adjusted according to the available space.
This means that they can be overlapped and be more difficult to interpret. The current
solution of limiting the maximum number of bins to thirty makes the tick labels always
readable but lacks the possibility of viewing the distribution in more detail for variables
with a wide range of possible values. Research is needed to provide a better way to
display the labels when they are overlapping. One possible solution can be to hide every
39
other tick label, but in that case the user needs to either hover with the mouse over a
bin to read the actual interval for values connected to the bin or interpret the possible
values from the surrounding tick labels.
Including a time parameter to be able to filter and show data from inspection within
an interval in time, such as all inspections before 2010 or inspections between 2008 and
2012. Filtering by year would make it possible to identify trends or changes in the
relationship over the years of inspections.
When a user interacts with the application, he (or she) may sometimes want to store
the current selection and graphs, both for comparison with another selection or to be
restored later. As the database grows and the number of attributes increases, a selection
for the current dataset should be stored in order to restore the same selection using the
updated dataset. Functionality for creating a text file with images can be implemented.
A dashboard where users can select the data source to use and what attributes to be
retrieved and loaded to the visual representations is another interesting addition to the
application in the future. Additionally, a login system can be built in order to provide
different amount of data to different users. For example, providing the application to
customers should only show the machines owned by that customer. With a login system,
sessions can be stored on users, which make it possible to store data selections for each
user.
40
Bibliography
[1] Siemens Industrial Turbomachinery AB. Siemens Industrial Turbomachinery AB Hem; 2009. [Retrieved: 2015-02-10]. Available from: http://www.sit-ab.se/.
[2] Siemens AG. Siemens Global Website; 2015. [Retrieved: 2015-02-10]. Available
from: http://www.siemens.com.
[3] Johnson J. Designing with the Mind in Mind: Simple Guide to Understanding User
Interface Design Guidelines. 2nd ed. Morgan Kaufmann Publishers Inc.; 2014.
[4] Healey CG, Enns JT. Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics.
2012;18(7):1170–1188.
[5] Ware C. Visual Thinking for Design. Morgan Kaufmann Publishers Inc.; 2008.
[6] Ware C. Information Visualization: Perception for Design. 3rd ed. Morgan Kaufmann Publishers Inc.; 2013.
[7] Krug S. Don’t Make Me Think: A Common Sense Approach to the Web. 2nd ed.
Thousand Oaks, CA, USA: New Riders Publishing; 2005.
[8] Shneiderman B, Plaisant C. Designing the user interface. 4th ed. Pearson Addison
Wesley, USA; 2005.
[9] Spence R. Information Visualization: Design for Interaction. 2nd ed. Upper Saddle
River, NJ, USA: Prentice-Hall, Inc.; 2007.
[10] Heer J, Bostock M, Ogievetsky V. A tour through the visualization zoo. Communications of the ACM. 2010;53(6):59–67.
[11] Few S. Eenie, Meenie, Minie, Moe: Selecting the Right Graph for Your Message.
Intelligent Enterprise. 2004;[Retrieved: 2015-02-10]. Available from: http://www.
perceptualedge.com/articles/ie/the_right_graph.pdf.
[12] Few S. Quantitative vs. Categorical Data: A Difference Worth Knowing. DM Review.
2005;[Retrieved: 2015-02-10]. Available from: http://www.perceptualedge.com/
articles/dmreview/quant_vs_cat_data.pdf.
[13] Yau N. Visualize This: The FlowingData Guide to Design, Visualization and Statistics. Wiley Publishing, Inc.; 2011.
[14] Shneiderman B. The eyes have it: A task by data type taxonomy for information
visualizations. In: IEEE Symposium on Visual Languages; 1996. p. 336–343.
41
[15] Craft B, Cairns P. Beyond guidelines: what can we learn from the visual information seeking mantra? In: Proceedings of the Ninth International Conference on
Information Visualisation. IEEE; 2005. p. 110–118.
[16] Yau N. Data Points: Visualization That Means Something. Wiley Publishing, Inc.;
2013.
[17] Khan M, Khan SS. Data and information visualization methods, and interactive mechanisms: A survey. International Journal of Computer Applications.
2011;34(1):1–14.
[18] Chan WWY. A survey on multivariate data visualization. Department of Computer Science and Engineering Hong Kong University of Science and Technology.
2006;8(6):1–29.
[19] Kosara R, Bendix F, Hauser H. Parallel sets: Interactive exploration and visual
analysis of categorical data. Transactions on Visualization and Computer Graphics.
2006;12(4):558–568.
[20] Cuzzocrea A, Zall D. Parallel Coordinates Technique in Visual Data Mining: Advantages, Disadvantages and Combinations. In: IEEE 17th International Conference
on Information Visualisation (IV); 2013. p. 278–284.
[21] Few S. Multivariate analysis using parallel coordinates. Perceptual Edge. 2006;.
[22] Johansson J, Ljung P, Jern M, Cooper M. Revealing structure within clustered
parallel coordinates displays. In: IEEE Symposium on Information Visualization;
2005. p. 125–132.
[23] Kosara R. Turning a table into a tree: Growing parallel sets into a purposeful
project. Beautiful Visualization: Looking at Data through the Eyes of Experts,
Steele J, Iliinsky N,(Eds) O’Reilly. 2010;p. 193–204.
[24] Kosara R, Ziemkiewicz C, Mako III FJ, Miles J, Seong KT. Parallel Sets in the Real
World: Three Case Studies; 2009.
[25] Bendix F, Kosara R, Hauser H. Parallel sets: Visual analysis of categorical data. In:
IEEE Symposium on Information Visualization; 2005. p. 133–140.
[26] Few S.
Are Mosaic Plots Worthwhile?; 2014.
[Retrieved: 2015-03-03].
Available from: http://www.perceptualedge.com/articles/visual_business_
intelligence/are_mosaic_plots_worthwhile.pdf.
[27] Bieh-Zimmert O, Koschtial C, Felden C. Representing Multidimensional Cancer
Registry Data. In: Proceedings of the 13th International Conference on Knowledge
Management and Knowledge Technologies. ACM; 2013. p. 35.
[28] Godinho PIA, Meiguins BS, Goncalves Meiguins A, Casseb do Carmo RM,
de Brito Garcia M, Almeida LH, et al. PRISMA-A multidimensional information visualization tool using multiple coordinated views. In: 11th International Conference
on Information Visualization. IEEE; 2007. p. 23–32.
42
[29] Theus M. Interactive data visualization using Mondrian. Journal of Statistical
Software. 2003;7(11):1–9.
[30] Hauser H, Ledermann F, Doleisch H. Angular brushing of extended parallel coordinates. In: IEEE Symposium on Information Visualization; 2002. p. 127–130.
[31] Kandogan E. Star coordinates: A multi-dimensional visualization technique with
uniform treatment of dimensions. In: Proceedings of the IEEE Information Visualization Symposium. vol. 650; 2000. p. 22.
[32] Pfleeger SL, Atlee JM. Software Engineering: Theory and Practice. 4th ed. Pearson
Education Inc.; 2010.
[33] Microsoft. The Official Microsoft IIS Site; 2015. [Retrieved: 2015-03-15]. Available
from: http://www.iis.net/.
[34] Microsoft. ASP.NET Web Pages; 2015. [Retrieved: 2015-03-15]. Available from:
http://www.asp.net/web-pages.
[35] Microsoft. Introduction to ASP.NET Web Programming Using the Razor Syntax
(C#); 2014. [Retrieved: 2015-03-15]. Available from: http://www.asp.net/
web-pages/overview/getting-started/introducing-razor-syntax-(c).
[36] W3C. HTML5; 2014. [Retrieved: 2015-02-10]. Available from: http://www.w3.
org/TR/html5/.
[37] W3C. Cascading Style Sheets; 2015. [Retrieved: 2015-02-10]. Available from: http:
//www.w3.org/Style/CSS/.
[38] Bostock M. D3.js - Data-Driven Documents; 2015. [Retrieved: 2015-02-10]. Available
from: http://d3js.org/.
[39] W3C. W3C Document Object Model; 2005. [Retrieved: 2015-02-10]. Available from:
http://www.w3.org/DOM/.
[40] The jQuery Foundation. jQuery; 2015. [Retrieved: 2015-02-10]. Available from:
https://jquery.com/.
[41] The jQuery Foundation. jQuery UI; 2015. [Retrieved: 2015-03-17]. Available from:
https://jqueryui.com/.
43
Fly UP