...

Institutionen för datavetenskap Measuring and Analysing Execution Time Henrik Liljeroth

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Institutionen för datavetenskap Measuring and Analysing Execution Time Henrik Liljeroth
Institutionen för datavetenskap
Department of Computer and Information Science
Final Thesis
Measuring and Analysing Execution Time
in an Automotive Real-Time Application
by
Henrik Liljeroth
LIU-IDA/LITH-EX-A--09/050--SE
2009-11-10
Linköpings universitet
SE-581 83 Linköping, Sweden
Linköpings universitet
581 83 Linköping
Linköping University
Department of Computer and Information Science
Final Thesis
Measuring and Analysing Execution Time
in an Automotive Real-Time Application
by
Henrik Liljeroth
LIU-IDA/LITH-EX-A--09/050--SE
2009-11-10
Supervisor:
Simin Nadjm-Tehrani
ida, Linköping University
Bengt Arne Nyman
Autoliv Electronics
Andrzej Bednarski
Autoliv Electronics
Markus Isaksson
Autoliv Electronics
Examiner:
Simin Nadjm-Tehrani
ida, Linköping University
Avdelning, Institution
Division, Department
Datum
Date
Division of Computer Science
Department of Computer and Information Science
Linköpings universitet
SE-581 83 Linköping, Sweden
Språk
Language
Rapporttyp
Report category
ISBN
□ Svenska/Swedish
□ Licentiatavhandling
ISRN
⊠ Engelska/English
□
⊠
□ Examensarbete
□ C-uppsats
□ D-uppsats
□
□ Övrig rapport
2009-11-10
—
LIU-IDA/LITH-EX-A--09/050--SE
Serietitel och serienummer ISSN
Title of series, numbering
—
□
URL för elektronisk version
http://www.ida.liu.se
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-ZZZZ
Titel
Title
Exekveringstid i ett Realtidssystem för Fordon
Measuring and Analysing Execution Time in an Automotive Real-Time Application
Författare Henrik Liljeroth
Author
Sammanfattning
Abstract
Autoliv has developed the Night Vision system, which is a safety system for use
in cars to improve the driver’s situational awareness during night conditions. It
is a real-time system that is able to detect pedestrians in the traffic environment
and issue warnings when there is a risk of collision. The timing behaviour of
programs running on real-time systems is vital information when developing and
optimising both hardware and software. As a part of further developing their
Night Vision system, Autoliv wanted to examine detailed timing behaviour of a
specific part of the Night Vision algorithm, namely the Tracking module, which
tracks detected pedestrians. Parallel to this, they also wanted a reliable method
to obtain timing data that would work for other parts of that system as well, or
even other applications.
A preliminary study was conducted in order to determine the most suitable
method of obtaining the timing data desired. This resulted in a measurementbased approach using software profiling, in which the Tracking module was
measured using various input data. The measurements were performed on
simulated hardware using both a cycle accurate simulator and measurement tools
from the system CPU manufacturer, as well as tools implemented specifically to
handle input and output data.
The measurements resulted in large amounts of data used to compile performance
statistics. Using different scenarios in the input data, we were able to obtain
timing characteristics for several typical situations the system may encounter
during operation. By manipulating the input data we were also able to observe
general behaviour and achieve artificially high execution times, which serves
as indications on how the system responds to irregular and unexpected input data.
The method used for collecting timing information was well suited for this
particular project. It provided the possibility to analyse behavior in a better way
than other, more theoretical, approaches would have. The method is also easily
adaptable to other parts of the Night Vision system, or other systems, with only
Nyckelord minor adjustments to measurement environment and tools.
Keywords
WCET, Dynamic timing analysis, CPU load
Abstract
Autoliv has developed the Night Vision system, which is a safety system for use in
cars to improve the driver’s situational awareness during night conditions. It is a
real-time system that is able to detect pedestrians in the traffic environment and
issue warnings when there is a risk of collision. The timing behaviour of programs
running on real-time systems is vital information when developing and optimising
both hardware and software. As a part of further developing their Night Vision
system, Autoliv wanted to examine detailed timing behaviour of a specific part of
the Night Vision algorithm, namely the Tracking module, which tracks detected
pedestrians. Parallel to this, they also wanted a reliable method to obtain timing
data that would work for other parts of that system as well, or even other applications.
A preliminary study was conducted in order to determine the most suitable method
of obtaining the timing data desired. This resulted in a measurement-based approach using software profiling, in which the Tracking module was measured using
various input data. The measurements were performed on simulated hardware
using both a cycle accurate simulator and measurement tools from the system
CPU manufacturer, as well as tools implemented specifically to handle input and
output data.
The measurements resulted in large amounts of data used to compile performance
statistics. Using different scenarios in the input data, we were able to obtain timing characteristics for several typical situations the system may encounter during
operation. By manipulating the input data we were also able to observe general
behaviour and achieve artificially high execution times, which serves as indications
on how the system responds to irregular and unexpected input data.
The method used for collecting timing information was well suited for this particular project. It provided the possibility to analyse behavior in a better way
than other, more theoretical, approaches would have. The method is also easily
adaptable to other parts of the Night Vision system, or other systems, with only
minor adjustments to measurement environment and tools.
v
Acknowledgments
This project was carried out entirely at Autoliv Electronics AB in Linköping during November 2008 to May 2009.
I would like to thank all the employees and consultants who provided valuable
information on the system and associated tools. I would especially like to thank
my supervisors, Andrzej Bednarski, Markus Isaksson and Bengt Arne Nyman, at
Autoliv for providing support and assistance during the course of the project.
vii
Contents
1 Introduction
1.1 Background . . . . . . . . . . . . . . . .
1.1.1 Real-time Systems and Timing in
1.2 Objectives . . . . . . . . . . . . . . . . .
1.3 Problem Description . . . . . . . . . . .
1.4 Limitations . . . . . . . . . . . . . . . .
1.5 Definitions and Abbreviations . . . . . .
1.5.1 Definitions . . . . . . . . . . . .
1.5.2 Abbreviations . . . . . . . . . . .
1.6 Methods . . . . . . . . . . . . . . . . . .
1.6.1 Test Set . . . . . . . . . . . . . .
1.6.2 Test Environment . . . . . . . .
1.6.3 Measurement Procedure . . . . .
1.7 Thesis Structure . . . . . . . . . . . . .
. . . . .
General
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
4
5
6
7
7
8
8
9
9
10
11
2 System Description
2.1 General Overview . . .
2.2 Hardware . . . . . . .
2.2.1 Camera . . . .
2.2.2 ECU . . . . . .
2.2.3 Video Display .
2.3 Software Architecture
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
14
15
15
15
3 Method Study
3.1 Worst-Case Execution Time . . . . . . . . . . . . . .
3.1.1 Static WCET . . . . . . . . . . . . . . . . . .
3.1.2 Analysis tools for static WCET . . . . . . . .
3.1.3 Measurement-based (dynamic) WCET . . . .
3.1.4 Comparison of Static and Dynamic Methods
3.2 Context Dependencies . . . . . . . . . . . . . . . . .
3.3 Code Coverage . . . . . . . . . . . . . . . . . . . . .
3.4 Cycle Profiling . . . . . . . . . . . . . . . . . . . . .
3.5 Built-in Timing Functions . . . . . . . . . . . . . . .
3.6 Adopted Approach . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
18
18
19
20
21
22
22
23
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
Contents
4 Environment Setup and Preparations
4.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Module Test Function . . . . . . . . . . . . . . . .
4.1.2 Hardware Test Bench . . . . . . . . . . . . . . . .
4.1.3 Cycle Accurate Simulator . . . . . . . . . . . . . .
4.1.4 Hardware Profiling Limitations . . . . . . . . . . .
4.1.5 Simulator vs. Hardware . . . . . . . . . . . . . . .
4.2 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Tools for Generating and Manipulating Test Data
4.2.2 Recorded Test Data . . . . . . . . . . . . . . . . .
4.2.3 Modified Recorded Test Data . . . . . . . . . . . .
4.2.4 Generated Test Data . . . . . . . . . . . . . . . . .
4.2.5 Generating Test Data and Processing Results . . .
4.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Automatic Profiling Script . . . . . . . . . . . . . .
4.3.2 CSV-file Parser . . . . . . . . . . . . . . . . . . . .
4.3.3 Data Extractor . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
30
30
32
32
33
33
34
36
37
37
38
39
40
40
5 Performance Measurements
5.1 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Function and Loop Clock Cycle Comparison .
5.1.2 Measurements with Recorded Data . . . . . . .
5.1.3 Measurements with Modified Recorded Data .
5.1.4 Measurements with Artificially Generated Data
5.2 Results of Measurements . . . . . . . . . . . . . . . . .
5.2.1 Code Coverage Results . . . . . . . . . . . . . .
5.2.2 Cycle Profiling Results . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
42
44
47
49
49
50
51
6 Data Analysis
6.1 Result Accuracy Concerns . . . . . .
6.2 Input Data Analysis . . . . . . . . .
6.2.1 Full Sequences . . . . . . . .
6.2.2 Frame-by-frame . . . . . . . .
6.3 Code Coverage Data . . . . . . . . .
6.3.1 Full Sequence Measurements
6.3.2 Frame-by-frame . . . . . . . .
6.4 Cycle Profiling Data . . . . . . . . .
6.4.1 Full Sequence Measurements
6.4.2 Frame-by-frame . . . . . . . .
6.5 Improvements . . . . . . . . . . . . .
6.5.1 Optimisation Possibilities . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
54
56
56
56
57
57
58
60
61
62
71
72
7 Final Discussion
7.1 Overall Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Detailed Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Main Method: Measurements vs. Static Analysis . . . . . .
73
73
74
74
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
7.3
xi
7.2.2 Environment: Simulator vs. Hardware
7.2.3 Performance Study: Preparations . . .
7.2.4 Performance Study: Measurements . .
7.2.5 Performance Study: Results . . . . . .
Future Work . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
A Script Code
A.1 testingscript.py
A.2 fileparser.py . .
A.3 DataGrabber.m
A.4 DataCutter.m .
75
75
76
76
76
79
.
.
.
.
81
81
83
84
85
B Measurement Results Definitions
B.1 Code Coverage Summary Worksheet . . . . . . . . . . . . . . . . .
B.2 Code Coverage Function Worksheet . . . . . . . . . . . . . . . . .
B.3 Profiler Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
88
89
90
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
This report summarises the results of a Master’s thesis project (30 credit points)
as partial fulfilment of a degree in computer science and engineering at Linköping
university.
Chapter 1 describes the background, as well as the main issues present and the
purpose of the investigation. Also included are sections describing limitations to
the project scope, terminology and methods used to obtain the results. Finally
there is a section describing the overall structure of this report.
1.1
Background
Anyone who has ever driven a car at night has probably noticed the increased
difficulties involved compared to driving in daylight. The darkness makes it more
difficult to distinguish features on and at the side of the road, and makes the effective distance you can see in detail much shorter. The difficulties can also be
further magnified by other factors, such as bad weather or approaching vehicles.
As a result of this, accidents are more likely to happen in these situations. The
greatest concern here is not the traffic environment itself, since that is most often
static. Nor is it other vehicles, since they have their own lights to make them visible, and appear at predictable locations. The biggest risk is accidents involving
pedestrians and animals [10]. These are often hard to detect due to a number of
different reasons: they are less predictable in both location and behaviour, and
they are also often very hard to distinguish from the background due to lack of
lights and reflexes, and/or wearing dark clothes.
As a way to reduce the risk of accidents due to poor light conditions, Autoliv
has developed the Night Vision system to improve the driver’s vision at night.
The company Autoliv Inc. is a worldwide leader in automotive safety. They develop and manufacture safety products for all the major car manufacturers in the
world today. Among the products are airbags, seatbelts, steering wheels and safety
1
2
Introduction
electronics [3]. One example of such a safety system is the Night Vision system.
The system consists of three main parts: an infrared emission detection camera, an electronic control unit (ECU) and a video display. It enables the driver to
see in complete darkness without any help from infrared lamps or other illumination. When used together with the normal headlights of the car1 , the system can
greatly increase the driver’s situational awareness.
Figure 1.1 shows the difference between an IR camera image and a colour camera
image of the same scene. Pedestrians are clearly visible in the IR camera image
(upper part), but much harder to distinguish in the colour camera image (lower
part).
Figure 1.1. Comparison between infrared and normal vision. Source: Autoliv
The first generation of this system is comparatively simple. The images taken
by the camera are filtered and adjusted to give an accurate representation of the
environment in front of the car on the video screen for the driver. The system
is also calibrated to be most sensitive in the spectrum that heat emissions from
humans and animals have, to make sure such objects have good visibility. This
system is the foundation to the next generation of Night Vision.
The second generation of the system expands on the original functionality by
introducing automatic detections and warnings for pedestrians, that can be displayed to the driver. This is done by running detection algorithms on each frame
of video data, with a rate of 30 frames/second.
1 Although the Night Vision system itself functions without the headlights, it is not intended
to replace them. For safety reasons headlights should always be used in conjunction with the
Night Vision system.
1.1 Background
1.1.1
3
Real-time Systems and Timing in General
A real-time system is a system where not only the accuracy of the result is important, but also the time in which it is generated [5]. The Night Vision system
is a real-time system, which means it has to finish processing and deliver results
within certain time restrictions, in this case determined by the frame rate. It is
also what is known as a soft real-time system.
In soft real-time systems the deadlines are important, but they are not absolutely
critical. Missed deadlines often means degraded performance, but the system is
still functioning. Hard real-time systems on the other hand have much higher
demands on results within the deadlines. A missed deadline here would mean a
complete system failure with disastrous consequences. Systems that are generally
classified as hard real-time systems include for example brake systems for cars or
flight control systems in aircraft and other safety-related systems [5].
In real-time systems timing is very important. Information about how long different processes take to complete their execution is often critical to designing a
correctly functioning system. Tasks in real-time systems usually have varying execution times depending on input data and environment behaviour. Out of these
execution times the shortest is called the Best-Case Execution Time (BCET), and
the longest is called the Worst-Case Execution Time (WCET).
Figure 1.2 shows an example task and its different execution times, along with
the most important parameters for timing analysis. The columns visualise different executions of the task, which vary in execution time due to e.g. input data.
To be safe, a WCET estimate must be higher than the actual WCET. Similarly,
a BCET estimate must be lower than the actual BCET to be safe.
Figure 1.2. Execution times and estimates. The figure is a simplified version of Fig. 1
in [16].
The Worst-Case Execution Time is usually the most important aspect, since if a
guarantee of WCET can be given, it is safe to use this time as a base to make
4
Introduction
further design decisions on. Unfortunately, determining the exact WCET is not a
trivial task. There are however several methods that can give more or less accurate
estimates. An intuitive approach is to simply measure the execution time for a
number of test cases. But there are also methods to estimate WCET by analysing
the code itself, with no need to run it [16]. More detailed descriptions on these
methods can be found in Section 3.1.
1.2
Objectives
Autoliv’s motivation behind this study, is ultimately product development. More
efficient algorithm execution, more functionality and lower hardware costs are all
examples of areas where future development may be focused. In order to make
informed decisions regarding where to focus efforts and what changes that needs
to be made, a detailed study of the current system is important. This project aims
to be a step in this process. It should provide statistics on the current system and
evaluate methods to use for this and future studies.
The main focus of this thesis is to investigate and analyse processor utilization
on the ECU by a specific part of the system algorithm. We mainly consider
CPU-time usage by measuring clock cycle consumption and try to establish the
worst-case performance. The work is mostly focused on studying the changes in
clock cycle consumption based on different input data (situation behaviour). The
worst-case performance is secondary to this, but interesting nevertheless. Some
general measurements have previously been performed to provide an overview of
the hardware usage in some situations. The goal for this thesis work is to investigate what is happening in the ECU on a more detailed level by gathering data for
analysis and compiling statistics on the execution. To draw detailed conclusions
from the collected data, and how the results may affect future development of the
system, is outside the scope of this project. It is a matter for Autoliv’s system
designers.
There are several different modules in the system algorithm. They all share the
available CPU-time, distributed differently among the modules according to what
they need. There is also a time limit in which the entire algorithm has to complete in order to maintain the functionality at the rate of 30 frames/second. The
problem is not in the time limit itself, since the current system performs within
the limits. The goal is rather to analyse the execution of one particular module
(the Tracking module), and obtain data regarding how the module is performing
in different situations. This analysis will provide critical information about the
system that can be used in future development.
As mentioned, the focus of this thesis is to help provide answers to questions
related to CPU utilization. The following questions illustrate a more long term
purpose of the investigations. Some of them can be answered by this study, but
some may require more work and could be the subject of future studies. Therefore,
1.3 Problem Description
5
not all of them will necessarily be answered directly in this report.
• What are the characteristics of the current CPU utilization?
• Is it possible to implement more features?
• Is the current hardware more efficient than necessary? Can it be replaced?
• Where should optimization efforts be concentrated?
• What kind of optimizations are possible?
Most important is the current characteristics. The study focuses on providing
detailed information on CPU cycle consumption in different situations using different input data. This data helps with possible answers to the last two questions
in the list. The questions about whether or not it is possible to implement more
features, or make modifications to the hardware, is not possible to cover given the
project scope and time available. (See Section 1.4).
Another objective of this work is to develop tools and/or methods that can be
used in future analysis of other parts of the system, or evaluation measurements
on different target hardware than the one currently used.
So in short, there are two main goals for this project that needs to be fulfilled:
• The first goal is to gather data and compile statistics for analysis. Detailed
analysis of the data is neither possible given the time available, nor desired
at this early stage of investigations of the module. There should however be
some superficial analyses made where necessary to decide how to continue
the study and to understand the statistics.
• The second goal is to find and/or develop a method for this study that will
be possible to use on future similar studies. This includes both the method
itself and the tools that may be needed to use it. The method is needed to
complete the first goal.
1.3
Problem Description
To reach the goals described in Section 1.2, a method was to be identified which
suits the area of application. This method should be used to gather data, which
is then analysed to form performance statistics. The work can be divided into five
main areas, which are seen in the list below.
• System study
• Method study
• Environment setup and preparations
• Simulations and measurements
6
Introduction
• Data analysis
First, a study had to be performed on the Night Vision system. This was necessary in order to know how the system operates, what the important aspects are,
etc. The information gathered here helped to make informed decisions on what
methods to use for the performance study, what parameters to focus on, and so on.
As there are different approaches available to study performance of real-time systems, a study of these had to be performed to be able to choose the most suitable
method for this particular project. The method should be easy to use, but still
allow for detailed results to be extracted. It should cover as many of the interesting areas as possible. It should also be possible to adapt to work outside this
project (e.g. measurements of other modules besides the Tracking module). The
method study also provided knowledge on what different tools there are available,
and how to use them.
To be able to perform the study, a suitable environment was to be constructed.
This includes separating the module that is to be studied from the rest of the system, and put it in its own test environment. This test environment did not exist,
and had to be implemented. Other preparations included building a set of input
data, and implementing tools to handle things outside the actual performance test,
like for example pre-processing of input data.
When the environment was properly set up and prepared, the actual study could
be performed. In this project it consisted of several measurements of the module
while it was executing on simulated hardware, using the previously designed set
of input data. This is called profiling, and uses software tools developed by the
CPU manufacturer.
The data acquired from the measurements were then compiled and analysed. By
doing this it was possible to visualise the results and draw conclusions regarding
the performance characteristics.
1.4
Limitations
This project has been limited to only study the software part of the Night Vision
system closely. Although interesting, the hardware part is fixed since this is a
commercial product at late stages of development. Therefore modification to the
hardware itself is not possible in the current system. However, the information
provided by this analysis could indicate which areas of the hardware (if any) that
can be optimized in future generations.
Another important limitation lies in the algorithm modules. There are a lot of
different software modules in the system, but not all of them were analysed. There
are two main reasons for this. One is that there is simply not enough time allocated to this project to perform a total analysis of all the modules. The other
1.5 Definitions and Abbreviations
7
reason is that many modules have a more easily predictable behaviour or use only
a small fraction of the available resources. Also their time consumption often only
depends on each individual frame. Modules that have more complex dependencies,
such as previous frames and internal states, are far more interesting. Although
every effort has been made to cover as much as possible in the analysis, an exhaustive analysis would be overly time consuming, so the measurements have been
prioritized according to relevance to the main focus.
1.5
Definitions and Abbreviations
There are several abbreviations and other terminology used in this report that
may not be common knowledge to readers. This section provides brief general
explanations of these in alphabetical order.
An important note is on the use of the word algorithm. Many parts of the system have their own algorithms that are separate from each other, and there is
an encompassing algorithm for the complete system. When the word algorithm is
used throughout this report it normally refers to the complete Night Vision system
algorithm, unless stated otherwise in the text.
1.5.1
Definitions
Term
Branch delay slot
Code Composer Studio
Control Flow Graph
Cross-path
Execution packet
Frame
Description
An instruction slot in a processor where it is possible to schedule instructions while waiting for the
branch destination to be known. These instructions
are always executed whether the branch is taken or
not.
An integrated development environment (IDE) developed by Texas Instruments that is available for use
with their processors [6]. It is used for writing, debugging and analysing code for processors, but also
includes numerous tools for analysing how the code
performs on specific processors.
Visual representation of all paths in a program that
might be traversed during execution.
A single data bus connected to all registers allowing
the two sets of functional units of the CPU to use
each other’s registers.
A group of instructions that are to be executed simultaneously by the different functional units in the
CPU. More information in [1].
One of the many single images that together makes
up a video sequence.
8
Introduction
Greyscale image
HEX-file
Infeasible path
Infrared radiation
Profiling
Profile range
Stall cycle
Test port
1.5.2
Abbreviations
Abbreviation
CAN
DSP
UUT
ECU
FIR
IDE
LVDS
VP
WCET
1.6
An image where each pixel only carries intensity information, and no colours. The result is an image
composed exclusively of shades of grey, with black
and white as the extremes.
Hexadecimal object file format. It is used for storing
binary data in ASCII representation [11].
Path that is executable according to the control-flow
graph structure, but not feasible when considering
possible input data and program behaviour given this
data [14].
Radiation with frequencies just below the color red in
the visible spectrum. Also known as heat radiation.
Investigation of a program’s behaviour using information gathered as the program executes. A tool for
doing this is generally called a profiler.
Two points in source or assembler code between
which the Profiler is set to collect measurement data.
A CPU cycle in which the CPU does not execute instructions as intended. There may be several reasons
for this, including waiting for previous operations to
finish or waiting for memory access.
A port on the ECU used to extract binary data from
a running system. The data can come from a number
of different stages in the algorithm.
Description
Controller Area Network
Digital Signal Processor
Unit Under Test
Electronic Control Unit
Far InfraRed
Integrated Development Environment
Low Voltage Differential Signaling
Vision Processor
Worst Case Execution Time
Methods
To get WCET estimates on programs there are two different main categories of
methods available [16]. One is static WCET, where the code itself is analysed
(often using a specialised tool), but not necessarily executed at all. The other is a
practical measurement based approach, where the software is run with a suitable
set of input data which is designed to give a good estimate. Both approaches have
1.6 Methods
9
their own benefits and drawbacks, which will be explained in more detail later in
Section 3.1. Given the characteristics of these approaches and the overall goal of
the study, we decided to use a measurement based approach.
1.6.1
Test Set
The input data for the measurements consists of a number of video files recorded
with a test car during different conditions. To get as complete data as possible,
it is important to have a test set that represents every possible situation and environment that the system may operate in during live conditions. This includes
different environments, such as countryside roads, highways and cities, as well as
different situations, which could be for example single pedestrians, multiple pedestrians, pedestrians entering vehicle path, etc.
Autoliv has a database with a substantial amount of video files that was used
to build a suitable test set. It mostly consists of files recorded in real traffic situations from several different locations around the world, but some of it is also
arranged scenes with actors in more controlled conditions.
1.6.2
Test Environment
The test setup reflects the car setup of the Night Vision system as closely as practically possible. There are two alternative ways to set up the test environment.
One is hardware-based using components used in the actual system. The other
is entirely software-based using a simulator from the DSP manufacturer. In this
project the latter was used. The simulator, along with tools for measuring code
coverage and performing application profiling, comes with the Code Composer
Studio IDE. Details on hardware, software and the decisions regarding them are
provided in Section 4.1.
The test setup was designed to focus on the Tracking module. In order to perform
measurements on the module, we need to isolate it from the rest of the code and
put it into a special module test environment. This makes it easier to control
input and output data, as well as it excludes parts of the algorithm that are not
necessary and would take up extra time during the testing. To do this, a special
wrapping function (referred to as the module test function) was written that the
module under test could be embedded into.
The module test is designed to model the execution of the module as realistically
as possible. Around the module functions there is a test framework consisting of
supporting code necessary to e.g. initialize and set up a correct environment and
perform file I/O. This framework does not, however, include facilities to simulate
e.g. interrupts or context switches, which could occur in the real system. Further
details on the module test environment is located in Sections 4.1.1 and 5.1.1.
10
Introduction
1.6.3
Measurement Procedure
The measurement procedure needs to be reliable and easily usable for this and
other system studies. This is one of the goals of the work in this thesis. The
procedure includes all the steps necessary to perform a performance study, from
start to finish.
The choice of method and tools led to a general procedure that can be described
by the list below. The modular composition of the Night Vision system makes it
relatively easy to isolate parts of it and control input and output data. In practice,
this meant writing a wrapping function that contains routines to handle input and
output data, as well as control the execution of the core, which is the functions of
the measured module.
1. Study restrictions and requirements of the module, input and output data.
2. Write a wrapping function that caters to the specific needs of the code under
test.
3. Generate suitable input data using simulated algorithm or other means.
4. Set up measurement environment in Code Composer Studio, which includes
breakpoints and profiler settings.
5. Run measurements. Possibly using automated measurements controlled by
an external script.
6. Collect results, compose tables, draw graphs, etc.
This procedure is what is used in this study, and works well for the Tracking module. Although not tried during this project, it should work on many other modules
as well by just modifying the parts that depends on the measured code.
In reality, most of the items in the list will require some work when adapting the
procedure to different modules (or systems). Most obvious is perhaps the wrapping function (the module test function) and the input data generation, which are
strongly connected to the measured module. But the measurement setup in Code
Composer Studio also depends on how the measured module should be executed,
and what to measure. And linked to that is of course the sequence of instructions
in any automated measurement scripts.
A measurement-based procedure such as this one often takes a lot of time to
perform. But once the proper preparations are made, it is easy to use and can
produce a great amount of output data. The nature and volume of the output
data can be controlled in detail by configuring the measurement tools, in this case
Code Composer Studio.
1.7 Thesis Structure
1.7
11
Thesis Structure
This thesis is divided into seven chapters. Chapter 2 gives an overview of the
current Night Vision system, relevant to this work. Chapter 3 describes available
methods to perform the performance study and motivates the choice of method.
Chapter 4 explains the test environment used for the study, along with other preparations made prior to actual measurements. Chapter 5 describes the measurements
by showing used test cases and information on measurement result data. Chapter
6 compiles and analyses the different results obtained from the measurements and
Chapter 7 contains a discussion on the results and conclusions.
Chapter 2
System Description
This chapter provides an overview of the current system. It gives a reasonably
detailed view of the system, without going into too much detail. The system is
currently undergoing development and being sold on a competitive market, which
means that certain information is considered confidential.
The chapter focuses more on the software side of the system, since that is where
the main part of the work lies in this project. But in order to understand how the
system works in general, the reader needs to be provided with some details on the
hardware as well.
2.1
General Overview
The Night Vision system is a system for improving driver and pedestrian safety
when driving during night conditions. It is designed to provide a video image
showing what is in front of the car, and to warn the driver if a pedestrian1 is
in, about to enter, or dangerously close to, the vehicle’s path. The purpose is to
increase the driver’s awareness and help to prevent accidents.
The system (Figure 2.1) consists of an infrared emission detection camera, an
ECU and a video display. The camera sends a video stream of 30 frames/second
to the ECU and the car provides other sensory data. The ECU then enhances the
quality of the received image and detects pedestrians in the scene. The enhanced
image, along with detection and warning information, is then passed to the video
screen on the dashboard of the car.
1 The current version of the system is configured to automatically detect only pedestrians.
Detecting other things, such as animals, may be implemented in future versions. They are of
course visible on the display however.
13
14
System Description
Figure 2.1. General overview of the system. The ECU is communicating with the car
using the car’s CAN bus. It is also communicating with the camera in order to control
its operation. It receives video data from the camera, and outputs a processed image
along with any warning information the system is set to deliver to the display.
The result is a greyscale video image of the world in front of the car. In this
image objects that are emitting more infrared radiation are brighter than objects
emitting less. The system is also calibrated to be most sensitive in the temperature
range of emissions from humans and animals. This gives an image where humans
and animals are usually very bright compared to the background, and thereby
easy to distinguish from the background. There are however several other objects
in a normal traffic environment that get almost the same brightness intensity. To
complement the video image, pedestrian detection and warning information may
also be displayed on the display.
2.2
Hardware
The system hardware consists of three main parts apart from the car itself. The
camera, the ECU and the video screen (display) as shown in Figure 2.1.
The ECU communicates with the car using its built-in CAN (Controller Area
Network) bus. The CAN bus passes data about the car’s systems and surrounding
environment, taken from the different sensors the car is equipped with. Some of
the data provided is used in the different algorithms in the system. Examples of
such data is vehicle speed, ambient temperature, steering wheel angle, load sensor
data, etc.
2.2.1
Camera
The camera used in the system is a FIR camera (far infrared). According to
Planck’s law, every object that has a temperature above 0 degrees Kelvin emits
heat radiation. This heat is registered by the camera sensor, which only registers
these passive IR emissions, and does not use any type of active illumination. This
gives the system a very long range since attenuation in air is low.
2.3 Software Architecture
2.2.2
15
ECU
The ECU contains the necessary electronics to provide system functionality. Communication with the car and the camera is handled by the ECU, as well as image
processing and output to the display. A central part of the ECU is the DSP, on
which the algorithm runs.
DSP Characteristics
The type of processing unit used in this system is a DSP from Texas Instruments
[1]. Some of the characteristics of this DSP are important to help understanding
parts of the results presented later. This is a very brief overview of these, and the
DSP in general.
The DSP is a Very Long Instruction Word (VLIW) processor. It has 64 general purpose registers of 32 bit word length, and eight separate and independent
functional units (two multipliers and six arithmetic logic units (ALU)). It uses a
two-level cache-based architecture with a level 1 data cache and a level 1 program
cache. The level 2 cache memory is shared between program and data space.
The functional units in the CPU are divided into two sets, each with four units and
a register file, making up sides A and B of the CPU. The register files each have
32 32-bit registers making it a total of 64. In addition to using the 32 registers on
its own side, each set of functional units is also connected to the registers on the
other side via a single bus known as the data cross-path.
2.2.3
Video Display
The video display is normally situated somewhere on the dashboard of the car
where it is clearly visible to the driver. It receives the enhanced image from the
ECU along with the detection and warning information. The display technology
used in the car and decisions on how to display any additional information are
specific to each car manufacturer, and not part of the actual Night Vision system.
The system just makes sure it is possible to present the information requested by
the customer.
2.3
Software Architecture
Most of the software architecture is confidential and may not be revealed here.
Therefore, this section only describes the Tracking module (very briefly).
The system algorithm is designed as a number of separate modules, each implementing a part of the functionality. The Tracking module is one of the modules in
the pedestrian detection and positioning part of the system. The tracking module
is, as the name suggests, responsible for tracking detected objects. To be able to
track objects between frames, this module has to consider data on tracked objects
16
System Description
from previous frames as well as new data from the current frame, unlike many
other modules which only work on one frame at a time. This causes a timing
behaviour that is not only dependant on the current frame, but on a whole scene
(several frames in sequence), which is why this module is so interesting to perform
timing measurements on.
Input data for the tracking module consists of a list of detected objects and car
data. The list of detected objects is derived from the earlier stages in the algorithm (implemented in other modules). The output is a list of tracked objects
represented by their coordinates in the image, along with some other attributes
describing them to the algorithm. Figure 2.2 shows the Tracking module and its
input and output signals.
Figure 2.2. Tracking module with input and output signals.
The car data comes from the vehicle CAN bus and contains information from the
car’s own sensors, and other sensors that may have been installed in the car. This
information is used by several modules in the algorithm.
The tracking module works by maintaining a list of tracked objects, which is
updated appropriately as the scene progresses. New detected objects are considered continuously as well as data from the vehicle. A number of deciding factors
in the input data determines when objects are added to the list to be tracked,
and when objects are removed from the list. The module is also able to predict
the movement of tracked objects based on the available information for a number
of frames. This is done so that pedestrians that are temporarily occluded, by for
example passing cars, will not be lost. Then, if the object is found again within
following frames, it remains as a tracked object. Finally, the list of tracked objects is output for use by other modules in the system, as well as by the Tracking
module itself for its next iteration.
Chapter 3
Method Study
Real-time systems can be studied in different ways depending on what kind of data
that is desired. There are different general approaches to such studies that focuses
on different aspects of the target system. Each having benefits and drawbacks.
There are several different system characteristics that are of relevance when investigating performance and the possibility of optimizations. This chapter describes
different methods available, and discusses the choice of method given the most relevant characteristics for this particular project. The most important characteristics include primarily Worst-Case Execution Time, Code Coverage, Cycle profiling
data, and behavioural dependencies.
3.1
Worst-Case Execution Time
The WCET estimation is an important aspect of timing analysis. The basic concept is that the WCET of a program task is the longest time that task will take
during program execution. This is important to know because it makes it possible to better calculate system performance in extreme cases and design additional
system features, as well as recognize needs for improvements, optimizations and so
on. A good estimate of the WCET is needed in order to guarantee that a system
will meet its deadlines. This is particularly important in real-time systems.
Determining the WCET of a program and its sub-tasks may seem simple in theory, but in reality it is not. The time for a particular execution of a program task
depends on two principal things. One is the path through the code taken in the
specific case, and the other is the hardware on which it executes.
There are two main approaches to attaining WCET estimates for program tasks
[16]. One is an analytical approach known as static WCET analysis. The other is
a more practical measurement-based approach, sometimes referred to as dynamic
WCET analysis. Static methods tend to emphasise safe but pessimistic estimates,
17
18
Method Study
while the dynamic methods usually provides realistic estimates, but with no guarantees of safety if no guarantee of worst possible test case is given.
3.1.1
Static WCET
Static WCET analysis methods do not rely on executing the software on real hardware or on a simulator. Instead they analyse the code itself to derive a WCET
estimate. The process is generally divided into three different segments: flow analysis, low-level analysis and calculation [8, 14].
Flow analysis focuses on obtaining the different execution paths a program may
take during execution. It is information about which functions get called, number
of loop iterations, branch decisions made, etc., which in turn depends on the input
data. A part of the effort here is to eliminate infeasible paths and to determine
loop bounds and recursion depth. This has the potential to reduce overestimation
of the WCET and give a tighter estimate. Static WCET estimation tools usually
have the ability to detect some of these things automatically. Others has to be
provided by the user.
Low-level analysis is done to model the hardware characteristics. An accurate
model of this is needed to be able to retrieve accurate results from the analysis.
This has become more and more important as target hardware is making more and
more use of advanced performance enhancing features such as caches, pipelines and
branch predictors. On advanced processor architectures there is a great number
of hardware aspects that could be taken into consideration and modeled. This of
course makes the low-level analysis quite complex.
The calculation step uses the information acquired in the flow and low-level analyses to calculate an upper bound of all the execution times of the whole task [16].
There are three main methods of doing this: structure-based, path-based and using implicit-path enumeration (IPET). According to [7], the most commonly used
method is IPET due to its ability to handle complex flow constraints. For more
information on these methods, see [16] or [8].
3.1.2
Analysis tools for static WCET
To assist in static WCET analyses there are several different tools available, both
commercial and research prototypes. The main differences between them lies in
what calculation methods they use, and which CPU:s they support. Wilhelm et
al. [16] provide a survey of several tools used for static WCET analysis. We briefly
looked into a few of them to see if they would be of any use to this project.
One of the commercial tools available is the aiT Worst-Case Execution Time analyser developed by AbsInt Angewandte Informatik in Germany [2]. It is a static
WCET analyser which does control flow analysis, value analysis, cache analysis,
pipeline analysis, path analysis and analysis of loops and recursive procedures.
3.1 Worst-Case Execution Time
19
aiT supports a number of processor models, although unfortunately not the one
used in the Night Vision system.
Another commercial tool is Bound-T developed by the company Tidorum Ltd.
in Finland [4]. It does "static program analysis on the machine-code level" to compute the WCET of the target program, and the worst-case consumption of stack
memory. Bound-T supports a smaller number of target processors than aiT, and
none of them is the one used in the system.
Rapita Systems Ltd. has developed a tool called RapiTime [12]. It combines
elements of static WCET analysis with measurements. In addition to the WCET
it also gives the distribution of execution times for a given code segment. RapiTime
is not specific to any target configuration, in contrast to other tools. Rather it is
"compatible with virtually every 8, 16, and 32-bit microprocessor on the market",
as stated on their website. Although it is a potentially useful tool for this study,
we chose not to use it at this point. The main reasons for this is the extra work it
would mean to set this tool up, and that we did not have easy access to it since
it is a commercial tool. Autoliv preferred to use existing tools available to them
before purchasing new ones just for this project. For further mentions of RapiTime
related to this project, see sections 3.6 and 7.2.1.
Another tool is the Heptane Static WCET Analyzer [9]. It is a research prototype that is distributed under GPL license. Heptane tries to find upper bounds
for execution times of programs written in C by analysing the source code and/or
binaries [16]. It supports a few different target processors, none of which is the
one used in the Night Vision system.
3.1.3
Measurement-based (dynamic) WCET
This approach centers around performing measurements on the program code while
it is running. There are many different ways to do this, and many different aspects
of the execution (besides WCET) that can be studied with measurements.
In many cases, this is the approach used in the industry [7]. It is a more intuitive way to study WCET compared to static methods, and it is easier to apply
to specific projects since it will work for any software and hardware configuration.
Since any measured times include the hardware behaviour, no specific behaviour
analysis or customizations needs to be applied. It is replaced by the measurements. Furthermore, any execution paths that are measured are feasible, thus the
problem of infeasible paths that exists in static analysis is circumvented.
However, to find the path that leads to the WCET may present a problem. There
is no way to force execution of different paths without modifying the code, so in
order to measure e.g. the longest path the input data needs to trigger it. This
means that it is necessary to have a reliable set of input data that is guaranteed to
20
Method Study
generate the WCET. Finding this data may be difficult, or even impossible. Exhaustive measurements, where all possible input is used, are often not an option
since they can be very time consuming. Generally, a compromise is used, with
a test set that is as large and precise as practically possible. This produces an
estimate, but not any guarantees of upper bounds. To improve the accuracy of
the estimate you need to improve the set of input data.
There are many different ways to perform measurements. A simple way is to
insert some instrumentation code into the program that can provide time stamps
or CPU cycle counts. Another way is to use external hardware to collect timing
data, for example a logic analyser. There is also the possibility to leave the hardware out entirely, and perform the measurements on a cycle accurate simulator
with the help of a software profiler.
In many cases, setting up the test environment can be a challenge in itself. Depending on the code that is subject to measurements, the surrounding system
components and/or hardware that is needed can be quite hard to obtain and set
up properly.
The results of measurements can give a good overview of the variations in execution time of the program due to changes in input data or different context
(e.g. processor state). This makes it easier to study the overall behaviour of the
software in different situations. They are also useful to validate results from static
WCET analysis.
3.1.4
Comparison of Static and Dynamic Methods
Table 3.1 illustrates the most important differences between static and dynamic
WCET methods. Static methods compute the upper bounds on execution time
Estimate
Safety
Analysis
Static
Pessimistic
Safe results
Offline code analysis
Input
Equipment
No input data required
Software tools
Dynamic
Realistic
Unsafe results
Online measurements on running
system
Input data set required
Hardware system or software
simulator, measurement tools
Table 3.1. General characteristics of static and dynamic analysis methods.
by using Control Flow Graphs (CFG) of the code, and hardware models to model
the behaviour of the target processor. The result is safe, i.e. the execution time
can never exceed the result value. However, it is also a pessimistic overestimate,
since it calculates WCET for each code section and then combines them. It is
not uncommon that there are combinations that would not be possible to execute
3.2 Context Dependencies
21
during normal operation, so the accuracy of the result can sometimes be increased
by eliminating these infeasible paths through the code.
Static methods can be used without having to run the program code. This is
useful on systems that are complex to set up or requires a large amount of peripheral hardware or software to run. On the other hand it needs a fairly precise
model of the target processor to be able to account for hardware characteristics.
When using a measurement-based method there is no need for any hardware models. The processor behaviour is automatically included in the results. The result
is often more accurate than for static methods, since all results are realistic and do
not include pessimistic overestimations like static analysis does. But the results
are also unsafe, because there is the possibility that the actual worst-case is not
covered by the tests performed. This often means that a safety margin needs to
be added to the execution time. Measurements usually provide a better picture
of the distribution of execution times for different sets of input data, compared to
static analysis, and the accuracy of the results can be increased by adding more
test cases to include more possible execution paths.
Since measurements needs to have the program running, this method may require a more or less complex hardware and/or software environment. When this
is not a big obstacle, the method benefits from being simpler to use and easier to
apply to new hardware configurations.
3.2
Context Dependencies
While interesting, the WCET of the system is not the only concern. Another important aspect is the behaviour of the execution times depending on input data.
When the Night Vision system is in operation it is subjected to a wide range of
different situations. These are interesting to study for all parts of the system, but
in particular for the Tracking module since it has internal states depending on
several consecutive frames instead of just one at a time.
Acquiring the behaviour for the module in these different situations implies having to subject it to different sets of input data, and then measure the result. A
large part of this data would be real situations as they would appear in the real
world. But it is also interesting to see what happens if sudden irregularities should
occur. The module has multiple internal states, and sudden unexpected changes
could potentially disrupt the execution in ways not currently known. One way to
generate such irregularities is to cut apart and paste together different parts of
different real sequences into one single set of input data.
Designing artificial input leads to even more interesting behavioural cases. Data
could be designed to exploit known limits and weaknesses in the code and see how
this impacts the execution time. An example of this would be to artificially intro-
22
Method Study
duce detected pedestrians in a scene close to the limit of the number the module is
set to handle, and let them behave in ways that are computation intensive. This
would create artificial cases that is not possible to create using real data, and is
an attempt to increase accuracy by increasing the test set, as described in the
dynamic WCET part in Section 3.1. It is also made to introduce irregular and
unexpected input data that can be used for studying unexpected behaviour and
forcing new estimates of the WCET.
Even though module behaviour can be studied by providing different input, the
module’s behaviour when it is integrated in the complete system can not. As mentioned before, the simulator can not simulate events that are generated by external
parts of the system, which may cause changes in behaviour, e.g. different types of
interrupts. To do this there would have to be a way to generate such events in the
module test code, and the current module test environment has no facilities for
this. This is to be considered when detailed analysis is performed in the future.
3.3
Code Coverage
One aspect of code execution in this project is Code Coverage. This makes it
possible to see exactly what parts of the code that are executed under different
circumstances, and how much CPU resources that are consumed. In addition to
clock cycle consumption there is also information about for example cache hits
and misses.
Code coverage analysis can be done in Code Composer Studio (CCS). It is one of
the built-in profiling methods available. But it is only available on the simulator,
and not on the hardware emulator. It also requires that the code is compiled in
debug mode1 to be able to gather data.
The coverage data is collected by specifying between which points of the code
data collection is to be performed. Then the code is executed without collecting
data until it reaches the point where collection should start. At this point the
collection is activated, and the program runs until it halts at the end point. The
data collected and stored internally in CCS can then be processed to form an
Excel-sheet containing the code coverage information. More information on this
can be found in Section 5.2.1.
3.4
Cycle Profiling
Cycle profiling is a detailed analysis of the program execution. It can be used to
study how many clock cycles different functions consume, how many CPU cycles
that are spent on stalls, cache hit/miss ratios and various other characteristics.
Some of the information attainable through profiling overlaps with what can be
1 The debug mode does not negate other optimisations set for the compilation, so the efficiency
of the compiled code is not affected by this.
3.5 Built-in Timing Functions
23
given by the code coverage analysis. There is, however, much more information
available through the profiler.
Cycle profiling can be done using the profiling tool in Code Composer Studio.
It can be used on both hardware directly, and on simulated targets. But there
are certain factors that limit the usability of profiling on hardware, as described
in Section 4.1.4.
Code profiling is performed by specifying the different parts of the code that are to
be measured. These are called ranges, and can be for example specific functions,
specific loops or simply between two lines in the source code (or assembly code).
When the code is executed, the profiler analyses the code behaviour according to
what parameters it has been set to watch. More information on the results of the
profiler can be found in Section 5.2.2.
3.5
Built-in Timing Functions
One of the simplest types of measurements that can be performed on software
is to use instrumentation code that collects time stamps or cycle counts. These
functions can be inserted in the program at points of interest and gather data
as the program runs. This data can then be used to get execution times, follow
executions paths, etc.
The Night Vision algorithm code already has a number of these functions included
at various locations. These can be used to extract general timing information on
module level. More of them could also be included and put at other locations
to be able to measure the most interesting parts. However, care must be taken
when adding instrumentation code since there is a possibility that it may offset
the true performance slightly. It is more suited for measurements on higher levels
(e.g. module level), and less suited for lower levels (e.g. function level).
3.6
Adopted Approach
This section contains a discussion about what choices we have made regarding use
of methods and tools. The selection criteria is the practical use in this project and
the types of data that can be attained.
The objective of this study is to get a clearer picture of how the Tracking module behaves in different situations. This primarily means execution times for the
tracking process and its sub-functions. Getting execution times and their distributions is therefore a primary goal. Other data, like for example cache usage, is only
investigated in order to give a simple overview. One reason is that cache usage
investigations are often unnecessarily advanced investigations of module analysis.
Another reason this information can be ignored is that they are not accurate unless all parts of the system is measured at the same time, which is not the case
24
Method Study
here. This is because the cache usage of other parts of the system may interfere
with the cache usage of the Tracking module.
The absolute value of the WCET is of course interesting, but it is not critical at
this stage. As already mentioned the current system performs within limits for
real-time execution. A more interesting situation is instead when the worst-case
arises. What the input data is when it happens, and what the module state is.
The WCET does however give a good indication on how close to the limits the
system performs during worst possible conditions. Even if exact values are not
required, it is useful to know approximates.
The different tools for static WCET all have very useful features and would be
quite interesting to use. However, none of them readily supports the target processor that the Night Vision system uses, so any attempt to use any of them would
require time and effort spent on building a hardware model. Even if none of the
tools already support the required processor, most of them can be modified to do
so. Either by requesting it from the tool provider, or by doing it yourself. The
RapiTime tool is different from this in that it does not require as much work to be
usable. But a drawback is that it, as many of the tools, is a commercial product
which has to be bought to get it fully functional. The decision was made not to
use any of these tools initially, much due to the extra work this would cause when
customizing and/or evaluating, but with the option to use them later if there was
time.
The static WCET analysis approach focuses on providing safe estimates of boundaries for the execution times based on code analysis and calculations. It does,
however, not provide the level of detail desired when it comes to behavioural
relationships between input data and resulting execution times. For such data,
measurements using different input data is a much more suitable approach. Measurements provide a relatively easy and quick way of getting data for different
situations, while still retaining a fairly good chance of producing a satisfying estimate of the WCET (if a good test set is used). The measurement based method
also eliminates the need to spend valuable time on customizing a WCET tool to
build a model of the processor. Another factor contributing to the choice of using
measurements for this project, was that we already had access to measurement
tools through CCS that are designed for this type of study.
Code Coverage data is collected and used to some extent. Mainly for measurements
of entire sequences at once. The reason it is not used at all times is that there
were difficulties setting up collection of this data using the automatic testing script
(described in section 4.3.1). The interface used for the script does not allow for the
automatic creation of the code coverage excel files, and storing of these. Instead it
is possible to store all measurement data in binary files, and then taking care of it
afterwards. The problem is that this uses a large amount of disk space and requires
3.6 Adopted Approach
25
a large amount of work, and is therefore not practical when performing a large
number of measurements. Because of this we had to limit the use of code coverage.
Cycle Profiling is the main source of measurement data in this study. It is very
detailed, and the results are easy to extract and store using an automatic testing
script. The only problem is choosing what to measure, and the work necessary to
extract the interesting information from the results. The option to simply measure everything and then extracting the interesting parts is not viable since the
profiler uses a large amount of system memory. If too many system properties
are selected for measurement, the host computer quickly runs out of memory and
crashes. This practical problem makes it necessary to limit both the number of
properties measured, and the length of the measurement runs.
Although there are some built-in Timing Functions already in the code, no measurements were performed using them explicitly. The information obtainable using
these functions is easy to acquire using the profiler as well. And the profiler also
gives a better accuracy.
Choosing the tools to use in the study was not a major difficulty. We already
had access to Code Composer Studio, with all of its accompanying analysis tools,
and that provided enough functionality for this study. So going for alternative
measurement tools was hard to motivate.
We also expected that there would be some need for minor additional programs to
help facilitate the measurement procedure and/or process results. But these were
expected to be simple enough to write ourselves, so no efforts were made to find
external programs for this.
Chapter 4
Environment Setup and
Preparations
There are certain preparations that needs to be made before any measurements can
begin. The measurements must be performed in an environment where execution
can be controlled, as well as input and output data. This chapter describes the
test environment and other preparations that needed to be made before actual
data collection was performed.
4.1
Test Environment
The purpose of the test environment is to isolate the module that is going to be
measured. This is done to narrow the scope to only the parts of interest. If this
is not done, there is a risk that other parts of the system affect the measured
parts in ways that can be hard to trace. Even if this risk is low in this particular
case, it is a logical approach to testing work, and should always be followed in all
measurements and tests.
There are two different environments available to perform tests in. One is the
actual system hardware, with supporting hardware to make it possible to provide
input data and extract output data. The other is entirely in software using a cycle
accurate simulator provided by Texas Instruments. The simulator is integrated in
their development environment called Code Composer Studio (CCS) [6].
In the rest of this section we describe the benefits and drawbacks of the hardwarebased measurements as opposed to the simulator-based measurements, and motivate the choice of test environment used in the rest of the thesis.
27
28
Environment Setup and Preparations
Figure 4.1. Simplified principle for performing tests and measurements.
Figure 4.1 shows the principle for how the testing works. The test environment
is responsible for giving the unit under test (UUT) the correct input, taking care
of its output and control its execution. In our case, the input consists of recorded
video sequences along with car data, and the output is text data, which is easy to
analyse. However, the true procedure differs somewhat from this. It also differs
between the two types of environments, as explained below.
4.1.1
Module Test Function
The module test function is a function that wraps around any module that is to
be measured. It is a part of the test environment block surrounding the UUT in
Figure 4.1. Depending on which module this is, the function may look a little
different in each case, but its general function is to control the execution of the
measured module and supply it with input data.
In the case of the Tracking module, the basic structure of the module test function
is fairly simple. The module is surrounded by a loop that executes once for every
frame in the input data, just as the module would in the real system. Outside of
that loop there is some variable and module initialisation routines necessary for
correct operation. These are only run once for each separate measurement run.
Figure 4.2 shows the structure of the module test environment written for the
Tracking module.
To perform measurements, the module test function with the embedded module, and other routines that handles memory operations etc. (same as for the
normal system), are loaded into the target system, and then executed. The target
system can be either real or simulated hardware.
4.1 Test Environment
29
Figure 4.2. Basic structure of the module test environment. setTrackingData,
TRA_process and TRA_createOutput are the Tracking module functions.
30
4.1.2
Environment Setup and Preparations
Hardware Test Bench
The hardware test bench setup consists of the UUT (the ECU), a hardware emulator, a CAN bus interface (CANcaseXL), a Low-Voltage Differential Signaling
(LVDS) unit, a camera, a video screen and the host computer, which is a normal
desktop computer. It is connected as shown in Figure 4.3.
Figure 4.3. Hardware test bench setup.
The CANcaseXL device is a data interface for CAN communication used for supplying the ECU with CAN information from video files running on the PC. The
LVDS unit handles the image data being provided to the ECU. It can come from
either a camera directly, or from video files running on the PC. The hardware
emulator is used for connecting Code Composer Studio and its different analysis
tools directly to the hardware on the ECU. It is necessary in order to analyse code
executing on the hardware.
4.1.3
Cycle Accurate Simulator
A cycle accurate simulator can simulate the microarchitecture of a processor. This
means it functions exactly like the corresponding hardware would. In addition to
simulating the functionality, it also provides performance metrics such as cycle
counts, cache hit ratios and other resource utilization data [13]. Unfortunately,
cycle accurate simulators are very slow for advanced systems due to the high degree of complexity found in modern architectures.
The Code Composer Studio IDE includes a cycle accurate simulator for the processor used in the Night Vision system. This can be used to run the code on to
get accurate measurements, using the built-in profiling tool. Although slower than
using real hardware, it provides more accurate results. This is because there are
4.1 Test Environment
31
certain limitations when profiling execution on hardware, as described in Section
4.1.4.
Figure 4.4. Measurements using the simulator.
One drawback of the simulator is that it does not simulate any external peripherals that might be connected in the full system. This includes e.g. communication,
image capture and other interrupts. In the real system external peripherals affect
the performance of the system, including the Tracking module, in ways that simulator measurements do not model.
Measurements using the simulator require some amount of pre-processing of the
input data to make it suitable for the module test. There is also some postprocessing of the output needed to take care of the results properly. These steps
are explained in Section 4.2.5.
As can be seen in Figure 4.4, the results come from external measurement tools,
rather than from the test itself. This is because the measurement data is what
is most important. The output generated by the module is actually discarded in
most cases. The correctness of this data is relatively unimportant in this case
since it does not affect time consumption in any significant way. The accuracy of
this data is subject to other types of testing. (The correctness of the output has
previously been verified by running the same version of the algorithm on a system
test bench at Autoliv).
32
Environment Setup and Preparations
4.1.4
Hardware Profiling Limitations
There are a few factors that limit the usefulness of using the hardware for measurements. These are important to consider when choosing which method to use.
According to [15], there are several hardware-specific limitations in the CCS Profiler. These include:
• Program performance limitations
• Profile accuracy limitations due to:
– Cache invalidation
– Execution packets and conditional branches in optimized code
– Branch delay slots
Program Performance Limitations
The CCS help documentation states the following:
"To gather profile data for your application, the profiler must query
the target for information on your desired event(s). Unfortunately,
this target query also invalidates the target’s cache. Previously available cached data and instructions must be reloaded before they can be
effectively utilized by your application. This results in a degradation
of application performance."
Furthermore, the performance degradation is proportional to the number of ranges
being profiled, as it increases the number of times the target has to be queried.
The cache is invalidated more often, and the execution times increase.
Profile Accuracy Limitations
The accuracy of the profile data is affected much in the same way as the program
performance is. The measured values are inflated by target queries invalidating
the cache, and the inflation is larger the more ranges that are being measured.
4.1.5
Simulator vs. Hardware
Due to the hardware limitations described in Section 4.1.4, Texas Instruments
recommends that the simulator is used for all extensive profiling. The simulator
has minimal impact on performance, and is not affected by queries that invalidate
cache contents [15].
The hardware setup is faster to run measurements on since they are run in realtime. It is also more representative of the real system because it is not simulated
in any part. The problem with this approach lies primarily in the limitations described in Section 4.1.4. If measurements are performed on hardware, the number
and size of ranges should be kept as small as possible, which would imply a higher
4.2 Test Data
33
number of separate measurement runs.
The cycle accurate simulator is a lot slower. Measuring execution times with it
takes considerably more time than performing them on hardware. The simulator
is however providing more accurate data than the hardware due to the problems
described in Section 4.1.4.
These factors make the choice of measurement environment a question about
speed versus accuracy. Because we do not know exactly how much the results
from a hardware measurement would differ from the simulated, and because there
are a lot of ranges that need to be measured, we have chosen the cycle accurate
simulator. This unfortunately means that the measurements take a long time to
perform.
4.2
Test Data
In order to test the performance of the Tracking module, it requires input data.
This data consists of all parameters that the module requires for normal operation.
It does not include the actual video stream however. The image data is only used
in earlier stages of the algorithm, which generates the appropriate input data for
the Tracking module.
This makes it possible to exclude the video stream, and only consider relevant
input data such as detected objects, vehicle CAN data, and other data from previous steps in the algorithm. This also allows for generating artificial test data,
with the purpose of testing certain differences in a controlled manner.
4.2.1
Tools for Generating and Manipulating Test Data
There is one major software tool involved in the process of creating test data to
use for measurements, namely MATLAB. MATLAB can be used to simulate the
entire system and generate the required input data for the Tracking module, based
on recorded video sequences. Autoliv has developed a simulator program for this
purpose that has been used to extract data for this project. The output of the
simulation is a MAT-file, containing all relevant data. Some of the input data may
need to be modified in order to serve different types of measurements. For this,
two scripts were written.
Data modifier script. This script is written in MATLAB and is designed to
be able to change the data fields in the MAT-files used. It can for example add
and remove fields in the structure array. It can also change the data in specified
fields to any values. In its current implementation, which is the only one used for
this project, it simply adds one new parameter to the input data that is needed
for all measurements. The script can easily be rewritten to change other parts of
the data if necessary.
34
Environment Setup and Preparations
The source code for this script is in large parts copied from an existing script used
for testing at Autoliv, with only minor modifications. Therefore the source code
is not included among the appendices of this report.
Data Cutting Script. This script was created for the purpose of being able to
easily modify input data. Because the input data exists as MAT-files, the obvious
choice was to write the script in MATLAB.
The script loads two MAT-files specified freely by the user, and then takes frames
from these and puts them together as a single sequence of frames in a separate
output file. The order in which the frames are placed, and which frames that are
taken from the source files are completely up to the user.
The user must specify the frames used by modifying the variables framesfromvictim1 and framesfromvictim2, which represents the two input files. The frames
are specified by designating a number of ’chunks’, along with the specific frames
they should contain. Based on this, the script computes the length of the target
sequence and a list of points where it should change source file when it builds the
target sequence. It then constructs the target sequence by copying frames one by
one from the correct source file.
4.2.2
Recorded Test Data
Test data is based on video sequences recorded when driving the car. The video
file is passed through the MATLAB simulation to generate the required input data
for the tracking module. The input data consists of a list of detected objects and
their properties along with CAN data.
After the simulation is finished, the output is stored in a MATLAB structure array. At this point it is possible to apply any modifications that are to be done
to the data. The data file is then passed through another MATLAB program to
convert it into a format suited for input to the algorithm running on the hardware
simulator. The data is stored in HEX-files which are used directly as input to the
module test code.
Since the focus lies on testing the tracking module, a number of test cases needed
to be identified that are especially difficult for this module to handle. These
test cases should induce intensive calculations which consume a lot of CPU time.
Scenes that are particularly demanding of the tracking module includes:
4.2 Test Data
35
• Scenes with many pedestrians
• Detections exceeding a certain age level1
In addition to these calculation intensive scenes, there are a number of other
situations that are interesting to investigate. There are a few variables that cause
the code to execute differently depending on their values. By analysing the effects
it is possible to get useful information about which events that are causing what
changes in execution time. Examples of such variables include:
• Vehicle speed
• Size of detected objects
• Number of detected objects
• Objects whose representations are overlapping each other
• Detected objects being temporarily occluded (tracked objects getting occluded means that the Tracking module will try to predict their movement)
Test Set
The sequences chosen to make up the test set have different characteristics. Some
of them are chosen because they were believed to cause heavy CPU load. Others
were chosen because they contained situations mentioned in the list above. Some
were also chosen outside of these guidelines. They are just situations that can
serve as reference sequences.
The test set contains 8 sequences chosen among 25 sequences originally cut from
files in Autoliv’s database. Each sequence is 200 frames long, which is about 6,7
seconds. Since the Tracking module only needs a fraction of this number of frames
for correct operation, this is enough time to observe behaviour in the Tracking
module in different situations. Table 4.1 lists the sequences that were used, and a
short description of their characteristics.
1 The age of a detection is the number of consecutive frames it has been present. There is a
threshold value for this, below which detections are discarded as false, and not tracked.
36
Environment Setup and Preparations
Sequence name
Test_sequence_01
Test_sequence_08
Test_sequence_09
Test_sequence_19
Test_sequence_20
Test_sequence_21
Test_sequence_22
Test_sequence_23
Details
Inner city street. Speed: 0-20 km/h. Vehicle stopping at crossing. Large group of people in front. People occluded by passing mopeds and a car.
Countryside road. Speed: 86.5 km/h. Road turns to
the right, slightly uphill. No pedestrians.
Highway. Speed: 95 km/h. Straight road, slightly
uphill. No pedestrians.
Inner city street. Speed: 21-22 km/h. People standing by their cars on both sides of the street. One
overlapped by another. Several pedestrians walking
in the distance, some occluded by passing car.
Suburban residential area street. Speed: 40-44
km/h. Houses and trees on both sides. No pedestrians.
Open area in city center. Speed: 14-28 km/h. Large
number of pedestrians. A couple walking close in
front, many farther away. Pedestrians on the side
occluded by turning car.
City street. Speed: 38-48 km/h. Passing under railway bridge. Single and groups of pedestrians walking
on both sides, some are partially occluded.
City street. Speed: 16-25 km/h. Vehicle slowing
down to stop at traffic light. Several pedestrians
crossing street in front, both single and grouped.
Many pedestrians overlapped by others.
Table 4.1. The different test sequences used, and a few of their characteristics.
4.2.3
Modified Recorded Test Data
Modified test data is composed of data from one or more of the sequences from
the previous section. It may for instance be the first 30 frames from sequence 01,
pasted together with frames 20-60 from sequence 08. The MAT-files created when
running the test sequences through the MATLAB simulation are cut apart and
pasted together using the script written specifically for this purpose. The resulting
files are used as input to see what happens when there are such anomalies in the
input data. It gives hints as to the behaviour when the tracking module encounters
irregular input. Subjecting the Tracking module to this kind of interrupted and
irregular data should produce visible effects in the measurement data (e.g. cycle
counts), but the exact nature of these effects is hard to predict and depends on
the exact form of the input data.
This data does of course not model the real world or real situations in any way,
but it is a way of studying behaviour in some situations not normally producible
4.2 Test Data
37
during real conditions. Some situations similar to these may however arise when
performing system tests for algorithm verification, where many different sequences
of varying length are used consecutively. It is relevant to know that tests using
this data are experimental, rather than focused on specific results. These kinds of
tests often leads to discoveries of unexpected behaviours or relations that would
otherwise pass unnoticed. Hopefully this data will reveal module behaviour details
that are not currently known.
4.2.4
Generated Test Data
Artificially created test data is used with the intention to be able to control which
areas of code that will be executed by manipulating input data manually. This
means in practice to alter certain variables to force certain conditional branches
of code to execute. This allows for a detailed view of which sections of code that
are consuming lots of time. But since this kind of testing does not necessarily
model the real world, it can never fully replace real test cases. It is however useful
for examining extreme cases, even though these cases may never appear in a real
scenario.
4.2.5
Generating Test Data and Processing Results
As described earlier, the tracking module is only a part of the complete system. It
is depending on information given to it by previous modules in the algorithm. But
the Tracking module is isolated for the purpose of measurements, which means the
other modules are not connected. Therefore, a pre-processing step is needed to
provide all input that the Tracking module needs that would normally be provided
by the other modules.
Similarly, there is need for post-processing to take care of measured data, and
extract the relevant parts.
Pre-processing Steps
Figure 4.5 is an illustration of the steps involved in transforming the video files to
input usable by the tracking module.
Figure 4.5. Tracking module input data generation using MATLAB simulation
38
Environment Setup and Preparations
The first step is to generate MAT-files from the recorded video sequences that form
the test set. The MAT-files will contain all information relevant to the tracking
module, and it is a format that is easily viewable using MATLAB. This way it is
possible to study the data just before it enters the Tracking module, and it is easy
to modify it if needed.
Conversion to MAT-files is done by simulating the algorithm steps in MATLAB.
The simulation uses the recorded video files as input, and saves output data in the
MAT-file. The simulation is set up to save detected object data, CAN data etc.
just before the tracking module. It is also configured to save this data directly
after the Tracking module.
The second step is optional. Here it is possible to make modifications to the
MAT-file in order to modify the input data the tracking module receives. It can
be for instance cutting the file apart and pasting it together in a different order,
or modify various attributes and variables. Any modifications here are normally
done using MATLAB.
The third step is the conversion to the HEX-files that are used as direct input
to the module. There is a previously written MATLAB program available that
does this. It creates a single HEX-file per frame of input data that contains the
data the Tracking module requires.
After the third step the HEX-files are ready to be used by the module test code
as input to the Tracking module.
Post-processing Steps
The profiler used to measure the module execution is set to save the measurement
results as CSV-files. These files normally contain tables with large amounts of
data. A Python-script was written that parses the data specified by the user and
stores it in a TXT-file for easier use. The contents of the TXT-file can then be
transferred to other programs, for example Microsoft Excel, for further analysis.
4.3
Tools
There are five tools that have been developed specifically for this project. They
are all focused on different parts of the measurement procedure. One of them is
a way to automate measurements, two are used for manipulation and/or creation
of input data (as described in Section 4.2.1, and two of them are designed to
extract data from files generated generated during the measurement process. In
this section the automatic measurement script and the data extraction scripts are
described. The other two scripts, that are used for the creation of modified data,
were described in Section 4.2.1.
All the tools are rather short scripts, which in their current form rely on the
4.3 Tools
39
user to provide them with critical input directly as variables. This means specifying things like desired properties of desired functions in the case of extracting
data from profiler data, or the frame order for modified sequences. Also source
files and their paths are currently given to the scripts in this way.
4.3.1
Automatic Profiling Script
The Automatic profiling script is a small script designed to automate measurements and data collection using Code Composer Studio. It is written in Python
and uses an interface that contains functions that allow control of several features
in CCS, including starting the program, loading environment, compiling code, profiling execution, etc.
It is basically a straight forward sequence of commands that mimics the actions the
user would take when performing a profiling measurement. They are also placed
in a loop to enable multiple consecutive measurements.
The profiler in CCS needs to be told exactly which parameters that are to be
included in the measurement. They need to be provided as a list of strings that
matches the field names the profiler uses. There is a large number of these parameters available, but only a few are selected here and are kept as a constant in the
script.
Below is the basic algorithm of the script. The command sequence depends on the
workspace setup in CCS. If, for example, the breakpoints are placed in a different
manner the script needs to be modified to compensate.
1. Initialize:
1 Start Code Composer Studio.
2 Load workspace and associated settings (in this case provided by a
previously saved workspace file).
3 Load program2 into simulator (requires that the program has been previously compiled).
4 Run target until first breakpoint.
5 Load profiler settings (in this case provided by a previously saved profiler settings file).
2. Loop x times, where x is the number of frames in the target sequence:
1 Check frame number.
2 Create destination folder for data.
3 Enable profiling.
2 The program that is to be measured. In this case it consists of the Tracking module and the
surrounding module test function.
40
Environment Setup and Preparations
4
5
6
7
Run target until stop breakpoint.
Disable profiling.
Save measured profiling and summary data to destination folder.
Run target until start of next iteration.
3. Close Code Composer Studio.
The first item in the loop (point 2.1) checks the frame number. If it has certain
values, the program jumps to a subroutine that closes down CCS. After this it
restarts CCS and continues measurements at the point where it was closed down.
This may seem like a strange thing to do, but it is in fact a work-around for a
problem that occurs when running long measurements in CCS.
The problem is that CCS consumes a lot of system memory as it collects data.
And for some reason it does not free this memory up until it is shut down. So
after a while, the host computer runs out of memory and crashes. In order to
prevent this from happening, the script shuts CCS down at specified frames in
the sequence. It is then restarted, and the sequence is looped forwards without
measuring until it reaches the point where it was, and then the measurements are
resumed.
4.3.2
CSV-file Parser
The output from the Profiler is saved as two different CSV-files for each frame of
input data. Each file consists of a large table containing the data collected for that
frame. When the data is later used, only one value in such a table is normally
needed for each frame, which means going through up to 200 files picking a single
value from each one. It would be extremely time consuming to manually do this
every time a new parameter value is needed, so instead the CSV-file parser was
created.
The script is simple in principle. The user provides information about what data
that is to be collected, in what type of files this data exists and the parent directory
where the files are located. The script then builds a list with all such files found
in that directory, and then searches each file in that list for the specified value.
When all files have been searched it writes the values to a single line in a TXT-file.
What would take several hours to do manually is now done in 10 seconds.
4.3.3
Data Extractor
This MATLAB-script is used for the same thing as the CSV-file parser, but instead
of working on the output files, this script extracts data from the input. The files
used are the MAT-files before the data is converted into HEX-files. The structure
of these files makes it relatively easy to extract specific parts of it. The script is
currently written to traverse the array in the file and save the number of detections
and vehicle speed for each frame to a TXT-file. It can however easily be extended
to collect other information as well.
Chapter 5
Performance Measurements
This chapter details the measurements performed in this study. It describes the
different test cases with both recorded and modified input data.
The results of the measurements for the different test cases are explained in Section
5.2. The section only contains examples and explanations necessary to understand
the measurement results. The actual measured results are instead analysed and
discussed in Chapter 6.
5.1
Test Cases
To accumulate data for statistics and behavioural studies, a number of different
measurements were performed. These measurements are categorised into several
different types of tests, called test cases, which are described in this section.
When the Tracking module is invoked by the real system, three different functions are called (see Figure 5.1). One sets up the input data for the Tracking,
one handles the output of the tracking and one is the actual tracking process.
These three functions are all measured at the top level. But the most interesting
function is of course the tracking process, which does all the actual computing.
This function is measured in more detail than the other two. There are two levels
of sub-functions in the tracking process, and these are all measured separately
(exceptions to this are functions related to error handling).
Most of the measurements are performed in the same way: First a look at the
top level, then two more runs to cover the different sub-functions. Tests are differentiated by varying input data, instead of any other modifications to the procedure
itself. This is done to ensure method consistency throughout the measurements.
But there are a couple of exceptions to this in the form of special test cases performed to verify e.g. some special relationship. One example of this is the test
in Section 5.1.1, which is a preliminary test to establish differences in measurement results for different measurement ranges. This is done in order to choose
41
42
Performance Measurements
measurement granularity for the rest of the measurements.
5.1.1
Function and Loop Clock Cycle Comparison
When measuring cycle consumption of a piece of software (and other types of
measurements), it is usually critical to only measure the relevant parts of interest.
This is however not always easy. A program may be written in a way that makes it
complicated and/or time consuming (or even impossible) to perform precise measurements. If this is the case, much time and work can be spared by making less
accurate measurements. But this may only be done if the measurement inaccuracies are negligible or tolerable.
The code examined in this thesis is an example of such software. The structure of the test environment (see Figure 5.1) gives the possibility to decrease time
consumption for measurements by including some code outside the actual Tracking module (supporting code belonging to the test environment). But at the same
time this would decrease the accuracy of the measurements. In order to get an
indication on how important it is to limit the measurement range for these particular measurements, some preliminary tests were performed.
The goal of doing this is only to get an estimate on how much cycle time that
is consumed by the supporting code relative to the critical code section. This
would indicate what code that is most important to exclude from the measurements to maintain accurate results for them. The code that is the real target for
the later measurements may of course never be excluded, but surrounding it is the
module test code, which can be excluded if needed. In this case, the module test
code should not vary much in cycle consumption depending on input data, since
the operations it performs does not depend on the contents of the data. The only
thing that may affect it is the number of times it is run, and then the relation
to number of frames in the sequence is linear. This makes it possible to get the
desired information by doing only one test per range of code that is considered for
exclusion.
The module test code, which is used to read input data from files and use it
in the the tracking module, consists of more code than simply the loop used for
passing each frame to the tracking process. This code is mostly consisting of parameter initialization, memory allocation, creation and deletion of the tracking
instance.
We measured how many clock cycles these extra lines of code consume. The
purpose was to see if the number of cycles was negligible when compared to the
cycles consumed by the loop. If it was found to be negligible, it would make no
real difference to take measurements on the whole function instead of just the
loop. That would simplify the testing process and speed it up significantly. In the
following text we describe how these measurements were performed.
5.1 Test Cases
43
There were two tests of this type performed. Both of them used the same single
video file as base for the input (test_sequence_01). First, the cycle consumption
of the entire module test function was compared to the cycle consumption of the
loop containing the tracking module functions. Secondly, the cycle consumption
of the loop was compared to the consumption of only the actual tracking module
functions. One measurement for each of these cases is enough to provide the desired data, and to make it as clear as possible, the measurement was for the entire
length of the input sequence.
The actual measured relation between test code and module code cycle consumption will of course be different depending on the choice of video file used for input.
But the change is all in the module code, which is what is to be measured anyway.
For this reason, test_sequence_01 was chosen since it contains an average traffic
situation without any extreme conditions. So it is good to use as a reference sequence for these tests.
Figure 5.1. Basic structure of the module test environment. setTrackingData,
TRA_process and TRA_createOutput makes up the unit under test. Their names imply what their function is, but the actual functionality of the tracking module is entirely
located in TRA_process.
The result of the first test showed that the overhead of initialization and deletion
represents around 0.03% of the total cycles. This is considered to be negligible.
The execution of these lines are not dependant of the input data, but the execution
44
Performance Measurements
of the loop is, so the percentage could be slightly affected by the complexity of
the input data. This was however not considered to be an issue given the small
percentage the test resulted in.
Additionally, the loop itself also contains code which is not part of the actual
Tracking module. This includes the read from the input data files on an external
hard drive. This has a much higher potential to inflate the number of clock cycles
measured because these operations include memory accesses, which typically take
a long time to perform.
The second test resulted in an overhead of about 13,17%. This is too much to
be neglected. Because of this, all measurements for cycle profiling were performed
selecting only the functions which are called when the system is operating under
normal conditions, i.e. setTrackingData, TRA_process and TRA_createOutput.
The code lines that causes the extra clock cycles are thereby excluded from all
further tests on the module.
5.1.2
Measurements with Recorded Data
These are the measurements performed using real sequences recorded with the
data collection vehicle as input (described in Table 4.1). They are not altered in
any way.
Full Sequence Measurements
These measurements are done to gather statistics and values on different parameters when running entire sequences. Although not as detailed as the frame-byframe measurements described below, they are useful to get a good overview of
average resource consumption in different situations. They also require much less
time compared to measuring each single frame. Because of this, full sequence
measurements are used as the first level of measurements to provide quick results
and some indication of interesting areas. More details can then be provided by
frame-by-frame measurements later.
The input data for each run is 200 HEX-files (one file per frame) generated when
a full sequence is passed through the pre-processing steps. The measured ranges
are the three functions setTrackingData, TRA_Process and TRA_CreateOutput
with breakpoints outside the loop boundaries, as indicated in Figure 5.2. The
filled outer markers in the figure are the breakpoints between which the code
coverage and profile summary data is collected. The clock symbols mark measurement ranges for the profiler, in this case one for each of the three functions.
Measurement ranges state between which points the Profiler should collect data.
Defining a function as a range means the Profiler will start collecting data when
the function is called, and stop when it exits.
5.1 Test Cases
45
Figure 5.2. Setup for full sequence measurements of top-level functions.
Measurement data is collected using both the Profiler and the Code coverage tool.
The Profiler gives total values of measured parameters, as well as maximum, minimum and average values for single accesses to the measured range. For example;
in these measurements, the TRA_Process function is called 200 times, as there is
200 frames in a sequence. Each access is measured separately by the profiler, and
when the measurement is complete, it presents the information as the maximum,
minimum and average value for a single access of TRA_Process.
The Profiler also provides a convenient summary of the whole measurement run
with totals and percentages describing resource consumption statistics. Unfortunately, the data in this summary is collected from the start of the measurement to
the stop of the measurement, normally specified by breakpoints in the source code.
It disregards what measurement ranges that may have been specified in between.
The ranges are included, but so is other surrounding code. More details on the
summary window and what it contains are described in Section 5.2.2.
Running an entire sequence means that the test code has to loop through the
tracking functions 200 times. To achieve this in a practical way, the start and
stop breakpoints are located just outside the loop. However, this loop also has
to load and set up new input data to send to the module. This means that this
supporting code is also measured by the profiler and included in the summary. In
order to get accurate values for the summary one would have to match the source
46
Performance Measurements
code breakpoints with the measured ranges, and this is very impractical in a full
sequence measurement.
The Code Coverage tool helps visualize how the code is executed. In the case
of full sequence runs, the acquired data is of course totals. Even if some of the
data is overlapped by results from the profiler, code coverage makes it easier to
observe which execution paths have been taken, how many times they were taken
and total code coverage as a percentage. It also provides information about, for
example, how many CPU cycles or how many cache hits each line of source code
has caused. In this way it is more detailed than the profiler, but the profiler can
watch many more properties.
Frame-by-frame Measurements
In these measurements each single frame is measured separately, and results are
stored, before continuing with the next frame. Entire sequences of 200 frames are
measured in this way using the unaltered data from the original test sequences.
This type of measurement gives far more details than the complete sequence measurements described above. Here it is possible to get values for all the required
parameters, but for each frame. This can be used to study the behaviour of the
module over time as the input data changes.
The TRA_process function calls a number of functions during its operations.
These functions in turn call other functions. When looking at the call graph of
the TRA_process function, there are two apparent ’levels’ of these calls. So for
reasons of simplicity and structuring measurements, they are called the level 1 and
2 sub-functions. When measuring these sub-functions explicitly, the breakpoints
can be moved inside the TRA_process function to exclude as much surrounding
code as possible.
The input data is the exact same 200 HEX-files that are used for the sequence
totals measurements. But the measurement start and stop points are positioned a
bit differently (shown in Figure 5.3). Also, three different runs are performed for
each sequence, using different measurement ranges each time:
1. Top level functions
2. Level 1 sub-functions
3. Level 2 sub-functions
5.1 Test Cases
47
Figure 5.3. Setup for frame-by-frame measurements of top-level functions. Observe the
difference from Figure 5.2 in the placement of the breakpoints.
The first attempt at these measurements was done manually by starting profiling, running application for one frame, stopping profiling, saving results and
continuing to next frame. This process proved to be extremely time consuming
(approximately 6 hours per run per sequence, while requiring constant attention
by the person performing the measurements). This was simply not acceptable
considering the number of tests we wanted to perform. To solve this, a script was
written that could run CCS and control the measurements automatically using an
interface package for CCS, called Code Composer Scripting Tool (script details in
Section 4.3.1). Unfortunately, the interface only contained functions to handle the
profiler, and not the Code Coverage tool. Because of this, code coverage information is not collected for these measurements.
The positioning of the breakpoints allows for the exclusion of the surrounding
code from the measurement results. This in turn gives more accurate results in
the profiler summary view. However, when measuring sub-functions the code between functions is still included, which again reduces the accuracy. As mentioned
earlier, the only case in which the summary is accurate is when the breakpoints
and the range boundaries are at the exact same place.
5.1.3
Measurements with Modified Recorded Data
The data used in these measurements is taken from the original sequences described in Table 4.1. But the MAT-files generated after they have passed through
48
Performance Measurements
the MATLAB simulation are modified using the MATLAB script called DataCutter. Using this script, the frames can be cut and pasted together in any sequence,
allowing for creating irregular input data using repetitions or combinations of data
from different sequences.
Using this type of measurement is an attempt to investigate behaviour using other
input data than real recorded sequences directly. It is also simpler to create than
completely artificial input data. This is one way of increasing the test case coverage, which is one of the main issues with using a measurement based approach
to WCET estimations. Using this input data is likely to cause peaks in execution
time that are higher than when using normal sequences.
All these measurements are performed using the setup and method for frameby-frame measurements described in Section 5.1.2.
Repeated Sequence Tests
This is measurements on sequences that repeat a range of frames. The purpose
is to see if there are any differences in consecutive executions of the same frames.
Since the system is normally restarted for each new measurement run, measuring a
sequence several times always produces the same results. These tests are a way to
repeat a sequence of frames without restarting the system by using the DataCutter
to simply create new sequences containing the same frames repeated several times.
The measurements are performed using sequences of 200 frames. Each sequence
is made up of either 20 or 50 frames from one source sequence repeated several
times. For example it may be frames number 21-40 of test_sequence_01 repeated
10 times to form a new sequence of 200 frames.
Single Frame Measurements
As the most extreme form of repeated sequences, these measurements consists of
a repetition of the same frame over and over again. The goal of doing this is to get
values based on only that frame, and not influenced by any other frames that may
have come before it in the real sequence. The values should typically converge
towards the true value over time. These measurements are shorter than the ones
previously mentioned. They have a length of 50 frames.
The single frames used for this data are chosen based on what they contain visually, as well as how many clock cycles they consumed when in their correct context
(the original place in the unmodified input data, which has been measured in previous measurements). A few different frames are selected where the CPU load is
high, low and average.
5.2 Results of Measurements
49
Combined Sequence Tests
Another interesting test is to combine frames from different sequences. This can
be done in many different ways. The measurements performed here use input data
that is taken from 2 different sequences and put together in three different ways.
• Frames 1-30 from sequence A, followed by frames 31-60 from sequence B,
and then frames 61-90 from sequence A again.
• Frames 1-30 from sequence A, followed by frames 31-60 from sequence B,
and then a repetition of frames 1-30 from sequence A.
• Alternating the source between sequence A and B for each frame.
Sequence A and B may be chosen freely from the original sequences available.
Interesting cases includes alternating between scenes with lots of pedestrians and
scenes without any or with only a few. It also includes combining two sequences
which both have lots of pedestrians, but at different locations. All input data used
in the measurements has a length of 90 frames.
5.1.4
Measurements with Artificially Generated Data
Artificially generated data is data created by the user. The purpose of such data
is to be able to test situations not available in the original data, and to test limits
by exploiting known limitations of the program code. This is often called a white
box test.
This type of data is quite hard to design. There are large amounts of variables
that needs to be set to at least somewhat realistic values. One way is of course
to just randomise all or some of the variables, but that would be of limited use
since that would just provide unnatural input data, which is not the main purpose.
However, it has the potential to generate the WCET, and the probability for this
increases with a larger test set.
In more useful scenarios, where the data is realistic but modified to stress the
module, the creator has to have detailed knowledge about both the program and
how the data is handled. Because of time limitations for this project, tests using
artificially generated data have not been performed, and artificial data has not
been created.
5.2
Results of Measurements
This section describes the nature of the results. It includes examples of results
to show the output in its raw form before any post-processing is applied. Results
from the different measurements are extracted using several different tools and
methods, as described earlier. The output of these are explained here without
going into details about the individual test cases. More detailed analyses and
discussions are given in Chapter 6.
50
5.2.1
Performance Measurements
Code Coverage Results
Code Composer Studio uses Microsoft Excel to present the code coverage data as
several worksheets. The first worksheet that is presented is the summary worksheet. This sheet contains code coverage and exclusive profile data of all functions
in the program. It also contains two additional rows, named Others and Total.
The Total row shows the total of all event counts in the entire application. The
Others row shows counts that occurred in parts of code with no debug information.
It is important to remember that the exclusive counts of the various events only
includes code directly in respective function. It does not include any counts that
occurs in other functions called by this function.
Figure 5.4. Example of a summary page in the Excel worksheet created by the Code
Coverage tool. The actual numbers are not important. The figure is purely illustrative.
Details on what all the different columns in this worksheet means can be found in
[15]. They are also listed in Appendix B.1.
The other worksheets each corresponds to individual source code files. Each sheet
displays all the code in the file, along with the information listed in Appendix B.2
for each line of code.
5.2 Results of Measurements
51
Figure 5.5. Example of a function page in the Excel worksheet created by the Code
Coverage tool. The actual numbers are not important. The figure is purely illustrative.
Many of the columns in the function worksheet are the same as in the summary
worksheet. The ones that are listed are the ones that differ.
The tool makes use of colour coding to represent functions and lines of code that
have been executed or not. A green colour means the line or function has been
executed at least once, a red colour means it has not been executed and lines or
functions that has not generated any code when compiled are shown as black.
5.2.2
Cycle Profiling Results
The data collected during profiling is displayed in the profile viewer window in
CCS. Figure 5.6 is a screenshot of what this window may look like.
Figure 5.6. Example of the profile viewer after run and measurement of one frame
in one of the test cases. The actual numbers are not important. The figure is purely
illustrative.
There is one row for each of the ranges the profiler is set to measure. Each row
in turn contains a large number of columns, one for each of the various properties
measured. Which properties that are measured are selected by the profiler settings.
There are two tabs in the profiler setup that affects this. First, by selecting profiler
activities in the activity tab, for which a number of default properties are selected.
Then by selecting any other properties of interest in the Custom tab (see Figure
5.7).
52
Performance Measurements
Figure 5.7. Example of the activity and custom tabs in the profiler setup.
In addition to all the different properties measured, the profiler will always measure and display certain types of data for each property in the viewer window.
These are displayed for every property selected in the setup, apart from Address
range, SLR, Symbol name, Symbol type, Loop identifier and Access count, which
are shared for all properties on one row. Details on these types of data, and shared
columns, are available in [15], as well as in Appendix B.3.
For all measurements performed the option "Collect Application Level Profile for
Total Cycles and Code Size" in the profiler setup activity tab was selected. This
option generates the Summary window in the profiler results. This window displays a summary of the totals of the available events, along with percentages when
related to other values (e.g. amount of cache hits out of total cache accesses). The
data in the summary view is possible to get using the standard profiler output as
well, but it requires much more work afterwards.
The summary information collects data from the time when the profiling is enabled, until it is disabled. It disregards any measurement ranges that may be
defined between the start and stop points. For many of the measurements in this
project this means that the summary data also includes code that is outside the
parts of code that are the targets for the measurement. This causes an inaccuracy
that can only be reduced by narrowing the start and stop points to match the
defined measurement ranges, which is highly impractical when measuring many
different ranges.
Chapter 6
Data Analysis
This chapter contains the results obtained by the different test cases. But due
to time and space limitations, only a selection of the results are presented in this
chapter.
Results from the Profiler and Code Coverage tools are always in the form of large
tables. It is difficult to get a good overview of results using only such a table, so
the most interesting parts were extracted using scripts to parse the CSV-files, and
the data was then imported to Microsoft Excel to facilitate generation of graphs,
summary tables and so on to ease the analysis.
Two major approaches to collecting measurement data has been used. One approach is to simply run the entire test sequence from start to end in one single measurement. This is called full sequence measurements, and generates total counts
and average values for entire sequences. The other approach is to perform one
separate measurement for each single frame of the test sequences. This is called
frame-by-frame measurements, and can provide more detailed data and possibility
to study behaviour in a better way.
The two approaches represents two ’layers’ of the measurement work. To first
get an overview and indications on what to focus further measurements on, full
sequence measurements are performed. They are faster, easier and provide some
useful results, such as guidelines on what further measurements that should follow. They make it possible to exclude some detailed measurements that would not
be interesting, but still very time consuming. The results are however not very
detailed, and therefore have limited use. They can often be replaced with more
detailed results from more elaborate measurements.
The next step is to collect more detailed information by performing more detailed measurements. These are the frame-by-frame measurements. They provide
all the information that the full sequence measurements does, but for each single
frame. This is also where more care is taken in regards to what functions and
53
54
Data Analysis
sub-functions to measure and study.
The three main sections of this chapter (input data analysis, code coverage and
cycle profiling) are all divided into different subsections for the different layers of
measurements. First the full sequence measurements, and then frame-by-frame
measurements. The full sequence measurements do not provide results that are
part of the main focus of this project. They are however relevant as preliminary
results on which to build further measurements, and helps decide where to focus
the efforts.
Much of the actual data obtained in the measurements (especially full sequence
measurements) have been omitted from this report due to confidentiality reasons.
6.1
Result Accuracy Concerns
In this study, all the results come from the profiler tool (and the code coverage
tool) in Code Composer Studio. There has been no real possibility to verify these
results using another measurement tool. Although there is no reason to believe
that these results are incorrect, some kind of verification would have been preferable. However, considering that Texas Instruments has designed the simulator and
profiler to accurately model and measure the type of processor used, the results
should be fairly reliable.
Another cause for concern may be errors in the measurement procedure. For
example choosing wrong measurement range, or using wrong settings in the simulator or measurement tools. A great deal of care was taken to make sure the
correct things were measured, everything was set up as it should, and that the
measurements were performed in the same way every time. As a way to get an
indication of any variations as the measurements progresses, earlier tests can be
re-run at later times to make sure the results are still the same. A few such control tests were performed, but none of them produced any differences in the results.
A known source of inaccuracy in this study is the Profiler’s summary tab. It
is not actually inaccurate as such, but its results may often differ from the desired
results. This is because the summary collects its data between the points where
the profiler is set to start and stop. In other words, the summary also includes
code that surrounds the functions really targeted for measurement, which means
it will give a safe, but not accurate, estimation of the measured code. Figure 6.1
illustrates this. Assume that the targets for the measurement are functions A, C,
D and E, so the measurement ranges are chosen to cover these. But for practical
reasons (due to code structures such as e.g. loops) the breakpoints must be placed
at code line 1 and code line 4. This means that everything between them (code
lines 1-3 and functions A-E) will be included in the profile summary. But the profile data will include 2 lines. One for function A, and one for functions C-E (the
real targets for the measurement). No other code will be included, so the results
6.1 Result Accuracy Concerns
55
from the profiler will be accurate (as accurate as the Profiler is at measuring code
of course).
Figure 6.1. Illustration of breakpoints and profile ranges for the profiler measurement
data.
It is entirely possible to match the profiler’s start and stop points with the target
functions, and by doing so acquire accurate summary results as well. But when
measuring the large number of functions and subfunctions the Tracking module
includes, doing this would mean increasing the measurement work. In this simple example it would mean placing breakpoints at the first and last lines inside
function A, at function C and at function E. Then change how the measurement
is carried out since it is now split up into multiple parts. It becomes even more
complicated if a range is part of a loop, in which case it would mean a separate
measurement for each iteration of the loop. In small examples such as this one,
this may not require much extra work. But in complex systems with a large number of functions and subfunctions the increase in work is considerable.
Because of this we chose to sacrifice the accuracy of the summary in order to
be able to perform more measurements more easily. The profiler summary data is
not vital anyway. It is a fast and easy way to get indications on many different
aspects of the measurement results, but since more detailed and more precise data
is available through the main profiling data, that is used for analysis instead. The
summary is used to get a clearer picture on what to focus detailed measurements
on.
Depending on the nature of the inaccuracies present when measuring, the degree
to which they impact the useful results can vary. If the inaccuracies themselves
vary over time they pose a bigger problem than if they are relatively constant. For
these measurements the known inaccuracies were determined to be fairly constant
throughout time without any major relations to the input data. This is because
they are mostly caused by code that does not vary much in CPU usage. As long
as the inaccuracies are always the same, they can easily be compensated for when
56
Data Analysis
analysing. It is however worth to note that this was not investigated in great detail,
and it could be an issue if more detailed measurements are performed in the future.
Another fact that reduces the significance of the known inaccuracies is that the
study focuses more on relative measurements. It is more important to compare the
behaviour and values between measurements than to get absolute values. Provided
that all measurements are performed in the same way, the inaccuracies should also
be the same in all cases, and can thereby be ignored. Exceptions to this are any
inaccuracies that depend on the actual input data, since that varies between the
test cases.
6.2
Input Data Analysis
To help understanding the results there is some benefit in analysis of the input
data. Since the input data contains all the information on which the performance
of the Tracking module is based, studying this data is likely to reveal connections
between the input and the results. For example, the number of detections in the
input data will affect the cycle counts. Other possible factors could be detection
position, movement and size, vehicle speed, etc. The easiest way to get this data is
to extract it from the MAT-files generated by the MATLAB algorithm simulator
program.
There is also a way to compare this type of data before and after the Tracking
module, and in that way get indications on what the module is doing functionally.
This is done by using the MATLAB simulator program and extracting data from
both just before the Tracking module (the input data), and just after the module
(module output data). Doing this may help to understand behaviour that is visible
in the measured module performance data, and correlate it to module operation.
Unfortunately, Autoliv considers much of this information (comparisons, conclusions, etc.) confidential. Because of that, this section only mentions that the data
exists and what it can be used for in general terms. Not any specific details.
6.2.1
Full Sequences
The input data can be studied using summarised data from full sequences. This
provides information about e.g. total number of detections in the sequence, average
vehicle speed and where in the image most of the detections appear. This can be
used to some degree to compare execution times with number of detections etc., but
is really of limited use since the level of detail is very low. There is no information
about specific detections, and no information on input data changes over time.
6.2.2
Frame-by-frame
The input data can also be studied for each single frame. This provides a better
picture of how the sequence develops over time. This provides much more details
6.3 Code Coverage Data
57
of the input data. It is now possible to see e.g. when the detections appear, how
they move and how the vehicle speed changes. This should of course be compared
to module performance data measured in the same way to make it clear what
input data is causing what effects in the measured data.
Watching the difference in e.g. number of detections before and after the Tracking
module for each frame provides information about what the module is doing in
those frames. With knowledge of the code structure, and using this data together
with cycle counts and the source video file itself, conclusions and connections can
be made regarding e.g. internal module states and behaviour. Specific results and
conclusions regarding this data was deemed confidential by Autoliv, so they are
omitted from this report.
6.3
Code Coverage Data
The code coverage results are somewhat limited due to the fact that this type
of data was not collected for most of the measurements. The data that exists is
however useful for getting an overview of how much different functions are called,
and how many clock cycles they consume.
6.3.1
Full Sequence Measurements
Code coverage information was collected for the runs of complete sequences for all
files listed in Table 4.1. The data obtained provides values on many different characteristics of the execution besides the actual coverage. Examples include CPU
cycle totals, executed instruction totals and number of calls to different functions.
These values can be further studied and combined to produce values on, for instance, CPU stall percentage, number of instructions executed per CPU cycle and
overall CPU usage.
Most of the different characteristics the Code Coverage tool provides are also
available through the Profiler. However, we decided to also use the data from the
Code Coverage tool in the cases where it was gathered (i.e. full sequence measurements). Partly to see that there were not any unexpected differences between Code
Coverage and Profiler tools, but also to show the usefulness of the Code Coverage
data collection (as a part of the method evaluation). In some situations when conducting future measurements, there may not be a need for collecting other data
than what is given by the Code Coverage tool, depending on the required detail
level.
When looking at the numbers, it is evident that the cycle counts, instruction
counts and calls to various functions are higher for more complex input data. This
is of course entirely expected since there is simply more work for the Tracking
module to handle.
58
Data Analysis
The stall1 percentage, on the other hand, is lower for more complex data. The
total number of stalls is higher, but the relative number is lower. This indicates
a fairly efficient algorithm. The number of instructions per CPU cycle shows the
same behaviour. The number goes up with increasing complexity of the input
data. However, the number of instructions per cycle is still quite low for this type
of processor. The algorithm does not appear to be using the parallel functional
units of the CPU to their full potential.
When comparing total cycle counts of different measured sequences to stall percentage and detected objects, some sequences deviate from the normal pattern.
Most notable is test_sequence_20, which has a much higher stall percentage and
total cycle count than it should have considering its contents. This deviation is
more visible in later frame-by-frame measurements (see Figure 6.4). The difference
is explained by an algorithm-specific behaviour when tracking detected objects.
Code coverage data is also useful for studying number of calls and exclusive cycle
consumption of individual functions in the modules. This data reveals that the
one group of functions consumes far more cycles than any other functions. These
are the calculation functions (i.e. functions for addition, multiplication, division,
etc.). They are optimised functions designed for real-time applications.
Another piece of information obtained by the Code Coverage is the actual coverage
percentage, along with information on which code lines that have been executed
and not. The coverage percentage shows how much of each function that was
executed when it was called. In the case of full sequence measurements, it shows
how much of the code in a function that has been executed at least once during
the full sequence, which is of limited use since there is no information about how
many times or when those lines were executed. To get more useful information
more detailed measurements are needed (e.g. measuring each frame separately).
The information that is visible in this data shows that most functions that are
called have a coverage of 95–100%. This is thanks to the functions often being short and sequential. There are a few other functions containing conditional
branches (this includes error handling), and these naturally have lower percentages. The average percentages for the whole module using sequences containing
pedestrians is 95%. If using sequences with no pedestrians, most functions are
never called. The average coverage percentage of the functions called in this case
is 73%.
6.3.2
Frame-by-frame
In one instance code coverage information was also collected for frame-by-frame
measurements. This was the first such measurement, which was performed manually without using the testing script, on test_sequence_01. Collecting code cover1 For
a brief description of what stalls are, see Section 1.5.1.
6.3 Code Coverage Data
59
age for each individual frame gives an opportunity to study the actual coverage in
greater detail than is possible for measurements on full sequences. The only data
exclusive to the Code Coverage tool is the coverage information. The rest of the
data the Code Coverage tool provides is obtainable through use of the Profiler as
well.
Figure 6.2 shows total cycle consumption, number of calls to the AddNewObject
function, stall percentage and CPU usage per frame of test_sequence_01. The
frame-by-frame data provides a possibility to examine the behaviour of different
functions over time as the test sequence progresses.
Unsurprisingly, the CPU usage follows the same pattern as the cycle count. It
is also once again clear that stall percentage increases as the processor load decreases. The graph displaying calls to AddNewObject shows that new objects are
added throughout the entire sequence. Yet there is probably a significant difference in the number of objects that are actually being tracked, which is indicated
by the variations in cycle counts. Unfortunately there was no way to see how
many objects that are in the list of tracked objects, or how many objects that
were removed from it, in this data.
Figure 6.2. Characteristics of test_sequence_01 regarding total cycles, calls to
AddNewObject, stall percentage and CPU usage for each single frame.
Based on information gathered in the full sequence measurements, number of calls
60
Data Analysis
and cycle consumption of four different functions are shown in Figure 6.3. These
four functions are using far more cycles than any other function measured, and
they are the ones that are dominating the behaviour of the entire module. They
follow each other closely in regard to cycle consumption.
Code coverage data collection was unfortunately not carried out on any other
test sequences. Otherwise it would have provided ways to compare different sequences in greater detail. The reason for this is, as mentioned earlier, that it was
not easy to automate collection and storage of the data, and doing everything
manually was not a feasible solution. Instead much of the same data is collected
by the Profiler tool.
Figure 6.3. Calls to AddNewObject and total cycle counts for a few of the functions in
the tracking process function.
6.4
Cycle Profiling Data
The Profiler in CCS has been used extensively to gather data on a wide range of
different aspects of the Tracking module when it is executing. One part is collection of data for complete runs of full sequences in the form of totals, much like the
data gathered using the Code coverage tool. But the most extensive part is the
6.4 Cycle Profiling Data
61
data gathered for each single frame.
As input, both original and modified data was used. The data has been modified in several different ways in attempts to study certain situations or achieving
artificial worst-case results. with these modifications, the input data is not very
realistic most of the time, but it serves the purpose of studying certain behavioural
characteristics, and provide some statistics about them.
6.4.1
Full Sequence Measurements
The profiling tool allows for more accurate range specifications than the Code
Coverage tool does when choosing which parts of the code to measure. Since each
piece of code that is of interest can be targeted for measurement by specifying a
measurement range, it is easier to exclude irrelevant parts. This means Profiling
data has a potential to be more accurate than data gathered by the code coverage
tool.
Using measurement ranges for the Profiler to only cover the functions to the Tracking module, cycle counts were gathered and then compared to the tightest possible
measurement using the Code Coverage tool. The difference is in the range of 10–
20%, which are cycles consumed by code surrounding the module.
When measuring a series of executions of the same range, the Profiler provides
values on maximum and minimum number of cycles measured in one such execution, along with the average number of cycles consumed every execution. The
maximum number is actually the worst execution time for any frame in that particular measurement, while the minimum is the best. It is a quick and easy way
to get this result, but it carries no information about the true behaviour during
the sequence. It is however possible to see exactly which one of all the executions
that caused the maximum and minimum.
Just as the Code Coverage tool, the profiler also provides information on number and percentage of stall cycles. Additionally, the profiler is also able to divide
the stalls into four different categories (which the Code Coverage tool is not).
These are explained in Table 6.1. The number of stalls follow the same pattern as
observed in the code coverage data. Although, when studying the stall types, it
is clear that the amount of both L1P stall cycles and Cross Path stall cycles rises
significantly for complex input data (video sequences with many pedestrians, a lot
of movement, etc.).
62
Data Analysis
Stall type
Cross path stall
L1P stall
L1D stall
Memory bank conflict stall
Description
Stall due to use of the data cross-path.
Stall due to program cache access.
Stall due to data cache access.
Data banks do not allow more than one (1)
read/write at any time. When different requests need to access the same bank, this results in a bank conflict and one of the requests
stalls.
Table 6.1. Stall types. Source: [15]
6.4.2
Frame-by-frame
The main part of the measurement work has been concentrated on measuring each
frame separately. It provides all the same information that is gathered by running
full sequence measurements, but at a higher level of detail. Not only have measurements been performed on the three top-level functions of the tracking module,
but also on each sub-function called in the entire tracking process. Detailed data
is available for all these functions separately, and in all different test cases. This
section contains a reduced set, which represents parts of the data collected.
Real Unaltered Data
The different CPU load characteristics of the different test sequences are shown
in Figure 6.4. The graphs show total added cycle counts for the three functions
setTrackingData, TRA_process and TRA_createOutput, for six sequences.
6.4 Cycle Profiling Data
63
Figure 6.4. Cycle totals for different test sequences.
To see which functions are contributing most to the execution time, they need to
be separately measured and analysed. The use of profiling ranges in the Profiler
makes this easy to do, and the frame-by-frame approach clearly visualizes any
variations throughout the sequence. Figure 6.5 shows the cycle consumption of
the three top-level functions for test sequences 01, 08 and 20.
64
Data Analysis
(a)
(b)
(c)
Figure 6.5. Total number of cycles for top-level functions.
The cycle consumption of the functions setTrackingData and TRA_createOutput
is insignificant compared to the magnitude of TRA_process for any sequence containing pedestrians. It is also clear that they do not vary much in cycle consumption across the sequence. Therefore it is possible to largely ignore these functions
when looking at behaviour, and even absolute cycle count to some degree.
Further dividing TRA_process into its component level 1 subfunctions reveals
even more details. Most of the cycles are consumed by two individual functions
among the level 1 sub-functions. They have a cycle consumption far above the
rest of the functions. On the second level there are four functions standing out
from the rest. If further optimisation is possible, these are the functions to focus
on first. The curves of these functions are visible in Figure 6.6 which shows the
cycle counts for all functions. The function names are masked for confidentiality
reasons.
6.4 Cycle Profiling Data
65
(a)
(b)
Figure 6.6. Total number of cycles for different functions within TRA_process for
test_sequence_01. The function names are masked due to confidentiality.
Repeated Sequences
Figure 6.7(a) shows the cycle consumption for test_sequence_01 with unmodified
input data. Figure 6.7(b) is the result of repeating frames 1–20 of this sequence
over and over. It is easy to see that the repetition causes spikes in cycle consumption that are in the range of an extra 50–60 percent compared to the normal values
for these frames.
66
Data Analysis
(a) Total cycle consumption for unmodified input data.
(b) Total cycle consumption for modified input data.
Figure 6.7. Measurement of test_sequence_01, original and modified by repetition.
To make it easier to compare the frames of the repeated sequence to the ones
of the original, Figure 6.8(a) shows both the measurement result of the modified
sequence and the results from the exact same frames when it was measured in its
original form.
(a) Total cycles of repetition measurement combined with same frames from
original measurement.
(b) Difference in total cycles between
original and repeated sequences.
Figure 6.8. Measurement of test_sequence_01, original and modified by repetition.
It is obvious that the two series follow each other closely at first, but as soon as
the first repetition starts the cycle consumption rises sharply. It then takes about
8 frames until the cycle count is back down to where it was. This behaviour is
6.4 Cycle Profiling Data
67
expected as it is caused by certain characteristics on the algorithm itself. Figure
6.8(b) visualizes the absolute difference between the cycle counts of the repeated
sequence and the same frames from the original measurement. It is also clear that
each consecutive repetition behaves in the same way with no major differences.
Single Frame Sequences
The data obtained from these measurements shows that already in the second
frame the cycle consumption is at the approximate level that it will keep for the
duration of the sequence. There are however some fluctuations appearing over
time. Some show repeating patterns and some do not. Figure 6.9 shows examples
of three different cases.
(a)
(b)
(c)
Figure 6.9. Single repeated frames compared to the same frame from the original
measurements. (a) shows frame 50 from test_sequence_01, (b) shows frame 50 from
test_sequence_19 and (c) shows frame 184 from test_secuence_21.
The data also shows that the cycle consumption for a single repeated frame can
68
Data Analysis
be both lower and higher than the same frame was in the original measurement.
This likely depends on the frames surrounding the selected frame in the original
case.
Combined Sequences
The first of the three main categories of combined sequences is the ’natural’ frame
order. This means the sequences that consists of frames 1–30 from sequence A,
followed by 31–60 from sequence B and ends with frames 61–90 from sequence
A again. Figure 6.10 shows the total cycle count for a selection of four different
measurement runs with this type of input data. The figure also shows graphs
for the cycle consumption of corresponding frames taken directly from the results
when the sequences were measured in their unaltered form, as a comparison to
this measurement.
(a) Sequences 01 and 08 combined
(01-08-01).
(b) Sequences 08 and 01 combined
(08-01-08).
(c) Sequences 01 and 21 combined
(01-21-01).
(d) Sequences 21 and 01 combined
(21-01-21).
Figure 6.10. Combined sequences. Sources are frames 1-30 from first sequence, 31-60
from second sequence and 61-90 from first sequence.
6.4 Cycle Profiling Data
69
There are differences in cycle consumption between this measurement and the original ones. And in a few cases the differences are rather big. One example is Figure
6.10(a), where the middle part of the sequence comes from test_sequence_08, and
thereby does not contain any detections. Instead of reaching the low cycle counts
that this sequence ordinarily generates, the count stays high and fairly constant
until frames from test_sequence_01 reappears. A similar observation can be made
in Figure 6.10(b), where the two data series match each other quite well until the
frames switches from having detections to lacking them.
Figures 6.10(c) and 6.10(d) show the two combinations of test sequences 01 and 21.
The differences are less noticeable here, but still clearly visible. Like before, the
cycle counts returns to their normal values at about 8 frames after the switch in
source files for the frames. Naturally, the severity of any peaks caused by switches
like these depends on exactly when the switches occur. Predicting when to switch
to achieve the worst-case cycle consumption can be difficult. Figure 6.10(d) actually contains the highest absolute cycle count for any frame measured during this
project.
The next category of combined sequences is where the first 30 frames are repeated
after the frames from the second sequence. This makes it possible to see if these
frames generate the same output before and after the interruption of frames from
the second sequence. Figure 6.11 shows a selection of four such measurements,
along with the corresponding frames’ original cycle consumption.
70
Data Analysis
(a) Sequences 01 and 08 combined
(01-08-01).
(b) Sequences 08 and 01 combined
(08-01-08).
(c) Sequences 01 and 21 combined
(01-21-01).
(d) Sequences 21 and 01 combined
(21-01-21).
Figure 6.11. Combined sequences. Sources are frames 1-30 from first sequence, 31-60
from second sequence and 1-30 from first sequence.
The first 60 frames are exactly the same as in the previous case (Figure 6.10). The
interesting part is what happens after frame 60. All these cases exhibit the same
general behavior as seen earlier, which is a peak just after the switch point and
then back to normal about 8 frames later. After this point the execution times
are just as they were before, with only minor differences. An exception is the
combination of test sequences 08 and 01 in Figure 6.11(b) which has almost the
exact same appearance as the case in Figure 6.10(b). This is expected since all
frames of sequence 08 are empty, and thereby exactly the same.
Frame number 61 from the original data is a special case. This is a copy of the
very first frame in the original data, when the module is not yet running as normal, and therefore it does not have a cycle count value that matches what it would
be if the module had been initialized. So values from frame 61 are not very precise.
6.5 Improvements
71
The final type of combination between sequences is where the source file alternates for each consecutive frame in the combined sequence. So frames with odd
numbers are from one sequence, and frames with even numbers are from another.
This type of input data affect the tracking possibilities of different objects since
any detected objects are only visible for a single frame each time, and are then
replaced by detected objects from a different scene (which are usually at different
locations). Figure 6.12 shows two examples of this. 6.12(a) is a combination of test
sequences 01 and 08, which have a high number of detections and no detections
respectively. 6.12(b) is a combination of test sequences 01 and 21, which have
approximately the same number of detections in the selected frames.
(a) Sequences 01 and 08 combined.
(b) Sequences 01 and 21 combined.
Figure 6.12. Combined sequences. Odd numbered frames comes from the first sequence,
even numbered frames comes from the second.
Figure 6.12(a) clearly shows the alternation between the frames in the data from
the original measurements (unmodified versions of sequences 01 and 08, represented by the dark blue line). The combined sequence curve (represented by the
pink line), however, is much smoother and mostly keeps at a level at, or slightly
above, the values that test_sequence_01 originally generated. So the cycle counts
is only slightly affected by the disappearance of the objects every other frame.
Figure 6.12(b) on the other hand shows that the cycle consumption of the combined sequence (pink line) is significantly less than it was for the unmodified input
data (blue line). Probably because as the detections switches between frames, the
Tracking module has little chance to actually track any objects between frames.
6.5
Improvements
It is difficult to speculate on which improvements that could be made to the system
without detailed knowledge of the code itself. But there are a few areas identified
72
Data Analysis
by this study that may have a potential for improvement. This section mentions
the areas that possibly could be improved for better performance of the system.
6.5.1
Optimisation Possibilities
The number of instructions data might be of interest to optimise. The measured
results show that the system is currently capable of executing approximately one
instruction per cycle, which is not so good considering the CPU has eight parallel
functional units and thereby has a theoretical throughput of eight instructions
per cycle. Some instructions take more than one cycle, however. And it is very
unlikely to be able to utilise all functional units at their maximum capacity, especially since some tasks are usually not possible to parallelise, and it requires code
that can use the CPU pipelines fully. So the number of instructions per cycle may
never reach eight, but some improvement may be possible.
The amount of stall cycles might also be optimised. A certain amount of stalls are
inevitable in most applications, but they should be kept to a minimum to decrease
their impact on performance.
When dividing the total stall cycles into their respective categories, further information is available. Most of the stall cycles are due to program cache memory
access and use of the data cross-path. The stalls related to memory access could
possibly be addressed by optimising the cache memory usage. Some efforts could
also be made to reduce the attempted simultaneous use of the cross-path by several different functional units, which is what is causing the cross-path stalls. It is
possible that this is caused by the compiler used. Exactly how to optimise these
things is outside the scope of this project, so it has not been investigated. It is
not certain that it is even possible.
Another visible fact is that most of the cycles used during execution are consumed by the calculation functions (such as multiplications, subtractions and so
on). These functions are part of the FastRTS library functions that are optimised
calculation functions designed for floating-point operations. The functions themselves may be difficult to optimise further, but one possibility may be to change
data representation to allow for faster calculations. The downside of this would
be a potential decrease in accuracy.
Chapter 7
Final Discussion
This chapter contains discussions about the experiences from this project. First
a short summary of the conclusion is presented, followed by more detailed conclusions drawn from the work itself, and the results obtained. The last part is a
discussion on possible future work that could be performed to further the understanding of how the Tracking module operates.
7.1
Overall Conclusions
The first objective was to collect data and compile statistics on the execution of
the Tracking module. During the course of this project a large amount of data has
been collected by performing various measurements on the module using both real
input data, and data modified using specifically designed data modifier tools. The
data has also been compiled into tables, lists and graphs to further ease analysis.
The collected data provides a much more detailed view of the execution of the
Tracking module than was previously available. The true value of this data may
not yet be clear, but can prove even more valuable as it is analysed in greater
detail in the future. This collection of statistical data fulfills the main goal of the
study.
The second objective was to find a method for conducting this study that would
also work with other similar studies in the future. This was equally important as
the first objective, and necessary in order to complete it. The method used in this
study works well for collecting the desired type of data. It is also possible to use
it for studying other parts of the system, or similar studies on different systems.
There may be some minor modifications required, but that depends on the exact
conditions and requirements surrounding those studies. It may, however, be difficult to study some parts of the Night Vision system this way due to the full system
architecture. The extra tools that were designed to automate measurements, modify input data and extract output data will also work for other studies, with some
minor modifications to suit any new characteristics of the system and/or data that
may be present. Such modifications should not be difficult however. This satisfies
73
74
Final Discussion
the second objective for this study.
Just as expected, not all the questions listed in the initial project objectives can
be answered completely by the results of this study. Some of them require more
work and a better and more complete knowledge of the entire system architecture.
7.2
Detailed Conclusions
One objective of this project was to find a suitable method for performing a detailed analysis on modules of the Night Vision system. It was not previously
decided whether to analyse the system theoretically, or simply to measure it while
executing. Nor was it decided which programs, equipment or approaches that were
preferable to use.
Early on in the study it became obvious that there were several different methods
and programs available. However, it was equally obvious that there would not be
enough time to pursue and evaluate all alternatives fully. Especially when considering the other primary goal, which was to collect data and compile statistics of
the Tracking module performance.
Therefore a compromise was made, where several methods were studied only superficially, and based on that, the actual performance study was performed using
the method that was believed to be most useful given the desired results.
Time limitation was a major factor during the entire study, impacting all parts
of it. Due to the limited time available, both method studies and performance
studies had to be restricted in scope. There was not enough time to fully evaluate all methods and tools thoroughly because of this. Likewise the detail level of
the performance study suffers from the time shortage since more detailed studies
simply takes more time to perform.
Because it was known beforehand that time would be limited, the goal was set
to focus studies on only the tracking module, and not other parts of the system.
7.2.1
Main Method: Measurements vs. Static Analysis
Measurements were considered much better suited for this particular study than
static analysis would have been. Static analysis focuses more on the exact WCET,
and not as much on behaviour, and since the behaviour and dependencies on the
input is more important than exact WCET in this study, there was little reason
to choose the static approach.
However. Given that static analysis was not performed, detailed specifics regarding
the benefits and drawbacks of it for this particular project somewhat uncertain.
The conclusions regarding this method is based only on the preliminary study,
which included some specifications of tools used for it, and experiences of other
7.2 Detailed Conclusions
75
people who have written about their work in the field. The preliminary study did
however provide enough indication on possible benefits and drawbacks for us to
choose a dynamic approach instead.
The short list below summarises the main points considered when choosing between the two approaches.
• Static WCET methods provide guarantees of WCET, although often overestimated.
• Static WCET tools often require that the user has very good understanding
of both the tool and the program code. This is because the user needs to
provide the tool with information to reduce the overestimations. Measurements do not require the same level of knowledge, and always give correct
values.
• Measurements are more accurate, but may miss the true WCET.
A good way to analyse a system such as this would probably be to combine both
static analysis and measurement so that the two methods can compensate for each
other’s weaknesses. The description of RapiTime suggests that this tool is able to
do so, but this was not tried. On the other hand, measurements cover just about
all the important aspects in the goals for this project, so spending too much time
on other methods may not be well justified.
7.2.2
Environment: Simulator vs. Hardware
Choosing between simulated or real hardware for the measurements is a choice
between accuracy and time consumption. Real hardware is much faster, but less
accurate than the simulator due to the problems described in Section 4.1.4. Even
though it would have been good to achieve fast measurements, it was not adopted
due to a possibly large inaccuracy. The primary concern was to get reliable results,
and real hardware measurements could always be done later.
Another contributing factor was that the toolkit for extracting code coverage data
required that the measurements were performed on a simulator. Collection of this
data was not possible if using the hardware directly. For these reasons we decided to primarily use the simulator until enough results were gathered, and then
postpone measurements on hardware to the future.
7.2.3
Performance Study: Preparations
There was considerable work done in setting up the environment and making sure
all tools and programs would work as we wanted. Especially the MATLAB simulation and Code Composer Studio, which took some time getting used to. Also, a
test data set had to be built by selecting previously recorded sequences and cutting
76
Final Discussion
them into suitable lengths. The test data set was then run through the MATLAB
simulation to generate the input data files for the Tracking module. This was a
process that was very time consuming.
As preliminary trial measurements were performed, it became clear what kind of
extra programs that would be needed to parse data and automate measurements.
So part of the preparation work was to write these programs.
7.2.4
Performance Study: Measurements
Once the environment was set up properly, the measurements themselves were easy
to perform. They were however very time consuming. Doing one frame-by-frame
measurement took over 6 hours of constant monotonous work. The automatic
measurement script reduced this time to about 2-3 hours, and only required work
when starting new measurements. A total of 189 such measurements were performed. The automatic measurement script was vital to the collection of data in
this form.
The measurements on entire sequences at once were faster. They took about
70% of the time for a frame-by-frame measurement of the same sequence. They
also ran by themselves without any need for intervention by the user or any special
measurement scripting.
7.2.5
Performance Study: Results
The measurements performed have generated a tremendous amount of data. Much
of this data has not been analysed during the course of this project. The reason
for this is once again time limitations.
Those results that have been analysed come from both full sequence measurements
and frame-by-frame measurements. The limit lies instead in the detail level. In
the frame-by-frame measurements, separate data was taken from all functions in
the entire module, for all test cases. But the parts that were analysed in detail
were limited to only the top level functions.
Some useful data was nevertheless revealed by the analysed data. The behaviour
of the Tracking module when detections cease is one such example (see the combined sequences in Section 6.4.2). Other interesting parts include the actual values
of the execution times in different situations.
7.3
Future Work
There are many possibilities to continue the work presented in this thesis. The
most apparent areas of future studies are described here.
To study the module code using a static analysis approach, instead of only with
7.3 Future Work
77
measurements, may provide a different view on the actual WCET. Based on the
preliminary study of static methods, we concluded that it may be interesting to
perform this type of analysis in the future. Along with this method there would of
course be the tools associated with it. The most interesting of these is perhaps the
RapiTime tool, which seems to demand less in terms of configuration compared
to others, and yet is very powerful.
It would also be interesting to perform the measurements on the hardware directly, to complement the data from the simulator measurements. It is not certain
that this would produce any major revelations, but it would give the possibility
to compare the two sets of data. Such a comparison would indicate how big the
problem with cache invalidation (Section 4.1.4) actually is, and that would provide
a better basis for decisions on how to measure module characteristics in the future. If the cache invalidation problem only causes insignificant differences, future
measurements should perhaps be performed on hardware since it is much faster.
But if there are significant differences the best option is probably to keep using the
simulator. It should also give hints on how large and how detailed measurements
the hardware should be used for, since increasing measurement complexity also
increases the cache problem. The possible benefits in reducing the measurement
times to only a fraction of when using the simulator is what motivates an investigation of hardware measurements.
With regards to collecting the measurement data, there are several possible improvements to be considered. There may be a way to also automate the collection
of code coverage data using a script similar to the testing script used for this study.
If this is possible, collection of code coverage would not be such a demanding task
for the user. And since code coverage is a valuable piece of information, this would
be a good thing to do. It is also possible to expand the amount of data collected by
the profiler. This is very easy to do, but causes problems with computer memory
and disk usage, and should only be done if there is a real need for the extra details.
Another improvement would be to design a way to include or simulate external
interrupts, communication etc. in the measurements, so that a more accurate view
of the module behaviour in real conditions can be obtained.
Even if restrictions were made in the amount of data collected, there was still
insufficient time available for this project to fully analyse it all. Deeper analysis
of this would undoubtedly provide more knowledge. The data is already collected
and available. And to continue on this line, even more detailed measurements can
easily be performed providing even more data for analysis.
One thing that was deferred for now was cache analysis. This analysis is a separate tool in CCS which is used to display a graphical representation of the cache
memories, and show exactly where and when different types of accesses occurs. It
could possibly be used if any attempts at optimising the cache usage are made.
But unless measuring the full system, this information is not very accurate (since
other parts of the system will need the cache as well). This, and the fact that
78
Final Discussion
Autoliv did not plan to work on the cache usage, led us to consider cache analysis
as unnecessary and too advanced at this stage.
The scripts that were used in this project also have room for improvements. They
could be refined and made easier to use without having to understand the source
code and their exact purpose, which is something that should be done if they are
to be used for future studies. Their functionality can also be extended, and they
could even be integrated into a single measurement support program suited for
this specific type of study. There is quite a lot of freedom in deciding these support
functions, regarding file formats etc.
Finally it would be interesting to try the method and tools on another part or
several parts of the system, or on parts of another system. It is however unlikely
that the entire Night Vision system can be studied this way because of the system
architecture.
Bibliography
[1] DaVinci digital media processors. http://focus.ti.com/paramsearch/
docs/parametricsearch.tsp?family=dsp&sectionId=2&tabId=
1852&familyId=1300, June 2009. Collection of datasheets for TMS320
family.
[2] Absint angewandte informatik GmbH company website, March 2009. URL:
http://www.absint.com.
[3] Autoliv company website, March 2009. URL: http://www.autoliv.com.
[4] Tidorum ltd. company website, March 2009. URL: http://www.tidorum.fi.
[5] Alan Burns and Andy Wellings. Real-Time Systems and Programming Languages. Pearson Education Limited, 2001.
[6] Code composer studio IDE website,
March 2009.
http://focus.ti.com/docs/toolsw/folders/print/ccstudio.html.
URL:
[7] Ola Eriksson. Evaluation of static time analysis for CC systems. Technical Report ISSN 1404-3041 ISRN MDH-MRTC-183/2005-1-SE, Mälardalen
University, August 2005.
[8] Andreas Ermedahl. A Modular Tool Architecture for Worst-Case Execution
Time Analysis. PhD thesis, Uppsala University, Department of Information
Technology, 2003.
[9] Heptane
wcet
tool
website,
March
2009.
http://www.irisa.fr/aces/work/heptane-demo/heptane.html.
URL:
[10] Statens institut för kommunikationsanalys (SIKA). Vägtrafikskador 2007,
2008. Statistics report.
[11] Intel corporation. Hexadecimal Object File Format, 1998. Technical specifications, URL: http://www.microsym.com/content/index.php?pid=4&id=25.
[12] Rapita systems ltd. company website,
http://www.rapitasystems.com/rapitime.
79
March
2009.
URL:
80
Bibliography
[13] Mehrdad Reshadi and Nikil Dutt. Generic pipelined processor modeling and
high performance cycle-accurate simulator generation. In DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe, pages
786–791, Washington, DC, USA, 2005. IEEE Computer Society.
[14] Christer Sandberg, Andreas Ermedahl, Jan Gustafsson, and Björn Lisper.
Faster wcet flow analysis by program slicing. In LCTES ’06: Proceedings of
the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and
tool support for embedded systems, pages 103–112, New York, NY, USA,
2006. ACM.
[15] Code composer studio IDE c64x help documentation. For version 3.3 of Code
Composer Studio.
[16] Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti,
Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner,
Jan Staschulat, and Per Stenström. The worst-case execution time problem
- overview of methods and survey of tools. ACM Transactions on Embedded
Computing Systems, 7(3), April 2008.
Appendix A
Script Code
A.1
testingscript.py
from PyCCS import ∗
import o s
a = CCS_Server_Interface ( )
# I n i t i a l i z e r subroutine :
# S t a r t s up CCS w i t h a l l s e t t i n g s and runs u n t i l t r a c k i n g l o o p b e g i n s
.
def I n i t i a l i z e r ( ) :
a . CCSOpenNamed( ’ ∗ ’ , ’ ∗ ’ , True )
a . CCSWorkspaceOpen ( r ’C: \ CCStudio_v3 . 3 \ MyProjects \ WorkSpace \
u n i t t e s t . wks ’ )
a . ProgramLoad ( r ’C: \ mks\VP\DSP\ s o u r c e \ T r a c k i n g \ Test \CCS\Debug\
Tracking_UT . out ’ )
a . TargetRun ( )
a . L o a d P r o f i l e C o n f i g u r a t i o n ( r ’C: \ mks\VP\DSP\ s o u r c e \ T r a c k i n g \ Test \
CCS\Debug\ s i n g l e _ f r a m e _ p r o f i l e _ m e a s . i n i ’ )
a . TargetRun ( )
# LoopRunner s u b r o u t i n e :
# Used t o l o o p t h r o u g h t h e t r a c k i n g l o o p a number o f t i m e s w i t h o u t
measuring a n y t h i n g .
def LoopRunner ( i t e r a t i o n s ) :
f o r i in r a n g e ( ( i t e r a t i o n s −1) ∗ 2 ) :
a . TargetRun ( )
c o l s = [ ’ Address Range , ’ , ’ Symbol Name , ’ , ’SLR , ’ , ’ Symbol Type , ’ , ’ L a b e l
Range , ’ , ’ Loop ID , ’ , ’ A c c e s s Count , ’ ,
’ c y c l e .CPU: I n c l . Total , ’ , ’ c y c l e .CPU: E xc l . Total , ’ , ’ c y c l e .
T o t a l : I n c l . Total , ’ , ’ c y c l e . T o t a l : I n c l . Min . , ’ ,
’ c y c l e . T o t a l : I n c l . Max . , ’ , ’ c y c l e . T o t a l : I n c l . Avg . , ’ , ’ c y c l e .
T o t a l : I n c l . Min . Access , ’ ,
’ c y c l e . T o t a l : I n c l . Max . Access , ’ , ’ c y c l e . T o t a l : Ex c l . Total , ’
, ’ c y c l e . T o t a l : E x c l . Min . , ’ ,
’ c y c l e . T o t a l : Ex c l . Max . , ’ , ’ c y c l e . T o t a l : E x c l . Avg . , ’ , ’ c y c l e .
T o t a l : E x cl . Min . Access , ’ ,
81
82
Script Code
’ c y c l e . T o t a l : E x cl . Max . Access , ’ , ’CPU.NOP: I n c l . Total , ’ , ’
CPU.NOP: E xc l . Total , ’ ,
’CPU. e x e c u t e _ p a c k e t : I n c l . Total , ’ , ’CPU. e x e c u t e _ p a c k e t : E x c l .
Total , ’ , ’CPU. i d l e : I n c l . Total , ’ ,
’CPU. i d l e : E x c l . Total , ’ , ’CPU. a c c e s s . summary : I n c l . Total , ’ , ’
CPU. a c c e s s . summary : E xc l . Total , ’ ,
’CPU. a c c e s s . data . r e a d : I n c l . Total , ’ , ’CPU. a c c e s s . data . r e a d :
E x c l . Total , ’ , ’CPU. a c c e s s . data . w r i t e : I n c l . Total , ’ ,
’CPU. a c c e s s . data . w r i t e : E x c l . Total , ’ , ’CPU. a c c e s s . e x t e r n a l .
data : I n c l . Total , ’ ,
’CPU. a c c e s s . e x t e r n a l . data : Ex cl . Total , ’ , ’CPU. a c c e s s . e x t e r n a l
. prog : I n c l . Total , ’ ,
’CPU. a c c e s s . e x t e r n a l . prog : E xc l . Total , ’ , ’CPU. d i s c o n t i n u i t y .
branch : I n c l . Total , ’ ,
’CPU. d i s c o n t i n u i t y . branch : E x cl . Total , ’ , ’CPU. d i s c o n t i n u i t y .
summary : I n c l . Total , ’ ,
’CPU. d i s c o n t i n u i t y . summary : E xc l . Total , ’ , ’CPU. d i s c o n t i n u i t y .
i n t e r r u p t . summary : I n c l . Total , ’ ,
’CPU. d i s c o n t i n u i t y . i n t e r r u p t . summary : E x c l . Total , ’ , ’CPU.
i n s t r u c t i o n . c o n d i t i o n _ f a l s e : I n c l . Total , ’ ,
’CPU. i n s t r u c t i o n . c o n d i t i o n _ f a l s e : E x c l . Total , ’ , ’CPU.
i n s t r u c t i o n . decoded : I n c l . Total , ’ ,
’CPU. i n s t r u c t i o n . decoded : Ex c l . Total , ’ , ’CPU. i n s t r u c t i o n .
e x e c u t e d : I n c l . Total , ’ ,
’CPU. i n s t r u c t i o n . e x e c u t e d : Ex c l . Total , ’ , ’CPU. s t a l l . c r o s s p a t h
: I n c l . Total , ’ , ’CPU. s t a l l . c r o s s p a t h : E x c l . Total , ’ ,
’CPU. s t a l l . summary : I n c l . Total , ’ , ’CPU. s t a l l . summary : E x c l .
Total , ’ , ’CPU. s t a l l .mem. L1D : I n c l . Total , ’ ,
’CPU. s t a l l .mem. L1D : E xc l . Total , ’ , ’CPU. s t a l l .mem. L1P : I n c l .
Total , ’ , ’CPU. s t a l l .mem. L1P : E x c l . Total , ’ ,
’CPU. s t a l l .mem. b a n k _ c o n f l i c t : I n c l . Total , ’ , ’CPU. s t a l l .mem.
b a n k _ c o n f l i c t : E x c l . Total , ’ ,
’CPU. s t a l l .mem. summary : I n c l . Total , ’ , ’CPU. s t a l l .mem. summary :
E x c l . Total , ’ ,
’CPU. s t a l l .mem. l 2 . c a c h e . m i s s . r e a d : I n c l . Total , ’ , ’CPU. s t a l l .
mem. l 2 . c a c h e . m i s s . r e a d : E x c l . T o t a l ’ ]
columnlist = ’ ’
f o r k in r a n g e ( 0 , l e n ( c o l s ) ) :
columnlist = columnlist + cols [ k ]
I n i t i a l i z e r ()
f o r i in r a n g e ( 1 , 2 0 1 ) : #Loop ra nge i s t h e number o f frames i n t h e
t a r g e t s e q u e n c e . A s e q u e n c e w i t h 50 frames means ra nge ( 1 , 5 1 ) .
# I f s t a t e m e n t i s used t o h a n d l e i s s u e s w i t h CCS l o c k i n g up t o o
much s y s te m memory , and t h e r e b y c r a s h i n g t h e s y st e m .
# The s i m p l e s o l u t i o n i s t o r e s t a r t CCS a f t e r a s e t amount o f
frames , and t h e n r e s t a r t i n g t h e p r o c e s s and l o o p i n g f o r w a r d s
t o t h e l a s t measured frame .
i f i == 70 or i == 1 4 0 : # or i == 1 8 0 :
a . CCSClose ( 1 )
I n i t i a l i z e r ()
LoopRunner ( i )
o s . mkdir ( r ’N: \ misc \ h e n l i \ e x j o b b \ t e s t \ frame ’ + r e p r ( i ) )
a . EnableProfiling ()
A.2 fileparser.py
83
a . TargetRun ( )
a . DisableProfiling ()
a . E x p o r t P r o f i l e D a t a ( r ’N: \ misc \ h e n l i \ e x j o b b \ t e s t \ frame ’ + r e p r ( i ) ,
’ ’ , ’ rbe ’ , ’ \ t ’ , columnlist , ’ ’ )
a . E x p o r t P r o f i l e D a t a ( r ’N: \ misc \ h e n l i \ e x j o b b \ t e s t \ frame ’ + r e p r ( i ) ,
’ ’ , ’ summary ’ , ’ \ t ’ , ’ ’ , ’ ’ )
a . TargetRun ( )
a . CCSClose ( 1 )
A.2
fileparser.py
# The p u r p o s e o f t h e f i l e p a r s e r i s t o p r o v i d e an e a s y way t o e x t r a c t
t h e d e s i r e d i n f o r m a t i o n from t h e csv − f i l e s g e n e r a t e d by t h e
testingscript .
import c s v
import o s
# Searcher subroutine :
# S e a r c h e s f o r t h e d e s i r e d v a l u e by l o o k i n g f o r t h e c o r r e c t column
and row , which y i e l d s a s i n g l e v a l u e from t h e . csv− f i l e . Used f o r
t h e more column−o r i e n t e d rbe− f i l e s .
def S e a r c h e r ( columns , w h ic h F un c t io n ) :
f o r i in r a n g e ( 0 , l e n ( columns ) ) :
i f columns [ i ] == rbeDataColumn :
f o r row in r e a d e r :
i f row [ 1 ] == wh i c hF u n ct i o n :
return row [ i ]
r o o t d i r = r ’N: \ Misc \ h e n l i \ e x j o b b \ meas \ t r a c k i n g _ u n i t _ t e s t _ s i m \
c o d e _ c o v e r a g e _ a n d _ p r o f i l i n g \meas_on_mod_data\ meas22 \
seque nce_08_20_fr ame_51to100_i nterleave ’
s o u r c e F i l e = ’ rbe . csv ’
summaryKeyword = ’ T o t a l C y c l e s ’
rbeDataColumn = ’ c y c l e . T o t a l : I n c l . T o t a l ’ #’ c y c l e . T o t a l : I n c l . T o t a l
’
rbeFunction = ’ setTrackingData ’
i f s o u r c e F i l e == ’ summary . c s v ’ :
d e s t i n a t i o n F i l e = o s . path . j o i n ( r o o t d i r ,
e l i f s o u r c e F i l e == ’ r b e . c s v ’ :
d e s t i n a t i o n F i l e = o s . path . j o i n ( r o o t d i r ,
else :
print ( ’ Unknown i n p u t f i l e ! ’ )
’ data_from_summary . t x t ’ )
’ data_from_profiler . txt ’ )
t r e e = o s . walk ( r o o t d i r )
pathList = [ ]
cycleCountList = [ ]
# Create a l i s t o f paths to f i l e s
extraction .
f o r r o o t , d i r s , f i l e s in t r e e :
f o r name in f i l e s :
t h a t w i l l be p a r t o f t h e d a t a
84
Script Code
i f name == s o u r c e F i l e :
p a t h L i s t . append ( o s . path . j o i n ( r o o t , name ) )
o u t p u t F i l e = open ( d e s t i n a t i o n F i l e , ’w ’ )
w r i t e r = c s v . w r i t e r ( o u t p u t F i l e , d e l i m i t e r= ’ \ t ’ , l i n e t e r m i n a t o r= ’ \n ’ )
# C r e a t e a l i s t o f e x t r a c t e d v a l u e s from f i l e s , and w r i t e them t o a
different file .
f o r f i l e in p a t h L i s t :
i n p u t F i l e = open ( f i l e , ’ r t ’ )
r e a d e r = c s v . r e a d e r ( i n p u t F i l e , d e l i m i t e r= ’ \ t ’ )
i f s o u r c e F i l e == ’ summary . c s v ’ :
f o r row in r e a d e r :
i f l e n ( row ) > 3 :
i f row [ 0 ] == summaryKeyword :
c y c l e C o u n t L i s t . append ( row [ 1 ] )
break
e l i f s o u r c e F i l e == ’ r b e . c s v ’ :
f o r row in r e a d e r :
i f l e n ( row ) > 3 :
c y c l e C o u n t L i s t . append ( S e a r c h e r ( row , r b e F u n c t i o n ) )
break
inputFile . close ()
o u t p u t F i l e . w r i t e ( ’ S o u r c e i s : ’ + s o u r c e F i l e + ’ \n ’ )
i f s o u r c e F i l e == ’ summary . c s v ’ :
o u t p u t F i l e . w r i t e ( ’ D a t a f i e l d i s : ’ + summaryKeyword + ’ \n\n ’ )
e l i f s o u r c e F i l e == ’ r b e . c s v ’ :
o u t p u t F i l e . w r i t e ( ’ D a t a f i e l d i s : ’ + rbeDataColumn + ’ f o r ’ +
r b e F u n c t i o n + ’ \n\n ’ )
w r i t e r . w r i t e r o w ( r a n g e ( 1 , l e n ( p a t h L i s t ) +1) )
w r i t e r . writerow ( cycleCountList )
outputFile . close ()
print ( ’ Done ! R e s u l t s w r i t t e n t o ’ + d e s t i n a t i o n F i l e )
A.3
DataGrabber.m
f u n c t i o n DataGrabber ( )
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
% Data g r a b b e r s c r i p t :
% C o l l e c t s d i f f e r e n t p a r t s o f t h e data a v a i l a b l e in an
algorithmOutput − f i l e . I t p l a c e s t h e data in a t e x t f i l e in a tab
d e l i m i t e d form , with one number f o r each frame in t h e s e q u e n c e .
% The p u r p o s e i s t o make i t e a s y t o c o l l e c t data and d i s p l a y i t in
g r a p h s o u t s i d e o f MATLAB, f o r example in E x c e l .
%
% The data t h a t i s c u r r e n t l y e x t r a c t e d i s number o f d e t e c t i o n s p e r
frame and v e h i c l e s p e e d in each frame .
%−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
clear
all ;
A.4 DataCutter.m
85
[ f i l e n a m e , pathname ] = u i g e t f i l e ( ’ ∗ . mat ’ , ’ S e l e c t f i l e t o e x t r a c t
data from ’ ) ;
s o u r c e F i l e = l o a d ( [ pathname f i l e n a m e ] ) ;
d e s t i n a t i o n F i l e = f u l l f i l e ( pathname , ’ extracted_matlab_data . t x t ’ ) ;
s e q u e n c e L e n g t h = l e n g t h ( s o u r c e F i l e . a l g o r i t h m O u t p u t . frameOutput ) ;
f i l e I d = fopen ( d e s t i n a t i o n F i l e ,
’ wt ’ ) ;
f p r i n t f ( f i l e I d , ’ Number o f d e t e c t i o n s p e r frame : \ n ’ ) ;
f p r i n t f ( f i l e I d , ’%d\ t ’ , 1 : s e q u e n c e L e n g t h ) ;
f p r i n t f ( f i l e I d , ’ \n ’ ) ;
for k = 1 : sequenceLength
i f l e n g t h ( s o u r c e F i l e . a l g o r i t h m O u t p u t . frameOutput ( k ) . o b j e c t ) == 1
i f s o u r c e F i l e . a l g o r i t h m O u t p u t . frameOutput ( k ) . o b j e c t . i d == −1
f p r i n t f ( f i l e I d , ’%d\ t ’ , 0 ) ;
else
f p r i n t f ( f i l e I d , ’%d\ t ’ , 1 ) ;
end
else
f p r i n t f ( f i l e I d , ’%d\ t ’ , l e n g t h ( s o u r c e F i l e . a l g o r i t h m O u t p u t .
frameOutput ( k ) . o b j e c t ) ) ;
end
end
f p r i n t f ( f i l e I d , ’ \n\n ’ ) ;
f p r i n t f ( f i l e I d , ’ Vehicle speed : \ n ’ ) ;
f p r i n t f ( f i l e I d , ’%d\ t ’ , 1 : s e q u e n c e L e n g t h ) ;
f p r i n t f ( f i l e I d , ’ \n ’ ) ;
for i = 1 : sequenceLength
f p r i n t f ( f i l e I d , ’%g \ t ’ , s o u r c e F i l e . a l g o r i t h m O u t p u t . frameOutput ( i )
. can . s p e e d ) ;
end
f p r i n t f ( f i l e I d , ’ \n\ nAmbient t e m p e r a t u r e : %g \n ’ , s o u r c e F i l e .
a l g o r i t h m O u t p u t . frameOutput ( i ) . can . t e m p e r a t u r e ) ;
fclose ( fileId ) ;
d i s p ( [ ’ Output data w r i t t e n t o ’ , d e s t i n a t i o n F i l e ] )
clear ;
A.4
DataCutter.m
%===========================================================
% S c r i p t name : DataCutter
%
% D e s c r i p t i o n : C o p i e s f r a m e s from d i f f e r e n t i n p u t AlgorithmOutput−
f i l e s and p a s t e s them t o g e t h e r in a new f i l e in t h e o r d e r
s p e c i f i e d by t h e u s e r .
%
% Usage : S p e c i f y two f i l e s t o copy f r a m e s from . Then s p e c i f y t h e
d e s i r e d chunks o f f r a m e s from each f i l e and t h e s e q u e n c e in which
you want t o p a s t e them t o g e t h e r .
86
Script Code
% ( The s c r i p t in i t s c u r r e n t form a l w a y s s t a r t s with ’ chunk1 ’ from
t h e f i r s t f i l e ( v i c t i m 1 ) , and then a l t e r n a t e s between t h e chunks .
%
% I n p u t : Two algorithmOutput− f i l e s
%
% Output : One algorithmOutput − f i l e c o n s i s t i n g o f f r a m e s from t h e two
input f i l e s .
%
% Current v e r s i o n : v 0 . 1 0
%============================================================
[ f i l e n a m e 1 , pathname1 ] = u i g e t f i l e ( ’ ∗ . mat ’
s c r i p t w i l l s t a r t with t h i s one . ’ ) ;
c u t t i n g V i c t i m 1 = l o a d ( [ pathname1 f i l e n a m e 1
[ f i l e n a m e 2 , pathname2 ] = u i g e t f i l e ( ’ ∗ . mat ’
c u t t i n g V i c t i m 2 = l o a d ( [ pathname2 f i l e n a m e 2
, ’ Select
first
f i l e . The
]) ;
, ’ S e l e c t second f i l e . ’ ) ;
]) ;
s a d I d = c u t t i n g V i c t i m 1 . a l g o r i t h m O u t p u t . sadImageIndex ;
framesFromVictim1 = s t r u c t ( ’ chunk1 ’ , 1 : 5 0 , ’ chunk2 ’ , 1 0 1 : 1 5 0 ) ;
framesFromVictim2 = s t r u c t ( ’ chunk1 ’ , 5 1 : 1 0 0 , ’ chunk2 ’ , 1 5 1 : 2 0 0 ) ;
d e s t i n a t i o n F i l e = f u l l f i l e ( ’N: \ Misc \ h e n l i \ e x j o b b \ t e s t ’ , [ f i l e n a m e 1
( 6 : 1 6 ) , f i l e n a m e 2 ( 1 4 : 1 6 ) , ’ _combination . mat ’ ] ) ;
targetLength = 0;
f o r i = 1 : l e n g t h ( f i e l d n a m e s ( framesFromVictim1 ) )
t a r g e t L e n g t h = t a r g e t L e n g t h + l e n g t h ( framesFromVictim1 . ( s t r c a t ( ’
chunk ’ , num2str ( i ) ) ) ) ;
end
f o r j = 1 : l e n g t h ( f i e l d n a m e s ( framesFromVictim2 ) )
t a r g e t L e n g t h = t a r g e t L e n g t h + l e n g t h ( framesFromVictim2 . ( s t r c a t ( ’
chunk ’ , num2str ( j ) ) ) ) ;
end
d i s p ( [ ’ Tar get f i l e
])
w i l l c o n t a i n ’ , num2str ( t a r g e t L e n g t h ) , ’ f r a m e s . ’
% This v e c t o r k e e p s a l l p o i n t s where t h e c u t t e r s h o u l d change s o u r c e
file .
switchPointVector = [ ] ;
f o r k = 1 : l e n g t h ( f i e l d n a m e s ( framesFromVictim1 ) ) + l e n g t h ( f i e l d n a m e s (
framesFromVictim2 ) )
i f mod( k , 2 ) ~= 0
i f k == 1
s w i t c h P o i n t V e c t o r ( k ) = l e n g t h ( framesFromVictim1 . ( s t r c a t ( ’
chunk ’ , num2str ( c e i l ( k / 2 ) ) ) ) ) ;
else
s w i t c h P o i n t V e c t o r ( k ) = l e n g t h ( framesFromVictim1 . ( s t r c a t ( ’
chunk ’ , num2str ( c e i l ( k / 2 ) ) ) ) ) + s w i t c h P o i n t V e c t o r ( k
−1) ;
end
else
s w i t c h P o i n t V e c t o r ( k ) = l e n g t h ( framesFromVictim2 . ( s t r c a t ( ’
chunk ’ , num2str ( c e i l ( k / 2 ) ) ) ) ) + s w i t c h P o i n t V e c t o r ( k−1) ;
end
end
A.4 DataCutter.m
87
i f s a d I d ~= c u t t i n g V i c t i m 2 . a l g o r i t h m O u t p u t . sadImageIndex
d i s p ( ’ User m i s t a k e : Sad i n d e x e s d i f f e r i n i n p u t f i l e s . ’ )
end
a l g o r i t h m O u t p u t . sadImageIndex = s a d I d ;
source = cuttingVictim1 ;
chunkNumber = 1 ;
i n d e x = framesFromVictim1 . ( s t r c a t ( ’ chunk ’ , num2str ( 1 ) ) ) ( 1 ) ;
for l = 1 : targetLength
i f i n t e r s e c t ( l −1 , s w i t c h P o i n t V e c t o r )
%
i f i n t e r s e c t ( l −1 , s w i t c h P o i n t V e c t o r ) == t a r g e t L e n g t h
%
break
%
end
chunkNumber = chunkNumber + 1 ;
i f isequalwithequalnans ( source , cuttingVictim1 )
source = cuttingVictim2 ;
i n d e x = framesFromVictim2 . ( s t r c a t ( ’ chunk ’ , num2str ( c e i l (
chunkNumber / 2 ) ) ) ) ( 1 ) ;
else
source = cuttingVictim1 ;
i n d e x = framesFromVictim1 . ( s t r c a t ( ’ chunk ’ , num2str ( c e i l (
chunkNumber / 2 ) ) ) ) ( 1 ) ;
end
end
a l g o r i t h m O u t p u t . frameOutput ( l ) = s o u r c e . a l g o r i t h m O u t p u t .
frameOutput ( i n d e x ) ;
index = index + 1 ;
end
save ( d e s t i n a t i o n F i l e ,
d i s p ( ’ Done ! ’ )
clear
all
’ algorithmOutput ’ ) ;
Appendix B
Measurement Results
Definitions
B.1
Code Coverage Summary Worksheet
Column name
Functions
File
Line No.
Size(bytes)
Start Address (hex)
# times called
% coverage
Total Instructions
cycle.Total
Cycle.CPU
L1P.miss.summary
L1P.miss.summary_rate
L1D.miss.read
L1D.miss.read_rate
L1D.miss.write
Description
Lists all the function symbols in the program
Names the source file that has the function
Provides line number in the file where the
function begins
Provides the size of function in bytes
Provides function start address in hexadecimal
Lists the number of times the function was
called during the execution of the program
Shows the number of lines executed/number
of lines of code as % coverage
Shows the cumulative count of number of instructions decoded for that function
Total number of cycles consumed, including
stalls.
Total number of cycles consumed, excluding
stalls.
Total number of data cache misses
L1P cache miss rates for read, expressed as a
percentage of total L1P accesses
Number of data cache read misses
L1D cache miss rates for read, expressed as a
percentage of total L1D reads
Number of data cache write misses
88
B.2 Code Coverage Function Worksheet
L1D.miss.write_rate
L1P.hit
L1D.hit.read
L1D.hit.write
L1P.miss.conflict
L1D.miss.conflict
CPU.instruction.condition_false
L1D cache miss rates for writes, expressed as
a percentage of total L1D writes
Total number of program cache hits
Number of data cache read hits
Number of data cache write hits
Number of program cache conflict misses
Number of data cache conflict misses
Shows the number of lines executed/number
of lines of code as % coverage
Table B.1: Summary worksheet columns.
B.2
Code Coverage Function Worksheet
Column name
Line Count: Min
Line Count: Max
Line Number
Source Line
Total Instructions
89
Description
Shows the minimum of the execution counts of any
of the assembly instructions corresponding to this
Source Line. A zero here indicates that there is at
least one assembly instruction corresponding to this
source line that has not been executed.
Shows the maximum of the execution counts of any
of the assembly instructions corresponding to this
source line. A 0 here indicates that none of the assembly instructions corresponding to this source line
have been executed. When a source file is compiled,
each line in the source file is actually converted into
0 or more instructions. Consider a particular source
line. Let us say, for example, that 3 instructions are
generated for this line. When the program is executed, suppose that the first instruction corresponding to this line executed twice, the second instruction
executed once and the third instruction was never
executed. Thus, the Line Count:Min for this source
line would be the minimum of (2, 1, 0), which is 0.
The Line Count:Max for this source line would be
the maximum of (2, 1, 0), which is 2.
Provides line number in the file where the source is
Lists the content of the source line in the file
The cumulative count of the total number of the target instructions (corresponding to the source line)
that were decoded during the run of the program.
Table B.2: Function worksheet columns.
90
B.3
Measurement Results Definitions
Profiler Results
Profile data
Address Range
Source Line Reference (SLR)
Symbol Name
Symbol Type
Label Range
Loop Identifier
Access Count
Description
Displays the hexadecimal address range of the
code profiled in the current row.
Displays the integer line number range of the
code profiled in the current row.
The name of the symbol for which the information in the row is displayed.
Describes if the data in the row is for a function, loop, or range.
An address range marked by a symbolic label for the start address and a symbolic label
for the end address. Label range is commonly
used to mark the start and end addresses of
a range you are interested in profiling. This
column will be blank if the addresses have not
been specified in Profile Setup. See how to
specify a range of addresses as a profile area
for more information on setting up this function. This column is normally hidden; see Hiding and Displaying Rows/Columns for more
information.
A compile time unique symbolic name generated by the compiler for each loop in the
program.
The number of times the profile area is entered to gather Inclusive and Exclusive statistics during a profiling session.
B.3 Profiler Results
Exclusive.max_access
Exclusive.min_access
Access Count
Exclusive.avg
Exclusive.max
Exclusive.min
Exclusive.total
91
Relate the range access count to the corresponding minimum and maximum count.
Example:
For a function profiled for cycles that gets
called 10 times, you have an access count
of 10 along with the inclusive and exclusive total cycle counts, and the respective
exclusive and inclusive minimum and maximum cycle counts. The min_access tells you
which invocation (access or call) of the function had the minimum cycle count per call,
while the max_access tells you the respective invocation that caused the maximum cycle count per call.In other words, the function foo() is profiled with access_count=10,
inclusive.min=50, inclusive.total=500, inclusive.max=125, inclusive.max_access=2, inclusive.min_access=9. This means that the
9th call to foo() had the minimum cycle count
per call of 50 cycles, and the 2nd call to foo()
had the maximum cycle count per call of 125
cycles.
Relate the range access count to the corresponding minimum and maximum count.
The number of times the profile area is entered to gather Inclusive and Exclusive statistics during a profiling session.
The average number of cycles for one execution of the profile area, excluding the execution time (cycle count) of any subroutines
called from within the profile area.
The maximum number of cycles for one execution of the profile area, excluding the execution time (cycle count) of any subroutines
called from within the profile area.
The minimum number of cycles for one execution of the profile area, excluding the execution time (cycle count) of any subroutines
called from within the profile area.
The total number of cycles spent executing
the profile area, excluding the execution time
(cycle count) of any subroutines called from
within the profile area.
92
Inclusive.avg
Inclusive.max
Inclusive.min
Inclusive.total
Measurement Results Definitions
The average number of cycles for one execution of the profile area, including the execution
time (cycle count) of any subroutines called
from within the profile area.
The maximum number of cycles for one execution of the profile area, including the execution time (cycle count) of any subroutines
called from within the profile area.
The minimum number of cycles for one execution of the profile area, including the execution time (cycle count) of any subroutines
called from within the profile area.
The total number of cycles spent executing
the profile area, including the execution time
(cycle count) of any subroutines called from
within the profile area
Table B.3: Profiler columns
Copyright
Svenska
Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka
kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All
annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som
god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet
ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens
litterära eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se förlagets hemsida http:
//www.ep.liu.se/
English
The publishers will keep this document online on the Internet - or its possible replacement - for a
period of 25 years from the date of publication barring exceptional circumstances.
The online availability of the document implies a permanent permission for anyone to read, to
download, to print out single copies for your own use and to use it unchanged for any non-commercial
research and educational purpose. Subsequent transfers of copyright cannot revoke this permission.
All other uses of the document are conditional on the consent of the copyright owner. The publisher
has taken technical and administrative measures to assure authenticity, security and accessibility.
According to intellectual property law the author has the right to be mentioned when his/her work
is accessed as described above and to be protected against infringement.
For additional information about the Linköping University Electronic Press and its procedures for
publication and for assurance of document integrity, please refer to its WWW home page: http:
//www.ep.liu.se/
c Henrik Liljeroth
Linköping, 10 november 2009
Fly UP