...

Visual Inertial Navigation and Calibration Martin A. Skoglund

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Visual Inertial Navigation and Calibration Martin A. Skoglund
Linköping studies in science and technology. Thesis.
No. 1500
Visual Inertial Navigation and
Calibration
Martin A. Skoglund
LERTEKNIK
REG
AU
T
O MA
RO
TI C C O N T
L
LINKÖPING
Division of Automatic Control
Department of Electrical Engineering
Linköping University, SE-581 83 Linköping, Sweden
http://www.control.isy.liu.se
[email protected]
Linköping 2011
This is a Swedish Licentiate’s Thesis.
Swedish postgraduate education leads to a Doctor’s degree and/or a Licentiate’s degree.
A Doctor’s Degree comprises 240 ECTS credits (4 years of full-time studies).
A Licentiate’s degree comprises 120 ECTS credits,
of which at least 60 ECTS credits constitute a Licentiate’s thesis.
Linköping studies in science and technology. Thesis.
No. 1500
Visual Inertial Navigation and Calibration
Martin A. Skoglund
[email protected]
www.control.isy.liu.se
Department of Electrical Engineering
Linköping University
SE-581 83 Linköping
Sweden
ISBN 978-91-7393-126-7
ISSN 0280-7971
LiU-TEK-LIC-2011:39
Copyright © 2011 Martin A. Skoglund
Printed by LiU-Tryck, Linköping, Sweden 2011
To the memory of my mother
Abstract
Processing and interpretation of visual content is essential to many systems and
applications. This requires knowledge of how the content is sensed and also
what is sensed. Such knowledge is captured in models which, depending on the
application, can be very advanced or simple. An application example is scene
reconstruction using a camera; if a suitable model of the camera is known, then a
model of the scene can be estimated from images acquired at different, unknown,
locations, yet, the quality of the scene model depends on the quality of the camera
model. The opposite is to estimate the camera model and the unknown locations
using a known scene model. In this work, two such problems are treated in two
rather different applications.
There is an increasing need for navigation solutions less dependent on external
navigation systems such as the Global Positioning System (GPS). Simultaneous
Localisation and Mapping (slam) provides a solution to this by estimating both
navigation states and some properties of the environment without considering
any external navigation systems.
The first problem considers visual inertial navigation and mapping using a monocular camera and inertial measurements which is a slam problem. Our aim is to
provide improved estimates of the navigation states and a landmark map, given
a slam solution. To do this, the measurements are fused in an Extended Kalman
Filter (ekf) and then the filtered estimates are used as a starting solution in a nonlinear least-squares problem which is solved using the Gauss-Newton method.
This approach is evaluated on experimental data with accurate ground truth for
reference.
In Augmented Reality (ar), additional information is superimposed onto the surrounding environment in real time to reinforce our impressions. For this to be a
pleasant experience it is necessary to have a good models of the ar system and
the environment.
The second problem considers calibration of an Optical See-Through Head Mounted Display system (osthmd), which is a wearable ar system. We show and motivate how the pinhole camera model can be used to represent the osthmd and the
user’s eye position. The pinhole camera model is estimated using the Direct Linear Transformation algorithm. Results are evaluated in experiments which also
compare different data acquisition methods.
v
Populärvetenskaplig sammanfattning
Bearbetning och tolkning av visuellt innehåll är avgörande för många system
och tillämpningar. Detta kräver kunskap om hur innehållet mäts och även vad
mäts. Sådan kunskap kan fångas i modeller som, beroende på tillämpning, kan
vara mycket avancerad eller enkel. En applikationsexempel är scenrekonstruktion med en kamera; om en lämplig modell av kameran är känd, då kan en modell av scenen uppskattas från bilder som tagits ifrån olika, okända, platser. Dock
beror scenmodellens kvalitetet påkamereramodellens kvalitetet. Motsatsen är att
uppskatta kameramodellen och de okända platserna med en känd scenmodell. I
detta arbete behandlas två sådana problem i två olika tillämpningar.
Det finns ett ökande behov av navigationslösningar som är mindre beroende av
externa navigationssystem såsom Global Positioning System (GPS). Samtidig Lokalisering och kartering (slam) erbjuder en lösning genom att uppskatta både
navigationstillstånd och vissa egenskaper i den omgivande miljön utan något externt navigationssystem.
Det första problemet handlar om visuell tröghetsnavigering och kartering med
monokulär-kamera och tröghetssensorer. Vårt mål är att ge förbättrad uppskattning av navigeringstillstånd och en landmärkseskarta, givet en lösning från slam.
För att göra detta vägs mätningarna samman i ett Extended Kalman Filter (ekf)
och sedan används de filtrerade skattningar som en initiallösning i ett ickelinjärt
minsta-kvadratproblem, som i sin tur löses med Gauss-Newtons metod. Denna
strategi utvärderas i experiment där en tillförlitlig referens är känd.
I Augmented Reality (AR)-applikationer överlagras information på den omgivande miljö i realtid för att förstärka våra intryck. För att detta skall bli en trevlig
upplevelse är det nödvändigt att ha en bra modeller av AR-systemet och miljön.
Det andra problemet handlar om kalibrering av ett Optisk Halvtransparent Huvudburet Displaysystem (OSTHMD), som är ett bärbart AR-system. Vi visar och
motiverar varför pinhålskameramodellen kan användas för att representera OSTHMDn och användarens ögonposition. Pinnhålskameramodellen uppskattas med
hjälp av den så kallade Direct Linear Transformation algritmen. Resultaten utvärderas i experiment som också jämför olika datainsamlingmetoder.
vii
Acknowledgments
First of all I like to thank my supervisor Prof. Fredrik Gustafsson for your 24/7 inspiring expertise guidance, how do manage? My co-supervisor Dr Thomas Schön,
I admire your sound approach to research and your never ceasing positiveness.
Dr David Törnqvist, thanks for sharing so much of your knowledge, time and for
finding the bargains that no one else can.
Thanks Prof. Lennart Ljung and Fredrik for inviting me to join the Automatic
Control group, it’s been a great ride and I hope it will continue this way. Prof.
Svante Gunnarsson heads the group, and Ninna Stensgår makes this (and many
other things) a lot easier and for this I am truly grateful, thanks! The former head
Lennart, well supported by Ulla Salaneck and later Åsa Karmelind, did a terrific
job and created a pleasant environment!
There are many people at Automatic Control I would like to thank but I have
only time to mention a few. In no particular order; Prof. Mikael Nörrlöf, sorry
for killing Marvin, thanks for not killing me! Lic. Zoran Sjanic (The Zoran), it’s
allways interesting and fun to collaborate with you and I really like your weird
sense of ”humour”. Magnus Axholt, thanks for showing me the strange world
of Augmented Reality, also thanks for introducing me to all the nice people at
ITN. Lic. Jeroen Hol, thanks for sharing your expertise, I like the interesting
discussions we have when you visit Linköping. Dr Ragnar Wallin, this is for you:
P ? . Special thanks goes to Lic. Jonas Callmer, Lic. Karl Granström, Lic. Christan
Lundquist for the TAFWJP we spent together. I also want to thank the super
hacker geniuses Dr Henrik Tidefelt and Dr Gustaf Hendeby for all the excellent
LATEX advice and Shapes support. To all the people I did not mention; remember,
to me you are all equally unimporta..., eeeeh sorry, equally important!
I would also like to thank LINK-SIC Industry Excellence Centre for the financial
support, thanks!
There are many people outside the Automatic Control group that means a lot to
me, but I will only mention a few since LiU-Tryck want me to finish this now.
My second family in Falkenberg; Lena, Kurt, Pär, Lotta, Anna, Bertil, Amanda
and all the others, your hospitality is incredible and I truly enjoy when we meet!
My brother Johan; I’m amazed how much fun we have together, let it stay this
way! My father Bernt; our family has had some really tough years and you have
always taken the biggest load, I can not find words to thank you for it!
Finally; Maria, time flies when we are together and I enjoy every second of it!
Thanks for being supportive, loving, caring and being you!
Solebo, WGS 84 − decimal : 58.329808 15.734846, May 2011
Martin Skoglund
ix
Contents
Notation
I
xv
Background
1 Introduction
1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Visual Inertial Navigation and Mapping . . . . . . . . . .
1.1.2 Optical See-Through Head Mounted Display Calibration
1.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
3
3
3
4
4
5
6
2 Models
2.1 Kinematics . . . . . . . . . . . . . . . .
2.1.1 Notation . . . . . . . . . . . . .
2.1.2 Translational Kinematics . . . .
2.1.3 Rotational Kinematics . . . . .
2.1.4 Quaternions . . . . . . . . . . .
2.2 Dynamics . . . . . . . . . . . . . . . . .
2.2.1 Translational Dynamics . . . .
2.2.2 Rotational Dynamics . . . . . .
2.2.3 Dynamic Models . . . . . . . .
2.3 Inertial Measurements . . . . . . . . .
2.3.1 Gyroscopes . . . . . . . . . . . .
2.3.2 Accelerometers . . . . . . . . .
2.3.3 Discrete-time Dynamic Models
2.4 The Pinhole Camera Model . . . . . . .
2.4.1 Camera Coordinates . . . . . .
2.4.2 World to Pixel Coordinates . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
8
9
10
10
10
11
11
11
11
12
13
13
16
3 Estimation
3.1 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
17
xi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xii
CONTENTS
3.2 Calibration . . . . . . . . . . . .
3.3 Nonlinear Estimation . . . . . .
3.3.1 Extended Kalman Filter
3.3.2 Nonlinear Least-Squares
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
19
19
19
4 Visual Inertial Navigation and Mapping
4.1 Visual and Inertial ekf-slam . . . . .
4.1.1 Prediction . . . . . . . . . . .
4.1.2 Measurement Update . . . . .
4.2 Nonlinear Least-Squares and slam .
4.2.1 Remarks . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
22
23
23
5 Optical See-Through Head Mounted Display Calibration
5.1 System Overview . . . . . . . . . . . . . . . . . . . . .
5.1.1 Head Mounted Display . . . . . . . . . . . . . .
5.1.2 Tracker . . . . . . . . . . . . . . . . . . . . . . .
5.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Tracker Measurements . . . . . . . . . . . . . .
5.2.2 Direct Linear Transformation . . . . . . . . . .
5.2.3 Decomposition . . . . . . . . . . . . . . . . . .
5.2.4 Remarks . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
25
26
26
28
29
30
30
6 Concluding Remarks
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
33
33
Bibliography
35
II
.
.
.
.
.
.
.
.
Publications
A A Nonlinear Least-Squares Approach to the SLAM Problem
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
2
Problem Formulation . . . . . . . . . . . . . . . . . . . .
3
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Dynamics . . . . . . . . . . . . . . . . . . . . . .
3.2
Landmark State Parametrisation . . . . . . . . .
3.3
Camera Measurements . . . . . . . . . . . . . . .
4
Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Initialisation . . . . . . . . . . . . . . . . . . . . .
4.2
Nonlinear Least-Squares Smoothing . . . . . . .
5
Experiments . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Experimental Setup . . . . . . . . . . . . . . . . .
5.2
Results . . . . . . . . . . . . . . . . . . . . . . . .
6
Conclusions and Future Work . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
45
46
47
48
48
49
50
50
50
54
54
55
56
58
CONTENTS
B Parameter Estimation Variance of the Single Point Active Alignment
Method in Optical See-Through Head Mounted Display Calibration
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Background and Related Work . . . . . . . . . . . . . . . . . . . . .
2.1
Structure of Transformations . . . . . . . . . . . . . . . . .
2.2
Camera Resectioning Method . . . . . . . . . . . . . . . . .
2.3
Sensitivity of the DLT Algorithm . . . . . . . . . . . . . . .
2.4
Previous SPAAM Evaluations . . . . . . . . . . . . . . . . .
2.5
Other OSTHMD Calibration Methods . . . . . . . . . . . .
3
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Software Setup . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5
Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
Independent Variables . . . . . . . . . . . . . . . . . . . . .
3.7
Dependent Variables . . . . . . . . . . . . . . . . . . . . . .
4
Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Eye Point Estimation . . . . . . . . . . . . . . . . . . . . . .
4.2
Parameter Variance . . . . . . . . . . . . . . . . . . . . . . .
4.3
Condition Number . . . . . . . . . . . . . . . . . . . . . . .
5
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Independent Variables . . . . . . . . . . . . . . . . . . . . .
5.2
Dependent Variables . . . . . . . . . . . . . . . . . . . . . .
6
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
Condition Number . . . . . . . . . . . . . . . . . . . . . . .
6.2
Parameter Variance . . . . . . . . . . . . . . . . . . . . . . .
7
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
61
63
65
65
65
66
67
67
68
68
69
70
70
70
71
71
73
73
73
74
74
74
75
75
75
75
76
76
81
Notation
Abbreviations
Abbreviation
ar
dof
ekf
hmd
imu
kf
mems
ost
osthmd
pose
sift
slam
vr
Meaning
Augmented Reality
Degrees of Freedom
Extended Kalman Filter
Head Mounted Display
Inertial Measurement Unit
Kalman Filter
Microelectromechanical Systems
Optical See-Through
Optical See-Through Head Mounted Display
Position and Orientation
Scale-Invariant Feature Transform
Simultaneous Localisation and Mapping
Virtual Reality
xv
Part I
Background
1
Introduction
Processing and interpretation of visual content is essential to many systems and
applications. This requires knowledge of how the content is sensed and also what
is sensed. Such knowledge is captured in models which, depending on the application, can be very advanced or simple. An application example is scene reconstruction using a camera; if a suitable model of the camera is known, then a
model of the scene can be estimated from images acquired at different unknown
locations, yet, the quality of the scene model depends on the quality of the camera
model. The opposite is to estimate the camera model and the unknown locations
using a known scene model. The aim of this work is to approach two such problems in two rather different applications.
1.1
Problem Formulation
This thesis considers two problems which are seemingly unrelated but actually
have many things in common. The first problem is visual inertial navigation
and mapping using a camera and inertial measurements. The second problem is
calibration of an Optical See-Through Head Mounted Display system (osthmd)
modelled as a pinhole camera.
1.1.1
Visual Inertial Navigation and Mapping
Navigation is the problem of estimating position, attitude and motion an object
with respect to an inertial frame. The problem is fundamental to many applications and reliable solutions are necessary. In aircraft applications the navigation is often provided using inertial sensors and compass in combination with a
Global Navigation Satellite System (gnss), such as the Global Positioning System
(gps). In some environments, gnss is unavailable, e.g., underwater, or unreli3
4
1
Introduction
able due to multipath effects, (un)intentional jamming, among others. The mapping problem, on the other hand, seeks to describe properties of the environment,
given data. The purpose of the Simultaneous Localisation and Mapping (slam)
problem is to estimate both properties of the environment and the navigation. In
this thesis, we consider the slam problem using inertial sensors and a monocular
camera. To solve this problem, mathematical models which captures the most
essential parts of the sensors measurements are needed for statistical inference.
The slam problem can be solved using an Extended Kalman Filter (ekf) or similar
methods, yet without any guarantee of convergence. We treat the slam estimates
given by ekf as initial parameters for a nonlinear least-squares problem which
we solve using the Gauss-Newton method.
1.1.2
Optical See-Through Head Mounted Display Calibration
We use Augmented Reality (ar) to denote visual information overlaid, augmented,
on human vision. Optical See-Through Head Mounted Displays (osthmd) are
wearable ar systems, which means that the osthmd displayed graphics moves
with the user. To augment the real world with graphics in such a system three
main problems needs to be solved. First, the position and rotation of the osthmd
with respect to the inertial frame needs to be known, i.e., the navigation. Second,
the calibration problem which constitutes the static transformations of the relative position and rotation of the semitransparent graphics screen with respect to
the user’s eye and an inertial frame. Third, a model of the environment where the
graphics should be superimposed must be found. In this thesis, the first two of
the three problems are addressed. The first problem, navigation, is solved using
a visual tracking system. The second problem, calibration, is solved using the
navigation and user acquired measurements.
1.2
Publications
Published work of relevance to this thesis are listed below in chronological order.
M. Axholt, M. A. Skoglund, S. D. O’Connell, M. D. Cooper, S. R. Ellis,
and A. Ynnerman. Parameter Estimation Variance of the Single Point
Active Alignment Method in Optical See-Through Head Mounted Display Calibration. In Proceedings of the IEEE Virtual Reality Conference, pages 27–34, Singapore, Republic of Singapore, March 2011.
Background: The author was invited by Magnus Axholt to join his research in
2010. Magnus has great experience of various display systems and had already
been working on this project for more than a year. This work reflects some labour
intense engineering where several months were spent on the ar system which
consists of several hardware components interacting through several software
layers. We adopt the theoretical framework for camera calibration founded in
the computer vision and photogrammetry domains to osthmd calibration. The
calibration problem is rather ill-posed since the measurements are few, and fairly
noisy, compared to the parameter space.
1.3
Contributions
5
Z. Sjanic, M. A. Skoglund, T. B. Schön, and F. Gustafsson. A Nonlinear
Least-Squares Approach to the SLAM Problem. In Proceedings of the
IFAC World Congress, Milan, August - September 2011. To appear.
Background: This work started as a mini project by the author and Zoran Sjanic
in the graduate course Dynamic Vision, which was given by Thomas Schön in
2008 and some parts where also done within the graduate course Robot Modeling
and Control, given by Mikael Norrlöf. The initial ideas actually emerged during
the author’s Master’s Thesis project in 2008. Parts of this work has also been used
in Nilsson (2010); Nilsson et al. (2011) where the sensor is a forward looking infrared (IR) camera attached to a car, and in Karlsson and Bjärkefur (2010) where
the authors use stereo camera and laser camera in an indoor environment.
Published work not included in this thesis
J. Callmer, M. Skoglund, and F. Gustafsson. Silent Localization of
Underwater Sensors Using Magnetometers. EURASIP Journal on Advances in Signal Processing, 2010, 2010. doi: 10.1155/2010/709318.
URL http://dx.doi.org/10.1155/2010/709318. Article ID 709318.
M. Axholt, M. Skoglund, S. D. Peterson, M. D. Cooper, T. B. Schön,
F. Gustafsson, A. Ynnerman, and S. R. Ellis. Optical See-Through
Head Mounted Display Direct Linear Transformation Calibration Robustness in the Presence of User Alignment Noise. In Proceedings of
the 54th Annual Meeting of the Human Factors and Ergonomics Society, volume 54, pages 2427–2431, San Francisco, CA, USA, September
- October 2010. Human Factors and Ergonomics Society.
1.3
Contributions
The main contributions in this thesis are listed below.
• In Sjanic et al. (2011) we provide a solution to the slam problem using a
camera and inertial sensors. The individual contributions of the author
and Zoran Sjanic are equal and consist of deriving the proposed solution,
carrying out the experiments onto which the algorithms where evaluated
and documenting the work.
• In Axholt et al. (2011) the Single Point Active Alignment Method (spaam)
used for osthmd calibration is evaluated in both experiments and simulations. Previously, much effort has been spent on analysing the effects of
user’s interaction, which is known as human factors, in ar systems. The individual contributions of the author to this work is parts of the experiment
design, derivation and verification of mathematical models and implementation in both software and hardware. Interdisciplinary research collaboration can be challenging but also very rewarding. The author believes an
important contribution is due to this clash. It has led to a change of focus
from the analysis of errors possibly introduced by humans to focus on the
6
1
Introduction
analysis of different experiment designs and the tracker data. Most of the
documentation was carried out by Magnus Axholt in which the author only
had minor contribution.
1.4
Thesis outline
The thesis is divided into two parts, with edited versions of published papers in
the second part. The first part starts with Chapter 2 that introduce kinematics,
dynamics, sensors and model structures. Some estimation techniques relevant
two this thesis are presented in Chapter 3. Chapter 4 describes some details not
covered in Paper A and Chapter 5 describes the ar system and some details of
the calibration procedure used in Paper 5. The first part ends with Chapter 6
which presents conclusions and directions for future work. The second part contains edited versions of unpublished work, Sjanic et al. (2011) in Paper A and
and unpublished work Axholt et al. (2011) in Paper B. Paper A is about the slam
problem using a camera and inertial sensors, a solution is provided using nonlinear least-squares and the results are evaluated on experimental data. Paper B
is about osthmd calibration where the is collected using spaam which is solved
using the Direct Linear Transformation algorithm. The parameter variance for
different acquisition methods using spaam is evaluated in both experiments and
simulations.
2
Models
In this chapter we outline the most important model structures which are used
in the publications in Part II. The models describe rigid body kinematics and
dynamics in three dimensions, the pinhole camera and inertial measurements.
2.1
Kinematics
Kinematics describe the motion of bodies without considering the forces causing the motion. Kinematic transformation between coordinate frames consist of
length preserving translations and rotations which is illustrated in Figure 2.1.
zw
yc
zc
yw
zw
yc
zc
yw
w
cw
xw
x
xc
(a) A rotation is a
displacement of the
axes while the origin
is kept fix.
w
c
xc
(b) A translation is a displacement of the origin while the
axes are kept aligned.
Figure 2.1: Rigid body transformations.
7
8
2.1.1
2
Models
Notation
Throughout the thesis, superscript lower case letters are used to denote the associated coordinate frame. Subscript lower case letters denote, either a coordinate
and/or a time instant. The common convention is to use bold font lower case
letters to denote vectors and upper case letters to denote matrices. An example is
 c 
vx,t 
 c 
vtc = vy,t  = Rcw vtw ,
(2.1)
v c 
y,t
vtw
where the time dependent vector
in coordinate frame w is expressed in the
coordinate frame c by a matrix multiplication with Rcw .
2.1.2
Translational Kinematics
A translation is a displacement of the body origin c while the axes are kept
aligned. The translation is a vector, cw , from the origin of the frame w to the
coordinate c expressed in the w frame and it is also called the position.
2.1.3
Rotational Kinematics
Proper rotations are maps that preserve length and orientation, i.e., the handedness of the basis. These maps are actually linear, hence, rotations can be done
using matrices. There are many other parametrisation alternatives to rotation
matrices and perhaps the most common ones are Euler angles and the unit quaternion, see Section 2.1.4.
2.1 Definition (Special Orthogonal Group SO(3)). A matrix R ∈ R3×3 belongs
to the special orthogonal group SO(3) if and only if it holds
R T R = I3 ,
det R = 1.
(2.2a)
(2.2b)
Figure 2.1a describe a rotation from the w frame to the c frame. This can be
parametrised by the rotation matrix Rcw . The inverse rotation direction using (2.2a)
is (Rcw )−1 = (Rcw )T , hence (Rcw )T = Rwc .
2.2 Example: Rotation
A vector in the a-frame, x a ∈ R3 , is rotated to the b-frame by Rba ∈ SO(3) according to
x b = Rba x a ,
(2.3)
2.1
9
Kinematics
and using (2.2a)
||x b ||2 = ||Rba x a ||2
T
(x b )T x b = (x a )T Rba Rba x a
b 2
⇐⇒
(2.4a)
⇐⇒
(2.4b)
a 2
||x || = ||x || ,
(2.4c)
hence, the length is preserved. Note that using (2.2b) we have
T
2
det Rba = det Rba =⇒ det Rba = 1
(2.5)
and the positive root of (2.5) assures Rba ∈ SO(3).
2.1.4
Quaternions
Quaternions were invented by Sir William Rowan Hamilton (Hamilton, 1844) as
a tool to extend the imaginary numbers. The unit quaternion is widely used in
aerospace industry as well as computer graphics since it allows a computationally
efficient and singularity free rotation representation.
2.3 Definition (Quaternion). The quaternion is a four-tuple of real numbers
q ∈ R4 . An alternative definition, as in Kuipers (2002, page 104), is to define a
scalar part, q0 , and a real valued vector, q = e1 q1 + e2 q2 + e3 q3 , where e1 , e2 , e3 is
the standard orthonormal basis in R3 . With Kuipers definition we have
q = q0 + q,
(2.6)
which is a sum of a vector and a scalar which is not defined in ordinary linear
algebra. The full vector representation is then
 
q0  " #
q 
q
q =  1  = 0 .
(2.7)
q
q2 
q3
Rotation using Quaternions
A quaternion can be used to represent a rotation in R3 but in order to do so it must
be constrained to the unit sphere, i.e., q T q = 1, hence the name unit quaternion.
Let
"
#
cos δ
ba
q =
,
(2.8)
sin δn
which is a unit quaternion describing a rotation of an angle 2δ around the unit
vector n ∈ R3 . Then a rotation using this unit quaternion is
x̃ b = q ba x̃ a q ab
(2.9)
where x̃? = [0, x? ] are the vectors’ quaternion equivalents denote the quaternion multiplication, see e.g., Törnqvist (2008). The unit quaternion can also be
10
2
Models
used to construct a rotation matrix, see for instance Shuster (1993, page 462),
where the superscript ab is omitted for the sake of readability,
 2

2(q1 q2 + q0 q3 )
2(q1 q3 − q0 q2 ) 
(q0 + q12 − q22 − q32 )


R(q ab ) =  2(q1 q2 − q0 q3 )
(q02 − q12 + q22 − q32 )
2(q2 q3 + q0 q1 )  = (2.10)


2(q1 q3 + q0 q2 )
2(q2 q3 − q0 q1 )
(q02 − q12 − q22 + q32 )
 2

(2q0 + 2q12 − 1) 2(q1 q2 + q0 q3 ) 2(q1 q3 − q0 q2 ) 


=  2(q1 q2 − q0 q3 ) (2q02 + 2q22 − 1) 2(q2 q3 + q0 q1 )  ,
(2.11)


2(q1 q3 + q0 q2 ) 2(q2 q3 − q0 q1 ) (2q02 + 2q32 − 1)
where the last equality is established using q02 + q12 + q22 + q32 = 1. Throughout the
thesis R(q ab ) and Rba will be used interchangeably.
2.2
Dynamics
As rigid bodies are studied, a fixed coordinate frame can be assigned to the local body frame to reveal the time-evolution of the body with respect to another
coordinate frame.
2.2.1
Translational Dynamics
A body, in the b frame, with mass will be influenced by the Earths’ gravitational
field. Assuming that the gravity component, ge = [0 0 − 9.81]T m/s2 , is constant
we have
v̇te = aet − ge = Reb abt − ge ,
(2.12)
meaning that the e-referenced acceleration depends on the orientation of the
body. If there are errors in the rotation, the gravity subtraction will cause v̇te to
deviate from the truth and this will result in velocity and position errors if (2.12)
is used to resolve theses quantities. This is commonly known as gravity leakage
and it contributes to the inherent drift in inertial navigation systems.
2.2.2
Rotational Dynamics
The differential form of the unit quaternion is bilinear in the angular velocity and
is given by

 be
b
b
b   be 
be
be 
 0 −ωx −ωy −ωz  q0,t 
−q1,t −q2,t −q3,t   b 



 b

be 
be
be 
  ωx 
 be
0
ωzb −ωyb  q1,t
1 ω
 1  q0,t −q3,t q2,t  ω b 
=
q̇tbe =  xb
  y  , (2.13)
  be 
 be
be 
be
  b 
 2  q3,t
0
ωxb  q2,t
q0,t
−q1,t
2 ωy −ωzb



  be 
 b
 be
be
be  ωz
ωz ωyb −ωxb
0
−q2,t q1,t
q0,t
q3,t
|
{z
}
|
{z
}
be
b
e
S (qt )
S(ωt )
here, ωtb is the instantaneous angular velocity of the moving body frame b relative
to the fixed e frame, expressed in the b frame. For a complete derivation see
Törnqvist (2008).
2.3
11
Inertial Measurements
2.2.3
Dynamic Models
Assume that the acceleration, abt , the gravity, ge , and the angular acceleration, ωtb ,
are known. This leaves us a state vector resolved in the e frame
 e
 pt 
 
xte =  vte  ,
(2.14)
 be 
qt
where pet , vte and qtbe , represents the position, velocity and the rotation (attitude)
of the b frame with respect to the e frame, respectively. Obviously
p̈et = v̇te = aet ,
and by using (2.13) and (2.12) the dynamic model becomes



 e  0 I
0
0   pet  

 ṗt  





T
 
0   vte  + R q be ab − ge  .
ẋte =  v̇te  0 0
t
t





 be  

0 0 12 S ωtb  qtbe
q̇t
0
2.3
(2.15)
(2.16)
Inertial Measurements
Inertial sensors measure specific force and angular velocity and they should be
modeled as states which are measured. However, to reduce the dimension of the
state space needed the inertial measurements are considered to be inputs.
2.3.1
Gyroscopes
The output of the gyroscopes is
ωtb = ubω,t + wω,t ,
(2.17)
where ubω,t is used to stress that the angular velocity is considered to be an input.
Furthermore, wω,t is noise. Using (2.13) the quaternion dynamics becomes
1 1 e be 1 qt wω,t .
(2.18)
q̇ be = S ubω,t + wω,t qtbe = S ubω,t qtbe + S
2
2
2
Due to unmodeled effects in the sensors, a bias, δω,t , could be added to (2.17)
giving
ωtb = ubω,t + δω,t + wω,t ,
(2.19)
and δω,t could also be included in the state vector.
2.3.2
Accelerometers
Contrary to what the name implies, an accelerometer does not only measure acceleration when used on earth, since it is subject to the gravitational field. Hence,
accelerometers measure the specific force which is a sum of the free acceleration
and the gravity field. The accelerometer is also treated as a known signal and
12
2
Models
output becomes
e
b
,
= Rbe aet − ge + wa,t
uba,t = abt − gb + wa,t
(2.20)
b
where wa,t
is measurement noise. It should be clear that the measured specific
force depends on the attitude Rbe . If the noise is assumed to be zero mean Gaussian with the same variance in all components, (2.20) has the same statistical
properties as
b
uba,t = Rbe (aet − ge ) + wa,t
,
(2.21)
i.e., the noise does not depend on the coordinate frame.
2.3.3
Discrete-time Dynamic Models
The sensors deliver sampled data and the dynamic model (2.16) therefore needs
to be expressed in discrete time. This can be done in many ways. Assume that
the angular velocity is constant over the sampling interval T and free from noise,
then (2.18) integrates to
T
b
qt+1 = e 2 S (uω,t ) qtbe ,
approximating the exponential results in
T qt+1 ≈ I + S ubω,t qtbe ,
2
(2.22)
(2.23)
and assumed that the noise from (2.16) account for the exponential approximation as well as the true sensor noise we get
T T e be qt+1 = qt + S ubω,t qtbe + S
qt wω,t .
(2.24)
2
2
The discete-time dynamic models for the position and velocity can be computed
in a similar fashion. Assuming that ȧet = 0, from (2.15) we then have
 e 
  e
ṗt  0 I 0 pt 
 v̇ e  0 0 I   v e 
(2.25)
 t  = 
  t 
 
 e 
ȧt
0 0 0 aet
Assuming that the noise input is constant during the sampling interval, the discretetime translational dynamics is
 e  
2   e
pt+1   I T I T2 I  pt 
 v e  
 e
(2.26)
 t+1  = 0 I
T I   vt  .
 e  
 ae
at+1
0 0
I
t
2.4
13
The Pinhole Camera Model
xp
p
yp
mc
n
m
z
c
c
xi
xc
h
f
yc
yi
Figure 2.2: Pinhole projection with image the plane placed in front of the
camera centre.
Using (2.21), (2.24) and (2.26) we get the full discrete-time dynamic model

 e 
 e   2
#
0  "
 pt  I3 T I3 0  pt−1   T2 I3
be T b
R(qt−1
) ua,t + ge



e 


 vte  =  0


I3
0   vt−1  +  T I3
0 
be
  
  be  
S(ubω,t )qt−1
T 
0
0
I4 qt−1
qtbe
0
2 I4
T2

 2 I3
 " b #
0

 wa,t

 e
+  T I3
(2.27)
0


T e be  wω,t
S(q
)
0
t−1
2
which is also to be found in Section 3.1 in Paper A as (5) but is included here
for completeness. It is also worth noting that this model integrates the errors in
the sensor measurements directly, whereas if the sensor readings are modeled as
measurements we would get low-pass filtering, as pointed out in Callmer (2011).
2.4
The Pinhole Camera Model
The pinhole camera is a mathematical model describing how points in R3 relate
to points in R2 on the image plane through a central projection. The model only
contains five intrinsic parameters and for most systems it is an approximation
since it does not model illumination, contrast, lenses and lens-distortion, among
others. Additionally, the position and orientation (pose) of the camera centre
with respect to some reference frame is also needed if the camera is only a part of
a system in which we are interested in. This adds another 6 degrees of freedom
(dof) to the pinhole camera model.
2.4.1
Camera Coordinates
Figure 2.2 illustrates a frontal pinhole projection. In the pinhole camera model
the coordinate frames are:
14
2
Models
• c - Camera frame with the optical centre as the origin where its z-axis coincides with the optical axis, also known as the principal axis.
• w - World frame.
• i - Image frame. The image plane is orthogonal to the optical axis and has
the principal point h as its origin. The distance between the c and the i
frame is called the focal length f .
• n - Normalised image frame.
• p - Pixel frame with center in the upper left corner as viewed from the
optical centre c.
A point mc = [x c , y c , z c ]T in the c-frame relates to a point mn = [x n , y n ]T in the
n-frame as
" #
" n#
f xc
x
(2.28)
= c c .
yn
z y
It is convenient to write express (2.28) as a linear system which can be done by
appending unit elements to the vectors giving the homogeneous coordinate representation
 
 n
 n 
 xc 
x 
x  1 0 0 0 y c 
y n  ∝ z c y n  = 0 1 0 0  c  .
(2.29)
 
  
  z 
1
1
0 0 1 0  
|
{z
} 1
Π0
With focal length f = 1 the normalised image frame n and the image frame i
coincides. The projection (2.29) can be expressed as
λmn = Π0 mc ,
(2.30)
where λ ∈ R+ is an arbitrary scale factor.
Intrinsic Parameters
The digitalisalisation of the image plane i is the pixel plane p
" p# "
#" # "
#
fx sα x i
hx x i
x
,
=
+
0 fy y i
yp
hy y i
or in homogeneous coordinates
  i
 p 
x  fx sα hx  x 
y p   0 f
hy  y i  ,
  = 
y
 
  
1
0 0 1 1
|
{z
}
(2.31)
(2.32)
K
where K is referred to as the intrinsic parameter matrix or simply the calibration
matrix. The parameters of the calibration matrix are the focal lengths in pixel
units fx = sx /f and fy = sy /f where f is the focal length expressed in meters per
2.4
15
The Pinhole Camera Model
m
Rcw T mc
c
mw
w
cw , c )
w
(R
Figure 2.3: Camera to world transformation.
pixel and sx , sy are the pixel sizes in metric units. The center of the image plane
h = [hx hy ]T , is the principal point coordinate and sα = fx tan α is a skew parameter. In most cameras it is reasonable to assume sα ≈ 0 (Hartley and Zisserman,
2004).
Distortion
Most cameras suffer from non-negligible distortion due to the optical lenses involved. This can be compensated for by tangential- and radial distortion models
(Ma et al., 2004; Zhang, 2000), which are provided in camera calibration software,
see e.g., Bouguet (2010).
Extrinsic Parameters
Points in another frame, say w, can be expressed in the camera frame c. In the
pinhole camera model, such transformation is called extrinsic since it does not
depend on the intrinsic camera kalibration matrix K. Figure 2.3 describes the
relation between a point mw in world coordinates w expressed in camera coordinates c
mc = Rcw (mw − cw ) = Rcw mw − Rcw cw ,
which can be written in homogeneous coordinates as
 c
 w
x  "
x 
#
cw −Rcw cw 
y c 
y w 
R
 c  =
 w  .
 z 
 z 
0
1
 
 
|
{z
}
1
1
T cw
(2.33)
(2.34)
16
2.4.2
2
Models
World to Pixel Coordinates
Combining the extrinsic end intrinsic parameters, coordinates in the world frame
can be expressed in pixel coordinates as
 w
 p
 p 

"
x 
#
f
s
h
x
x
1
0
0
0
 
x
 p 
 p   x α
 Rcw −Rcw cw y w 


c
y  ∝ z y  =  0 fy hy  0 1 0 0
 w  ,
(2.35)
 
  
 
 0
 z 
1
 
1
1
0 0 1 0 0 1 0
|
{z
} 1
P
where P is called the camera matrix or projection matrix. In compact notation (2.35)
is
h
i
x p = K Rcw −Rcw cw x w .
(2.36)
3
Estimation
Estimation is the problem of taking measured data, yt , to reveal statistical properties, xt and θ, of systems which are of interest. Basic ingredients are model
structures representing the assumptions made on the interaction and time evolution of xt , θ and yt . As tools for solving estimation problems, state space models
are commonly used model structures and a fairly general description of such is
ẋt = f (xt , ut , θ, wt ),
yt = h(xt , θ, et ),
(3.1a)
(3.1b)
where f ( · ) and h( · ) are general nonlinear functions, xt are states, ut are known
inputs, θ are unknown or partially known parameters and wt and et are noise representing everything that cannot be predicted yet still affects the system. However, in this work, an important special case is considered where the noise are
assumed additive and Gaussian. Then the discrete time version of (3.1) is
xt = f (xt−1 , ut , θ) + wt ,
yt = h(xt , θ) + et .
(3.2a)
(3.2b)
The functions f ( · ) and h( · ) are assumed well behaved, meaning they are at least
one time continuously differentiable and smooth in the domain.
3.1
Sensor Fusion
Sensor fusion is the process of combining multi-sensory data in a clever way to obtain a state estimate x̂t or a state sequence estimate {x̂t }N
t=0 . Model parameters, θ,
are usually not of interest. The sensor fusion concept is illustrated in Figure 3.1,
where the input sensors could as well be sensor fusion nodes.
17
18
3
Sensor Fusion
Estimation
State Estimates
Sensor
Computing Device
Sensor
and
System Models
Sensor
Figure 3.1: Sensor data is processed in a sensor fusion node to obtain state
estimates.
An example of sensor fusion; An imu in combination with a vector magnetometer
in a single unit facilitate the possibility of estimating the absolute attitude of the
unit with respect to the local magnetic field and the local gravity field, assuming
the fields are constant and the sensor is still.
In Chapter 4 sensor fusion of an imu and a camera data is considered. Cameras
and IMUs are somewhat complementary sensors meaning that they together provide information which is readily not available from any of the sensors alone. As
imu measurements are noisy they only allow short time attitude estimation, actually, the absolute orientation cannot be obtained from an imu alone. On the other
hand, cameras allow long term attitude estimation (Montiel and Davison, 2006)
in case of moderate movement of the camera (not causing to much motion blur),
assuming robust detection and matching of interest points.
3.2
Calibration
We refer to calibration as the problem of estimating the parameter vector θ associated with the model structures (3.2a) and (3.2b). Some calibration problems
also require solving sensor fusion and parameter estimation jointly. In System
Identification literature calibration is usually called Grey-box Identification, see
e.g., Ljung (1999).
In Chapter 5 calibration of an osthmd is considered, where states, xt , are given
by an electro optical tracking device and the measurements, yt , are gathered by
humans in what is called a bore sighting exercise. The calibration is solved using
the Direct Linear Transformation (DLT) algorithm.
3.3
19
Nonlinear Estimation
3.3
Nonlinear Estimation
Methods of handling nonlinarities in estimation problems can broadly be divided into two classes; the first is nonlinear methods not requiering explicit linearisation where some examples are Uncented Kalman Filters (ukf) (Julier and
Uhlmann, 1997), particle filters (pf) (Gordon et al., 1993; Schön, 2006) the second
is to linearise the nonlinearities (at the current estimate) and use linear methods,
e.g., Extended Kalman Filters (ekf).
3.3.1
Extended Kalman Filter
The celebrated Kalman Filter(kf) (Kalman, 1960) provide unbiased estimates with
minimum variance if applied to linear systems which are subject to Gaussian
noise. For nonlinear systems, a natural extension is to linearise the nonlinear
functions by a Taylor expansion at the current estimate and then propagate the
mean and the covariance approximations. This can be done in an Extended
Kalman Filter (ekf), which is described in many books, see e.g., Kailath et al.
(2000). The ekf works in a two step procedure summarised in Algorithm 1 below
where the parameters, θ, are omitted and the sampling time is, T = 1.
Algorithm 1 Extended Kalman Filter
Require an initial state, x̂0|0 , and an initial state covariance, P0|0 , and use the
models (3.2).
1. Time Update
x̂t|t−1 = f (x̂t−1|t−1 , ut ),
(3.3a)
Pt|t−1 = Ft Pt−1|t−1 FtT + Qt ,
2. Measurement Update
Kt = Pt|t−1 HtT (Ht Pt|t−1 HtT + Rt )−1 ,
x̂t|t = x̂t|t−1 + Kt (yt − h(x̂t|t−1 )),
Pt|t = Pt|t−1 − Kt Ht Pt|t−1 .
In Algorithm 1 we have defined
∂f (xt , ut ) Ft ,
∂xt
,
Ht ,
(xt ,ut )=(x̂t−1|t−1 ,ut )
∂h(xt ) ,
∂xt (xt )=(x̂t|t−1 )
(3.3b)
(3.4a)
(3.4b)
(3.4c)
(3.5)
furthermore, Qt and Rt are the covariance matrices of wt and et , respectively.
3.3.2
Nonlinear Least-Squares
The objective in nonlinear least-squares is to estimate parameters, θ, by minimising the residuals, εt (θ) = yt − ht (θ), see for instance Nocedal and Wright (2006).
The residuals are minimised in a least-squares sense
N
N
t=1
t=1
1X
1X
||εt (θ)||22 =
εt (θ)T εt (θ).
V (θ) =
2
2
(3.6)
20
3
Estimation
dε(θ)T
Define ε(θ) = [ε1 (θ) ε2 (θ) . . . εN (θ)]T and the Jacobian J(θ) = dθ , then the
gradient and the Hessian of (3.6) with respect to the parameters are given by
N
dε (θ)
dV (θ) 1 X
εt (θ) t
=
= J(θ)ε(θ),
dθ
2
dθ
(3.7)
t=1
and
N
d 2 V (θ)
d 2 εt (θ)T
1X
= J(θ)T J(θ) +
,
εt (θ)
2
2
dθ
dθ 2
(3.8)
t=1
respectively. The extension to multivariable residuals is easily obtained by stacking the vectorisation of the individual residuals which again gives a scalar cost
function, see Gustafsson (2010, page 49).
The Gauss-Newton method is a popular second order method used for solving
nonlinear least-squares problems. It is computationally simple since the Hessian
of the objective function is approximated as
d 2 V (θ)
≈ J(θ)T J(θ),
(3.9)
dθ 2
significantly reducing computations. The Gauss-Newton method as an algorithm
is summarised in Algorithm 2
Algorithm 2 Gauss-Newton
1. Require initial an estimate θ̂
2. Compute the Jacobian J(θ̂)
3. Compute a search direction p by solving
J(θ̂)J(θ̂)T p = −J(θ̂)ε(θ̂).
(3.10)
4. Compute a step length, α, such that the cost (3.6) is decreased. This can
be done using line search, see e.g., Nocedal and Wright (2006, page 297)
or Boyd and Vandenberghe (2004, page 464).
5. Update the parameters
θ̂ := θ̂ + αp.
(3.11)
6. Terminate if a stopping criteria is met. Such criteria can be; the change in
cost is small, the number of iterations has exceeded some threshold, among
others.
7. Otherwise, return to step 2.
4
Visual Inertial Navigation and
Mapping
Simultaneous Localisation and Mapping (slam) emerged in the field of mobile
robotics as a tool to enhance navigation and to infere properties of the surounding environment by means of the onboard sensors. A lot of attention was spent
on indoor platforms equipped with wheel encoders for odometry and line sweeping laser range scanners measuring the environment. Today, all sorts of platforms and sensors are used in slam applications e.g., submarines (Eustice et al.,
2006), monocular camera (Davison et al., 2007; Davison, 2003; Eade, 2008; Klein
and Murray, 2007), aerial platforms (Karlsson et al., 2008; Caballero et al., 2009;
Bryson and Sukkarieh, 2009; Lupton and Sukkarieh, 2009, 2008; Skoglund, 2008)
and even domestic robots, Neato Robotics (2011). In this thesis, the sensors are
an imu and a camera contained in a single unit.
4.1
Visual and Inertial
EKF- SLAM
EKF-SLAM is probably the most common slam method and it is often straightforward to implement as described in e.g., Durrant-Whyte and Bailey (2006); Smith
et al. (1990). For a thorough treatment, the book by Thrun et al. (2005) serves
as a standard reference. In feature based slam, coordinates in the global frame
are explicitly represented as landmarks, l, which are part of the state vector. The
standard assumption is that the landmarks are stationary but alternatives exist, in
Bibby and Reid (2007) a model structure allowing dynamic objects is considered.
Some feature characteristics of the landmarks have to be detected and associated
to the measurements. This is known as the data association problem in slam, and
it depends on the sensors and the environment, among others. For a camera sensor, the data association is the 2D/3D correspondence between detected features
in images and landmarks in the landmark map. We solve this using the Scale
21
22
4
Visual Inertial Navigation and Mapping
Invariant Feature Extractor (SIFT), Lowe (1999), see Section 4.1 in Paper A.
In our application the camera and imu deliver data at different rates. This should
be distinguished in the equations as well as in the implementation. Denote the
time when camera measurements arrive by tk , then the dynamic, the landmark
and the measurement models are given by
xt = f (xt−1 , ut ) + Bw wt ,
lt = lt−1 ,
ytk = h(xtk , ltk ) + etk ,
(4.1a)
(4.1b)
(4.1c)
which is also found as (1) in Paper A. The ekf given in Algorithm 1 applies to (4.1)
with just a few modifications.
4.1.1
Prediction
The prediction step in EKF-SLAM is given by
x̂t|t−1 = f (x̂t−1|t−1 , ut ),
lˆt|t−1 = lˆt−1|t−1 ,
xx
Pt|t−1
=
xx
Ft Pt−1|t−1
FtT
(4.2a)
(4.2b)
+
Bw (x̂t−1|t−1 )QBTw (x̂t−1|t−1 ),
(4.2c)
where
∂f (xt−1 , ut ) Ft ,
∂xt−1
,
(4.3)
(xt−1 ,ut )=(x̂t−1|t−1 ,ut )
and
"
Q=
Qa
0
#
0
.
Qω
(4.4a)
Note that only the vehicle state covariance is updated. The full covariance matrix
is
" xx
#
Pt
Ptxl
Pt = lx
.
(4.5)
Pt
Ptll
Also, when new landmarks are initialised they are appended to the state vector.
4.1.2
Measurement Update
Assuming the data association is given, then the measurement update for ekfslam is given by
"
Kt = Pt|t−1 HtT (Ht Pt|t−1 HtT + Rt )−1 ,
# "
#
x̂t|t
x̂t|t−1
= ˆ
+ Kt (yt − h(x̂t|t−1 , lˆt|t−1 )),
lˆt|t
lt|t−1
Pt|t = Pt|t−1 − Kt Ht Pt|t−1 ,
(4.6a)
(4.6b)
(4.6c)
4.2
23
Nonlinear Least-Squares and slam
where
Ht ,
h
∂
h(xt , lt )
∂xt
i ∂
h(xt , lt )
∂lt
(xt ,lt )=(x̂t|t−1 ,lˆt|t−1 )
.
(4.7)
When the unit quanterion is used for rotation parametrisation in an EKF it has to
be normalised
qt|t
qt|t :=
,
(4.8)
||qt|t ||
preferably after each measurement update. An alternative is to introduce the
quaternions’ deviation from unity as a fictive measurement.
4.2
Nonlinear Least-Squares and
SLAM
The basic concept of nonlinear least-squares for slam is to acquire an initial estimate, linearise the models around the estimate and minimise all the measurement errors and the state trajectory errors in a least-squares sense.
0
ekf-slam (Section 4.1) provide an initial trajectory, x0:N
, and an initial landmark
0
0
estimate, lN . The linearised dynamics around xt at time t is then
0
xt0 + δxt = f (xt−1
, ut ) + Ft δxt−1 + Bw wt ,
|{z}
(4.9)
et
w
xt0
where δxt is a small deviation from
and Ft is the derivative of f (xt−1 , ut ) with
0
0
respect to xt−1 evaluated in xt−1
. The linearised measurements around lN
and xt0k
for the measurements at time tk is
0
ytk = h(xt0k , lN
) + Htk δxtk + Jtk δlN + etk ,
(4.10)
0
where δlN is a small deviation from lN
, Htk is the derivative of h(xtk , lN ) with
0
0
respect to xtk evaluated at xtk and lN and Jtk is the derivative of h(xtk , lN ) with
0
respect to lN evaluated at xt0k and lN
.
By rearranging the terms in (4.9) and (4.10) we can formulate the problem of
finding the trajectory and map deviations that minimises the noise variances as
min
δxt , δlN
N
X
t=1
et (δxt )||2e−1 +
||w
Qt
K
X
k=1
||etk (δxtk , δlN )||2R−1 .
(4.11)
tk
et = Bw Qt BTw and ||x||2 −1 = xP −1 x T . This is in principle a (weighted)
where Q
P
nonlinear least-squares problem which gives an increment for the parameters,
T
T T
θ = [x0:N
lN
] , and the Gauss-Newton algorithm (Algorithm 2) applies.
4.2.1
Remarks
There are, at least, two problems associated to (4.11) which are briefly discussed
below.
24
4
Visual Inertial Navigation and Mapping
Process Noise
et is singular, see (2.27) on page 13. This is because the veThe process noise, Q
locity is affected by the same noise as the position and the quaternion only has
3dof but 4 states. It turns out that a solution does not have to be consistent with
the dynamic model. A solution to this is to add the linearised dynamic model
as a constraint to the problem (4.11) which results in a constrained quadratic
program.
Quaternion
The unit quaternion is constrained to the unit sphere, q T q = 1, which is a nonconvex constraint that also should be handled. A suggestion is to use a local 3
parameter approximation of the unit quaternion, Triggs et al. (2000); Kümmerle
et al. (2011).
5
Optical See-Through Head Mounted
Display Calibration
The Optical See-Through Head Mounted Display (osthmd) calibration process is
introduced by giving an overview of the ar system and its geometry, explaining
the measurement procedure and the Direct Linear Transformation used for calibration. This chapter also explains some parts not covered in Paper B.
Figure 5.1 show the Kaiser ProView 50ST osthmd which was used in the experiments, Axholt et al. (2011). The red lights are tracker markers which are measured by the tracker system.
5.1
System Overview
The ar system consists of an osthmd and a motion tracking system briefly described below.
5.1.1
Head Mounted Display
In a pinhole camera model the light pass through a lens system and then hits the
camera sensor. In an osthmd there is no camera sensor, yet an optical combiner
serves the same purpose when used in a pinhole camera model. A simplified
illustration of how the graphics is created in th osthmd can be seen in Figure 5.2.
The plane-to-plane homography of the pixels from the beamer screen, through a
collimator and onto the ost semitransparent screen (optical combiner), is one of
the paths light travel before reaching the eye and this is how the virtual overlay is
created onto the scene. The other path of light comes from the scene, which does
not pass through the beamer or the collimator. Note that the pixels are distorted
by the relative rotation of the planes involved.
25
26
5
Optical See-Through Head Mounted Display Calibration
Figure 5.1: osthmd, the red lights are used by the optical tracker system
consisting of several cameras.
5.1.2
Tracker
The optical tracker system, which is a PhaseSpace IMPULSE motion capture, measure the position of tracker marks (red diodes) and provides 6dof pose at a maximum of 200Hz, but due to software limitations the calibration application only
runs at 60Hz. The tracker is used in two ways. Firstly, 11 tracker marks are
attached to the hmd defining a rigid body of which the tracker provides 6dof
pose. Secondly, a single tracker mark is mounted on a tripod and its position is
also reported by the tracker. This single tracker mark is called the background
marker.
5.2
Calibration
Traditional camera calibration considering electro optical cameras is a well studied problem with many effective solutions and freely available software providing very accurate calibration, almost automatically, see e.g., Bouguet (2010) or
OpenCV. In osthmd calibration good results are difficult to obtain for various
reasons, for example, equipment is fairly non-standard and requires special solutions, data acquisition is time consuming and errors are easily introduced (the
screen shifts about 30-40 pixels by just frowning!).
The osthmd Figure 5.1 is quite a complex device and with few calibration data
we are forced to approximations reducing the parameter space. With Figure 5.2
in mind, some assumptions motivating the pinhole camera model as an approxi-
5.2
27
Calibration
c
zc
xc
xi
yi
xs
ys
yc
xb y b
xi
yi
xs
(a) 3D illustration of how the graphics from the beamer b
screen is displayed onxbthe
virtual image i as the reflected light
yb
falling on the semitransparent screen s. The black dot represents the beamer light source. The c coordinate frame represents the user’s eye which corresponds to the camera centre in
the pinhole camera model.
ys
(b) 2D view of the beamer screen (left), ost display (middle)
and the image plane (right).
Figure 5.2: Projection from an image on the beamer, b, to the screen s onto
the image plane, i. The beamer is rotated around the x b axis while the screen
is rotated around its x s and y s resulting in a distorted image. This is because
projective transformations do not necessary preserve angles. The distance
from the c frame to the s frame is the focal length, f .
28
5
Optical See-Through Head Mounted Display Calibration
mate model structure for the osthmd are:
5.1 Assumption. The rotation between the frames b and s is small, i.e., Rbs ≈ I,
hence, the shape of the pixels projected onto the screen are rectangular.
5.2 Assumption. The pixel plane is perpendicular to the optical axis. Without this assumption another rotation has to be estimated, Robinett and Rolland
(1991).
5.3 Assumption. The pixels, as perceived by the user, are points. This is true
for most users with uncorrected vision because of the VGA resolution(640 × 480
pixels) of the ost display and the distance of the optical combiner to the user’s
eye makes the pixel shape almost indistinguishable.
5.2.1
Tracker Measurements
The aim of the measurement procedure is to gather N correspondences between
the background marker x w and screen coordinates pixel x p , we denote this by
p
xi ↔ xiw , i = 1, . . . , N . This is done manually in a so called boresighting exercise
where the user aligns the background marker with crosshair displayed on the ost.
This is done N times as described in Section 3.5, in Paper B.
The tracker provide the position of the tracker markers referenced in the tracker
frame t. The markers attached to the hmd is defined as a rigid body with its origin
defined in one of the markers for which the tracker reports the full 6 dof pose.
To simplify calculations the hmd is used as the base frame w and it is considered
to be inertial. Since the background marker is also measured by the tracker it
can easily be referenced in the hmd frame. Let T wt denote the homogeneous
representation of Euclidean transformation from the t frame to w frame, then
the background marker x t is referenced in the w frame as
x w = T wt x t ,
(5.1)
where the vectors are homogeneous.
This is a great advantage since we would otherwise need to know the exact location of the background marker in a global frame with respect to the tracker frame.
Furthermore, this is basically the same idea as in the Single Point Active Alignment Method (SPAAM) of Tuceryan and Navab (2000) where their background
marker is the approximate tracker origin.
Using (5.1) and the pinhole camera model (2.35) on page 16, the screen coordinates are related to the background marker as
λx p = P x w ,
where λ ∈ R accounts for the arbitrary scale.
(5.2)
5.2
29
Calibration
5.2.2
Direct Linear Transformation
The Direct Linear Transformation (DLT) (Abdel-Aziz and Karara, 1971) is a pap
rameter estimation method which takes N point correspondences xi ↔ xiw , i =
1, . . . , N , given by
p
λi xi = P xiw ,
(5.3)
where all λi are unknown scale factors. The sought parameters are the entries in
the 3 × 4 camera matrix P . In our setup, P defines the pinhole camera model as
p
in (2.35), where xi are pixel coordinates and xiw coordinates in the w-frame. The
unknown scale factors λi are removed by applying the cross product
p
p
p
λi xi × xi = xi × P xiw ,
| {z }
(5.4)
=0
which can be rearranged into

−xiw T
 0T
 w T
 xi
0T
 p T
p
w
−yi xi
−xi xiw T

p
yi xiw T 

p
−xi xiw T 

0T
 1
p 
 2 
p  = 0,
 3
p
|{z}
(5.5)
p
where pi , i = 1, 2, 3 are the transposed rows vectors of P , i.e., the vectorisation
of P . The are only 2 linearly independent equations in (5.5), so it is possible to
leave one out. Since there will be measurement errors, in either x p or x w , or both,
the right hand side of (5.5) will not be exactly zero. It is therefore preferable to
collect more than the six correspondence points which is the minimum required
to solve (5.5) for P . Taking the N correspondences and stacking the entries in a
tall 3N × 12 matrix A we obtain
Ap = ,
[p1
p2
(5.6)
p3 ]T
where p =
and are the residual errors associated to the correspondences. The residual errors can be minimised by the following optimisation problem
min ||Ap||
p
(5.7)
subject to ||p|| = 1,
which has a non-degenerate solution if the correspondence point configuration is
general, see Section 2.3 in Paper B. The solution correspondonds to the smallest
nonzero singular value of A. The camera matrix has only 11 dof since the scale is
unknown, hence, A has only 11 nonzero singular values. We must therefore add
a constraint to p since otherwise the solution would be q = 0. The unit singular
vector associated to the smallest nonzero singular value gives the minimising
argument p̂ and hence the camera matrix Pˆ is obtained. For a more complete
discussion on DLT, see Hartley and Zisserman (2004); Sutherland (1974).
30
5.2.3
5
Optical See-Through Head Mounted Display Calibration
Decomposition
The 3 × 4 camera matrix P can be decomposed into extrinsic parameters, Rcw and
cw , and the upper triangular intrinsic parameter matrix K. These parameters are
needed for analysis but are also required to assemble the OpenGL matrices for
display configuration of the ar graphics in VR Juggler. The decomposition is
obtained from (2.36) as
h
i
h
i
P = K Rcw −Rcw cw = K Rcw I −cw .
(5.8)
Since Rcw is orthogonal (Rcw Rcw T = I) and K is upper triangular, the matrices
can be found as the RQ-decomposition of the left 3 × 3 sub-matrix of P . The
ambiguity in the decomposition can be removed by requiring that K has positive
diagonal entries (Hartley and Zisserman, 2004). Also, K is normalised by its last
entry
K :=
K
.
K33
The camera centre (eye point location) is in the right null space of P
" w#
c
P
= 0,
1
(5.9)
(5.10)
and cw can be recovered as
cw = −P † p4 ,
(5.11)
where p4 is the last column of P and † denote the pseudo inverse.
5.2.4
Remarks
The DLT minimises the error, ||Ap||, which is called the algebraic error and it is
usually difficult to analyse geometrically. Actually, we wouldlike to maximise
the
p
probability of the parameters given the correspondences, p P ; xi , xiw . This can
p
easier be done using Maximum Likelihood, p xi , xiw ; P , where error measures
have to be derived. These methods are called Gold Standard in Hartley and Zisserman (2004) and usually results in nonlinear least-squares which is initialised
with an estimate from e.g., DLT. Under the assumption of zero mean independent
and identically distributed Gaussian noise in the pixels then
min
P
N
X
p
||xi − P xiw ||2 ,
(5.12)
i=1
is the Maximum Likelihood estimator for P . With the experimental data in Paper B and minimisation of (5.12) using Levenberg-Marquardt no further improvement is obtained. We believe this is because of a poor initial estimate. A more reasonable assumption, given our setup, is that there are errors in the background
marker coordinates, xiw , as well. This is due to the fact that the background
marker is measured, hence containing errors, and these measurements are referenced by T wt (5.1) which is also uncertain.
5.2
Calibration
31
Care should be taken interpreting the camera parameters as Euclidean from the
decomposition (Section 5.2.3). Even if the lens system in the osthmd is ideal there
is no guarantee that the ost display is perpendicular to the optical axis.
6
Concluding Remarks
6.1
Conclusions
The work in this thesis has dealt with two estimation problems that are related
by the pinhole camera model and inertial sensing.
We have shown how measurements from inertial and camera sensors can be used
in a compact model. The aim was to provide improved estimates of the navigation states and a landmark map. This was done by treating the filtered estimates
from ekf-slam as an initial solution to a nonlinear least-squares problem which
was solved using the Gauss-Newton method. The approach was evaluated on
experimental data and the results are encouraging.
We have presented a method for calibration of an Optical See-Through Head
Mounted Display system (osthmd) which is modelled as a pinhole camera. We
have shown and motivated why the pinhole camera model can be used to represent the osthmd and the user’s eye position. The camera model was estimated
using the Direct Linear Transformation algorithm. Results are presented in Paper B which also compare different data acquisition methods.
6.2
Future Work
Some suggestions for future research related to this thesis are:
• Vision Inertial Navigation and Mapping
– Utilise the inherent sparseness in the Jacobian of the full slam problem.
33
34
6
Concluding Remarks
– How well does the nonlinear least-squares approach perform compared
to other optimisation like methods such as iSAM (Kaess et al., 2008),
TORO (Grisetti et al., 2007)?
– How can inertial sensors and dynamic models be used in Bundle Adjustment (Triggs et al., 2000; Hartley and Zisserman, 2004)?
– Extend the relative pose calibration frame work of Hol et al. (2010);
Hol (2008) by adding a slam solution.
• osthmd Calibration
– Examine Cramér-Rao bounds for the parameters in the pinhole camera
model.
– In the longer run, display systems should be supported by eye tracking
devices which would then reduce the need for manual calibration.
Bibliography
Y. Abdel-Aziz and H. Karara. Direct linear transformation from comparator to
object space coordinates in close-range photogrammetry. In Proceedings of the
American Society of Photogrammetry Symposium on Close-Range Photogrammetry, pages 1–18, Falls Church, VA, USA, 1971.
M. Axholt, M. Skoglund, S. D. Peterson, M. D. Cooper, T. B. Schön, F. Gustafsson,
A. Ynnerman, and S. R. Ellis. Optical See-Through Head Mounted Display
Direct Linear Transformation Calibration Robustness in the Presence of User
Alignment Noise. In Proceedings of the 54th Annual Meeting of the Human
Factors and Ergonomics Society, volume 54, pages 2427–2431, San Francisco,
CA, USA, September - October 2010. Human Factors and Ergonomics Society.
M. Axholt, M. A. Skoglund, S. D. O’Connell, M. D. Cooper, S. R. Ellis, and A. Ynnerman. Parameter Estimation Variance of the Single Point Active Alignment
Method in Optical See-Through Head Mounted Display Calibration. In Proceedings of the IEEE Virtual Reality Conference, pages 27–34, Singapore, Republic of Singapore, March 2011.
C. Bibby and I. Reid. Simultaneous Localisation and Mapping in Dynamic Environments (SLAMIDE) with Reversible Data Association. In Proceedings of
Robotics: Science and Systems, Atlanta, GA, USA, June 2007.
J.-Y. Bouguet. www.vision.caltech.edu/bouguetj/calib_doc/, 2010.
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,
2004. URL http://www.stanford.edu/~boyd/bv_cvxbook.pdf.
M. Bryson and S. Sukkarieh. Architectures for Cooperative Airborne Simultaneous Localisation and Mapping. Journal of Intelligent and Robotic Systems, 55
(4-5):267–297, 2009. doi: 10.1007/s10846-008-9303-9.
F. Caballero, L. Merino, J. Ferruz, and A. Ollero. Vision-based odometry and
SLAM for medium and high altitude flying UAVs. Journal of Intelligent and
Robotics Syststems, 54(1-3):137–161, 2009. ISSN 0921-0296. doi: http://dx.
doi.org/10.1007/s10846-008-9257-y.
35
36
Bibliography
J. Callmer.
Topics in Localization and Mapping.
Licentiate thesis no.
1489, Department of Electrical Engineering, Linköping University, SE-581 83
Linköping, Sweden, May 2011.
J. Callmer, M. Skoglund, and F. Gustafsson. Silent Localization of Underwater
Sensors Using Magnetometers. EURASIP Journal on Advances in Signal Processing, 2010, 2010. doi: 10.1155/2010/709318. URL http://dx.doi.org/
10.1155/2010/709318. Article ID 709318.
A. J. Davison. Real-time simultaneous localisation and mapping with a single
camera. In Proceedings of the 9th IEEE International Conference on computer
vision, pages 1403–1410, 2003.
A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse. MonoSLAM: Real-time
single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1052–1067, 2007.
H. Durrant-Whyte and T. Bailey. Simultaneous Localization and Mapping: Part I.
IEEE Robotics & Automation Magazine, 13(12):99–110, June 2006.
E. Eade. Monocular Simultaneous Localisation and Mapping. PhD thesis, Cambridge University, 2008.
R.M. Eustice, H. Singh, J.J. Leonard, and M.R. Walter. Visually Mapping the RMS
Titanic: Conservative Covariance Estimates for SLAM Information Filters. International Journal of Robotics Research, 25(12):1223–1242, December 2006.
N.J. Gordon, D.J. Salmond, and A.F.M. Smith. Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEE Proceedings Radar and Signal Processing, 140(2):107–113, 1993. doi: 10.1049/ip-f-2.1993.0015.
G. Grisetti, C. Stachniss, S. Grzonka, and W. Burgard. A tree parameterization
for efficiently computing maximum likelihood maps using gradient descent. In
Proceedings of Robotics: Science and Systems (RSS), Atlanta, GA, USA, 2007.
F. Gustafsson. Statistical Sensor Fusion. Studentlitteratur, 2010. ISBN 978-94-4405489-6.
Sir W.R. Hamilton. On quaternions; or on a new system of imaginaries in algebra.
Philosophical Magazine, xxv:10–13, July 1844.
R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision.
Cambridge University Press, second edition, 2004. ISBN 0-521-54051-8.
J. Hol. Pose Estimation and Calibration Algorithms for Vision and Inertial
Sensors. Licentiate thesis no. 1370, Department of Electrical Engineering,
Linköping University, SE-581 83 Linköping, Sweden, May 2008.
J. Hol, T. B. Schön, and F. Gustafsson. Modeling and Calibration of Inertial and
Vision Sensors. The International Journal of Robotics Research, 29(2):231–244,
February 2010.
Bibliography
37
S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear
Systems. In Proceedings of AeroSense: The 11th International Symposium on
Aerospace/Defense Sensing, Simulation and Controls, SPIE proceedings series,
pages 182–193, Orlando, FL, USA, April 1997.
M. Kaess, A. Ranganathan, and F. Dellaert. iSAM: Incremental Smoothing and
Mapping. IEEE Transactions on Robotics (TRO), 24(6):1365–1378, Dec 2008.
T. Kailath, A. H. Sayed, and B. Hassibi. Linear Estimation. Prentice-Hall, Upper
Saddle River, New Jersey, 2000.
R.E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960.
A. Karlsson and J. Bjärkefur. Simultaneous Localisation and Mapping of Indoor
Environments Using a Stereo Camera and a Laser Camera. Master’s thesis, Department of Electrical Engineering, Linköping University, Linköping, Sweden,
2010.
R. Karlsson, T.B. Schön, D. Törnqvist, G. Conte, and F. Gustafsson. Utilizing
Model Structure for Efficient Simultaneous Localization and Mapping for a
UAV Application. In Proceedings of the IEEE Aerospace Conference, Big Sky,
MT, USA, March 2008.
G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces.
In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), pages 225–234, Nara, Japan, 2007.
J.B. Kuipers. Quaternions and Rotations Sequences. Princeton University Press,
Princeton University Press, 41 Williams Street, Princeton, New Jersey 08540,
2002. ISBN 0-691-10298-8.
R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. g2o: A
General Framework for Graph Optimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China,
May 2011.
L. Ljung. System Identification: Theory for the User. Prentice-Hall, Inc., Upper
Saddle River, NJ, USA, 2 edition, 1999. ISBN 0-13-656695-2.
D. Lowe. Object Recognition from Local Scale-Invariant Features. In Proceedings
of the Seventh International Conference on Computer Vision (ICCV’99), pages
1150–1157, Corfu, Greece, 1999.
T. Lupton and S. Sukkarieh. Removing scale biases and ambiguity from 6DoF
monocular SLAM using inertial. In Proceedings of the International Conference on Robotics and Automation (ICRA), pages 3698–3703, Pasadena, California, USA, 2008. IEEE. doi: 10.1109/ROBOT.2008.4543778. URL http:
//dx.doi.org/10.1109/ROBOT.2008.4543778.
T. Lupton and S. Sukkarieh. Efficient integration of inertial observations into visual SLAM without initialization. In Proceedings of the International Confer-
38
Bibliography
ence on Intelligent Robots and Systems (IROS), pages 1547–1552. IEEE, 2009.
doi: 10.1109/IROS.2009.5354267. URL http://dx.doi.org/10.1109/
IROS.2009.5354267.
Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry. An Invitation to 3-D Vision, From
Images to Geometric Models. Springer Science, 2004. ISBN: 978-0-387-008936.
J.M.M. Montiel and A.J. Davison. A visual compass based on SLAM. In Proceedings the IEEE International Conference on Robotics and Automation (ICRA),
pages 1917–1922, Orlando, Florida, USA, May 2006. doi: 10.1109/ROBOT.
2006.1641986.
Neato Robotics, 2011. [online] http://www.neatorobotics.com/, Visited
May 2011.
E. Nilsson. An optimization based approach to visual odometry using infrared
images. Master’s thesis, Department of Electrical Engineering, Linköping University, Linköping, Sweden, 2010.
E. Nilsson, C. Lundquist, T. B. Schön, D. Forslund, and J. Roll. Vehicle Motion Estimation Using an Infrared Camera. In Proceedings of the IFAC World Congress,
Milan, Italy, August - September 2011. To appear.
J. Nocedal and S. Wright. Numerical Optimization (Springer Series in Operations
Research and Financial Engineering). Springer, 2nd edition, July 2006.
W. Robinett and J. Rolland. A Computational Model for the Stereoscopic Optics
of a Head-Mounted Display. SPIE, 1457:140–160, 1991.
T. B. Schön. Estimation of Nonlinear Dynamic Systems - Theory and Applications. Linköping studies in science and technology. dissertations. no. 998, Department of Electrical Engineering, Linköping University, Linköping, Sweden,
February 2006.
M. D. Shuster. Survey of attitude representations. Journal of the Astronautical
Sciences, 41(4):439–517, 1993.
Z. Sjanic, M. A. Skoglund, T. B. Schön, and F. Gustafsson. A Nonlinear LeastSquares Approach to the SLAM Problem. In Proceedings of the IFAC World
Congress, Milan, August - September 2011. To appear.
M. Skoglund. Evaluating SLAM algorithms for Autonomous Helicopters. Master’s thesis, Department of Electrical Engineering, Linköping University,
Linköping, Sweden, 2008.
R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships
in robotics. In Autonomous robot vehicles, pages 167–193. Springer-Verlag
New York, Inc., New York, NY, USA, 1990. ISBN 0-387-97240-4.
I. E. Sutherland. Three-Dimensional Data Input by Tablet. Proceedings of the
IEEE, 62(4):453–461, 1974.
Bibliography
39
S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, September
2005. ISBN 0-262-20162-3.
D. Törnqvist. Estimation and Detection with Applications to Navigation.
Linköping Studies in Science and Technology. Dissertations. No. 1216,
Linköping University, November 2008.
B. Triggs, P. Mclauchlan, R. Hartley, and A. Fitzgibbon. Bundle adjustment - a
modern synthesis. In B. Triggs, A. Zisserman, and R. Szeliski, editors, Vision
Algorithms: Theory and Practice, volume 1883 of Lecture Notes in Computer
Science, pages 298–372. Springer-Verlag, 2000.
M. Tuceryan and N. Navab. Single Point Active Alignment Method (SPAAM)
for Optical See-through HMD Calibration for AR. In Proceedings of the
IEEE/ACM International Symposium on Augmented Reality, pages 149–158,
TUM, Munich, Germany, October 2000.
Z. Zhang. A flexible new technique for camera calibration. Pattern Analysis
and Machine Intelligence, IEEE Transactions on, 22(11):1330–1334, November
2000. ISSN 0162-8828. doi: 10.1109/34.888718.
Fly UP