...

Multispectral Imaging for Surveillance Applications Emil Riseby Alexander Svensson

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Multispectral Imaging for Surveillance Applications Emil Riseby Alexander Svensson
LiU-ITN-TEK-A-15/004-SE
Multispectral Imaging for
Surveillance Applications
Emil Riseby
Alexander Svensson
2015-02-20
Department of Science and Technology
Linköping University
SE- 6 0 1 7 4 No r r köping , Sw ed en
Institutionen för teknik och naturvetenskap
Linköpings universitet
6 0 1 7 4 No r r köping
LiU-ITN-TEK-A-15/004-SE
Multispectral Imaging for
Surveillance Applications
Examensarbete utfört i Medieteknik
vid Tekniska högskolan vid
Linköpings universitet
Emil Riseby
Alexander Svensson
Handledare Reiner Lenz
Examinator Daniel Nyström
Norrköping 2015-02-20
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –
under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för
ickekommersiell forskning och för undervisning. Överföring av upphovsrätten
vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for your own use and to
use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity,
please refer to its WWW home page: http://www.ep.liu.se/
© Emil Riseby, Alexander Svensson
Multispectral imaging for surveillance
applications
Emil Riseby & Alexander Svensson
March 5, 2015
Figure 1: A conceptual image of the light spectrum of a part in an image. The
last eight colors are in grayscale since they are outside the visible spectrum.
1
Abstract
Silicon based sensors is a commonly used technology in digital cameras today.
That has made such cameras relatively cheap and widely used. Unfortunately
they are constructed to capture and represent image quality for humans.
Several image applications work better without the restrictions of the visible
spectrum. Human visual restrictions are often indirectly put on technology by
using images showing only visible light. Thinking outside the box in this case
is seeing beyond the visible spectrum.
Camera
By removing the color filtering of a CMOS sensor and capturing images through
a series of 50 nm wide bandpass filters, a simple multispectral imaging camera
was created. The spectrum was thus limited to light that is absorbed by silicon
instead of restricted to the waverange of human vision.
Built-in automatic exposure control on the camera compensates for light transmittance differences between the filters. The bandpass filters were cycled through
using a motorized wheel that allowed for a sweep of 15 images to be captured
in 1-2 minutes.
Multispectral image processing
Having access to a wider light spectrum potentially gives more input to image processing and analysis. An example is vegetation that has a very high
reflectance in the near-infrared spectrum.
The captured images were used to explore different types of scenes and to apply
both skin segmentation and dehazing.
Preface
This master thesis was made in cooperation with the Department of Science
and Technology at Linköping University and Axis (Axis Communications AB).
It was carried out in Lund at Axis premises in spring 2014.
Axis Communications AB is a Swedish company that develops and manufactures
network surveillance cameras and adjoining software.
Acknowledgements
We appreciate all meaningful interactions with the employees at Axis and for
their help and friendliness. We would like to direct some special thanks as well:
Thanks to Reiner Lenz, our mentor at Linköping University, for support and
brainstorming.
The help and freedom given to us by Björn Benderius, our mentor at Axis, has
been outstanding.
Thanks to Jonas Hjelmström for designing the filter wheel and always being
helpful when needed.
The skin samples provided by Adam Nilsson and Hoa Truong were deeply appreciated.
Thanks to Can Xu for ideas.
A special thanks to Niclas Svensson for being a great surrogate mentor when
Björn was on parental leave.
1
Contents
1 Introduction
1.1 Background . . . . . . . . .
1.1.1 Near-infrared (NIR)
1.2 Objective . . . . . . . . . .
1.3 Outline . . . . . . . . . . .
1.4 Previous work . . . . . . . .
.
.
.
.
.
4
4
5
5
6
6
2 Camera setup
2.1 Capturing image data . . . . . . . . . . . . . . . . . . . . . . . .
9
9
.
.
.
.
.
3 Image acquisition
3.1 Reflectance . . . . . . . . . .
3.1.1 White point reference
3.1.2 Gray world . . . . . .
3.2 Color filtering . . . . . . . . .
3.2.1 Color filter model . . .
3.2.2 RGB . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
12
12
14
14
15
4 Spectral information
18
4.1 Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Haze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Method
21
5.1 Spectral valley detection . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Skin segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Dehazing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Results and evaluation
6.1 Skin . . . . . . . . .
6.1.1 Data . . . . .
6.1.2 Results . . .
6.1.3 Evaluation .
6.2 Dehazing . . . . . .
6.2.1 Results . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
24
24
25
28
28
6.2.2
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .
28
7 Discussion
31
7.1 Skin segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.2 Dehazing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8 Further work
8.1 Enhancements . . . . . . . . . . . .
8.1.1 Skin segmentation . . . . . .
8.2 Additional work . . . . . . . . . . . .
8.3 Increasing spectral resolution . . . .
8.4 Noise . . . . . . . . . . . . . . . . . .
8.5 Camera calibration and correction .
8.6 Compression of multispectral images
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
33
34
34
34
35
35
A Equipment
A.1 Camera . . . . . . . . . .
A.1.1 Sensor limitations
A.1.2 Characterization .
A.2 Chromatic aberration . .
A.3 Filter wheel . . . . . . . .
A.3.1 Characterization .
A.4 Calibration . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
38
38
38
38
39
39
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
B Filter data
42
C Data exploration
44
3
Chapter 1
Introduction
1.1
Background
Human vision is receptive to electromagnetic waves in the range 390 - 700 nm.
Regular cameras usually have CMOS or CCD sensors that are made of silicon,
a material that absorbs light from about 300 - 1100 nm [5]. This covers and
extends beyond the visible spectrum and the light that is invisible to humans
could be used for image analysis. The construction of conventional image sensors
unfortunately discards anything but visible light.
Color filter arrays on regular sensors are designed to capture colors in the visible
spectrum simultaneously on one sensor. These color filters are usually characterized as in figure 1.1. An infrared (IR) cutoff filter is added to restrict the
captured light to the visible spectrum.
80
70
Transmittance [%]
60
50
40
30
20
10
0
400
500
600
700
Wavelength [nm]
800
900
1000
Figure 1.1: Typical setup of color filters on an image sensor. The dashed line
indicates the IR cutoff filter.
4
B
G
B
G
G
R
G
R
B
G
B
G
G
R
G
R
Figure 1.2: The constellation of a Bayer filter.
In the commonly used Bayer filter (figure 1.2) the green pixels are twice as
frequent as red or blue since the green cones in the eye are strongly correlated
to intensity perception [1]. The Bayer pattern data from the sensor is processed
to obtain an image with the full resolution of the sensor. This is done by
interpolating neighboring color values in a process called demosaicing.
1.1.1
Near-infrared (NIR)
The upper part of the extended spectrum that silicon sensors are able to capture
(750 - 1100 nm) is called the NIR spectrum. Information in this band often
shows interesting material properties not seen in the visible spectrum.
For example: A well made fake plant might be hard to tell apart from a real
one, but the fact that chlorophyll has a high reflection in the NIR band might
not have been taken into consideration. This would mean that when looking at
the NIR band the real plant would be much brighter than the fake plant.
Today the NIR band is used for surveillance in low light scenes. By removing
the IR cutoff filter, the whole spectrum of the color filters pass through and
more light will be captured. An additional IR illumination, invisible for the
human eye, can be used to illuminate the scenes. These images are often shown
in grayscale because the color filtering is not intended for incoming NIR light.
Capturing data by just removing the IR cutoff filter will be inaccurate for multispectral imaging because the RGB filters on the sensor will still filter the light.
1.2
Objective
This project is intended to investigate surveillance applications with a multispectral approach. The unique capabilities of the custom camera are compared
to those of an ordinary RGB camera.
5
The filter wheel is examined by evaluating the use of 15 bandpass (BP) filters
instead of the standard RGB-filters. The sensor is investigated by measuring
how much light it captures, including non-visible light.
Image analysis is often restricted to the RGB color space even though the extended spectrum contains information that could possibly simplify or even improve the analysis. So after the characterization of the custom camera some
multispectral image processing and analysis methods are examined, with focus
on surveillance applications.
1.3
Outline
This report begins with an explanation of a regular image sensor and the nearinfrared light spectrum in section 1.1. Then we describe a custom camera in
chapter 2 and how it was used to acquire images in chapter 3. Additional
information on the camera setup and the exploration of the acquired images is
described in appendices A and C.
Chapter 4 shows some of the interesting differences of objects in visible light
versus near-infrared light. Chapter 5 follows with an explanation of how these
differences can be used for both skin segmentation and dehazing.
Results and discussion are presented in chapters 6 and 7. In chapter 8 examples
of further work are listed.
1.4
Previous work
A lot of work has been done in the field of multispectral imaging and below is
a list of a few reports that were especially relevant to the project. Some deal
with cameras that are similar to the camera used in this report, others cover
interesting theories and methods with focus on multispectral imaging.
Multispectral imaging and image processing
by Klein [9]
A multispectral camera similar to the one used in this project was developed at
RWTH Aachen University. The report is an overview of multispectral imaging
in general, and their camera in particular. Several typical pitfalls regarding
multispectral cameras, for example optical aberrations from the filters, are described, analyzed and discussed.
6
Geometric calibration of lens and filter distortions for multispectral filter wheel cameras
by Brauers, J. and Aach, T. [2]
Distortions for the lens and the filterwheel are modeled. Physical models of
the BP filters shows that aberrations caused by the filters can be modeled as
displaced image planes. The lens distortion is modeled using an extended pinhole camera model. A method is presented to calibrate and compensate for the
distortions.
Colorimetric and multispectral image acquisition
by Nyström [12]
The report deals with high quality image acquisition in colorimetric and multispectral formats. A multi-channel camera consisting of 7 BP filters and a
camera with monochrome CCD sensor is described. The calibration and characterization of the camera is described thoroughly.
Human skin detection by visible and near-infrared imaging
by Kanzawa et al. [8]
Skin detection using multispectral imaging is a promising technique and is described for use in driver assistance systems in this report. Human skin is characterized and differentiated from the background in typical traffic environments.
Affine illumination compensation for multispectral images
by Carmona et al. [3]
This report describes a method to compensate for uneven lighting when capturing a set of multispectral images. The camera used is, apart from a higher
resolution and a smaller light spectrum, quite similar to the multi-channel camera provided by Axis.
A collection of hyperspectral images for imaging systems
research
by Skauli and Farrell [15]
A large database of hyperspectral images has been made available by Skauli and
Farrell [15] on SCIEN (The Stanford Center for Image Systems Engineering).
The images are captured with two HySpex imaging spectrometers, where each
image actually consists of two sub-images:
7
1. RGB+NIR (415 nm - 950 nm)
2. Short-wavelength infrared (SWIR) (1000 nm - 2500 nm)
This database is heavily used in the report and will hereafter be referred to as
the SCIEN database.
8
Chapter 2
Camera setup
The camera used in the system has a special CMOS-sensor1 without any color
filters. A bandpass (BP) filter in a custom filter wheel restricts the light in 50 nm
increments. The full resolution of the sensor can be used directly without any
processing. The filter wheel is shown in fig. 2.1 and fig. 2.2 shows the conceptual
setup of the camera. A multi-channel2 image is obtained by combining the
monochromatic images from each band.
For camera specific measurements and calibration see appendix A.
2.1
Capturing image data
The filter wheels were motorized and controlled by an USB-interface. This made
it possible to capture an image set automatically. An image set was captured
with desirable BP filters in sequence, normally using automatic exposure control.
A complete image set consists of 15 raw images from 350 nm to 1100 nm,
captured with consecutive 50 nm wide BP filters along with exposure data for
each image.
A complete image set is captured in about 1-2 minutes which means that static
scenes are optimal. The long acquisition time is due to the mechanical rotation
of both a longpass (LP) and a shortpass (SP) filter wheel needed to obtain a
BP-filtered image as well as a few seconds for the automatic exposure control
to stabilize.
Every LP- and SP-filter is made up of several filter layers in order to obtain the
desired filter function. Each filter layer refracts the light slightly, so the LP- and
SP-filters consisting of many filters will affect the focal point more drastically
1 1/3”
1080 CMOS-sensor
it multispectral would be a bit of a stretch
2 Calling
9
Figure 2.1: The custom-made filter wheel.
Sensor
Lens
Filter
Figure 2.2: A simple model of the camera.
10
than those consisting of fewer filters. During a capture sequence the focus was
adjusted for any BP-filter that was out of focus due to the different optical
properties.
Images captured from buildings, for example the images evaluated in 5.3, were
captured through open windows. The reason being that some windows are
energy-efficient and designed to block IR-light as thermal insulation. A quick
and rough spectrometer analysis of the incoming light with and without windows
showed that the windows at Axis were in fact blocking IR-light.
11
Chapter 3
Image acquisition
3.1
3.1.1
Reflectance
White point reference
Information about the illumination was acquired by placing a white reference
point in the scene. The radiation spectrum of the white reference point characterizes illumination in the scene.
The illumination was used to calculate a spectrum of the reflectance (Rreflectance )
for each band using
Rradiance
(3.1)
Rreflectance =
Iillumination
where the white point radiation value is denoted as Iillumination and the radiation
value from the image is denoted as Rradiance .
This method is however only correct when the whole image is illuminated with
the same light source. Other parts of the image will have an other IIllumination
caused by, for example, shadows, reflections and other light sources. The information from the white reference point provides a rough estimate of the main
illumination.
3.1.2
Gray world
Gray world assumption was used to obtain reflectance in images without a white
reference point. Mostly since it is impractical to place a white reference point
in all types of scenes.
The gray world assumption states that the average color of an image is a neutral gray, and can be used to estimate the illuminant, given that the image is
12
(a) Original
(b) Gray world
(c) White point reference
Figure 3.1: Light source compensation
sufficiently heterogeneous [4]. In this case the illumination was estimated by
calculating the average radiation value for each channel in an image set. The
reflectance could then be obtained with (3.1).
This provided a simple and automatic way of compensating for the illumination
in each image in the spectrum with acceptable results (for this project). Applying this method on VIS+NIR images might behave badly in some cases as it is
not the intended input data.
13
3.2
Color filtering
For easier visualization of multi-channel images, their layers were combined into
three channels. Each channel was then displayed in its own RGB-channel.
3.2.1
Color filter model
The BP filters were measured with a spectrometer, calibrated according to appendix A.4 and stored in vectors b~i , one for each filter. Each vector b~i contains
n number of samples from the spectrometer. A color filter F~ , with the same
resolution as the spectrometer, could then be approximated by a linear combination
(3.2)
F~ = c1~b1 + c2~b2 + . . . + ci~bi
where c1 . . . ci are fitting constants and i is the number of BP filters.
Equation (3.2) could also be described in matrices:
F = BC
(3.3)
⊤
C = c1 c2 . . . c i
B = ~b1 ~b2 . . . ~bi
(3.4)
(3.5)
In this case system (3.3) is overdetermined since n > i and matrix B can not
be inverted. The matrix C was approximated with
C ≈ B+F
(3.6)
where B + is the Moore–Penrose pseudoinverse, a least squares method. The
estimated filter F̃ will be
F̃ = BC
(3.7)
With (3.3) an arbitrary color filter could be approximated according to (3.6).
Approximations can be applied by many different methods. The reason least
squares method was used is that it minimizes the error overall, i.e. the sum of
the squared error is minimized. The method is widely used and implemented.
The error of the approximated filter can be estimated by determining the distance between F and F̃ in the L2 norm.
v
uX
u n
(3.8)
Error = kF − F̃ k = t (Fj − F˜j )2
j=1
14
ColorChecker
Color
x
y
Red
0.539 0.313
Green 0.305 0.478
Blue
0.187 0.129
x
0.357
0.435
0.301
White point
y
L2 dist.
0.316
0.187
0.355
0.137
0.232
0.206
x
0.356
0.433
0.301
Gray world
y
L2 dist.
0.326
0.188
0.363
0.136
0.240
0.207
Table 3.1: Error according to colors in the ColorChecker from fig. 3.1.
3.2.2
RGB
Conventional RGB filtering was chosen as proof of the concept in section 3.2.1.
Example renders are shown in fig. 3.2.
3.2.2.1
Approach
By using the model described in section 3.2.1 the captured multi-channel data
could be transformed to CIE’s tristimulus values XYZ. By constructing filters
according to the CIE standard observers (fig. 3.3 and fig. 3.4), a XYZ image was
created. The XYZ image was then transformed to a desirable RGB-colorspace.
In this project sRGB was used. If no white point reference was present either
gray world or a specular reflection of a light source was used.
A numerical error estimate of the color filter model was made according to (3.8).
The calculated error for each color matching function:
kx(λ) − F̃ k ≈ 5.09
ky(λ) − F̃ k ≈ 7.56
kz(λ) − F̃ k ≈ 10.87
The color matching function for the Z-channel is the narrowest, this can explain
its significantly larger error.
The ColorChecker colors in fig. 3.1 are specified in the CIE xyY colorspace. An
error was estimated as the distance between the specified color and the obtained
color, in the xy-plane of the CIE xyY colorspace. The distance between the two
colors was measured in the L2 norm. The color error can be seen in table 3.1.
15
(a) Test scene with a hand
(b) Landscape photo, captured through an open
window.
Figure 3.2: Multi-channel images rendered to RGB space
2.0
x(λ)
y(λ)
1.5
z(λ)
1.0
0.5
0.0
400
450
500
550
600
Wavelength [nm]
650
700
750
Figure 3.3: CIE 1931 standard observers colormatching functions.
16
1.4
x(λ)
1.2
1.0
0.8
0.6
0.4
0.2
0.0
400
500
600
700
800
Wavelength [nm]
900
1000
Figure 3.4: BP filters in linear combination of one CIE colormatching function.
17
Chapter 4
Spectral information
When using an enlarged spectrum, the additional spectral information can be
used to further distinguish objects from each other. Fig. 4.1 by Laquerre et al.
[10] shows an RGB image and a grayscale image of the NIR range. Note the
improved contrast in the distance.
The camera setup obtains images with a higher spectral resolution than a normal
camera, six channels in VIS and eight channels in NIR. This made it possible to
investigate applications from a multispectral view. Different objects and combinations of materials have relatively different spectral signatures i.e. materials
reflect/absorb the incident light differently. Humans observe the spectral signatures in VIS light as colors, but the signatures are discernible outside of VIS as
well.
Human skin contains water that is known to have an absorption peak at 975 nm [11].
Longer wavelength light is less scattered by small particles than short wavelength
light [14]. Skin segmentation and dehazing were therefore chosen as proofs of
concept.
4.1
Skin
Human skin has a characteristic spectral reflectance, shown by Nunez [11] in
fig. 4.2. The plot shows both light and dark skin with melanosome levels of
2.4% and 24% respectively.
There are two distinct valleys for both skin types at 975 and 1200 nm. The
valley at 975 nm is a result of the water content and its absorption [11]. The
characteristics from 450 nm to 600 nm is a result of hemoglobin [11] and is
more visible for low levels of melanosome. As described in section A.1.1, the
light absorption in a silicon sensor ranges from 300 to 1100 nm. This means
18
(b) VIS (RGB)
(d) NIR (Grayscale)
Figure 4.1: An RGB image of the visible (VIS) range compared to a grayscale
intensity map of the NIR range 700 - 1100 nm [10].
0.8
Type I/II Skin
Type V/VI Skin
0.7
Reflectance [%]
0.6
0.5
0.4
0.3
0.2
0.1
0
450
600
800
1000
1200
Wavelength [nm]
1400
1600
1800
Figure 4.2: Human skin reflectance for Type I/II Skin and Type V/VI Skin
(melanosome levels of 2.4% and 24% respectively). Data from Nunez [11].
19
that the valley at 1200 nm is not usable for silicon sensors and is therefore not
evaluated in this report.
4.2
Haze
Haze is caused by light scattered in particles. The scattered light is dependent
of the photon wavelength λ and the particle size. For haze caused by particles
λ
the scattering light follows Rayleigh’s law (4.1).
smaller than 10
Is ∝
I0
λ4
(4.1)
According to (4.1) the intensity of the scattered light Is is proportional to the
ratio between the incident light intensity I0 and the forth power of λ.
Images capturing higher wavelengths such as the NIR band should therefore
contain less haze caused by small particles than images in the visible light (VIS)
band. For larger particles the light scatters according to Mie’s law which is
independent of λ. Haze caused by larger particles will then appear in both the
VIS and the NIR band. [14]
20
Chapter 5
Method
5.1
Spectral valley detection
Three BP filters were used to detect a local minima in the spectral intensity.
BPmin was placed at the expected minima with BPx1 and BPx2 on each side.
A ratio between the spectral bands was used as an indicator of a spectral valley.
High values of d in (5.1) indicates a local minima.
d=
BPx1
BPmin
BPx2
∗
BPmin
(5.1)
To ensure only valleys were included, d was set to 0 if BPmin was higher than
BPx1 or BPx2 .
5.2
Skin segmentation
Using the skin characteristics described in section 4.1, a method for skin segmentation was developed. The method described in section 5.1 was used to
BPx1
BPx2
BPmin
Figure 5.1: An example of BP filters for valley detection
21
510 − 550nm
630 − 670nm
850 − 900nm
965 − 985nm
1010 − 1050nm
BP1
BP2
BP3
BP4
BP5
BP 1/BP2 > 1
BP3 /BP4 > 1
BP5 /BP4 > 1
∗
Output
Figure 5.2: The spectral valley detection used for skin segmentation.
detect water and hemoglobin absorption.
The reflectance valley of water at 975 nm was detected with (5.2).


BPmin
water : BPx1


BPx2
= BP965−985
= BP850−900
= BP1010−1050
(5.2)
No local minima caused by hemoglobin exists in skin with a high melanosome
content. Therefore only a rising trend after 600 nm was detected with (5.3).
(
BPmin = BP510−550
(5.3)
hemoglobin :
BPx1
= BP630−670
The results for both water and hemoglobin were combined by multiplying the
two resulting images. An overview of the method can be seen in fig. 5.2
5.3
Dehazing
Haze usually appears at a far distance due to aerosol particles. A simple dehazing method was used to prove the haze properties described in section 4.2. Since
the NIR channel is less affected by haze it was used as the luminance channel in
a HSV color space. The color information from VIS channel was then applied
to the hue and saturation channels (5.4).


H = V ISH
Im : S = V ISS


V = N IR
22
(5.4)
This method was chosen as a proof for the theory described in section 4.2. There
are more sophisticated dehazing methods using NIR information as explained
by Schaul et al. [14] and Feng et al. [6]. The dehazed images from these methods
contain more realistic color information.
23
Chapter 6
Results and evaluation
6.1
Skin
6.1.1
Data
The filter wheel was limited to a spectral resolution of 50 nm which is relatively low compared to the SCIEN database that has a spectral resolution of
∼4 nm [15].
As the skin segmentation method needs a high resolution to use the water and
hemoglobin absorption peaks it was initially tested on images from the SCIEN
database. But while the spectral resolution was satisfactory, the spectral range
of 415 - 945 nm was not enough to enclose the absorption peak of water at
975 nm.
In order to test the method, the water reflectance dip at 975 nm could only be
predicted through a downwards trend towards the end of the spectrum. This
was calculated using only two BP filters in the method described in section 5.2,
with properties as in (6.1). The wider BP filters in (6.1) were constructed in
a simpler and less accurate way by calculating the mean of calibrated images
(instead of calibrating and constructing them from narrower BP filters).
water :
6.1.2
(
BPmin
BPx1
= BP930−945
= BP850−880
(6.1)
Results
The skin segmentation algorithm was tested on both the SCIEN database and
images from the Axis camera. The results are shown in fig. 6.1 and 6.2. The
24
Figure 6.1: RGB render (top row) and skin segmentation (bottom row) of some
images from the SCIEN database.
color based skin detection, from [13], in fig. 6.2b was added for comparison.
The test images in fig. 6.1 are not very heterogeneous since they basically consist
of a face and a background. The image in fig. 6.2 is less homogeneous but the
algorithm had some false positives mostly due to its low spectral resolution.
As a sort of verification of the method it was tested on images from the SCIEN
database that did not contain any humans. Fig. 6.3 shows the method applied
to images of a fruit basket and a landscape.
6.1.3
Evaluation
The method worked very well on images with a high spectral resolution. The
SCIEN database would probably have worked even better if the spectrum had
included the 975 nm absorption peak for water. Images with a low spectral
resolution give a higher amount of false positives as the characteristic reflectance
spectrum is not as discernible.
The fact that result images of landscapes are almost completely black is positive
since vegetation contains water and therefore has a similar water absorption as
skin.
Red fruits are falsely detected as skin because of their water absorption and red
color which leads to a similar spectral reflectance as skin. This false detection
would probably be removed by an additional valley detection or some other
classification since skin and red fruit differ even in the VIS range.
25
(a) RGB
(b) Color based skin detection
(c) Results
Figure 6.2: Skin segmentation of multi-channel images.
26
(a) Originals
(b) Results
Figure 6.3: Skin method applied on images from the SCIEN database with no
human skin.
27
6.2
6.2.1
Dehazing
Results
The dehazing method was applied to two different scenes that were clearly hazy
in the visible light spectrum:
• Fig. 6.4: A distant view towards the horizon.
• Fig. 6.5: A view of the ground outside a nearby building a foggy day.
While the view towards the horizon was dehazed with satisfactory results, the
dehazing of the near field view with fog did not work very well.
6.2.2
Evaluation
Fig. 6.5c does have a higher contrast than fig. 6.5a but the fog is still clearly
visible and the view behind the truck is still about as hazy as before the dehazing
was applied.
Visible in both scenes is that vegetation tends to become very bright after
dehazing (the trees in 6.4 and the grass in 6.5). This is due to chlorophyll’s high
reflection in the NIR band.
28
(a) RGB
(b) NIR
(c) HSNIR
Figure 6.4: Input images 6.4a and 6.4b for dehazing method and dehazed image
6.4c. To the right of each image is an enlarged view of the crane.
29
(a) RGB
(b) NIR
(c) HSNIR
Figure 6.5: Input images 6.5a and 6.5b for dehazing method and dehazed image
6.5c. To the right of each image is an enlarged view around the truck.
30
Chapter 7
Discussion
The use of multispectral imaging for surveillance applications is very promising.
Both skin segmentation and dehazing showed satisfactory results and will be
discussed further in sections 7.1 and 7.2.
The extended light spectrum is interesting for image analysis applications and
if objects’ spectral reflection is known and the camera’s spectral resolution is
high enough they are easily identified.
The custom camera’s long acquisition time only allows for static scenes. If a
specialised set of filters were to be put directly on top of the sensor the long
acquisition time could be remedied and the usability would increase a lot. The
downside being that the gain in speed would be met with a loss in either spectral
or spatial resolution.
7.1
Skin segmentation
Skin segmentation is simple and quite rigid given a wide light spectrum and
enough samples. It would be preferable in surveillance applications if the image
acquisition was faster and the spectral resolution was higher (at least around
the interesting peaks defined in 5.2).
One solution might be to use two sensors and a semi-transparent mirror: one
with the conventional Bayer filter and the other with a filter that can detect
water and/or hemoglobin absorption peaks.
Another solution is to scan one line at a time and distribute the spectrum over
the sensor using a prism. This method is used in the HySpex camera used
by Skauli and Farrell [15].
31
7.2
Dehazing
The dehazing method works best for haze where visibility slowly decreases over
the distance. Surveillance cameras are often placed to capture objects nearby,
so dehazing would possibly help reduce the effects of the occasional and nottoo-thick fog.
The problem is that fog is made up of water molecules that are quite big, meaning that they obscure light even in the NIR range, as discussed in section 4.2.
The atmospheric haze seen towards the horizon is made up of many different
types of particles, some of which are small enough not to obscure light in the
NIR range. This is possibly the reason why the clouds in 6.4c are mostly intact
while the haze is decreased.
Dehazing does not require a high spectral resolution and a feasible method would
be to replace one of the two green channels in a Bayer filter with a NIR channel.
The potential gain in contrast from using NIR as an additional luminosity source
might alone make it worthwhile.
32
Chapter 8
Further work
8.1
Enhancements
8.1.1
Skin segmentation
8.1.1.1
Resolution
A higher spectral resolution would lead to more precise hemoglobin and water
detection. The union of detected hemoglobin and water would probably result
in fewer false positives than the current method.
8.1.1.2
Method
The skin segmentation could be improved by weighing the contribution from the
hemoglobin and water absorption valley detection described in section 5.2. The
hemoglobin absorption coefficient is larger than the water absorption coefficients
and d from the valley detection (5.1) will therefore be larger than the hemoglobin
valley. By weighing the contribution correctly the threshold that sets non-valley
pixels to zero could be skipped.
This could be done using a linear combination of d from spectral valley detection(5.1).
Output = x1 dhemoglobin + x2 dwater
(8.1)
Or using an exponential combination of d.
1
2
Output = dxhemoglobin
∗ dxwater
33
(8.2)
f (λ)
f (λ)
f (λ)
λ
(a)
λ
(b)
λ
(c)
Figure 8.1: The construction of a narrow BP filter (c) using non-ideal LP (a)
and SP (b) filters with the same cutoff frequency.
8.2
Additional work
Further scenarios where multispectral imaging for surveillance cameras might
be useful:
• Face detection
• Noise reduction
• Detect and remove shadows
• Night scenes
8.3
Increasing spectral resolution
A big problem with the multi-channel images was their low spectral resolution.
This lack of detail led to many false positives.
The filter’s cutoffs are not instant but slightly gradual. A combination of an SP
and an LP filter with the same cutoff frequency will therefore result in a narrow
BP filter as shown in 8.1. The resulting BP filter will be quite dark, so a longer
exposure time is needed.
Using both the usual1 and the narrow BP filters, a higher spectral resolution
could be achieved and used to better detect thin peaks and valleys.
8.4
Noise
Noise is a factor that decreases the accuracy in the images. A noise reduction
could be applied by using the average of images captured with the same BP
1 50
nm wide
34
filter. The filter switching and automatic exposure control account for a lot
more time than the actual image capture so to capture a few additional images
would not have much impact on the total acquisition time.
8.5
Camera calibration and correction
The focus shift that occurs for different filters was taken care of by a slight refocusing for the shorter wavelengths, where the problem was apparent. Chromatic
aberration was not considered a problem in this project since it was small in
relation to the objects being examined.
Much time could be spent on calibrating the camera for the filters and correcting
for the chromatic aberration.
8.6
Compression of multispectral images
Even though an image can differ a lot over the spectrum, there can also be many
similarities. These similarities could be used to compress a multispectral image
efficiently.
The SCIEN database was compressed using singular value decomposition. The
file size was reduced by a factor 30 for an image set of faces and by a factor 10
for an image of a color checker.[15]
35
Bibliography
[1] B.E. Bayer. Color imaging array, 1976. US Patent 3,971,065.
[2] Brauers, J. and Aach, T. Geometric calibration of lens and filter distortions
for multispectral filter-wheel cameras. Image Processing, IEEE Transactions on, pages 496–505, 2011.
[3] Pedro Latorre Carmona, Reiner Lenz, Filiberto Pla, and Jose M. Sotoca.
Affine illumination compensation for multispectral images, 2007.
[4] Jonathan Cepeda-Negrete and Raul E. Sanchez-Yanez. Gray-world assumption on perceptual color spaces. In Image and Video Technology, pages
493–504. Springer Berlin Heidelberg, 2014. ISBN 978-3-642-53841-4.
[5] Arnaud Darmont. Spectral response of silicon image sensors. http://www.
aphesa.com/downloads/download2.php?id=1.
[6] Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Sabine
Süsstrunk. Near-Infrared Guided Color Image Dehazing. In Proc. IEEE
20th International Conference on Image Processing (ICIP), 2013.
[7] Peter D. Hiscocks.
Integrating sphere for luminance calibration.
http://www.ee.ryerson.ca/~phiscock/astronomy/light-pollution/
integrating-sphere.pdf.
[8] Y. Kanzawa, Y. Kimura, and T. Naito. Human skin detection by visible
and near-infrared imaging. In Proceedings of the 12th IAPR Conference on
Machine Vision Applications, MVA 2011, pages 503–507, 2011.
[9] Julie Klein. Multispectral imaging and image processing. In IS&T/SPIE
Electronic Imaging: Image Processing: Algorithms and Systems XII, 2014.
[10] Pierre-Francois Laquerre, Nicolas Etienne, Noemie Vetterli, Caroline Duplain, Albrecht Linder, and Sabine Süsstrunk. RGB-NIR scene dataset.
http://ivrg.epfl.ch/supplementary_material/cvpr11/.
[11] Abel S. Nunez. A physical model of human skin and its application for
search and rescue. Technical report, DTIC Document.
[12] Daniel Nyström. Colorimetric and multispectral image acquisition, 2006.
36
[13] Jiang Qiang-rong and Li Hua-lan. Robust human face detection in complicated color images. In Information Management and Engineering (ICIME),
2010 The 2nd IEEE International Conference on, pages 218–221, 2010.
[14] Lex Schaul, Clément Fredembach, and Sabine Süsstrunk. Color image
dehazing using the near-infrared. In Proceedings of the 16th IEEE International Conference on Image Processing, pages 1609–1612, 2009.
[15] Torbjørn Skauli and Joyce Farrell. A collection of hyperspectral images for
imaging systems research. http://scien.stanford.edu/jfsite/Papers/
ImageCapture/2013_HyperspectralImagingDatabase.pdf, 2013.
37
Appendix A
Equipment
A.1
A.1.1
Camera
Sensor limitations
The hardware restrictions were defined by the wavelengths that are absorbed
by silicon: 300 - 1100 nm. Both ends of this span were prone to noise due to
low absorption, so the most accurate measurements were acquired within the
400 - 1000 nm span.
A.1.2
Characterization
The sensor’s sensitivity to different wavelengths was measured with an integrating sphere [7]. The light source produced the highest intensities in the visible
spectrum and gradually decreased for both higher and lower wavelengths.
A.2
Chromatic aberration
A lens typically has different refractive indices for different wavelengths which
leads to a focus shift over the captured spectrum. This effect is especially
noticeable when the spectrum is extended beyond VIS, and can be reduced as
shown by Klein [9].
38
f (λ)
f (λ)
f (λ)
λ
(a)
λ
(b)
λ
(c)
Figure A.1: LP filter (a) together with an SP filter (b) results in a BP filter (c).
A.3
Filter wheel
To create BP filters for certain wavelengths an LP filter and an SP filter were
placed in front of the camera lens. A filter wheel was used to quickly change
BP filters. The wheel consisted of two rotating disks, one with SP filters and
one with LP filters. Each combination of an LP filter with an SP filter with a
higher cutoff wavelength corresponded to a BP filter (fig. A.1).
A.3.1
Characterization
The transmittance for each LP and SP filter was measured with a spectrometer1 . Fig. A.2 shows the narrowest BP filters, i.e. an LP filter with cutoff
λ nm combined SP filter with cutoff (λ + 50) nm. For measurement details see
appendix B.
A.4
Calibration
A model of the camera setup was created, see fig. A.3. The measured transmittance T for all the BP filters were multiplied by the camera response C
according to
Bi = CTi
(A.1)
were i is the current BP filter, resulting in B which represents the filter characteristics of the sensor. The result is shown in fig. A.4.
To obtain radiance from a captured multi-channel image each layer was modified
according to
Icalibrated = kI
(A.2)
1 OceanOptics
USB4000 VIS-NIR
39
80
70
Transmittance [%]
60
50
40
30
20
10
0
400
500
600
700
Wavelength [nm]
800
900
1000
Figure A.2: The measured transmittance for BP filters with theoretical 50 nm
width.
Radiance
F ilter
Lens
Sensor
System 1
Electronics
Raw Image
System 2
Camera setup
Figure A.3: A block diagram of the multi-channel camera. System 1 represents
the filter wheel and system 2 represents the camera.
k = ci
1
gt
(A.3)
where I is the obtained image with exposure time t, gain g and a constant c.
The constant c was used to normalize each BP filter and is defined as
ci =
1
area(Bi )
(A.4)
where the area was calculated using the trapezoidal rule. This calibration results
in a relative spectrum of the radiance in fig. A.3.
40
Sensor response
400
500
600
700
Wavelength [nm]
800
900
1000
Figure A.4: The sensor response and the relative BP filters.
41
Appendix B
Filter data
Each LP and SP filter’s transmittance was measured using a spectrometer1 .
The transmittance for a wavelength λ was calculated by
Tλ =
Sλ − D λ
Rλ − D λ
(B.1)
where Rλ is the reference intensity (i.e. only the input light), Dλ is the intensity
without light or filter and Sλ is the sample of a measurement.
All of the used filters can be seen in table B.1. The spectrometer registered
intensity in the range 345 - 1039 nm and in a resolution of 1 sample per nm. Each
measurement was obtained by the mean of 100 samples from the spectrometer
and the Dλ was measured twice, once before each LP and SP sequence.
1 OceanOptics
USB4000 VIS-NIR
42
Cutoff[nm]
Table B.1: Filters in the filter wheel with specified cutoff wavelengths, starting
at 400 nm with 50 nm increments up to 1100 nm.
Filter
SP
LP
400
400
450
450
500
500
550
550
600
600
650
650
700
700
750
750
800
800
850
850
900
900
950
950
1000 1000
1050 1050
1100 1100
43
Appendix C
Data exploration
With lots of produced multi-channel images and even more hyperspectral images
from SCIEN [15] there was a need for a time-efficient way to view such images.
One or more layers of an image are easily combined and viewed using MATLAB
or Octave but it is quite time consuming to keep combining layers in different
ways.
The number of multispectral image viewers is sparse so a decision was made
to create one for our purposes. The program was written in Python using
SciPy+NumPy for mathematics and PyQtGraph for GUI elements.
The interface was divided into four parts: In the bottom right there is one
histogram for each RGB-channel, which determine the amount of each layer
should be present in that channel. Next to that are a few buttons connected to
built-in functions such as:
Interlock Interlock the movement of the RGB-sliders.
Show gradient Show a grayscale image of each pixel’s multispectral gradient (derivative)
compared to the clicked one.
Gray world Calculate and apply the gray world white balance method.
sRGB Produce an sRGB image (needs a white point or gray world applied).
In the bottom right there are two graphs. The top one is showing the clicked
pixel’s intensity values over the spectrum, the other shows the derivative of the
graph above.
On the top is the final image (and a histogram to the right automatically provided by PyQtGraph)
A screenshot of the program in action is shown in fig. C.1.
44
Figure C.1: The GUI used to explore hyperspectral and multi-channel images.
45
Fly UP