...

Nonlinear programming Spring 2003 Lecture 2 1

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Nonlinear programming Spring 2003 Lecture 2 1
Nonlinear programming
Spring 2003
Lecture 2
Instructor: Farid Alizadeh
Scribe: Min Shi
01/29/03
1
Overview
In this lecture we continue reviewing basic mathematics required for mathematical programming. We first review some basic linear algebra and matrix theory
and then we review basic convex analysis.
2
Background on Matrix Theory (continued)
We saw at the end of last lecture that if A ∈ Sn (that is a symmetric matrix) then A can be rewritten as A = QΛQT , where Λ is diagonal matrix
containing the eigenvalues of A on it diagonal, and Q is an orthogonal matrix
Q = [q1 , q2 , ..., qn ] where qi is a normalized (i.e. unit length) eigenvector corresponding to eigenvalue λi . We now define a fundamental concept in symmetric
matrices.
Definition 1 (Positive Semidefinite) A ∈ Sn is positive semidefinite if for
every x ∈ Rn , xT Ax ≥ 0. We write A < 0. A is positive definite if for every
x ∈ Rn and x 6= 0, then xT Ax > 0; we write A 0.
There are at least two other very useful characterizations of positive semidefinite
matrices that we will use repeatedly. They are presented in the following two
lemmas.
Lemma 1 A is positive (semi)definite iff all of its eigenvalues are nonnegative
(positive).
Proof: if A < 0 and (λi , qi ) is an eigenvalue-eigenvector pair for A, then
X
0 ≤ qTi Aqi = qTi (
λi qj qTj )qi = qTi (λi qi qTi )qi = λi
i,j
1
scribe:Min Shi
Lecture 2
Date: 01/29/03
When A 0, the inequality above should be replaced with strict inequality.
n
Conversely,
P since any vector x ∈ R is a linear combination of qi we can
write x = i αi qi for some real numbers αi . If all λi ≥ 0, then
X
X
X
X
X
αi λi αi qTi qi qTi qi =
α2i λi ≥ 0
αi qTi ][
λj qj qTj ][
αk qk ] =
xT Ax = [
i
j
k
i
i
For positive definite situation the last inequality is strict because all λi > 0 and
if x 6= 0, then at least one of αi 6= 0.
Lemma 2 A < 0 iff there exists a matrix B ∈ Rm×n , such that A = BT B.
Proof: First assume that A = BT B for some matrix B. Let y = Bx, then
X
xT Ax = xT BT Bx = yT y =
y2i ≥ 0,
therefore A < 0. Conversely, if A is positive semidefinite then by the previous
lemma its eigenvalues are all nonnegative. Let its spectral decomposition be
given by


√
 √

λ1

A = Q
λ1
..
.
λn
λ1
..
 T

Q = Q
.
√
λn


..
.
√
λn
 T
Q
Setting
√

B=

λ1
..
.
√
λn
 T
Q .
T
we see that A = B B.
Since all eigenvalues of a symmetric matrix are real, one can order them in say
non-increasing order: λ1 ≥ λ2 ≥ · · · ≥ λn . The largest and smallest eigenvalues
of these matrices will play an important role in our future development. It turns
out that both the largest and the smallest eigenvalues of a symmetric matrix
can be characterized as maximum (respectively minimum) of some constrained
function.
Lemma 3 If A is a symmetric matrix with eigenvalues λ1 ≥ λ2 ≥ ... ≥ λn ,
then:
1. λ1 = maxkxk=1 xT Ax,
2. λn = minkxk=1 xT Ax.
Proof:
can write
P Since the eigenvectors qi form an orthonormal basis,
P we
2
x = i αP
since
kxk
=
1
it
follows
that
α
=
1. Now,
i qi . Furthermore,
i
i
P
P
P 2
xT Ax = ( αi qTi )( λj qj qTj )( αk qk ) =
αi λi , because qTi q)j = 0 unless
2
scribe:Min Shi
Lecture 2
Date: 01/29/03
i = j. Writing yi = α2i observe that as x takes arbitrary values on the set of unit
length vectors, yi take arbitrary values so long as they are nonnegative
and add
P
up to 1. Therefore, the set of y must reside in the set {y | i yi = 1, yi ≥ 0}.
Therefore maximizing the quantity xT Ax reduces to the linear programming
problem
P
max P λi yi
s.t.
(1)
iy=1
yi ≥ 0.
This linear program is especially easy and can be solved by the “greedy” method:
that is it is maximized by grabbing the largest weight yi and ignoring all other:
y1 = 1, yi = 0, i = 2, . . . , m. The optimal value of this LP is λ1 . Similarly
minimizing xT Ax can be expressed as the same linear program as above, except
that max should be replaced with min. In that case the optimal solution is
obtained by grabbing the smallest weight yi and ignoring any other, that is:
yi = 0 for i = 1, . . . , n − 1 and yn = 1, which results in optimal value of λn .
3
Convex sets, Convex and Concave function
We now review basic properties of convex functions and convex sets. They are
fundamental in that optimization of convex functions over convex sets has a
“clean” theory. When functions to be minimize are not convex or the sets over
which they are minimized are not convex, then, in general, no satisfactory algorithm or analysis exists in general. Understanding the fundamental properties of
convex functions and sets will give us an extremely useful toolbox to develop and
analyze optimization algorithms. This toolbox then may be extended to some
degree to non-convex problems, at least when local properties are concerned.
Some definitions
• Convex Set: C ∈ Rn isconvex if for each x1 , x2 ∈ C, λx1 + (1 − λ)x2 ∈ C
for all 0 ≤ λ ≤ 1.
• Convex Function: A function f : C → R, where C ⊆ Rn is a convex set,
is called a convex function if for every 0 ≤ λ ≤ 1 and every x1 , x2 ∈ C, we
have f[λx1 + (1 − λ)x2 ] ≤ λf(x1 ) + (1 − λ)f(x2 ).
• Concave Function: f is called concave if -f is convex.
Geometrically, a convex set, contains the entire line segment between any two
points it contains. In case of convex functions if P1 and P2 are two points on
the graph of such a function, then the the graph of the function falls below the
chord connecting P1 and P2 . In case of concave functions the opposite is true.
(See figure 3.)
3
scribe:Min Shi
Lecture 2
Date: 01/29/03
f(y)
f(x2)
f
f(x1 )
f(x2)
f
f(x1 )
f(y)
x
y
1
x
x2
1
y
x2
a concave function
a convex function
Figure 1: y = λx1 + (1 − λ)x2 and f = λf(x1 ) + (1 − λ)f(x2 ), for some 0 ≤ λ ≤ 1.
In convex functions the graph of functions falls below the chord connecting any
two points on the graph. In concave function the opposite is true.
Lemma 4 If f1 : C → R and f2 : C → R for some convex set C ⊆ Rn are convex
function, then the function g : C → Rn , defined as g(x) = α1 f1 (x) + α2 f2 (x) is
also convex for any α1 , α2 ≥ 0.
Proof:
g[λx1 + (1 − λ)x2
=α1 f1 λx1 + (1 − λ)x2 + α2 f2 λx1 + (1 − λ)x2
≤α1 λf1 (x1 ) + (1 − λ)f1 (x2 ) + α2 λf2 (x1 ) + (1 − λ)f2 (x2 )
=λ α1 f1 (x1 ) + α2 f2 (x1 ) + (1 − λ)[α1 f1 (x2 ) + α2 f2 (x2 )
=λg(x1 ) + (1 − λ)g(x2 ).
Therefore, g is a convex function.
Epigraph of a function
Definition 2 (Epigraph of function)
f : C → R, C ⊆ Rn ,
For any function
n
then the epigraph of f is the set Gf = (x, y) | x ∈ R , y ∈ R, y ≥ f(x) .
4
scribe:Min Shi
Lecture 2
Date: 01/29/03
f(x)
x
Epigraphs in a sense connect convex function and convex graphs. The following lemma will clarify this.
Lemma 5 If a convex function f : C → R, with C ⊆ Rn , then f is a convex
function iff its epigraph is a convex set.
Proof: If f is a convex function, and (x1 , y1 ) and (x2 , y2 ) ∈ Gf , then f(x1 ) ≤ y1
and f(x2 ) ≤ y2 . Then
f(λx1 + (1 − λ)x2 ) ≤ λf(x1 ) + (1 − λ)f(x2 ) ≤ λy1 + (1 − λ)y2 .
Therefore, λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 ∈ Gf . Conversely, let Gf be
convex and the points x1 , f(x1 ) , (x2 , f(x2 ) ∈ Gf , then: λ[(x1 , f(x1 )] + (1 −
λ)[x2 , f(x
2 )] ∈ Gf (for 0 ≤ λ ≤ 1). Therefore, λx1 + (1 − λ)x2 , λf(x1 ) + (1 −
λ)f(x2 ) ∈ Gf . This implies that f λx1 + (1 − λ)x2 ≤ λf(x1 ) + (1 − λ)f(x2 ).
Thus, f is a convex function.
Level Sets
Definition 3 (Level Sets) If f: C ⊆ Rn → R, then Lc = {x : f(x) ≤ c}.
5
scribe:Min Shi
Lecture 2
Date: 01/29/03
f(x)
c
x
Here the level set Lcis disjoint
Lemma 6 If f : C ⊆ Rn → R is a convex function over convex set C, then for
every real number c, Lc is also convex.
Proof: If f is convex, and x1 and x2 ∈ Lc , then f(x1 ) ≤ c, f(x2 ) ≤ c. Therefore
f(λx1 + (1 − λ)x2 ) ≤ λf(x1 ) + (1 − λ)f(x2 ) = c. Thus λx1 + (1 − λ)x2 ∈ Lc and
therefore, Lc is convex.
The definition of convex functions involves only two points. In fact the definition
can be extended to an arbitrary number of points. The Jensen inequality is the
formal statement of this fact. set of points
P
Theorem 1 Jensen Inequality: Let λ1 , λ2 , . . . , λk ≥ 0, and
λi = 1. If f :
C ⊆ Rn → R is convex over the convex set C, then for any x1 , x2 , ...xk ∈ C, we
have f(λ1 x1 + λ2 x2 + ... + λn xn ) ≤ λ1 f(x1 ) + λ2 f(x2 ) + ... + λn f(xn ).
Proof: The proof is by induction. Clearly it is true for the case of n = 2 by
definition of convex functions. Now suppose the statement is true for n − 1.
6
scribe:Min Shi
Lecture 2
Date: 01/29/03
Take a set of n points in x1 , . . . , xn−1 , xn ∈ C. Let λ1 , . . . , λn−1 , λn ≥ 0 and
P
i λi = 1 and consider the point y = λ1 x1 + · · · + λn−1 xn−1 + λn xn . If λn = 0
then we are back in the case of n − 1 points which by induction hypothesis the
theorem is true. If λn = 1 then the other λi = 0 and again the theorem is
trivially true. So assume 0 < λn < 1. Then
λ1
λn−1
λ1 x1 +· · ·+λn−1 xn−1 +λn xn = λn xn +(1−λn )
x1 + · · · +
xn−1
1 − λn
1 − λn
λn−1
λ1
Now, defining z = 1−λ
x1 + · · · + 1−λ
xn−1 we see that z is a convex combin
n
nation of x1 , . . . , xn−1 . Thus, by induction hypothesis,
f(z) ≤
n−1
X
λi f(xi )
i=1
Also, y = λn xn + (1 − λn )z is a convex combination of two points and so
f(y) ≤ (1 − λn )f(z) + λn f(xn )
Putting these two inequalities together we get
f(y) ≤ (1 − λn )f(z) + λn f(xn )
λn−1
λ1
≤ (1 − λn )
x1 + · · · +
xn−1 + λn xn
1 − λn
1 − λn
= λ1 x1 + · · · + λn xn
One common operation that arises quite often in applications is that we may
wish to optimize a function that is the maximum of several functions at a
given point. It turns out that if these functions are convex, then applying
the maximum on them, we get another convex function.
Lemma 7 Let C ⊆ Rn be a convex set and
f1 : C → R,
f2 : C → R, be convex
functions, then the function: g(x) = max f1 (x), f2 (x) is also convex.
Proof: Given x1 , x2 ⊆ C for any 0 ≤ λ ≤ 1, we have g(λx1 + (1 − λ)x2 ) =
fi (λx1 + (1 − λ)x2 ) ≤ λfi (x1 ) + (1 − λ)fi x2 ) ≤ λg(x1 ) + (1 − λ)g(x2 ), where i =1
or 2. Therefore, g(x) is convex.
This lemma can be easily extended to n functions by induction. In fact, it can
be extended to infinitely many functions as well.
Lemma 8 Let f(x; y) be convex in x ∈ C ⊆ Rn for each parameter y ∈ I, where
I is some index set. Then the function g(x) = supy∈I f(x; y) is also convex.
7
scribe:Min Shi
Lecture 2
Date: 01/29/03
f(x)
g(x)
max{f(x), g(x)}
epigraph of the function
max{f(x),g(x)}
One of the most important properties of convex functions is that they are
essentially continuous, that is they are continuous over open sets.
Theorem 2 Let C ⊆ Rn be an open convex set, if f is a convex function on C,
then f is continuous.
We omit the proof of this theorem but remind you that the theorem need not
hold if C contains some points of its boundary.
Another important property of convex functions is that their sets of minimizers itself is convex.
Theorem 3 Let C ⊆ Rn be a convex set and f : C → Rn be a convex function.
Consider the minimization problem to minimize f(x) on C. Find x∗ such that
f(x∗ ) = minx∈C f(x), and the set of x∗s , where the minimum is attained, is
convex.
Proof: Let z∗ = f(x1 ) = f(x2 ) = minf(x). Now consider λx1 + (1 − λ)x2 for all
0 ≤ λ ≤ 1, we have: f(λx1 + (1 − λ)x2 ) ≤ λf(x1 ) + (1 − λ)f(x2 ) = Z∗ . Therefore,
the set of x∗s is convex.
8
scribe:Min Shi
Lecture 2
Date: 01/29/03
Figure 2: Graph of the function max(2x2 + 3 ∗ y2 , 3): The set of minimizers is
the set (x, y, z) | z = 0, and 2x2 + 3y2 ≤ 3 .
9
Fly UP