...

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4
Semidefinite and Second Order Cone
Programming Seminar
Fall 2001
Lecture 4
Instructor: Farid Alizadeh
Scribe: Haengju Lee
10/1/2001
1
Overview
We examine the dual of the Fermat-Weber Problem. Next we will study optimality condition in the form of generalized complementary slackness theorem.
Finally we start the study of the eigenvalue optimization problem as a semidefinite program.
2
Dual of the Fermat Weber Problem
Recall that the Fermat-Weber problem seeks a point in m dimensional space
whose Euclidean distance from a set of given n points is minimum (see lecture
1). Given points v1 , v2 , . . . , vn ∈ Rm , weights w1 , w2 , . . . , wn , this problem can
be formulated as follows.
n
X
min
wi kvi − xk
i=1
The problem can be written equivalently as a cone-LP over Q, the second
order cone:
min w1 z1 + . . . + wn zn
s.t. zi ≥ kvi − xk, i = 1, . . . , n.
1
scribe:Haengju Lee
Lecture 4
Date: 10/1/2001
But,
zi ≥ kvi − xk
zi
⇐⇒
Q 0
x − vi
z
0
⇐⇒ i Q
x
vi
0
⇐⇒ zi e + x̂ Q
vi
0
T
where e = (1, 0, . . . , 0) and x̂ =
. Now the cone-LP formulation is:
x
Primal
min
w1 z1 + . . . +wn
zn
0
yi0
s.t.
zi e + x̂ Q
, i = 1, . . . , n
⇐⇒
vi
yi
yi0
If we define dual variable
corresponding to the the second order cone
yi
inequality in the Primal then the dual can be formulated as:
Dual
max
s.t.
Pn
T
i=1 vi yi
y0i = wi , i =
1, . . . , n
⇐⇒ zi
y
+
.
.
.
+
y
=
0
⇐⇒
xi
1
n
y0 i
Q 0
⇐⇒ since they arise from Q .
yi
After simplification (for instance eliminating yi0 ) we get:
Dual
max
s.t.
Pn
viT yi
Pi=1
n
i=1 yi = 0
kyi k ≤ wi .
The dual of Fermat Weber problem has an interesting interpretation in dynamics. Let us assume that wi are weights of objects hanging from threads that go
through a set of holes in a table. We are to take the other ends of the threads
and tie them up at a position of equilibrium, and spend minimal amount of
energy. Then the yi are interpreted as forces, and they must add up to zero so
that we have equilibrium. The condition kyi k ≤ wi simply states that the magnitude of the force exerted at the knot by the ith object cannot be larger than
its weight. Assuming that the
location is x∗ we can
P optimal
P write the value of
∗
the objective function as − i (x − vi )T yi because (x∗ )T i yi = 0. Then the
objective is simply the location with minimum potential energy. (Question:
2
scribe:Haengju Lee
Lecture 4
Date: 10/1/2001
Can you give an interpretation of Primal and explain why the primal and dual
problem are equal at the optimum?)
w2
w4
w1
w5
w3
3
Duality in different spaces
In many situations a m-dimensional cone in can be expressed as the intersection
of another n-dimensional cone and a linear space: K1 = K ∩ L where n > m.
Then, remembering that a linear space is also a cone and its dual as a cone is
simply its orthogonal complement L⊥ (why?), we get K1∗ = K∗ + L⊥ . Here K1∗
is the dual of K1 in the space <n . But if we can get the dual in the space L then
the dual cone will be m-dimensional and different from K1∗ ; let us call the dual
of K1 in the space L K1+ . If it is at all possible to find a good characterization of
K1+ we should use that instead of K1∗ . Let us look at an example and see what
would the problems be if we don’t.
In linear programming our cone is the non-negative orthant <n+ and cone-LP
is simply the ordinary LP:
Primal
min
s.t.
Dual
max
s.t.
cT x
aTi x = bi i = 1, . . . , m
xi ≥ 0 i = 1, . . . , n
T
b
Py
i = 1, . . . , m
i yi ai + si = ci
si ≥ 0 i = 1, . . . , n
Now suppose that we express the non-negative orthant as the intersection of
positive semidefinite cone and the linear space L which consists of only diagonal
3
scribe:Haengju Lee
Lecture 4
Date: 10/1/2001
matrices, that is X ∈ L iff xij = 0 for all i 6= j. We define diagonal matrices
C = Diag(c) and Ai = Diag(ai ), that is a matrix whose diagonal entries j, j are
cj (or (ai )j ), and non-diagonal entries i, j are all zeros. Now the primal linear
programming problem can be written as a semidefinite programming problem.
Primal : min{C • X | Ai • X = bi , for i = 1, . . . , m, Xij = 0 for i 6= j, X < 0}
Note that the condition Xij = 0 is the same as (Eij + Eji ) • X = 0 where Eij
is the matrix with all entries 0 except the i, j entry which is one. Now taking
the dual of this SDP we arrive at a problem that is not equivalent to the dual
of the LP:
X
X
Dual : max{bT y |
yi Ai +
sij (Eij + Eji ) C}
Even if the original LP problem has unique primal and dual solutions it is
unlikely in general that the dual of the SDP
P formulation have unique solutions.
The constraints in the dual imply that
yi ai ≤ c but there are in general
infinitely many sij that can be added to a set a given optimal y. The lesson is
that it is not a good idea to formulate an LP as an SDP (which was obvious at
the outset). But for the same reason it is not generally a good idea to express
the dual of a cone-LP over K1 ∩ L as K1∗ + L⊥ .
As another example consider the second order cone Q. Now we know that
x ∈ Q iff Arw x < 0. Thus again SOCP can be expressed as an SDP: write
Q = Pn×n ∩ L where L is the linear space saying matrix X is arrow shaped,
i.e. Xij = 0 if i 6= j and i 6= 0 and j 6= 0, and Xii = Xjj for all i, j. But again
formulating SOCP as and SDP is not a good idea. If we form the dual as an
SDP we will have extra and unnecessary variables that play no essential role and
can make the solution numerically unstable, even if the original SOCP does not
have numerical problems. In future lectures we will see even more compelling
reasons why the SOCP poblem should be treated in its own right rather than
as a special case of SDP.
4
Generalization of Complementary Slackness
Conditions
Consider the pair of cone-LP problems
Primal
min
s.t.
Dual
max
s.t.
cT x
Ax = b
x K 0
bT y
AT y + s = c
s K∗ 0.
We studied before that at the optimum the following three relations hold:
x K 0
s K∗ 0
and
xT s = 0.
In the case of LP, SDP and SOCP these conditions actually imply stronger
relations which we now examine.
4
scribe:Haengju Lee
Lecture 4
Date: 10/1/2001
Example 1 (non-negative orthant) When K = K∗ = <n+ , at the optimum,
xi ≥ 0 for i = 1, . . . , n, si ≥ 0 for i = 1, . . . , n, and xT s = 0 imply xi si = 0 for
i = 1, . . . , n because sum of a set of non negative numbers xi si is zero implies
that each of them must be zero. This is the familiar complementary slackness
theorem of linear programming.
Example 2 (the semidefinite cone) When K = K∗ = Pn×n the optimal,
X < 0, S < 0, and X • S = tr(XS) = √
0. Since√the matrix S is symmetric S
can be expressed as S = QT ΩQ = QT ΩQQT ΩQ = S 1/2 S 1/2 , where Q is
an orthogonal matrix, and Ω a diagonal matrix containing eigenvalues of S on
its diagonal. This shows that each positive semidefinite matrix has a unique
positive semidefinite square root which is denoted by S 1/2 . Now,
0 = tr(XS) = tr XS 1/2 S 1/2 = tr S 1/2 XS 1/2
This implies that S 1/2 XS 1/2 = 0 because S 1/2 XS 1/2 is also a positive semidefinite matrix, with non-negative eigenvalues and trace zero. Since trace is sum
of eigenvalues, this is possible only when all eigenvalues are zero, which, in the
case of symmetric matrices, implies that the matrix S 1/2 XS 1/2 is zero. Thus
0 = (S 1/2 X 1/2 )(X 1/2 S 1/2 ) = AT A. We now that AAT = 0 iff A = 0, thus
X 1/2 S 1/2 = 0 which implies XS = 0. We have shown:
Theorem 1 (Complementary slackness theorem for SDP) If X is optimal for the primal SDP, and (y, S) optimal for the dual SDP, and duality gap
X • S = 0, then XS = 0.
Example 3 (The second order cone) When K = K∗ = Q, we have x Q 0,
s Q 0 and xT s = 0, where x, s ∈ <n+1 , and x and s are indexed from 0. This
means that x0 ≥ kx̄k, and s0 ≥ ks̄k, and xT s = 0. or equivalently,
P 2
xi s0
(1)
x20 ≥ x21 + · · · + x2n ⇒ x0 s0 ≥
x
P 02
si x0
s20 ≥ s21 + · · · + s2n ⇒ x0 s0 ≥
(2)
s0
−x0 s0 = x1 s1 + · · · + xn sn
(3)
Now, adding (1), (2) and (3) we get
X x2 s0
s2i x0
i
0≥
+
+ 2xi si
x0
s0
X x2 s2 + s2 x2 + 2xi si x0 s0 i 0
i 0
=
x0 s0
X (xi s0 + si x0 )2
⇒0≥
x0 s0
Again, sum of a set of non-negative numbers is less that or equal to zero. Therefore all of them must be zero. We thus have xi s0 + x0 si = 0, i = 1, . . . , m and
xT s = 0
5
scribe:Haengju Lee
Lecture 4
Date: 10/1/2001
We have shown
Theorem 2 (Complementary slackness for SOCP) If x Q 0, s Q 0,
and xT s = 0, then x0 si + xi s0 = 0 for i = 1, . . . , n. This conditions (along with
xT s = 0) can be written more succinctly as
Arw (x) Arw (s)e = 0
We have implicitly assumed that x0 6= 0 and s0 6= 0. if x0 = 0 ≥ kx̄k then this
implies that x = 0 and the theorem above is trivially true. The same holds for
when s0 = 0.
5
A general complementary slackness theorem
For a proper cone K ⊆ <n , define C(K)
x
T
C(K) =
| x K 0, s K∗ 0, x s = 0 ⊆ <2n
s
Now, on the surface, the set C(K) seems to be a (2n − 1)-dimensional set: Its
members have 2n coordinates and since xT s = 0 we are left with 2n−1 “degrees
of freedom”. The condition x ∈ K by itself does not impose restriction on the
dimension of the set, nor does the condition s ∈ K∗ . Nevertheless it turns out
C(K) is actually an n-dimensional set! Here is why:
Theorem 3 There is a one-to-one and onto continuous mapping from C(K) to
<n .
Before we proceed to the proof we recall the following basic
Fact 1 Let S ⊆ Rn be a closed convex set and a ∈ <n . Then there is a unique
point x = ΠS (a) in S which is closest to a, i.e. there is a unique point x ∈ S
such that x = argminy∈S ka − yk.
The unique point above is called projection of a on to S. The proof of this fact
can be found in many texts and is based on Weierstrass’s theorem. Now we give
proof of Theorem 3.
Proof: Let a ∈ <n be any arbitrary point and define s = x − a. we will first
show that s ∈ K∗ , and then show that the correspondence between a and (x, s)
is a one-to-one, onto and continuous. First we show that s ∈ K∗ . For every
u ∈ K, define convex combination uα = αu + (1 − α)x where 0 ≤ α ≤ 1.
Again we define ζ(α) = ka − uα k2 . Then ζ(α) is a differentiable function on the
interval [0, 1] and min0≤α≤1 ζα is attained at α = 0.
Claim:
dζ ≥0
dα α=0
6
scribe:Haengju Lee
Lecture 4
Date: 10/1/2001
proof of Claim: Otherwise ∃α in some neighborhood of 0, such that ka−uα k <
ka − u0 k contradicting the fact x = u0 is the closest point to a in K.
From this claim,
dζ = −2(a − x)T (u − x) ≥ 0
dα α=0
⇐⇒ 2(x − a)T (u − x) ≥ 0
⇐⇒ 2sT (u − x) ≥ 0.
(4)
This latter inequality is true for any u ∈ K. If we choose u = 2x then we get
sT x ≥ 0. If we choose u = x/2 then sT x ≤ 0. We conclude that xT s = 0. If
we plug this into (4) we get sT u ≥ 0 which means s ∈ K∗ . Thus, for each a
we get a pair (x, s) ∈ C(K). Clearly each a results in a unique (x, s) as x the
projection is unique and thus so is s = x − a. Also, both projection operation
and s = x − a are continuous.
Conversely, if (x, s) ∈ C(K), then we can set a = x − s. All we have to show
now is that projection of a onto K is x. Assume otherwise. Then there is a
point u ∈ K such that ka − uk < ka − xk that is
⇐⇒ (a − x)T (a − x) > (a − u)T (a − u)
⇐⇒ xT x − 2(x − s)T x > uT u − 2(x − s)T u noting that xT s = 0,
⇐⇒ 0 > uT u + xT x − 2xT u + 2uT s
⇐⇒ 0 > ku − xk2 + 2sT u
which implies that sT u < 0, contradicting the fact s ∈ K∗ . (This proof is due
to Osman Güler.)
Example 4 (Dual of half line) Let us see what C(K) looks like in the case
of half-line, that is when K = K∗ = <+ .
x
C(K) =
| x ≥ 0, s ≥ 0 ⊆ R2
s
In other words, C(<+ ) is the union of non-negative part of the x and s axes: it
is the real line < bent at the origin by a 90◦ angel.
Now the implication of this theorem is that since C(K) is n-dimensional, then
there must exist a set of n equations, that are independent in some sense and
define the manifold C(K). These n equations are precisely the complementary
slackness conditions. In case of non-negative orthant, semidefinite and second
order cones we were able to get these equations explicitly. When the cone K is
given by a set of inequalities of the form gi (x) ≤ 0 for i = 1, . . . , n, and gi (x)
are homogeneous and convex functions, then the classical Karush-Kuhn-Tucker
conditions gives us a method of obtaining these equations.
7
scribe:Haengju Lee
Lecture 4
6
Date: 10/1/2001
Eigenvalue Optimization
In this section we relate the eigenvalues λ1 (A) ≥ λ2 (A) ≥ · · · ≥ λn (A) for some
A ∈ Sn×n .
Let us find an SDP formulation of the largest eigenvalue, λ1 (A). This problem can be formulated by primal and dual SDPs as follows.
Primal
min
s.t.
z
zI A
Dual
max
s.t.
A•Y
I • Y = tr(Y ) = 1
Y 0
The primal formulation simply says find the smallest z such that z is larger
than all eigenvalues of A. But z is larger than all eigenvalues of A iff zI − A
is positive semidefinite. The dual characterization is obtained by simply taking
dual. Now define the feasible set of the dual to be S, that is
Definition 1
S = {Y ∈ Sn×n | tr(Y ) = 1, Y 0}
T
E = {qq | kqk = 1}
(5)
(6)
We can characterize the extreme points of S as follows:
Theorem 4 S is a convex set and the set of extreme points of S is E.
Proof: Convexity of S is obvious, since it is the intersection of the semideinite
T
T
cone
P and an affine set. Y 0 implies that Y = ω1 q1 q1 + · · · + ωk qk qk where
ωi = 1, ωi ≥ 0, and kqi k = 1. This shows that the extreme points of S
are among elements of E. Now we prove that all elements of E are extreme
points. Otherwise for some qqT there are p and r with kpk = krk = 1 and
√
T
√
√
√
qqT = αppT + (1 − α)rrT =
αp
1 − αr
αp
1 − αr . If α 6= 0
or 1 we will have a contradiction to the fact that rank(qqT )=1. so qqT are
extreme points.
Since the optimum of a linear function over a convex set is attained at an extreme
point, it follows that the Y ∗ that maximized A • Y in the dual characterization
above is of the form Y ∗ = qqT , with kqk = 1. That is
λ1 (A) = max qT Aq
kqk=1
This is a well-know result in linear algebra that we have proved using duality of
SDP. In future lectures we will use this characterization to express optimization
of eigenvalues over an affine class of matrices.
8
Fly UP