...

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

by user

on
4

views

Report

Comments

Transcript

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5
Semidefinite and Second Order Cone
Programming Seminar
Fall 2001
Lecture 5
Instructor: Farid Alizadeh
Scribe: Anton Riabov
10/08/2001
1
Overview
We continue studying the maximum eigenvalue SDP, and generalize it to an
affine set of matrices. Next, we define Lovász Theta function and prove Lovász’s
“sandwich theorem”. Finally, we introduce further extensions to the eigenvalue
SDP for finding minimum sum of k largest eigenvalues over an affine set of
matrices.
2
Finding Largest Eigenvalue (Continued)
In the previous lecture we have introduced sets
S := {X : tr X = 1, X < 0},
E := {qqT : kqk2 = 1}.
We have shown that E ⊂ S and members of E are extreme points of S. For
a fixed symmetric matrix A we have defined the largest eigenvalue SDP as:
λ1 (A) =
min z
s.t. zI < A
The dual to this problem is:
λ1 (A) =
max A · Y
s.t. tr Y = 1
Y<0
1
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
Note that S is exactly the feasible set of the dual. As we know, optimal
value is attained at an extreme point of these set, i.e. Y ∗ ∈ E if and only if Y ∗
is an optimal solution of the dual. Thus there exists q such that Y ∗ = qqT . So,
we can write:
λ1 (A) = maxY · A = max qqT · A = max qT Aq.
kqk2 =1
kqk2 =1
Thus,
λ1 (A) = max qT Aq,
kqk2 =1
(1)
which is a well-known result from linear algebra, that we have proven using
semi-definite programming duality.
Note 1 Similarly we can show that the smallest eigenvalue λn (A) can be expressed as:
λn (A) = min qT Aq.
kqk2 =1
3
Lovász Theta Function
In this section we describe extensions of the largest eigenvalue problem. Interested students are referred to the classic book “Geometric Algorithms and
Combinatorial Optimization” by M. Grötschel, L. Lovász and A. Schrijver,
Chapter 9.
3.1
Optimizing Largest Eigenvalue Over an Affine Set
Suppose now that we are solving the same largest eigenvalue problem, but the
matrix A is not fixed, and we are looking for a way to minimize the largest
eigenvalue over a set of matrices. For example, consider the linear combination
of symmetric matrices, A := A0 + x1 A1 + · · · + xm Am . The problem in this case
translates into the following unconstrained optimization problem:
min λ1 (A0 + x1 A1 + · · · + xm Am ).
x∈<m
It can be shown that this function is not linear. Furthermore, it is not smooth,
and this makes optimization a very difficult task. However, we will prove that
this function is convex, and therefore the task of finding it’s minimum should
be tractable.
Proposition 1 Function f(x) := λ1 (A0 + x1 A1 + · · · + xm Am ) is convex.
Proof: We will prove that λ1 (A+B) ≤ λ1 (A)+λ1 (B), and the result will follow.
Using (1), we can state that there exists q such that:
λ1 (A + B) = qT (A + B)q = qT Aq + qT Bq
2
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
The value of q is feasible, but not optimal, if we apply (1) to A and B
separately. Therefore,
qT Aq + qT Bq ≤ λ1 (A) + λ1 (B),
and the result follows.
We can rewrite the problem as an equivalent semidefinite programming problem:
min z
s.t. zI − x1 A1 − · · · − xm Am < A0
The dual for this problem is:
max A0 · Y
s.t. tr Y = 1
Ai Y = 0
Y<0
People have studied this problem from the point of view of unconstrained
optimization. It turns out to be a very difficult problem, when formulated this
way. The graph of the function is not smooth. Newton method can not be
applied directly, since taking derivative is not always possible. Semidefinite
programs, in turn, can be solved efficiently with any given precision. We will
describe algorithms for solving SDPs later in the course.
3.2
Maximum Clique, Stable Set and Graph Coloring Problems
Assume G = (V, E) is an undirected graph, no loops attached to vertices, i.e.
edges (i,i) are not allowed in G.
Definition 1 Maximum clique problem: find a maximum fully connected subgraph of G (a clique). I.e. find a subgraph G0 = (V0 , E0 ), V0 ⊆ V, E0 ⊆ E such
that ∀i, j ∈ V0 , i 6= j ⇒ ∃(i, j) ∈ E0 and |V0 | is maximal.
The maximum clique problem is NP-complete. In fact, it was one of the
original problems shown to be NP-complete.
Example 1 (Maximum Clique) Consider the graph on Figure 1. In this
graph the cliques of size more than 2 are {2,3,4,5} and {1,2,5}. The former
clique is maximum.
Definition 2 Maximum independent (or stable) set problem: find a maximum
set of vertices none of which are connected with an edge.
3
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
1
5
2
4
3
Figure 1: An example of an undirected graph.
The maximum independent set problem is in some sense a dual of the maximum clique problem. The maximum independent set of a graph is the maximum
clique of its complement.
Definition 3 Complement graph Ḡ = (V, Ē) of a graph G = (V, E) is a graph,
that has the same set of vertices, and satisfying the following property: every
pair of vertices, that is connected in the original graph is not connected in its
complement, and vice versa.
Example 2 (Stable Set) Using the graph in Figure 1, the optimal solutions
are {1,4} and {1,3}.
1
5
2
4
3
Figure 2: Complement of a graph in Figure 1.
Example 3 (Complement Graph) Figure 2 shows the complement graph
for the graph shown on Figure 1.
Note 2 Graph G is a complement of Ḡ.
Note 3 Every solution to the maximum stable set problem on a graph G is a
solution for the maximum clique problem on the graph Ḡ, and vice versa.
Definition 4 Minimum graph coloring problem: assign colors to vertices such
that every pair of adjacent vertices has different colors, using the smallest possible total number of colors. The number of colors used in the optimal solution
to the problem is referred to as the chromatic number of the graph.
4
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
Finding the chromatic number of a graph is also NP-complete. In fact, it
can be proven that there does not exist an ε-approximation algorithm for this
problem. In other words, for any given ε ≥ 1 it is impossible to give a polynomial
algorithm that for every graph G finds a number that is guaranteed to be at
most εχ(G), unless P = NP.
We will use the following notation to denote optimal values of these problems:
• ω(G) - size of maximum clique in graph G;
• α(G) - size of largest stable set in graph G;
• χ(G) - chromatic number of graph G.
Proposition 2 ω(G) ≤ χ(G).
Proof: The vertices of each clique must have different colors.
Example 4 (Strict Inequality) However, the equality does not hold all the
time. The example of a cycle of 5 vertices shows that in some cases the inequality
above is strict.
1
5
2
4
3
Figure 3: Odd cycle example.
It is easy to see that the graph shown in Figure 3 can not be painted with
less than 3 colors, therefore χ(G) = 3. However, the largest clique in this graph
has only 2 vertices, and ω(G) = 2 < 3 = χ(G). This is actually true for any
odd cycle.
Proposition 3 If a symmetric matrix A ∈ <n×n and its submatrix B ∈ <k×k
are such that
k
k
B C
A=
,
CT D
then
λ1 (B) ≤ λ1 (A).
Proof: From (1) it follows that there exists q ∈ <k , satisfying kqk2 = 1, such
that:
λ1 (B) = qT Bq.
5
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
Recall that the value of vector q, at which maximum in (1) is achieved, is the
eigenvector corresponding to the eigenvalue λ1 (B). Further,
B C
q
λ1 (B) = qT Bq = qT 0
≤ λ1 (A),
0
CT D
where the inequality follows from the fact that vector qT
optimal for the entire matrix A in terms of (1).
0 is not necessarily
Corollary 1 If A ∈ <n×n is a symmetric matrix and its submatrix B ∈ <k×k
is composed of the rows and columns corresponding the same index subsets, i.e.
for each column i of A that is included in B, the row i of A is included in B,
and vice versa, then λ1 (B) ≤ λ1 (A).
Proof: This corollary states that the submatrix B does not have to be in the
corner of A, it can be distributed over A, as long as corresponding rows and
columns are participating. We will make use of permutation matrices to prove
this.
The matrix A can be transformed to have the structure required in Proposition 3 via multiplication to permutation matrices. A permutation matrix is a
square matrix composed of 0 and 1 elements, and having only one non-zero in
each row and in each column. If the permutation matrix P rearranges columns,
when multiplied on the right side, then PT on the left side results in the rearrangement of corresponding columns. Therefore there exists a permutation
matrix P such that PT AP has the required structure. It is known from linear
algebra that permutation matrices are orthogonal, and PT = P−1 . Therefore
permutation does not change eigenvalues, and the matrix PT AP has the same
eigenvalues as A. Thus, λ1 (B) ≤ λ(PT AP) = λ1 (A), where the inequality follows
from Proposition 3.
Let us now apply this result to adjacency matrix A of an arbitrary graph G.
An adjacency matrix is defined as A = (aij ), where aij = 1 if there exists an
edge (i, j), and aij = 0 otherwise.
We will illustrate the procedure on the example graph in Figure 1. For this
graph, the adjacency matrix A is




0 1 0 0 1
1 1 0 0 1
1 1 1 1 1
1 0 1 1 1




 and A + I = 0 1 1 1 1 .
0
1
0
1
1
A=




0 1 1 0 1
0 1 1 1 1
1 1 1 1 0
1 1 1 1 1
It is easy to see that cliques in the graph correspond to submatrices of ones
in the matrix A + I, composed of the corresponding rows and columns. Now
we can apply Corollary 1. Thus, for each such clique submatrix Jk = 11T of
size k × k,
λ1 (Jk ) ≤ λ1 (A + I).
(2)
6
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
The eigenvalues of Jk are easy to compute. It is a rank 1 matrix, therefore it
has only one non-zero eigenvalue, and it is easy to see that the vector 1 is the
eigenvector corresponding to this non-zero value, and the value itself is equal to
k. Thus,
λ1 (Jk ) = k.
Substituting in (2),
k ≤ λ1 (A + I).
Since this equation holds for any clique size k, it also holds for the maximum
clique size:
ω(G) ≤ λ1 (A + I).
(3)
3.3
Lovász Theta Function
Let us see if the inequality (3) can be made tighter, so we can obtain a better
estimate of ω(G). Note that Corollary 1 can still be applied in the same way,
and (3) holds, even if the zeros in A + I are replaced with arbitrary values, as
long as the matrix A + I remains symmetric. The reason for this is that there
are no zeros inside any clique submatrix Jk , so that part (corresponding to B in
Corollary (1) remains unchanged. Formally, let
1,
if (i, j) ∈ E or i = j;
[A(x)]ij =
xij = xji , otherwise.
Now we can write a stronger version of inequality (3):
ω(G) ≤ min λ1 (A(x)) .
x
(4)
Definition 5 The right hand side of (4) is referred to as Lovácz θ-function of
a graph. It is denoted θ(G) = minx λ1 (A (x)).
Note that A(x) defined above can be written as A(x) = A0 + x1 A1 + x2 A2 +
· · · + xm Am . The example below illustrates how this can be done.
Example 5 (Rewriting A(x) as sum)


1 x y
 x 1 1  = x(E12 + E21 ) + y(E13 + E31 ) + (A + I).
y 1 1
We know that ω(G) ≤ χ(G). Now we can formulate the following theorem, that makes a stronger statement. This theorem is also known as Lovász’s
“sandwich theorem”.
Theorem 1 ω(G) ≤ θ(G) ≤ χ(G).
7
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
Before we prove this theorem, let us make several observations regarding
graph sums.
Definition 6 Given graphs G1 = (V1 , E1 ), G2 = (V2 , E2 ) the sum of graphs
G1 ⊕ G2 is a graph formed by putting together graphs G1 and G2 . In other
words, G1 ⊕ G2 = (V1 ∪ V2 , E1 ∪ E2 ).
The following propositions are straightforward to prove, so the proofs are
not given.
Proposition 4 ω(G1 ⊕ G2 ) = max{ω(G1 ), ω(G2 )}.
Proposition 5 χ(G1 ⊕ G2 ) = max{χ(G1 ), χ(G2 )}.
Although given the two propositions above, the following proposition may
be anticipated, the proof is not as obvious.
Proposition 6 θ(G1 ⊕ G2 ) = max{θ(G1 ), θ(G2 )}.
Proof: Write an SDP for finding the value of θ(G) :
θ(G) =
min z
s.t.
zI − X < 0
xij = 1
if i = j or (i, j) ∈ E
Here is the dual of this problem:
X
X
max
yij +
yii = (A + I) • Y
ij∈E
s.t.
tr Y = 1
yij = 0
Y<0
(i, j) 6∈ E
Now let us write the dual for graph composition G1 ⊕ G2 . The dual variable
will have the following structure (since there are no edges between the graphs
in the sum):
Y1 0
.
0 Y2
Thus, the dual program is
max (A + I) • Y1 + (A + I) • Y2
s.t.
tr Y1 + tr Y2 = 1
yij = 0
(i, j) 6∈ E
Y1 , Y 2 < 0
8
(5)
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
The formulation above resembles the original dual formulation. It could be
separated into two problems (for graphs G1 and G2 ) and the proposition will
be proven immediately, if not for the constraint (5), that makes Y1 dependent
on Y2 .
Note that we can replace Y with αY, in the dual problem, for some α > 0,
and the objective still remains the same, except that everything is now scaled
down by α. Now let us choose an α such that 0 ≤ α ≤ 1, and replace (5) with
the one of following two constraints, and solve the two dual problems:
tr Y1 = α
tr Y2 = 1 − α
The solutions to these independent problems will be, as we know, αθ(G1 )
and (1−α)θ(G2 ) correspondingly. The solution to the dual problem for G1 ⊕G2
is then:
max αθ(G1 ) + (1 − α)θ(G2 ) = max{θ(G1 ), θ(G2 )},
0≤α≤1
where the equality follows from the linearity of the optimal solution in α.
Now we are ready to prove Theorem 1.
Proof: Denote k = χ(G). The nodes of the graph G can be partitioned into
k classes according to their color. We can make all classes contain the same
number of nodes (r) by introducing additional singleton nodes, that are not
connected to any nodes in the graph, and therefore can have any color. This
procedure does not change θ(G) according to Proposition 6, since θ of a single
vertex is 1. Denote this new graph G1 . Then,
|V(G1 )| = kr.
θ(G) = θ(G1 ),
Consider matrix A(α) ∈ <kr×kr ,

(αJr + (1 − α)Ir )
Jr

J
(αJ
+
(1
− α)Ir )
r
r
A(α) := 

···
···
Jr
Jr
···
···
···
···
Note that (αJr + (1 − α)Ir ) is a symmetric r × r matrix
structure:

1
α
α ···
α
1
α ···

α
1 ···
(αJr + (1 − α)Ir ) = 
α
· · · · · · · · · · · ·
α
α
α ···

Jr

Jr


···
(αJr + (1 − α)Ir )
having the following

α
α

α

···
1
Jr , as before, is a matrix of all ones of size r × r.
It is easy to see that matrix A(α) is a special case of (A(x)) matrix for
graph G1 . Vertices within a color class are not connected, and if we rename
them so that classes follow one another, we will have zero matrices of size r × r
9
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
on the diagonal in adjacency matrix for G1 . Now we can replace all zeros with
variables, and we choose to replace some of them with α, which preserves matrix
symmetry. Since it is a special case, and by definition of θ(G1 ),
θ(G1 ) ≤ λ1 (A(α)).
Note that we can rewrite A(α) as:
A(α) = Jkr − (1 − α)(Jr ⊕ Jr ⊕ · · · ⊕ Jr ) + (1 − α)I.
(6)
Now we would like to find the value of λ1 (A(α)) as a function of α in closed
form, and choose α such that λ1 (A(α)) ≤ k = χ(G1 ) = χ(G). Note that α does
not have to be positive.
Eigenvalue of matrix sum is equal to sum of eigenvalues, if the matrices
commute. It is known from linear algebra that for matrices A and B, AB = BA
if and only if A and B share a system of common eigenvectors. This is almost
obvious for symmetric matrices, we can make use of eigenvalue decomposition:
A = QT DQ, B = QT CQ, then A + B = QT (D + C)Q if the matrices commute.
We claim that Jkr , (Jr ⊕ Jr ⊕ · · · ⊕ Jr ) and I commute. I is easy, since it
commutes with every matrix. It is also straightforward to verify that the vector
of all ones, and everything orthogonal to it, can be taken as eigenvectors for
Jkr . Vector of all ones is also an eigenvector of (Jr ⊕ Jr ⊕ · · · ⊕ Jr ); thus take any
other set of eigenvectors for it, its members are orthogonal to 1, and so are also
eigenvectors of Jkr , and this proves the claim.
The table below summarizes the eigenvalues of the 3 terms in the sum (6):
Jkr
kr
0
...
0
0
0
...
0
0
0
...
0
(1 − α)(Jr ⊕ Jr ⊕ · · · ⊕ Jr )
−(1 − α)r
0
...
0
−(1 − α)r
0
...
0
−(1 − α)r
0
...
0
(1 − α)I
1−α
1−α
...
1−α
1−α
1−α
...
1−α
1−α
1−α
...
1−α
Thus in order to complete the proof, we need to show that there exists an
α that satisfies:
kr − (1 − α)r + (1 − α) ≤ k
1 − α − (1 − α)r ≤ k
1−α≤k
10
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
Solving the first inequality for α we obtain α ≤ 1 − k. Solving the last one
we get α ≥ 1 − k. Setting α = 1 − k will satisfy all three inequalities.
As we have just shown, this theorem proves a bound on a very hard integer program (IP) using results obtained with semidefinite programming (SDP)
analysis. Later we will show that any IP can be relaxed to an SDP in a similar
way. SDP relaxations are typically tighter than relaxations to linear programs,
and we will give a general framework for creating such relaxations.
4
Computing Sums of Eigenvalues
Consider further generalization of the optimization problem we have been working with. As before, define
A(x) = A0 + x1 A1 + x2 A2 + · · · + xm Am .
But now, we want to find the sum of a given number of largest eigenvalues,
instead of just finding one largest eigenvalue:
f(x1 , x2 , · · · , xn ) = (λ1 + λ2 + · · · + λk )(A0 + x1 A1 + x2 A2 + · · · + xm Am ).
And, as in previous sections, we would like to find the unconstrained minimum:
min f(x1 , x2 , · · · , xn ).
x
We can write an SDP for this problem, but the straightforward approach (writing constraints for all possible sums of eigenvalues) results in an exponential
number of constraints. A more sophisticated approach should be used. We will
first illustrate the ideas we are going to employ on a small linear program.
Example 6 (Linear Programming)
Consider the following linear program (LP):
max w1 x1 + · · · + wn xn
s.t. x1 + · · · + xn = k
0 ≤ xi ≤ 1
(7)
Suppose k is an integer. This program finds the k largest values of w, and
sets corresponding values of x to 1.
All extreme points of feasible set of LP are integer. This can be proven, for
example, using complementary slackness. Let us write the dual of this problem:
X
min kz +
yi
s.t. z + yi ≥ wi
yi ≥ 0
Now, using complementary slackness, we can show that a fractional solution
is either non-optimal, or is not an extreme point.
11
scribe:Anton Riabov
Lecture 5
Date: 10/08/2001
Another way to show that all the extreme points of LP are integral is to
notice that the matrix of this problem is totally unimodular (all determinants
of its sub-matrices are either 0, 1, or −1).
Either way, it can be shown that the extreme points of this polytope are 0-1
vectors having exactly k ones in them. Introducing just one constraint on the
sum (equation (7) in LP) greatly restricts the degrees of freedom for x.
12
Fly UP