8.7 Exploiting Structure

By default, the functions conelp() and coneqp() exploit no problem structure except (to some limited extent) sparsity. Two mechanisms are provided for implementing customized solvers that take advantage of problem structure.

Providing a function for solving KKT equations.
The most expensive step of each iteration of conelp() or coneqp() is the solution of a set of linear equations (‘KKT equations’) of the form
⌊      T      T   ⌋ ⌊    ⌋   ⌊    ⌋
⌈ P   A     G     ⌉ ⌈ ux ⌉   ⌈ bx ⌉
  A    0     0T       uy   =   by
  G    0  - W  W      uz       bz
(8.3)

(with P = 0 in conelp()). The matrix W depends on the current iterates and is defined as follows. We use the notation of sections 8.1 and 8.2. Suppose

u = (ul, uq,0, ..., uq,M- 1, vec(us,0), ..., vec(us,N -1)), ul ∈ Rl, uq,k ∈ Rrk , k = 0,...,M - 1, us,k ∈ Stk, k = 0,...,N - 1.

Then W is a block-diagonal matrix,

W u = (Wlul, Wq,0uq,0, ..., Wq,M- 1uq,M- 1, Ws,0vec(us,0), ..., Ws,N-1vec (us,N-1))

with the following diagonal blocks.

It is often possible to exploit problem structure to solve (8.3) faster than by standard methods. The last argument kktsolver of conelp() and coneqp() allows the user to supply a Python function for solving the KKT equations. This function will be called as ”f = kktsolver(W)”, where W is a dictionary that contains the parameters of the scaling:

The function call ”f = kktsolver(W)” should return a routine for solving the KKT system (8.3) defined by W. It will be called as ”f(bx, by, bz)”. On entry, bx, by, bz contain the righthand side. On exit, they should contain the solution of the KKT system, with the last component scaled, i.e., on exit,

bx := ux,    by := uy,  bz := W uz.

In other words, the function returns the solution of

⌊ P  AT   GTW  -1 ⌋⌊ ˆux ⌋   ⌊ bx ⌋
⌈ A   0      0    ⌉⌈ ˆuy ⌉ = ⌈ by ⌉ .
  G   0    - W T     ˆuz       bz

Specifying constraints via Python functions.
In the default use of conelp() and coneqp(), the linear constraints and the quadratic term in the objective are parameterized by CVXOPT matrices G, A, P. It is possible to specify these parameters via Python functions that evaluate the corresponding matrix-vector products and their adjoints.

If the argument G of conelp() or coneqp() is a Python function, it should be defined as follows:

G(x, y [, alpha[, beta[, trans]]])

This evaluates the matrix-vector products

y := αGx  + βy  (trans = ′N′),   y := αGT x +βy   (trans = ′T′).

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = ’N’.

Similarly, if the argument A is a Python function, then it must be defined as follows.

A(x, y [, alpha[, beta[, trans]]])

This evaluates the matrix-vector products

y := αAx  + βy  (trans = ′N′),   y := αAT x +βy   (trans = ′T′).

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = ’N’.

If the argument P of coneqp() is a Python function, then it must be defined as follows:

P(x, y [, alpha[, beta]])

This evaluates the matrix-vector products

y := αP x+ βy.

The default values of the optional arguments must be alpha = 1.0, beta = 0.0.

If G, A or P are Python functions, then the argument kktsolver must also be provided.

We illustrate these features with three applications.

Example: 1-norm approximation

The optimization problem

minimize  ∥Pu - q∥1

can be formulated as a linear program

minimize  1Tv
subject to - v ≼ P u- q ≼ v.

By exploiting the structure in the inequalities, the cost of an iteration of an interior-point method can be reduced to the cost of least-squares problem of the same dimensions. (See section 11.8.2 in the book Convex Optimization.) The code belows takes advantage of this fact.

from cvxopt import base, blas, lapack, solvers  
from cvxopt.base import matrix, spmatrix, mul, div  
 
def l1(P, q):  
    """  
 
    Returns the solution u, w of the l1 approximation problem  
 
        (primal) minimize    ||P*u - q||_1  
 
        (dual)   maximize    q’*w  
                 subject to  P’*w = 0  
                             ||w||_infty <= 1.  
    """  
 
    m, n = P.size  
 
    # Solve the equivalent LP  
    #  
    #     minimize    [0; 1]’ * [u; v]  
    #     subject to  [P, -I; -P, -I] * [u; v] <= [q; -q]  
    #  
    #     maximize    -[q; -q]’ * z  
    #     subject to  [P’, -P’]*z  = 0  
    #                 [-I, -I]*z + 1 = 0  
    #                 z >= 0.  
 
    c = matrix(n*[0.0] + m*[1.0])  
 
    def G(x, y, alpha = 1.0, beta = 0.0, trans = ’N’):  
 
        if trans==’N’:  
            # y := alpha * [P, -I; -P, -I] * x + beta*y  
            u = P*x[:n]  
            y[:m] = alpha * ( u - x[n:]) + beta * y[:m]  
            y[m:] = alpha * (-u - x[n:]) + beta * y[m:]  
 
        else:  
            # y := alpha * [P’, -P’; -I, -I] * x + beta*y  
            y[:n] =  alpha * P.T * (x[:m] - x[m:]) + beta * y[:n]  
            y[n:] = -alpha * (x[:m] + x[m:]) + beta * y[n:]  
 
    h = matrix([q, -q])  
    dims = {’l’: 2*m, ’q’: [], ’s’: []}  
 
    def F(W):  
 
        """  
        Returns a function f(x, y, z) that solves  
 
            [ 0  0  P’      -P’      ] [ x[:n] ]   [ bx[:n] ]  
            [ 0  0 -I       -I       ] [ x[n:] ]   [ bx[n:] ]  
            [ P -I -D1^{-1}  0       ] [ z[:m] ] = [ bz[:m] ]  
            [-P -I  0       -D2^{-1} ] [ z[m:] ]   [ bz[m:] ]  
 
        where D1 = diag(di[:m])^2, D2 = diag(di[m:])^2 and di = W[’di’].  
        """  
 
        # Factor A = 4*P’*D*P where D = d1.*d2 ./(d1+d2) and d1 = di[:m].^2, d2 = di[m:].^2.  
 
        di = W[’di’]  
        d1, d2 = di[:m]**2, di[m:]**2  
        D = div( mul(d1,d2), d1+d2 )  
        A = P.T * spmatrix(4*D, range(m), range(m)) * P  
        lapack.potrf(A)  
 
        def f(x, y, z):  
 
            """  
            On entry bx, bz are stored in x, z.  On exit x, z contain the solution,  
            with z scaled: z./di is returned instead of z.  
            """"  
 
            # Solve for x[:n]:  
            #  
            #    A*x[:n] = bx[:n] + P’ * ( ((D1-D2)*(D1+D2)^{-1})*bx[n:]  
            #              + (2*D1*D2*(D1+D2)^{-1}) * (bz[:m] - bz[m:]) ).  
 
            x[:n] += P.T * ( mul(div(d1-d2, d1+d2), x[n:]) + mul(2*D, z[:m]-z[m:]) )  
            lapack.potrs(A, x)  
 
            # x[n:] := (D1+D2)^{-1} * (bx[n:] - D1*bz[:m] - D2*bz[m:] + (D1-D2)*P*x[:n])  
 
            u = P*x[:n]  
            x[n:] =  div(x[n:] - mul(d1, z[:m]) - mul(d2, z[m:]) + mul(d1-d2, u), d1+d2)  
 
            # z[:m] := d1[:m] .* ( P*x[:n] - x[n:] - bz[:m])  
            # z[m:] := d2[m:] .* (-P*x[:n] - x[n:] - bz[m:])  
 
            z[:m] = mul(d[:m],  u - x[n:] - z[:m])  
            z[m:] = mul(d[m:], -u - x[n:] - z[m:])  
 
        return f  
 
    sol = solvers.conelp(c, G, h, dims, kktsolver = F)  
    return sol[’x’][:n],  sol[’z’][m:] - sol[’z’][:m]

Example: SDP with diagonal linear term

The SDP

           T
minimize  1 x
subject to  W  + diag (x) ≽ 0

can be solved efficiently by exploiting properties of the diag operator.

from cvxopt import base, blas, lapack, solvers  
from cvxopt.base import matrix  
 
def mcsdp(w):  
    """  
    Returns solution x, z to  
 
        (primal)  minimize    sum(x)  
                  subject to  w + diag(x) >= 0  
 
        (dual)    maximize    -tr(w*z)  
                  subject to  diag(z) = 1  
                              z >= 0.  
    """  
 
    n = w.size[0]  
    c = matrix(1.0, (n,1))  
 
    def G(x, y, alpha = 1.0, beta = 0.0, trans = ’N’):  
        """  
            y := alpha*(-diag(x)) + beta*y.  
        """  
 
        if trans==’N’:  
            # x is a vector; y is a symmetric matrix in column major order.  
            y *= beta  
            y[::n+1] -= alpha * x  
 
        else:  
            # x is a symmetric matrix in column major order; y is a vector.  
            y *= beta  
            y -= alpha * x[::n+1]  
 
 
    def cngrnc(r, x, alpha = 1.0):  
        """  
        Congruence transformation  
 
            x := alpha * r’*x*r.  
 
        r and x are square matrices.  
        """  
 
        # Scale diagonal of x by 1/2.  
        x[::n+1] *= 0.5  
 
        # a := tril(x)*r  
        a = +r  
        tx = matrix(x, (n,n))  
        blas.trmm(tx, a, side = ’L’)  
 
        # x := alpha*(a*r’ + r*a’)  
        blas.syr2k(r, a, tx, trans = ’T’, alpha = alpha)  
        x[:] = tx[:]  
 
    dims = {’l’: 0, ’q’: [], ’s’: [n]}  
 
    def F(W):  
        """  
        Returns a function f(x, y, z) that solves  
 
                      -diag(z)     = bx  
            -diag(x) - r*r’*z*r*r’ = bz  
 
        where r = W[’r’][0] = W[’rti’][0]^{-T}.  
        """  
 
        rti = W[’rti’][0]  
 
        # t = rti*rti’ as a nonsymmetric matrix.  
        t = matrix(0.0, (n,n))  
        blas.gemm(rti, rti, t, transB = ’T’)  
 
        # Cholesky factorization of tsq = t.*t.  
        tsq = t**2  
        lapack.potrf(tsq)  
 
        def f(x, y, z):  
            """  
            On entry, x contains bx, y is empty, and z contains bz stored  
            in column major order.  
            On exit, they contain the solution, with z scaled  
            (vec(r’*z*r) is returned instead of z).  
 
            We first solve  
 
               ((rti*rti’) .* (rti*rti’)) * x = bx - diag(t*bz*t)  
 
            and take z = - rti’ * (diag(x) + bz) * rti.  
            """  
 
            # tbst := t * bz * t  
            tbst = +z  
            cngrnc(t, tbst)  
 
            # x := x - diag(tbst) = bx - diag(rti*rti’ * bz * rti*rti’)  
            x -= tbst[::n+1]  
 
            # x := (t.*t)^{-1} * x = (t.*t)^{-1} * (bx - diag(t*bz*t))  
            lapack.potrs(tsq, x)  
 
            # z := z + diag(x) = bz + diag(x)  
            z[::n+1] += x  
 
            # z := -vec(rti’ * z * rti)  
            #    = -vec(rti’ * (diag(x) + bz) * rti  
            cngrnc(rti, z, alpha = -1.0)  
 
        return f  
 
    sol = solvers.conelp(c, G, w[:], dims, kktsolver = F)  
    return sol[’x’], sol[’z’]  

Example: Minimizing 1-norm subject to a 2-norm constraint
In the second example, we use a similar trick to solve the problem
minimize  ∥u∥1
subject to  ∥Au - b∥2 ≤ 1.

The code below is efficient, if we assume that the number of rows in A is greater than or equal to the number of columns.

def qcl1(A, b):  
    """  
    Returns the solution u, z of  
 
        (primal)  minimize    || u ||_1  
                  subject to  || A * u - b ||_2  <= 1  
 
        (dual)    maximize    b^T z - ||z||_2  
                  subject to  || A’*z ||_inf <= 1.  
 
    Exploits structure, assuming A is m by n with m >= n.  
    """  
 
    m, n = A.size  
 
    # Solve equivalent cone LP with variables x = [u; v].  
    #  
    #     minimize    [0; 1]’ * x  
    #     subject to  [ I  -I ] * x <=  [  0 ]   (componentwise)  
    #                 [-I  -I ] * x <=  [  0 ]   (componentwise)  
    #                 [ 0   0 ] * x <=  [  1 ]   (SOC)  
    #                 [-A   0 ]         [ -b ]  
    #  
    #     maximize    -t + b’ * w  
    #     subject to  z1 - z2 = A’*w  
    #                 z1 + z2 = 1  
    #                 z1 >= 0,  z2 >=0,  ||w||_2 <= t.  
 
    c = matrix(n*[0.0] + n*[1.0])  
    h = matrix( 0.0, (2*n + m + 1, 1))  
    h[2*n] = 1.0  
    h[2*n+1:] = -b  
 
    def G(x, y, alpha = 1.0, beta = 0.0, trans = ’N’):  
        y *= beta  
        if trans==’N’:  
            # y += alpha * G * x  
            y[:n] += alpha * (x[:n] - x[n:2*n])  
            y[n:2*n] += alpha * (-x[:n] - x[n:2*n])  
            y[2*n+1:] -= alpha * A*x[:n]  
 
        else:  
            # y += alpha * G’*x  
            y[:n] += alpha * (x[:n] - x[n:2*n] - A.T * x[-m:])  
            y[n:] -= alpha * (x[:n] + x[n:2*n])  
 
 
    def Fkkt(W):  
        """  
        Returns a function f(x, y, z) that solves  
 
            [ 0   G’   ] [ x ] = [ bx ]  
            [ G  -W’*W ] [ z ]   [ bz ].  
        """  
 
        # First factor  
        #  
        #     S = G’ * W**-1 * W**-T * G  
        #       = [0; -A]’ * W3^-2 * [0; -A] + 4 * (W1**2 + W2**2)**-1  
        #  
        # where  
        #  
        #     W1 = diag(d1) with d1 = W[’d’][:n] = 1 ./ W[’di’][:n]  
        #     W2 = diag(d2) with d2 = W[’d’][n:] = 1 ./ W[’di’][n:]  
        #     W3 = beta * (2*v*v’ - J),  W3^-1 = 1/beta * (2*J*v*v’*J - J)  
        #        with beta = W[’beta’][0], v = W[’v’][0], J = [1, 0; 0, -I].  
 
        # As = W3^-1 * [ 0 ; -A ] = 1/beta * ( 2*J*v * v’ - I ) * [0; A]  
        beta, v = W[’beta’][0], W[’v’][0]  
        As = 2 * v * (v[1:].T * A)  
        As[1:,:] *= -1.0  
        As[1:,:] -= A  
        As /= beta  
 
        # S = As’*As + 4 * (W1**2 + W2**2)**-1  
        S = As.T * As  
        d1, d2 = W[’d’][:n], W[’d’][n:]  
        d = 4.0 * (d1**2 + d2**2)**-1  
        S[::n+1] += d  
        lapack.potrf(S)  
 
        def f(x, y, z):  
 
            # z := - W**-T * z  
            z[:n] = -div( z[:n], d1 )  
            z[n:2*n] = -div( z[n:2*n], d2 )  
            z[2*n:] -= 2.0*v*( v[0]*z[2*n] - blas.dot(v[1:], z[2*n+1:]) )  
            z[2*n+1:] *= -1.0  
            z[2*n:] /= beta  
 
            # x := x - G’ * W**-1 * z  
            x[:n] -= div(z[:n], d1) - div(z[n:2*n], d2) + As.T * z[-(m+1):]  
            x[n:] += div(z[:n], d1) + div(z[n:2*n], d2)  
 
            # Solve for x[:n]:  
            #  
            #    S*x[:n] = x[:n] - (W1**2 - W2**2)(W1**2 + W2**2)^-1 * x[n:]  
 
            x[:n] -= mul( div(d1**2 - d2**2, d1**2 + d2**2), x[n:])  
            lapack.potrs(S, x)  
 
            # Solve for x[n:]:  
            #  
            #    (d1**-2 + d2**-2) * x[n:] = x[n:] + (d1**-2 - d2**-2)*x[:n]  
 
            x[n:] += mul( d1**-2 - d2**-2, x[:n])  
            x[n:] = div( x[n:], d1**-2 + d2**-2)  
 
            # z := z + W^-T * G*x  
            z[:n] += div( x[:n] - x[n:2*n], d1)  
            z[n:2*n] += div( -x[:n] - x[n:2*n], d2)  
            z[2*n:] += As*x[:n]  
 
        return f  
 
    dims = {’l’: 2*n, ’q’: [m+1], ’s’: []}  
    sol = solvers.conelp(c, G, h, dims, kktsolver = Fkkt)  
    if sol[’status’] == ’optimal’:  
        return sol[’x’][:n],  sol[’z’][-m:]  
    else:  
        return None, None

Example: 1-norm regularized least-squares
As an example that illustrates how structure can be exploited in coneqp(), we consider the 1-norm regularized least-squares problem
minimize  ∥Ax - y∥22 + ∥x∥1

with variable x. The problem is equivalent to the quadratic program

minimize  ∥Ax - y∥22 + 1T u
subject to - u ≼ x ≼ u

with variables x and u. The implementation below is efficient when A has many more columns than rows.

from cvxopt.base import matrix, spdiag, mul, div  
from cvxopt import base, blas, lapack, solvers  
import math  
 
def l1regls(A, y):  
    """  
 
    Returns the solution of l1-norm regularized least-squares problem  
 
        minimize || A*x - y ||_2^2  + || x ||_1.  
 
    """  
 
    m, n = A.size  
    q = matrix(1.0, (2*n,1))  
    q[:n] = -2.0 * A.T * y  
 
    def P(u, v, alpha = 1.0, beta = 0.0 ):  
        """  
            v := alpha * 2.0 * [ A’*A, 0; 0, 0 ] * u + beta * v  
        """  
        v *= beta  
        v[:n] += alpha * 2.0 * A.T * (A * u[:n])  
 
 
    def G(u, v, alpha=1.0, beta=0.0, trans=’N’):  
        """  
            v := alpha*[I, -I; -I, -I] * u + beta * v  (trans = ’N’ or ’T’)  
        """  
 
        v *= beta  
        v[:n] += alpha*(u[:n] - u[n:])  
        v[n:] += alpha*(-u[:n] - u[n:])  
 
    h = matrix(0.0, (2*n,1))  
 
 
    # Customized solver for the KKT system  
    #  
    #     [  2.0*A’*A  0    I      -I     ] [x[:n] ]     [bx[:n] ]  
    #     [  0         0   -I      -I     ] [x[n:] ]  =  [bx[n:] ].  
    #     [  I        -I   -D1^-1   0     ] [zl[:n]]     [bzl[:n]]  
    #     [ -I        -I    0      -D2^-1 ] [zl[n:]]     [bzl[n:]]  
    #  
    # where D1 = W[’di’][:n]**2, D2 = W[’di’][:n]**2.  
    #  
    # We first eliminate zl and x[n:]:  
    #  
    #     ( 2*A’*A + 4*D1*D2*(D1+D2)^-1 ) * x[:n] =  
    #         bx[:n] - (D2-D1)*(D1+D2)^-1 * bx[n:] +  
    #         D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -  
    #         D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:]  
    #  
    #     x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n]  - D2*bzl[n:] )  
    #         - (D2-D1)*(D1+D2)^-1 * x[:n]  
    #  
    #     zl[:n] = D1 * ( x[:n] - x[n:] - bzl[:n] )  
    #     zl[n:] = D2 * (-x[:n] - x[n:] - bzl[n:] ).  
    #  
    # The first equation has the form  
    #  
    #     (A’*A + D)*x[:n]  =  rhs  
    #  
    # and is equivalent to  
    #  
    #     [ D    A’ ] [ x:n] ]  = [ rhs ]  
    #     [ A   -I  ] [ v    ]    [ 0   ].  
    #  
    # It can be solved as  
    #  
    #     ( A*D^-1*A’ + I ) * v = A * D^-1 * rhs  
    #     x[:n] = D^-1 * ( rhs - A’*v ).  
 
    S = matrix(0.0, (m,m))  
    Asc = matrix(0.0, (m,n))  
    v = matrix(0.0, (m,1))  
 
    def Fkkt(W):  
 
        # Factor  
        #  
        #     S = A*D^-1*A’ + I  
        #  
        # where D = 2*D1*D2*(D1+D2)^-1, D1 = d[:n]**-2, D2 = d[n:]**-2.  
 
        d1, d2 = W[’di’][:n]**2, W[’di’][n:]**2  
 
        # ds is square root of diagonal of D  
        ds = math.sqrt(2.0) * div( mul( W[’di’][:n], W[’di’][n:]),  
            base.sqrt(d1+d2) )  
        d3 =  div(d2 - d1, d1 + d2)  
 
        # Asc = A*diag(d)^-1/2  
        Asc = A * spdiag(ds**-1)  
 
        # S = I + A * D^-1 * A’  
        blas.syrk(Asc, S)  
        S[::m+1] += 1.0  
        lapack.potrf(S)  
 
        def g(x, y, z):  
 
            x[:n] = 0.5 * ( x[:n] - mul(d3, x[n:]) +  
                mul(d1, z[:n] + mul(d3, z[:n])) - mul(d2, z[n:] -  
                mul(d3, z[n:])) )  
            x[:n] = div( x[:n], ds)  
 
            # Solve  
            #  
            #     S * v = 0.5 * A * D^-1 * ( bx[:n] -  
            #         (D2-D1)*(D1+D2)^-1 * bx[n:] +  
            #         D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -  
            #         D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:] )  
 
            blas.gemv(Asc, x, v)  
            lapack.potrs(S, v)  
 
            # x[:n] = D^-1 * ( rhs - A’*v ).  
            blas.gemv(Asc, v, x, alpha=-1.0, beta=1.0, trans=’T’)  
            x[:n] = div(x[:n], ds)  
 
            # x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n]  - D2*bzl[n:] )  
            #         - (D2-D1)*(D1+D2)^-1 * x[:n]  
            x[n:] = div( x[n:] - mul(d1, z[:n]) - mul(d2, z[n:]), d1+d2 )\  
                - mul( d3, x[:n] )  
 
            # zl[:n] = D1^1/2 * (  x[:n] - x[n:] - bzl[:n] )  
            # zl[n:] = D2^1/2 * ( -x[:n] - x[n:] - bzl[n:] ).  
            z[:n] = mul( W[’di’][:n],  x[:n] - x[n:] - z[:n] )  
            z[n:] = mul( W[’di’][n:], -x[:n] - x[n:] - z[n:] )  
 
        return g  
 
    return solvers.coneqp(P, q, G, h, kktsolver = Fkkt)[’x’][:n]