[Grundlehren der mathematischen Wissenschaften] Convex Analysis and Minimization Algorithms II Volume 306 ||

x. Conjugacy in Convex Analysis

Prerequisites. Definitions, properties and operations concerning convex sets (Chap. III) and convex functions (Chap. IV); definition of sublinear functions and associated convex sets (Chap. V); definitions of the subdifferential of a finite convex function (§VI.l). Onedimensional conjugacy (§I.6) can be helpful but is not necessary.

Introduction. In classical real analysis, the gradient of a differentiable function f : ]Rn -+ lR plays a key role - to say the least. Considering this gradient as a mapping x 1-+ s (x ) = V f (x) from (some subset X of) lRn to (some subset S of) lRn, an interesting object is then its inverse: to a given s E S, associate the x E X such that s = V f(x). This question may be meaningless: not all mappings are invertible! but could for example be considered locally, taking for X x S a neighborhood of some (xo, So = V f(xo», with V2 f continuous and invertible at Xo (use the local inverse theorem).

Let us skip for the moment all such technicalities. Geometrically, we want to find x such that the hyperplane in lRn x lR defined by the given (s, -1), and passing through (x, f(x», is tangent to gr f at x; the problem is meaningful when this x exists and is unique. Its construction is rather involved but analytically, an amazing fact is that the new mapping x (.) = (V I) -I thus defined is itself a gradient mapping: say x(·) = Vh, with h : S C lRn -+ lR. Even more surprising: this function h has a simple expression, namely

S 3 s 1-+ h(s) = (s, x(s») - f(x(s». (0.1)

To explain this, do a formal a posteriori calculation in (0.1): a differential ds induces the differentials dx and dh, which are linked by the relation

dh = (s, dx) + (ds, x) - (V f(x), dx) = (s, dx) + (ds, x) - (s, dx) = (ds, x)

and this defines x as the gradient of h with respect to s.

Remark 0.1 When V f is invertible, the function

is called the Legendre transform (relative to I). It should never be forgotten that, from the above motivation itself, it is not really the function h which is primarily interesting, but rather its gradient Vh. 0

J.-B. Hiriart-Urruty et al., Convex Analysis and Minimization Algorithms II© Springer-Verlag Berlin Heidelberg 1993

36 X. Conjugacy in Convex Analysis

The gradients of f and of h are inverse to each other by definition, so they establish a reciprocal correspondence:

s = V f(x) {:::::::} x = Vh(s) . (0.2)

In particular, applying the Legendre transform to h, we have to get back f. This symmetry appears in the expression of h itself: (0.1) tells us that, for x and s related by (0.2),

f(x) + h(s) = (s, x) .

Once again, the above statements are rather formal, insofar as they implicitly assume that (V f)-I is well-defined. Convex analysis, however, provides a nice framework to give this last operation a precise meaning.

First of all, observe that the mapping x ~ V f (x) is now replaced by a set-valued mappingx 1---+ of (x) -see Chap. VI. To invert it is to find x such that of (x) contains a given s; and we can accept a nonunique such x: a set-valued (of) -1 will be obtained, but the price has been already paid anyway.

Second, the construction of x(s) is now much simpler: s E of (x) means that o E of (x) - Is} and, thanks to convexity, the last property means that x minimizes f - (s, .) over]Rn. In other words, to find x (s), we have to solve

inf {f(x) - (s, x) : x E ]Rn}; (0.3)

and the Legendre transform - in the classical sense of the term - is well-defined when this problem has a unique solution.

Let us sum up: if f is convex, (0.3) is a possible way of defining the Legendre transform when it exists unambiguously. It is easy to see that the latter holds when f satisfies three properties:

- differentiability - so that there is something to invert; - strict convexity - to have uniqueness in (0.3); - V f(lRn) = ]Rn - so that (0.3) does have a solution for all s E ]Rn; this latter

property essentially means that, when IIxll ~ 00, f(x) increases faster than any linear function: f is J -coercive.

In all other cases, there is no well-defined Legendre transform; but then, the transformation implied by (0.3) can be taken as a new definition, generalizing the initial inversion of V f. We can even extend this definition to nonconvex f, namely to any function such that (0.3) is meaningful! Finally, an important observation is that the infimal value in (0.3) is a concave, and even closed, function of s; this is Proposition IY.2.1.2: the infimand is affine in s, f has little importance, ]Rn is nothing more than an index set.

The concept of conjugacy in convex analysis results from all the observations above. It is often useful to simplify algebraic calculations; it plays an important role in deriving duality schemes for convex minimization problems; it is also a basic operation/for formulating variational principles in optimization (convex or not), with applications in other areas of applied mathematics, such as probability, statistics, nonlinear elasticity, economics, etc.

1 The Convex Conjugate of a Function 37

1 The Convex Conjugate of a Function

1.1 Definition and First Examples

As suggested by (0.3), conjugating a function f essentially amounts to minimizing a perturbation of it. There are two degenerate situations we want to avoid:

- the result is +00 for some s; observe that this is the case if and only if the result is +00 for all s;

- the result is - 00 for all s.

Towards this end, we assume throughout that f : ]Rn ~ ]R U {+oo} (not necessarily convex) satisfies

1 f t= +00, and there is an affine function minorizing f on]Rn .1 (1.1.1)

Note in particular that this implies f (x) > -00 for all x. As usual, we use the notation doml := {x : I(x) < +oo} #- 0. We know from Proposition IY.1.2.1 that (1.1.1) holds for example if 1 E Conv]Rn.

Definition 1.1.1 The conjugate ofa function 1 satisfying (1.1.1) is the function j* defined by

]Rn 3 s ~ I*(s) := sup {(s, x) - I(x) : x E dom f}. (1.1.2)

For simplicity, we may also let x run over the whole space instead of dom I. The mapping 1 ~ 1* will often be called the conjugacy operation, or the

Legendre-Fenchel transform. 0

A very first observation is that a conjugate function is associated with a scalar product on ]Rn. Of course, note also with relation to (0.3) that

I*(s) = - inf {f(x) - (s, x) : x E dom f}.

As an immediate consequence of (1.1.2), we have for all (x, s) E dom 1 x ]Rn

I*(s) + I(x) ~ (s, x) . (1.1.3)

Furthermore, this inequality is obviously true if x ¢ dom I: it does hold for all (x, s) E ]Rn x ]Rn and is called Fenchel's inequality. Another observation is that j*(s) > -00 for all s E ]Rn; also, if x ~ (so, x) + ro is an affine function smaller than I, we have

- 1* (so) = inf [f(x) - (so, x)] ~ ro, i.e. I*(so) ~ - ro < +00. x

In a word, j* satisfies (1.1.1). Thus, dom j* - the set where j* is finite - is the set of slopes of all the possible

affine functions minorizing lover ]Rn. Likewise, any s E dom 1 is the slope of an affine function smaller than j*.


Theorem 1.1.2 For f satisfYing (1. 1.1), the conjugate f* isaclosedconvexfunction: f* E Conv lRn.

PROOF. See Example rv.2.1.3. o

Example 1.1.3 (Convex Quadratic Functions) Let Q be a symmetric positive definite linear operator on lRn, b E lRn and consider

f(x) := 4(x, Qx) + (b, x) for all x E lRn . (1.1.4)

A straightforward calculation gives the optimal x = Q-I(S - b) in the defining problem (1.1.2), and the resulting f* is

f*(s) = 4(s - b, Q-I (s - b») for all s E lRn.

In particular, the function 1/211 • 112 is its own conjugate. Needless to say, the Legendre transformation is present here, in a particularly

simple setting: V f is the affine mapping x 1-+ Qx +b, its inverse is the affine mapping s 1-+ Q-I(S - b), the gradient of f*. What we have done is a parametrization oflRn

via the change of variable s = V f(x); with respect to the new variable, f is given by

f(x(s» = 4(Q-' (s - b), s - b) + (b, Q-I (s - b») .

Adding f* (s) on both sides gives

f(x(s» + f*(s) = (Q-I(s - b), s) = (x(s), s),

which illustrates (0.1). Taking b = 0 and applying Fenchel's inequality (1.1.3) gives

(x, Qx) + (s, Q-I s) ~ 2(s, x) for all (x, s),

a generalization of the well-known inequality (obtained for Q = cI, c > 0):

cllxll2 + ~lIsll2 ~ 2(s, x) for all c > 0 and x, s E lRn. o

When Q in the above example is merely positive semi-definite, a meaning can still be given to (V f) -I provided that two problems are taken care of: first, s must be restricted to V f(lRn), which is the affine manifold b + 1m Q; second, V f(x + y) = V f (x) for all Y E Ker Q, so (V f) -I (s) can be defined only up to the subspace Ker Q. Using the definition of the conjugate function, we obtain the following:

Example 1.1.4 (Convex Degenerate Quadratic Functions) Take the convex quadratic function (1.1.4), but with Q symmetric positive semi-definite. The supremum in (1.1.2) is finite only if s - b E (Ker Q).l, i.e. s - b E 1m Q; it is attained at an x such that s - V f (x) = 0 (optimality condition VI.2.2.1). In a word, we obtain

f* (s) = { too if s ¢ ~ + 1m Q , 2(x, s - b) otherwtse,


where x is any element satisfying Qx + b = s. This formulation can be condensed to

f*(Qx + b) = !(x, Qx) for all x E IRn ,

which is one more illustration of(O.1): add f(x) to both sides and obtain

f(x) + f*(Qx + b) = (x, Qx + b) = (x, V f(x»).

It is also interesting to express f* in terms of a pseudo-inverse: for example, Qdenoting the Moore-Penrose pseudo-inverse of Q (see §A.3.4), f* can be written

* { +00 if s \l b + 1m Q , f (s)= !(s-b,Q-(s-b») ifsEb+lmQ. o

In the above example, take b = 0 and let Q = PH be the orthogonal projection onto a subspace H oflRn. Then 1m Q = Hand Q- = Q, so

* {+oo if s \l H , (PH) (s) = !lIs ll 2 if s E H.

Another interesting example is when Q is a rank-one operator, i.e. Q = UU T, with 0 =f. U E IRn (we assume the usual dot-product for (', .). Then 1m Q = lRu and, for x Elm Q, Qx = lIull 2x. Therefore,

( UU T)\s) = { 211~1I2S if sand U are collinear , +00 otherwise.

Example 1.1.5 Let Ie be the indicator function of a nonempty set C c IRn. Then

(Ic)*(s) = sup [(s, x) - Ic(x)] = sup (s, x) X EdomIe XEe

is just the support function of C. If C is a closed convex cone, we conclude from Example V.2.3.1 that (Ic)* is the indicator of its polar Co. If C is a subspace, (Ic)* is the indicator of its orthogonal C.l.

If C = IRn, Co = {O}; Ie == 0 and the conjugate of Ie is 0 at 0, +00 elsewhere (this is the indicator of {O}, or the support oflRn). A similar example is the nonconvex function x f-+ f(x) = Illxlll l / 2, where III . III is some norm. Then /*(0) = 0; but if s =f. 0, take x of the form x = ts to realize that

f*(s) ~ sup [tllsll 2 - Jt1Si] = +00. t~O

In other words, f* is still the indicator function of {O}. The conjugacy operation has ignored the difference between f and the zero function, simply because f increases at infinity more slowly than any linear function. 0


1.2 Interpretations

Geometrically, the computation of !* can be illustrated in the graph-space Rn x R. For given s E Rn, consider the family of affine functions x ~ (s, x) - r, parametrized by r E R. They correspond to affine hyperplanes Hs,r orthogonal to (s, -1) E Rn+l; see Fig. 1.2.1. From (1.1.1), Hs,r is below gr 1 whenever (s, r) is properly chosen, namely s E dom!* and r is large enough. To construct !*(s), we lift Hs,r as much as possible subject to supporting gr I. Then, admitting that there is contact at some (x, I(x», we write

(s, x) - r = I(x) or rather r = (s, x) - I(x) ,

to see that r = I*(s). This means that the best Hs.r intersects the vertical axis {OJ x R at the altitude - !*(s).

Fig.I.2.1. Computation of f* in the graph-space

Naturally, the horizontal hyperplanes Ho.r correspond to minimizing I:

- 1*(0) = inf {f(x) : x ERn} .

The picture illustrates another definition of 1*: being normal to Hs•r , the vector (s, -1) is also normal to gr 1 (more precisely to epi f) at the contact when it exists.

Proposition 1.2.1 There holds for all x E Rn

I*(s) = O"epi!(s, -1) = sup {(s, x) - r : (x, r) E epi f}.

It follows that the support function of epi 1 has the expression

PROOF. In (1.2.1), the right-most term can be written

ifu>O, ifu=O, if u < O.

sup sup [(s, x) - r] = sup [(s, x) - I(x)] x r ~ !(x) x

(1.2.1)

(1.2.2)


and the first equality is established. As for (1.2.2), the case U < 0 is trivial; when U > 0, use the positive homogeneity of support functions to get

uepif(s, -u) = uUepif(ks, -I) = uf*(ks) ;

finally, for U = 0, we have by definition

Uepif(s,O) = sup {{s, x) : (x, r) e epi f for some r e lR},

and we recognize udomf(s). o

Assume f e Conv lRn. This result, illustrated in Fig. 1.2.1, confirms that the contact-set between the optimal hyperplane and epi f is the face of epi f exposed by the given (s, -1). From (1.2.2), we see that Uepi f and the perspective-function of f* (§IY.2.2) coincide for U =F 0 - up to the change of variable U ~ -u. As a closed function, a epi f therefore coincides (still up to the change of sign in u) with the closure of the perspective of f*. As for u = 0, we obtain a relation which will be used on several occasions:

Proposition 1.2.2 For f e Conv lRn,

udomf(s) = uepif(s, 0) = (f*)~(s) for all s e lRn . (1.2.3)

PROOF. Use direct calculations; or see Proposition IY.2.2.2 and the calculations in Example Iy'3.2.4. 0

The additional variable u introduced in Proposition 1.2.1 gives a second geometric construction: Fig. 1.2.1 plays the role of Fig. Y.2.1.1, knowing that lRn is replaced here by lRn+l. Here again, note in passing that the closed convex hull of epi f gives the same f*. Now, playing the same game as we did for Interpretation Y.2.1.6, f* can be looked at from the point of view of projective geometry, lRn being identified with lRn x {-I}.

Consider the set epi f x {-I} C lRn x lR x lR, i.e. the copy of epi f, translated down vertically by one unit. It generates a cone Kf C lRn x lR x lR:

Kf:= {t(x,r, -1) : t > 0, (x,r) e epif}. (1.2.4)

Now take the polar cone (K f)O c lRn x lR x lR. We know from Interpretation Y.2.1.6 that it is the epigraph (in lRn+ 1 x lR!) of the support function of epi f. In view of Proposition 1.2.1, its intersection with lRn x {-I} x lR is therefore epi f*. A short calculation confirms this:

(Kf)O = {(s,a,,8)elRn xlRxlR: t{s,x)+tar-t,8~O

for all (x, r) e epi f and t > O}

= {(s, a,,8) : (s, x) + ar ~ ,8 for all (x, r) e epi f}.

Imposing a = -1, we just obtain the translated epigraph of the function described in (1.2.1).


K pnina/ -.---.------~

epilx(-l}

Fig. 1.2.2. A projective view of conjugacy

Figure 1.2.2 illustrates the construction, with n = 1 and epi f drawn horizontally; this epi f plays the role of S in Fig. Y.2.l.2. Note once more the symmetry: K f [resp. (K f) 0] is defined in such a way that its intersection with the hyperplane lRn x lR x { -1 } [resp. lRn x {-I} x lR] is just epi f [resp. epi 1*].

Finally, we mention a simple economic interpretation. Suppose ]Rn is a set of goods and its dual (]Rn)'1< a set of prices: to produce the goods x costs f(x), and to sell x brings an income (s, x). The net benefit associated with x is then (s, x) - f(x), whose supremal value 1* (s) is the best possible profit, resulting from the given set of unit prices s. Incidentally, this last interpretation opens the way to nonlinear conjugacy, in which the selling price would be a nonlinear (but concave) function of x.

Remark 1.2.3 This last interpretation confirms the warning already made on several occasions: ]Rn should not be confused with its dual; the arguments of f and of f* are not comparable, an expression like x + s is meaningless (until an isomorphism is established between the Euclidean space ]Rn and its dual).

On the other hand, f -values and f* -values are comparable, indeed: for example, they can be added to each other- which is just done by Fenchel's inequality (1.1.3)! This is explicitly due to the particular value "-I" in (1.2.1), which goes together with the "-I" of(1.2.4). 0

1.3 First Properties

Some properties of the conjugacy operation f 1-+ 1* come directly from its definition.

(a) Elementary Calculus Rules Direct arguments prove easily a first result:

Proposition 1.3.1 The functions f, fj appearing below are assumed to satisfy (1.1.1).

(i) The conjugate of the function g(x) := f(x) + a is g*(s) = I*(s) - a.


(ii) With a > O,theconjugateofthefunctiong(x) :=al(x)isg*(s) =af*(s/a). (iii) With a #- 0, the conjugate ofthefunction g(x) := I(ax) is g*(s) = f*(s/a). (iv) More generally: if A is an invertible linear operator, (f 0 A)* = f* 0 (A -1)*. (v) The conjugate of the function g(x) := I(x - xo) is g*(s) = f*(s) + (s, xo).

(vi) The conjugate of the function g(x) := I(x) + (so, x) is g*(s) = f*(s - so). (vii) If II ~ f2, then ft ~ It (viii) "Convexity" of the conjugation: ifdom/l n domh #- 0 and a E ]0,1[,

[all + (1- a)h]* ~ aft + (1- a)N;

(ix) The Legendre-Fenchel transform preserves decomposition: with

m lR,n := lR,n. X ••• x lR,nm 3 X ~ I(x) := L Ij(xj)

j=1

and assuming that lR,n has the scalar product of a product-space,

m 1*(sJ, ... , sm) = L 1j*(Sj)'

j=1 o

Among these results, (iv) deserves comment: it gives the effect of a change of variables on the conjugate function; this is of interest for example when the scalar product is put in the form (x, y) = (AX)T Ay, with A invertible. Using Example 1.1.3, an illustration of (vii) is: the only function f satistying f = f* is 1/211' 112 (start from Fenchel's inequality); and also, for symmetric positive definite Q and P:

Q ~ P => p-I ~ Q-I ;

and an illustration of (viii) is:

[aQ + (1- a)prl ~ aQ-1 + (1 - a)p-I .

Our next result expresses how the conjugate is trarIsformed, when the starting function is restricted to a subspace.

Proposition 1.3.2 Let I satisfY (1.1.1), let H be a subspace oflR,n, and call PH the operator of orthogonal projection onto H. Suppose that there is a point in H where I isfinite. Then 1+ IH satisfies (1.1.1) and its conjugate is

(1.3.1)

PROOF. When Y describes lR,n, PH Y describes H so we can write, knowing that PHis symmetric:

(f + IH)*(S) := sup {(s, x) - I(x) : x E H} = sup {(s, PHY) - I(PHY) : Y E lR,n} = sup {(PHS, y) - I(PHY) : Y E lR,n}. o


When conjugating f + IH, we disregard f outside H, i.e. we consider f as a function defined on the space H; this is reflected in the "f 0 PH" -part of (1.3.1). We then obtain a "partial" conjugate, say j* E Conv H; the last "oPH" of(1.3.1) says that, to recover the whole conjugate (f + IH)*, which is a function of Conv lRn, we just have to translate horizontally the graph of j* along H 1. .

Remark 1.3.3 Thus, if a subspace H and a function g (standing for f + IH) are such that dom g C H, then g* (. + s) = g* for all s E H 1.. It is interesting to note that this property has a converse: if, for some Xo E domg, we have g(xo + y) = g(xo} for all y E H, then dom g* C H 1.. The proof is immediate: take Xo as above and, for s ¢ H 1., take Yo E H such that (s, Yo) = a ::f:. 0; there holds (s, AYO) - g(xo + AYO) = Aa - g(xo) for all A E lR; hence

g*(s) ~ suP~J(s, Xo + AYO) - g(xo + AYO)] (s, xo) + sup).[Aa - g(xo)] = +00. o

The above formulae can be considered from a somewhat opposite point of view: suppose that lRn, the space on which our function f is defined, is embedded in some larger space lRn+p. Various corresponding extensions of f to the whole oflRn+p are then possible. One is to set f := +00 outside lRn (which is often relevant when minimizing j), the other is the horizontal translation

f(x + y) = f(x) for all Y E lRP ;

these two possibilities are, in a sense, dual to each other (see Proposition 1.3.4 below). Such a duality, however, holds only if the extended scalar product preserves the structure oflRn x lRP as a product-space: so is the case of the decomposition H + H 1. appearing in Proposition 1.3.2, because

(s, x) = (PHS, PHX) + (PHl.S, PH.lx ).

Considering affine manifolds instead of subspaces, we mention the following useful result:

Proposition 1.3.4 For 1 satisfying (1.1.1), let a subspace V contain the subspace parallel to affdoml and set U := Vol. For any Z E affdoml and any s E IRn

decomposed as s = su + sv, there holds

I*(s) = (su, z) + I*(sv)·

PROOF. In (1.1.2), the variable x can range through z + V :J aff dom I: I*(s) = sUPveV[(SU + sv, z + v) - I(z + v)]

= (su, z) + sUPveY[(SV, z + v) - I(z + v)] = (su, z) + j*(sv)· o

(b) The Biconjugate of a Function What happens if we take the conjugate of j* again? Remember that j* satisfies automatically (1.1.1) if 1 does. We can therefore compute the biconjugate function of I: for all x E IRn,

I**(x) := (f*)*(x) = sup {(s, x) - I*(s) : s E IRn}. (1.3.2)

This operation appears as fundamental. The function j** thus defined is the "closeconvexification" of I, in the sense that its epigraph is the closed convex hull of epi I:


Theorem 1.3.5 For f satisfYing (1.1.1), the jUnction f** of (1.3 .2) is the pointwise supremum of all the affine functions on lR.n majorized by f. In other words

epi f** = co (epi f) . (1.3.3)

PROOF. Call 17 C lR.n x lR. the set of pairs (s, r) defining affine functions x 1-+ (s, x}-r majorized by f:

(s, r) E 17 {:::::::} f(x) ~ (s, x) - r for all x E lR.n

{:::::::} r~ sup{{s,x}-f(x): x ElR.n}

{:::::::} r ~ f*(s) (and s E dom f*!).

Then we obtain, for x E lR.n,

sUPCs,r)EL'[(s,x}-r] = sup{(s,x}-r: sEdomf*, -r~ -f*(s)} = sup {(s, x) - f*(s) : s E domf*} = f**(x).

Geometrically, the epigraphs of the affine functions associated with (s, r) E 17 are the (non-vertical) closed half-spaces containing epi f. From §IY.2.5, the epigraph of their supremum is the closed convex hull of epi f, and this proves (1.3.3). 0

Note: the biconjugate of an f E Conv lR.n is not exactly f itself but its closure: f** = cl f. Thanks to Theorem 1.3.5, the general notation

cof:= f** (1.3.4)

can be - and will be - used for a function simply satisfying (1.1.1); it reminds one more directly that f** is the closed convex hull of f:

f**(x) = sup {(s, x) - r : (s, y) - r ~ f(y) for all y E lR.n} . (1.3.5) r,s

Corollary 1.3.6 If g is a function satisfYing co f ~ g ~ f, then g* = f*. The function f is equal to its biconjugate f** if and only iff E Conv lR.n.

PROOF. Immediate. o

Thus, the conjugacy operation defines an involution on the set of closed convex functions. When applied to strictly convex quadratic functions, it corresponds to the inversion of symmetric positive definite operators (Example 1.1.3 with b = 0). When applied to indicators of closed convex cones, it corresponds to the polarity correspondence (Example 1.1.5 - note also in this example that the biconjugate of the square root of a norm is the zero-function). For general f E Conv Rn , it has a geometric counterpart, also based on polarity, which is the correspondence illustrated in Fig. 1.3.1. Of course, this involution property implies a lot of symmetry, already alluded to, for pairs of conjugate functions.


( . )*

f E ConvRn - f' E COri'V Rn --( . )*

Fig. 1.3.1. The *-involution

(c) Conjugacy and Coercivity A basic question in (1.1.2) is whether the supremum is going to be +00. This depends only on the behaviour of f at infinity, so we extend to non-convex situations the concepts seen in Definition Iy'3.2.6:

Definition 1.3.7 A function f satisfying (1.1.1) is said to be O-coercive [resp. 1-coercive] when

lim f(x) = +00 [resp. lim f(x) = +00] . 0 IIxll-Hoo IIxll-Hoo IIxll

Proposition 1.3.8 If f satisfying (1.1.1) is I-coercive, then !*(s) < +00 for all s eRn.

PROOF. For given s, the l-coercivity of f implies the existence of a number R such that

IIxll ~ R ==> f(x) ~ IIxll(lIsll + 1),

so that we have in (1.1.2)

(s, x) - f(x)::S; - IIxli for all x such that IIxli ~ R,

hence sup{(s,x)-f(x): IIxll~R}::S; -R.

On the other hand, (1.1.1) implies an upper bound

sup{(s,x)-f(x): IIxll::S;R}::S;M.

For a converse to this property, we have the following:

Proposition 1.3.9 Let f satisfy (1.1.1). Then

(i) Xo e intdomf =>!* - (xo,') is O-coercive; (ii) in particular, iff is finite over Rn, then !* is I-coercive.

o

PROOF. We know from (1.2.3) that O'domf = U*)6o so, using Theorem Y.2.2.3(iii), Xo e int dom f C int( co dom f) implies

U*)~(s) - (xo, s) > 0 for all s#-O.

By virtue of Proposition IY.3.2.5, this means exactly that !* - (xo, .) has compact sublevel-sets; (i) is proved.

Then, as demonstrated in Definition IY.3.2.6, O-coercivity of!* - (xo, .) for all Xo means l-coercivity of f*. 0

Piecing together, we see that the l-coercivity of a function f implies that !* is finite everywhere, and this in turn implies l-coercivity of co f.


Remark 1.3.10 If we assume in particular f E Conv]Rn, (i) and (ii) become equiv-alences:

Xo E int dom f <==> f* - (xo, .) is O-coercive ,

dom f = ]Rn <==> f* is I-coercive. o

1.4 Subdifferentials of Extended-Valued Functions

For a function f satisfying (1.1.1), consider the following set:

af(x) := {s E]Rn : f(y) ~ f(x) + (s, y - x) for all y E ]Rn} . (1.4.1)

When f happens to be convex and finite-valued, this is just the subdifferential of f at x, defined in VI.12.1; but (1.4.1) can be used in a much more general framework. We therefore keep the terminology subdifferential for the set of (1.4.1), and subgradients for its elements. Note here that af(x) is empty if x ¢ domj: take y E domf in (1.4.1).

Theorem 1.4.1 For f satisfying (1.1.1) and af defined by (1.4.1), s E af(x) if and only if

f*(s) + f(x) - (s, x) = 0 (or ~ 0). (1.4.2)

PROOF. To say that s is in the set (1.4.1) is to say that

(s, y) - f(y) ~ (s, x) - f(x) for all y E domf ,

i.e. f*(s) ~ (s, x) - f(x) ;

but this is indeed an equality, in view of Fenchel's inequality (1.1.3). o

As before, af(x) is closed and convex: it is a sublevel-set of the closed convex function f* - (., x), namely at the level - f (x). A subgradient of f at x is the slope of an affine function minorizing f and coinciding with f at x; af(x) can therefore be empty: for example if epi f has a vertical tangent hyperplane at (x, f (x»; or also if f is not convex-like near x.

Theorem 1.4.2 Let f E Conv]Rn. Then af (x) =F 0 whenever x E ri dom f.

PROOF. This is Proposition IY.1.2.1. o

When af(x) =F 0, we obtain a particular relationship at x between f and its convexified version co f:

Proposition 1.4.3 For f satisfying (1.1.1), thefollowingproperties hold:

af(x) =F 0 ~ (co f)(x) = f(x) ;

co f ~ g ~ f and g(x) = f(x) ===> ag(x) = af(x) ;

s E af(x) ~ x E af*(s).

(1.4.3)

(1.4.4)

(1.4.5)


PROOF. Let s be a subgradient of I at x. From the definition (1.4.1) itself, the function Y ~ is(Y) := I(x) + (s, Y - x) is affine and minorizes I, hence is ~ co I ~ I; because is (x) = I(x), this implies (1.4.3).

Now, s E 81(x) if and only if

I*(s) + I(x) - (s, x) = o.

From our assumption, /* = g* = (co f)* (see Corollary 1.3.6) and g(x) = I(x); the above equality can therefore be written.

g*(s) + g(x) - (s, x) = 0

which expresses exactly that s E 8g(x), and (1.4.4) is proved. Finally, we know that /** = co I ~ I; so, when s satisfies (1.4.2), we have

I*(s) + /**(x) - (s, x) = /*(s) + (co f)(x) - (s, x) ~ 0,

which means x E 8/*(s): we have just proved (1.4.5). o

Among the consequences of (1.4.3), we note the following sufficiency condition for convexity: if 81 (x) is nonempty for all x E JRn, then I is convex and finite-valued on JRn . Another consequence is important:

Corollary 1.4.4 If I E Conv JRn, the following equivalences hold:

I(x) + I*(s) - (s, x) = 0 (or ~ 0) {:::::} s E 81(x) {:::::} x E 81*(s) .

PROOF. This is a rewriting of Theorem 1.4.1, taking into account (1.4.5) and the symmetric role played by I and /* when I E Conv JRn . 0

If s E 81 (x) (which in particular implies x E dom f), the property s E dom /* comes immediately; beware that, conversely, dom 1* is not entirely covered by 81(JRn ): take I(x) = expx, for which /*(0) = 0; but 0 is only in the closure of If (JR).

Even though it is attached to some designated x, the concept of subdifferential, as defined by (1.4.1), is global, in the sense that it uses the values of Ion the whole of JRn . For example,

inf {f(x) : x E JRn} = - 1*(0)

and obviously

x minimizes I satisfying (1.1.1) {:::::} 0 E 81 (x) .

Then a consequence of Corollary 1.4.4 is:

Argmin/(x) = 81*(0) if IE ConvJRn . (1.4.6) XElRn


1.5 Convexification and Subdifferentiability

For a function I satisfying (1.1.1), the biconjugate defined in § l.3(b) is important in optimization. First of all, minimizing I or co I is almost the same problem:

- As a consequence of Corollary 1.3.6, we have the equality in]R U {-oo}

inf (f(x) : x E ]Rn} = inf {(co f)(x) : x E ]Rn}.

- We also have from Proposition 1.4.3 that the minimizers of I minimize co I as well; and since the latter form a closed convex set,

co(Argminf) C Argmin(co f). (1.5.1)

This inclusion is strict: think of I (x) = Ix 11 /2. Equality holds under appropriate assumptions, as can be seen from our analysis below (see Remark 1.5.7).

The next results relate the smoothness of I and of co I.

Proposition 1.5.1 Suppose I satisfying (1.1.1) is Gateaux differentiable at x, and has at x a nonempty subdifferential in the sense of (1.4.1). Then

a/(x) = {Y'/(x)} = a(co f)(x).

PROOF. Let s E al(x): for all d E ]Rn and t > 0,

I(x + td) - I(x) ~ (s, d) . t

Let t -I.- 0 to obtain (Y'/(x), d) ~ (s, d) for all d, and this implies Y'/(x) = s. The second equality then follows using (1.4.3) and (1.4.4). 0

For a given x to minimize a differentiable I, a necessary condition is "Y'I (x) = 0". A natural question is then: what additional condition ensures that a stationary point is a global minimum of I? Clearly, the missing condition has to be global: it involves the behaviour of I on the whole space. The conjugacy correspondence turns out to be an appropriate tool for obtaining such a condition.

Corollary 1.5.2 Let I be Gateaux differentiable on ]Rn. Then, x is a global minimum of I on ]Rn if and only if

(i) Y'/(x) = 0 and (ii) (co f)(x) = I(x).

In such a case, co I is differentiable at x and Y'(co f)(x) = o.

PROOF. Let x minimize I; then a/(x) is nonempty, hence (ii) follows from (1.4.3); furthermore, a(co f)(x) reduces to {OJ (Proposition 1.5.1).

Conversely, let x satisfy (i), (ii); because the differentiable function I is finite everywhere, the convex function co I is such. Then, by virtue of Proposition 1.5.1, o E a (co f) (x); we therefore obtain immediately that x minimizes I on ]Rn. 0


Thus, the global property (ii) is just what is missing in the local property (i) for a stationary point to be a global minimum. One concludes for example that differentiable functions whose stationary points are global minima are those f for which

v f(x) = 0 :::::} (co f)(x) = f(x).

We turn now to a characterization of the subdifferential of co f in terms· of that of f, which needs the following crucial assumption:

f satisfies (1.1.1), and is lower semi-continuous and I-coercive. (1.5.2)

Lemma 1.5.3 For f satisfying (1.5.2), co(epi f) is a closed set.

PROOF. Take a sequence {Xk, rk} in co(epi f) converging to (x, r) for k ---+ +00; in order to establish (x, r) E co(epi f), we will prove that (x, p) E co(epi f) for some p~r.

By definition of a convex hull in lRn+l , there are n + 2 sequences {xi, ri} with

f(xi) ~ rk, and a sequence {ak} in the unit simplex L1n+2 such that

n+2 (Xk, rk) = L ai (xi, rk) .

i=1

[Step 1] For each i, the sequence {airk} is bounded: in fact, because of (1.5.2), f is bounded from below, say by JL; then we write

rk ~airk+ Latf(xi> ~airk+ (l-aOJL, j=/:i

and our claim is true because Irk} and {ail are bounded.

The sequences Irk} are also bounded from below by JL. Now, if some Irk} is not bounded from above, go to Key Step; otherwise go to Last Step.

[Key Step] We proceed to prove that, if Irk} is not bounded from above for some index i, the corresponding sequence can be omitted from the convex combination. Assume without loss of generality that 1 is such an index and extract a subsequence if necessary, so rl ---+ +00. Then

this is clear if {xl} is bounded; if not, it is a consequence ofl-coercivity. Remembering Step 1, we thus have the key properties:

alrl Iial xIII - k k ---+ 0

k k - rl:Jllxlll . ak ---+ 0,

Now, for each k, define 13k E L1n+1 by


We have

ai+1 11~ := -1 k 1 for i = 1, ... , n + 1.

-ak

n+1 1 L l1~x~ = --I (Xk - akxk) ~ X, i=1 l-ak

n+1 1 1 JL ~ LI1~r~ = --I (rk - akrk) ~ --I (rk - akJL) ~ r.

i=1 l-ak l-ak

Let us summarize this key step: starting from the n + 2 sequences of triples {al, x~, r~}, we have eliminated one having {r~} unbounded, and thus obtained.e =

n + 1 sequences of triples {11~, xl, r~} satisfying

l l

11k E Lll' LI1~xl ~ x, LI1~r~ ~ p ~ r. (1.5.3) i=1 i=1

Execute this procedure as many times as necessary, to end up with .e ~ 1 sequences of triples satisfying (1.5.3), and all having {r~} bounded.

[Last Step] At this point of the proof, each {r~} is bounded; fromcoercivity of f, each

{x~} is bounded as well. Extracting subsequences if necessary, we are therefore in the

following situation: there are sequences of triples {11~, x~, r~}, i = 1, ... ,.e ~ n + 2, satisfying

11k E Lll' 11k ~ 11 E Lll ;

f(xl) ~ r~, xl ~ xi, r~ ~ ri for i = 1, ... ,.e; l l l

LI1~x~ ~ x, LI1~r~ ~ Ll1iri = p ~ r. i=1 i=1 i=1

Because f is lower semi-continuous, f(x i ) ~ ri for i = 1, ... ,.e and the definition (lY.2.5.3) of co f gives

l l (cof)(x)~ Ll1if(xi)~ Ll1iri =p~r.

i=1 i=1

In a word, (x,r) E epicof.

This c10sedness property has important consequences:

Proposition 1.5.4 Let f satisfy (1.5.2). Then

(i) co f = co f (hence co f E Conv]Rn).

o

(ii) For any x E dom co f = co dom f, there are Xj E dom f and convex multipliers aj for j = 1, ... , n + 1 such that

n+1 x = Lajxj

j=1

n+1 and (cof)(x) = Lajf(xj).

j=1


PROOF. By definition, co I is the function whose epigraph is the closure of co epi I; but the latter set is already closed. Since the relations

coepi Ie epico Ie epico 1= coepil

are always true, (i) is proved. Using the definition co I = II of Proposition IY.2.S.1, and knowing that co epi I

is closed, we infer that the point (x, co I (x)) is on the boundary of co epi I. Then, invoking Proposition 111.4.2.3, we can describe (x, co I(x)} as a convex combination of n + I elements in epi I:

n+1 (x,co/(x}) = L,aj(Xj,rj} ,

j=1

with I(xj} :::;; rj. Actually each rj has to be I(xj} if the corresponding aj is positive, simply because

n+1 L,aj (Xj' I(xj}) E coepi/, j=1

and, again from the definition of co I (x):

n+1 n+1 L,ajrj = co/(x}:::;; L,aj/(xj}. o j=1 j=1

The combinations exhibited in (ii) are useful for an explicit calculation of co I and a (co f), given in the next two results.

Theorem 1.5.5 Let I satisfy (1.1.1). For a given x E co dom I, suppose there exists a family {Xj, aj} as described in Proposition l.5.4(ii); set

J := {j : aj > O} .

Then

(i) I(xj} = (co f)(xj}for all j E J, (ii) co I is affine on the polyhedron P := co {Xj: j E J}.

PROOF. [(0] The function co I is convex and minorizes I:

n+l n+l (co f)(x}:::;; L, aj(co I)(xj) :::;; L, aj/(xj} = (co f)(x);

j=l j=1

following an argument already seen on several occasions, this implies

and (i) is proved.


[(ii)] Consider the affine function (here {fJj} is a set of convex multipliers)

P 3 X' = LfJjXj H- l(x') := LfJj/(Xj). jeJ jeJ

In view of (i), the convexity of co / implies that co / ~ l on P, with equality at the givenx.

Now take x' #- x in P. The affine line passing through x and x' cuts rbd P at two points y) and Y2, both different from x (since the latter is in ri P). Then write (see Fig. 1.5.1)

x = fJx' + (l - fJ)y) with fJ e ]0, 1[.

By convexity,

(co f)(x) ~ fJ(co f)(x') + (l - fJ)(co /)(y) ~ fJl(x') + (l :.... fJ)l(y) = l(x).

Since (co f) (x) = l (x ), this is actually a chain of equalities:

[because co f :::; l ]

[because l is affine]

fJ[(co f)(x') - l(x')] + (1 - fJ)[(co /)(y) - l(y)] = o. Once again, we have two nonnegative numbers having a zero convex combination; they are both zero and (ii) is proved. 0

Fig. 1.5.1. Affinity of a convex hull

Theorem 1.5.6 Under the hypotheses and notations of Theorem 1.5.5,

a (co f)(x) = n a/(Xj); jeJ

(1.5.4)

Vs e a (co f)(x), (s, x) - (co f)(x) = (s, Xj) - /(Xj) for all j e J.

PROOF. A subgradient of co / at x is an s characterized by

(co f)*(s) + (co f)(x) - (s, x) = 0 -<==:} /*(s) + LjeJ aj /(Xj) - (s, LjeJ ajxj) = 0 -<==:} LjeJ aj[/*(s) + /(Xj) - (s, Xj)] = 0 -<==:} /*(s)+/(Xj)-(s,Xj)=O foralljeJ.

[Theorem 1.4.1]

[(co f)* = /*]

[Fenchel (1.1.3)]

The last line means precisely that s e a/(Xj) for all j e J; furthermore, we can write

/(Xj) - (s,Xj) = -/*(s) = -(cof)*(s) = (cof)(x) - (s,x). 0

Note in the right-hand side of (1.5.4) that each a/(Xj) could be replaced by a (co f)(Xj): this is due to (1.4.4).


Remark 1.5.7 As a supplement to (1.5.1), the above result implies the following: for f satisfying (1.5.2), Argmin f is a compact set (trivial), whose convex hull coincides with Argmin(co f). Indeed, use (1.5.4): an x E Argmin(co f) is characterized as being a convex combination of points {Xj}jeJ such that 0 E af(xj), i.e. these Xj minimize f. 0

Corollary 1.5.8 Let the function f : lRn ~ lR be lower semi-continuous, Gateaux differentiable, and l-coercive. Then co f = co f is continuously differentiable on lRn; furthermore,

V(cof)(X) = Vf(xj) for all j E J,

where we have used the notation of Theorem 1.5.5.

(1.5.5)

PROOF. Our f satisfies (1.5.2), so Proposition 1.5.4(i) directly gives co f = co f. The latter function is finite everywhere, hence it has a nonempty subdifferential at any x (Theorem 1.4.2): by virtue of (1.5.4), all the af(xj)'s are nonempty. Together with Proposition 1.5.1, we obtain that all these af(xj)'s are actually the same singleton, described by (1.5.5). Finally, the continuityofV(co f) follows from Theorem VI.6.2.4.

o

It is worth mentioning that, even if we impose more regularity on f (say COO), co f as a rule is not C2•

2 Calculus Rules on the Conjugacy Operation

The function f, whose conjugate is to be computed, is often obtained from some other functions !;, whose conjugates are known. In this section, we develop a set of calculus rules expressing f* in terms of the (fi)* (some rudimentary such rules were already given in Proposition 1.3.1).

2.1 Image of a Function Under a Linear Mapping

Given a function g : lRm ~ lR U {+oo} satisfying (1.1.1), and a linear mapping A : lRm ~ lRn, we recall that the image of g under A is the function defined by

lRn 3 x ~ (Ag)(x) := inf {g(y) : Ay = X}. (2.1.1)

Letg* be associated with a scalar product (., ·)m inlRm; we denote by (., ·)n the scalar product in lRn, with the help of which we want to define (Ag)*. To make sure that Ag satisfies (1.1.1), some additional assumption is needed: among all the affine functions minorizing g, there is one with slope in (Ker A)l. = 1m A *.

Theorem 2.1.1 With the above notation, assume that 1m A * n dom g* =f:. 10. Then Ag satisfies (1.1.1); its conjugate is

(Ag)* = g* 0 A* .

2 Calculus Rules on the Conjugacy Operation 55

PROOF. First, it is clear that Ag ;fE +00 (take x = Ay, with y E domg). On the other hand, our assumption implies the existence of some Po = A * So such that g* (Po) < +00; with Fenchel's inequality (1.1.3), we have for all y E Rm:

g(y) ~ (A*so, Y)m - g*(po) = (so, AY)n - g*(po).

For each x E Rn, take the infimum over those Y satisfying Ay = x: the affine function (so, .) - g*(po) minorizes Ag. Altogether, Ag satisfies (1.1.1).

Then we have for s E Rn

(Ag)*(s) = SUPXERn[(S, x) - infAy=x g(y)] = SUPxERn,Ay=x[(S, x) - g(y)] = SUpYERm[(s, Ay) - g(y)] = g*(A*s). o

For example, when m = n and A is invertible, Ag reduces to goA -I, whose conjugate is therefore g* 0 A*, given by Proposition 1.3.1(iv).

As a first application of Theorem 2.1.1, it is straightforward to compute the conjugate of a marginal function:

I(x) := inf {g(x, z) : Z E RP}, (2.1.2)

where g operates on the product-space R n x RP. Indeed, just call A the projection onto Rn: A (x, z) = x E Rn, so I is clearly Ag. We obtain:

Corollary 2.1.2 With g : R n x RP =: Rm -+ R U {+oo} not identically +00, let g* be associated with a scalar product preserving the structure ofRm as a product space:

(-, ')m = (-, ')n + (-, ')p,

and suppose that there is So ERn such that (so, 0) E domg*. Then the conjugate of I defined by (2.1.2) is

!*(s) = g*(s, 0) for all s E Rn.

PROOF. It suffices to observe that, A being the projection defined above, there holds for all YI = (XI, ZI) E Rm andx2 ERn,

(AYI> X2)n = (XI> X2)n = (Xlt X2)n + (Zit 0) P = (Yl> (X2, O»m ,

which dtifines the adjoint A * x = (x, 0) for all x E Rn. Then apply Theorem 2.1.1. o

More will be said on this operation in §2.4. Here we consider another application: the infimal convolution which, to II and h defined on Rn, associates (see §IY.2.3)

(2.1.3)

To use Theorem 2.1.1, we take m = 2n and


Corollary 2.1.3 Let two functions II and hfrom IRn to IR U {+oo}, not identically +00, satisfy domN n domN '# 0. Then II t h satisfies (1.1.1), and (fl t 12)* = N+lt PROOF. EquippinglRn xlRn with the scalar product (', ')+{', .), weobtaing*(slo S2) = N(sl)+ N(S2) (Proposition 1.3.I(ix»andA*(s) = (s, s). Then apply the definitions.

o

Example 2.1.4 As sketched in Proposition 1.2.2.4, a function can be regularized if we take its infimal convolution with one of the kernels 1/2ell·1I2 or ell· II. Then Corollary 2.1.3 gives, with the help of Proposition 1.3. 1 (iii) and Example 1.1.3:

(J ~ !ell'1I2)* = f* + tell' 112 , (f ~ ell . 11>* = f* + ~IB(O.c) •

In particular, if f is the indicator of a nonempty set CeRn, the above formulae yield

o

2.2 Pre-Composition with an Affine Mapping

In view of the symmetry of the conjugacy operation, Theorem 2.1.1 suggests that the conjugate of goA, when A is a linear mapping, is the image-function A * g*. In particular, a condition was needed in §2.1 to prevent Ag(x) = -00 for some x. Likewise, a condition will be needed here to prevent goA == +00. The symmetry is not quite perfect, though: the composition of a closed convex function with a linear mapping is still a closed convex function; but an image-function need not be closed, and therefore cannot be a conjugate function.

We use notation similar to that of §2.1, but we find it convenient to distinguish between the linear and affine cases.

Theorem 2.2.1 With g E ConvlRm and Ao linear from IRn to IRm, define A(x) := Aox + Yo E IRm and suppose that A(lRn) n domg '# 0. Then goA E ConvlRn and its conjugate is the closure of the convex function

IRn 3 s 1-+ inf{g*(p) - {Yo, P)m : A6P = s}. P

(2.2.1)

PROOF. We start with the linear case (Yo = 0): suppose that h E ConvlRn satisfies ImAo n domh '# 0. Then Theorem 2.1.1 applied to g := h* and A := Ari gives (Arih*)* = h 0 Ao; conjugating both sides, we see that the conjugate of h 0 Ao is the closure of the image-function Arih*.

In the affine case, consider the function h := g(. + Yo) E Conv IRm; its conjugate is given by Proposition 1.3.I(v): h* = g* - {Yo, ·)m. Furthermore, it is clear that

(g 0 A)(x) = g(Aox + Yo) = h(Aox) = (h 0 Ao)(x) ,

so (2.2.1) follows from the linear case. o


Thus, the conjugation of a composition by a linear (or affine) mapping is not quite straightforward, as it requires a closure operation. A natural question is therefore: when does (2.2.1) define a closed function of s? In other words, when is an image-function closed? Also, when is the infimum in (2.2.1) attained at some p? We start with a technical result.

Lemma 2.2.2 Let g E Conv JR.m be such that 0 E dom g and let Ao be linear from JR.n to JR.m. Make the following assumption:

ImAo nridomg # 0 i.e. 0 E ridomg - ImAo [= ri(domg - 1m Ao)] .

Then (g 0 Ao)* = Atig* and,for every s E dom(g 0 Ao)*, the problem

inf {g*(p) : Atip = s} p

(2.2.2)

has an optimal solution p, which therefore satisfies g*(p) = (goAo)*(s) = Atig*(s).

PROOF. To prove (g 0 Ao)* = Ati g*, we have to prove that Ati g* is a closed function, i.e. that its sublevel-sets are closed (Definition IY.1.2.3). Thus, for given r E JR., take a sequence {Sk} such that

(Atig*)(Sk) :::;; r and Sk -+ s.

Take also Ok ..l- 0; from the definition of the image-function, we can find Pk E JR.m such that

g*(Pk) :::;; r + Ok and AtiPk = Sk.

Let qk be the orthogonal projection of Pk onto the subspace V := lin dom g - 1m Ao. Because V contains lindomg, Proposition 1.3.4 (withz = 0) gives g*(Pk) = g*(qk). Furthermore, V..1 = (lin dom g)..1 n Ker Ati; in particular, qk - Pk E Ker Ati. In summary, we have singled out qk E V such that

(2.2.3)

Suppose we can bound qk. Extracting a subsequence if necessary, we will have qk -+ q and, passing to the limit, we will obtain (since g* is l.s.c)

g*(q) :::;; liminf g*(qk) :::;; r and Atiq = s .

The required closedness property Ati g* (q) :::;; r will follow by definition. Furthermore, this q will be a solution of (2.2.2) in the particular case Sk == s and r = (Atig*)(s). In this case, {qk} will be actually a minimizing sequence of (2.2.2).

To prove bo~dedness of qk> use the assumption: for some e > 0, Bm (0, e) n V is included in domg - ImA. Thus, for arbitrary z E Bm(O, e) n V, we can find Y E domg and X E JR.n such that z = y - Aox. Then

{qko Z)m = {qk, Y)m - (Atiqk. x)n :::;; g(y) + g*(qk) - (Atiqko x)n :::;; g(y) + r + Ok - {Sko x)n .

[Fenchel (1.1.3)] [(2.2.3)]


We conclude

sup { (qk, Z) : k = 1, 2, ... } is bounded for any Z E Bm (0, e) n V ,

which implies that qk is bounded; this is Proposition V.2.1.3 in the vector space V. o

Adapting this result to the general case is now a matter of translation:

Theorem 2.2.3 With g E ConvlRm and Ao linear from IRn to IRm, define A(x) := Aox + Yo E IRm. Make the following assumption:

(2.2.4)

Then,for every s E dom(g 0 Ao)*. thefollowing minimization problem has a solution:

min {g*(p) - (p, Yo) : A~p = s} = (g 0 A)*(s). p

(2.2.5)

PROOF. By assumption, we can choose x E IRn such that y := A(x) E ridomg. Consider the function g := g(y + .) E Conv IRm . Observing that

(g 0 A)(x) = g(A(x) - y) = (g 0 Ao)(x - x) ,

we obtain from the calculus rule l.3.I(v)

(g 0 A)* = (g 0 Ao)* - (', x) .

Then Lemma 2.2.2 allows the computation of this conjugate. We have 0 in the domain of g, and even in its relative interior:

ridomg = ridomg - {y} 30 E ImAo.

We can therefore write: for all s E dom(g 0 Ao)* [= dom(g 0 A)*],

or also

(g 0 Ao)*(s) = min{g*(p) : A~p = s}, p

(g 0 A)*(s) - (s, x) = minp{g*(p) - (p, y) : A6P = s} = -(s, x) + minp{g*(p) - (p, Yo) : A6P = s}. 0

An example will be given in Remark 2.2.4 below, to show that the calculus rule (2.2.5) does need a qualification assumption such as (2.2.4). First of all, we certainly need to avoid goA == +00, i.e.

(2.2.Q.i)

but this is not sufficient, unless g is a polyhedral function (this situation has essentially been treated in §Y.3.4).


A "comfortable" sharpening of (2.2.Q.i) is

A(lRn) n intdomg =1= Ii'J i.e. 0 E intdomg - A(lRn) , (2.2.Q.ii)

but it is fairly restrictive, implying in particular that dom g is full-dimensional. More tolerant is

o E int[domg - A (lRn)] , (2.2.Q.iii)

which is rather common. Use various results from Chap. V, in particular Theorem Y.2.2.3(iii), to see that (2.2.Q.iii) means

O"domg(P) + (p, YO) > 0 for all nonzero P E Ker A~ .

Knowing that O"domg = (g*)~ (Proposition 1.2.2), this condition has already been alluded to at the end of §IY.3.

Naturally, our assumption (2.2.4) is a further weakening; actually, it is only a slight generalization of (2.2.Q.iii). It is interesting to note that, if (2.2.4) is replaced by (2.2.Q.iii), the solution-set in problem (2.2.5) becomes bounded; to see this, read again the proof of Lemma 2.2.2, in which V becomes Rm.

Remark 2.2.4 Condition (2.2.4) is rather natural: when it does not hold, almost all information on g is ignored by A, and pathological things may happen when conjugating. Consider the following example withm = n = 2, A(~, 1]) := (~, 0),

/

1]IOg1] R2 3 (~, 1]) 1-+ g(~, 1]) := 0

+00

and the scalar product

for 1] > 0, for 1] = 0 elsewhere ,

(s, x) = (p, .), (~, 1]») = (p + .)~ + (p + 2.)1].

Easy calculations give:

*( ) {exP(. - 1) if p +. = 0, g p,. = +00 otherwise,

and A*(p,.) = (p + .)(2, -1). Note: taking the canonical basis ofR2 (which is not orthonormal for (., .) I), we can define

A := [~ ~ J. M := [~ ~] so that

(s,x)=sTMx and A*=[ 2 2]=M-1A™. -1 -1

Thus domg n ImA = R x {OJ; none of the assumptions in Theorem 2.2.3 are satisfied; domg* n ImA* = {OJ,

A*g*(O) = inf{exp(. -1) : p +. = O} = 0,

and the infimum is not attained "at finite distance". Note also that A * g* is closed (how could it not be, its domain being a singleton!) and is the conjugate of goA == o.

The trouble illustrated by this example is that composition with A extracts from g only its nasty behaviour, namely its vertical tangent plane for 1] = o. 0


The message of the above counter-example is that conjugating brutally goA, with a scalar product defined on the whole of lim , is clumsy; this is the first line in Fig. 2.2.1. Indeed, the restriction of g to 1m A, considered as a Euclidean space, is a closed convex function by itself, and this is the relevant function to consider: see the second line of Fig. 2.2.1 (but a scalar product is then needed in 1m A). Alternatively, if we insist on working in the whole environment space lim, the relevant function is rather g + 11m A (third line in Fig. 2.2.1, which requires the conjugate of a sum). All three constructions give the same goA, but the intermediate conjugates are quite different.

xeRn_AeL(Rn,lmA) _AxelmA_ geConvlmA _(goA)(x)

X e Rn _ A e L(Rn,Rm) _ Ax e Rm _ g+ilmA e COiWRm_ (goA)(x)

Fig.2.2.1. Three possible expressions for goA

Corollary 2.2.5 Let g E Conv lim and let A be linear from lin to lim with 1m A ndom g "# 0. Then

(g 0 A)*' = A*(g + IlmA)*

and,for every s E dom(g 0 A)*, the problem

inf {(g + IlmA)*(P) : A* p = s} p

has an optimal solution p, which therefore satisfies

(2.2.6)

(2.2.7)

PROOF. The closed convex functions goA and (g + 11m A) 0 A are identical on lin, so they have the same conjugate. Also, g + 11m A is a closed convex function whose domain is included in ImA, hence

ridom(g + limA) n ImA = ridom(g + limA) "# 0.

The result follows from Theorem 2.2.3. o

We leave it as an exercise to study how Example 2.2.4 is modified by this result. To make it more interesting, one can also take g(~, 1]) + 1/2~2 instead of g. Theorem 2.2.3 and Corollary 2.2.5 are two different ways of stating essentially the same thing; the former requires that a qualification assumption such as (2.2.4) be checked; the latter requires that the conjugate of the additional function g + 11m A be computed. Either result may be simpler to use, depending on the particular pro~lem under consideration.

Remark 2.2.6 We know from Proposition 1.3.2 that (g + IlmA)* = (g 0 P A)* 0 P A, where P A : lim ~ lim is the orthogonal projection onto the subspace 1m A (beware that, by contrast to A, P A operates on lim). It follows that (2.2.6) - (2.2.7) can be replaced respectively by:


inf{(goPA)*(PAP): A*p=s},

(g 0 PA)*(P AP) = A*[(g 0 PA)* 0 P A](S) = (g 0 A)*(s).

When the qualification assumption allows the application of Theorem 2.2.3, we are bound to realize that

A*g* = A*(g + IImA)* = A*[(g 0 P A)* 0 PA]. (2.2.8)

Indeed, Theorem 2.2.3 actually applies also to the couple (P A, g) in this case: (g 0 P A)* can be developed further, to obtain (since PAis symmetric)

[(g 0 P A)* 0 P A](P) = [(P Ag*) 0 P A](P) = inf (g*(q) : P Aq = PAP}.

Now, to say that P Aq = PAP, i.e. thatP A(q-P) = O,istosaythatq-P E (ImA)l. = Ker A*, i.e. A * P = A * q. Altogether, we deduce

A*(g + IImA)*(S) = inf{g*(q) : A* P = A*q = s} = A*g*(s) p.q

and (2.2.8) does hold. o

To conclude this subsection, consider an example with n = 1: given a function g E Conv JRm, fix Xo E dom g, 0 #- d E JRm, and set

1/I(t) := g(xo + td) for all t E JR.

This 1/1 is the composition of g with the affine mapping which, to t E JR, assigns Xo + t d E JRm. It is an exercise to apply Theorem 2.2.1 and obtain the conjugate:

1/I*(a) = min {g*(s) - (s, xo) : (s, d) = a}

whenever, for example, Xo and d are such that Xo + td E ri dom g for some t - this is (2.2.4). This example can be further particularized to various functions mentioned in § 1 (quadratic, indicator, ... ).

2.3 Sum of Two Functions

The formula for conjugating a sum will supplement Proposition 1.3. 1 (ii) to obtain the conjugate of a positive combination of closed convex functions. Summing two functions is a simple operation (at least it preserves closedness); but Corollary 2.1.3 shows that a sum is the conjugate of something rather involved: an inf-convolution. Likewise, compare the simplicity of the composition goA with the complexity of its dual counterpart Ag; as seen in §2.2, difficulties are therefore encountered when conjugating a function of the type goA. The same kind of difficulties must be expected when conjugating a sum, and the development in this section is quite parallel to that of§2.2.

Theorem 2.3.1 Let gl and g2 be in Conv JRn and assume that dom gl n dom g2 #- 0. The conjugate (gl + g2)* of their sum is the closure of the convex function gr t g;'


PROOF. Call f;* := gj, fori = 1,2; apply Corollary 2.1.3: (grtgi)* = gl +g2; then take the conjugate again. 0

The above calculus rule is very useful in the following framework: suppose we want to compute an inf-convolution, say h = f t g with f and g in Conv]Rn . Compute f* and g*; if their sum happens to be the conjugate of some known function in Conv]Rn, this function has to be the closure of h.

Just as in the previous section, it is of interest to ask whether the closure operation is necessary, and whether the inf-convolution is exact.

Theorem 2.3.2 The assumptions are those of Theorem 2.3.1. Assume in addition that

the relative interiors of dom gl and dom g2 intersect I or equivalently: 0 E ri(domgl - domg2). (2.3.1)

Then (gl + g2)* = gr t gi and,for every s E dom(gl + g2)*, the problem

has an optimal solution (p, q), which therefore satisfies

PROOF. Define g E Conv(]Rn x ]Rn) by g(XIo X2) := gl (Xl) + g2(X2) and the linear operator A : ]Rn -+ ]Rn x ]Rn by Ax := (x, x). Then gl + g2 = goA, and we proceed to use Theorem 2.2.3. As seen in Proposition 1.3.1(ix), g*(p, q) = gr(p) + gi(q) and straightforward calculation shows that A * (p, q) = p + q. Thus, if we can apply Theorem 2.2.3, we can write

(gl + g2)*(S) = (g 0 A)*(s) = (A*g*)(s)

= inf{g~(p) + g;(q) : p + q = s} = (g~ t gi)(s) p,q

and the above minimization problem does have an optimal solution. To check (2.2.4), we note that dom g = dom gl x dom g2, and 1m A is the diagonal

set Ll := {(s, s) : s E ]Rn}. We have

(x, x) E ridomgl x ridomg2 = ri(domgl x domg2)

(Proposition 111.2.1.11), and this just means that 1m A = Ll has a nonempty intersection with ri dom g. 0

As with Theorem 2.2.3, a qualification assumption - taken here as (2.3.1) playing the role of (2.2.4) - is necessary to ensure the stated properties; but other assumptions exist that do the same thing. First of all,

domgl n domg2 =1= 0 i.e. 0 E domgl - domg2 (2.3.Q.j)


is obviously necessary to have gl + g2 '# +00. We saw that this ''minimal'' condition was sufficient in the polyhedral case. Here the results of Theorem 2.3.2 do hold if (2.3.1) is replaced by:

gl and g2 are both polyhedral, or also gl is polyhedral and dom gl n ri dom g2 :f:. (21 •

The "comfortable" assumption playing the role of(2.2.Q.ii) is

intdomgl n intdomg2 :f:. (21, (2.3.Q.jj)

which obviously implies (2.3.1). We mention a non-symmetric assumption, particular to a sum:

(intdomgl) n domg2 :f:. (21.

Finally, the weakening (2.2.Q.iii) is

o E int(domgl - domg2)

which means Udomg.(S) +udomg2(-s) > 0 foralls:f:. 0;

we leave it as an exercise to prove (2.3.Q.jj') => (2.3.Q.JJJ).

(2.3.Q.jj')

(2.3.Q.jjj)

The following application of Theorem 2.3.2 is important in optimization: take two functions gl and g2 in Conv]Rn and assume that

is a finite number. Under some qualification assumption such as (2.3.1),

(2.3.2)

a relation known as Fenchel s duality Theorem. If, in particular, g2 = Ie is an indicator function, we read (still under some qualification assumption)

inf U(x) : x E C} = -minU*(s) + oc(-s) : S E ]Rn}.

It is appropriate to recall here that conjugate functions are interesting primarily via their arguments and subgradients; thus, it is the existence of a minimizing s in the right-hand side of (2.3.2) which is useful, rather than a mere equality between two real numbers.

Remark 2.3.3 Our proof of Theorem 2.3.2 is based on the fact that the sum of two functions can be viewed as the composition of a function with a linear mapping. It is interesting to demonstrate the converse mechanism: suppose we want to prove Theorem 2.2.3, assuming that Theorem 2.3.2 is already proved.

Given g E Conv lRm and A : lRn ~ lRm linear, we then select the two functions with argument (x, y) E lRn x lRm :

gl (x, y) := g(y) and g2:= Igr A .

Direct calculations give


* { g*(p) if s = 0, gl (s, p) = +00 if not ,

(2.3.3)

g!(s, p) = sup {(s, x}n + (p, Y}m : Y = Ax} = sup(x, s + A* P}n. x,Y x

Hence * {O ifs+A*p=O,

g2(s, p) = +00 ifnot. (2.3.4)

We want to study (g 0 A)*, which is obtained from the following calculation:

(gl + g2)*(S, 0) = sup{(s, x}n - g(y) : y = Ax} = (g 0 A)*(s) , x,Y

so the whole question is to compute the conjugate of gl + g2. If (2.3. 1) holds, we have

but, due to the particular structure appearing in (2.3.3) and (2.3.4), the above minimization problem can be written

inf{g*(p): s-A*p=O}=A*g*(s).

There remains to check (2.3.1), and this is easy:

domgl = Rn x domg hence ridomgl = Rn x ridomg; domg2 = gr A hence ridomg2 = gr A.

Under these conditions, (2.3.1) expresses the existence of an Xo such that Axo E ri dom g, and this is exactly (2.2.4), which was our starting hypothesis. 0

To conclude this study of the sum, we mention one among the possible results concerning its dual operation: the infimal convolution.

Corollary 2.3.4 Take fl and h in ConvlRn, with flO-coercive and h bounded from below. Then the inf-convolution problem (2.1.3) has a nonempty compact set of solutions; furthermore fl thE Conv IRn.

PROOF. Letting f.L denote a lower bound for h, we have

and the first part of the claim follows. For closedness of the infimal convolution, we set gi := 1;*, i = 1, 2; because ofO-coercivity of flo 0 E int dom gl (Remark 1.3.10), and g2 (0) ~ - f.L. Thus, we can apply Theorem 2.3.2 with the qualification assumption (2.3.Q.JJ'). 0


2.4 Infima and Suprema

A result of the previous sections is that the conjugacy correspondence establishes a symmetry between the sum and the inf-convolution, and also between the image and the pre-composition with a linear mapping. Here we will see that the operations "supremum" and "closed convex hull of the infimum" are likewise symmetric to each other.

Theorem 2.4.1 Let {fj } j eJ be a collection of functions satisfying (1.1.1) and assume that there is a common affine function minorizing all of them:

sup{f/(s) : j E J} < +00 forsomes E]Rn.

Then their infimum f := infjeJ fj satisfies (1.1.1), and its conjugate is the supremum of the r 8:

J

(infjeJ ij)* = SUPjeJ fj*.

PROOF. By definition, for all s E ]Rn

j*(s) = sUPx[(s, x) - infjeJ fj(x)] = sUPx SUPj[(s, x) - fj(x)] = SUPj supX£(s, x) - fj(x)] = SUPjeJ fj*(s).

(2.4.1)

o

This result should be compared to Corollary 2.1.2, which after all expresses the conjugate of the same function, and which is proved in just the same way; compare the above proof with that of Theorem 2.1.1. The only difference (apart from the notation j instead of z) is that the space ]RP is now replaced by an arbitrary set J, with no special structure, in particular no scalar product. In fact, Theorem 2.4.1 supersedes Corollary 2.1.2: the latter just says that, for g defined on a product-space, the easy-to-prove formula

g*(s, O) = supg;(s, z) z

holds, where the subscript x of g; indicates conjugacy with respect to the first variable only. In a word, the conjugate with respect to the couple (x, z) is of no use for computing the conjugate at (·,0). Beware that this is true only when the scalar product considers x and z as two separate variables (i.e. is compatible with the product-space). Otherwise, trouble may occur: see Remark 2.2.4.

Example 2.4.2 The Euclidean distance to a nonempty set C C ]Rn is an inf-function:

x ~ dc(x) = inf (fy(x) : Y E C} with fy(x):= IIx - yll .

Remembering that the conjugate of the norm II . II is the indicator of the unit ball (combine (V.2.3.1) and Example 1.1.5), the calculus rule 1.3. 1 (v) gives

(/y)*(s) = IB(O,I)(S) + (s, y) (2.4.2)


and (2.4.1) may be written as

dC(s) = IB(o,I)(S) + sup (S, y) = IB(O,I)(S) + ods) , YEC

a fonnula already given in Example 2.1.4. The same exercise can be carried out with the squared distance. 0

Example 2.4.3 (Directional Derivative) With f E Conv IRn , let Xo be a point where af(xo) is nonempty and consider for t > 0 the functions

IRn 3 d 1-+ ft(d) := f(xo + td) - f(xo) . t

(2.4.3)

They are all minorlzed by (so, .), with So E af(xo); also, the difference quotient is an increasing function of the stepsize; their infimum is therefore obtained for t .J.. o. This infimum is denoted by f'(xo, .), the directional derivative of fat Xo (already encountered in Chap. VI, in a finite-valued setting). Their conjugates can be computed directly or using Proposition 1.3.1:

(jj )*( ) _ /*(s) + f(xo) - (s, xo) Sl-+ t s- ,

t (2.4.4)

so we obtain from (2.4.1)

[f'( .)]*() - /*(s) + f(xo) - (s, xo) Xo, s - sup . t>O t

Observe that the supremand is always nonnegative, and that it is 0 if and only if s E af(xo) (cf. Theorem 1.4.1). In a word:

[I' (xo, .)]* = la!(xo) .

Taking the conjugate again, we see that the support function of the subdifferential is the closure of the directional derivative; a result which must be compared to §Vl.l.l.

o

As already mentioned, sup-functions are fairly important in optimization, so a logical supplement to (2.4.1) is the conjugate of a supremum.

Theorem 2.4.4 Let {gj ljE} be a collection of jUnctions in ConvlRn. If their supremum g := SUPjE} is not identically +00, it is in ConvlRn, and its conjugate is the closed convex hull of the gi s:

(SUPjE) gj)* = co (infjE) gj) . (2.4.5)

PROOF. Call Jj := gj, hence fl = gj, and g is nothing but the /* of (2.4.1). Taking the conjugate of both sides, the result follows from (1.3.4). 0


Example 2.4.5 Given, as in Example 2.4.2, a nonempty bounded set C, let

IRn :3 x ~ L1c(x):= sup{lIx - yll : y E C}

be the distance from x to the most remote point of cl C. Using (2.4.2),

inf (/y)*(s) = 18(0 I)(S) + inf (s, y) , yeC 'yeC

so (2.4.5) gives

.1C = co (18(0,1) - O'-C). o

Example 2.4.6 (Asymptotic Function) With I E Conv IRn ,xo E dom I and It as defined by (2.4.3), consider

I~(d) := sup {ft(d) : t > O} = lim{ft(d) : t ~ +oo}

(see §1y'3.2). Clearly, It E ConvlRn and It(O) = 0 for all t > 0, so we can apply Theorem 2.4.4: 160 E Conv IRn. In view of (2.4.4), the conjugate of 160 is therefore the closed convex hull of the function

. f I*(s) + I(xo) - (s, xo) s~m .

t>o t

Since the infimand is in [0, +00], the infimum is 0 if S E dom J*, +00 if not. In summary, (/60)* is the indicator ofcldomJ*. Conjugating again, we obtain

I~ = O'domf* '

a formula already seen in (1.2.3). The comparison with Example 2.4.3 is interesting. In both cases we extremize

the same function, namely the difference quotient (2.4.3); and in both cases a support function is obtained (neglecting the closure operation). Naturally, the supported set is larger in the present case of maximization. Indeed, dom J* and the union of all the subdifferentials of I have the same closure. 0

The conjugate of a sup-function appears to be rather difficult to compute. One possibility is to take the supremum of all the affine functions minorizing all the fj* 's, i.e. to solve (1.3.5) (a problem with infinitely many constraints, where I stands for infj fj*)' Another possibility, based on (IY.2.5.3), is to solve for {aj. Sj}

inf{Ej,!llajfj*(Sj) : a E .1n+J. Sj Edomfj*. Ej,!ll ajsj =s}; (2.4.6)

but then we still have to take the lower semi-continuous hull of the result. Both possibilities are rather involved, let us mention a situation where the second one takes an easier form.


Theorem 2.4.7 Let fl' ... , fp be finitely many convex functions from JRn to JR, and let f := maxj fj; denote by m := min{p, n + I}. For every

S E domf* = cOU{dom.tj* : j = 1, ... , p},

there exist m vectors Sj E dom fj* and convex multipliers aj such that

S = I>jSj and f*(s) = Laj.tj*(sj)' j j

(2.4.7)

In other words: an optimal solution exists in the minimization problem (2.4.6), and the optimal value is a closed convex function of s.

PROOF. Set g := minj fj*, so f* = cog; observe that domg = Uj domfj*. Because

each fj* is closed and I-coercive, so is g. Then we apply Proposition 1.5.4: first,

co g = co g; second, for every

S E domf* = domcog = co{Uj domfj*} ,

there are Sj E domg and convex multipliers aj, j = 1, ... , n + 1 such that

n+1 n+1 S = LajSj and (cog)(s) = /*(s) = Lajg*(sj)'

j=1 j=1

In the last sum, each g*(Sj) is a certain !;*(Sj); furthermore, several sj's having the same 1;* can: be compressed to a single convex combination (thanks to convexity of each 1;*, see Proposition IY.2.S.4). Thus, we obtain (2.4.7). 0

The framework in which we proved this result may appear very restrictive; however, enlarging it is not easy. For example, extended-valued function Ii create a difficulty: (2.4.7) does not hold with n = I and

!I (x) = expx and h = I{o} .

Example 2.4.8 Consider the function g+ := max{O, g}, where g : JRn --+- JR is convex. With fl := g and h == 0, we have ft = I{o} and, according to Theorem 2.4.7: for all S E dom(g+)* = co{{O} U domg*},

(g+)*(s) = min {ag*(sl) : SI E domg*, a E [0,1], aSI =s}.

For S # 0, the value a = 0 is infeasible: we can write

(g+)*(s) = min a g*(ks) for S # O. o<a~ I

For S = 0, the feasible solutions have either a = 0 or SI = O. Both cases are covered by the condensed formulation

(g+)*(O) = min {g*(O), O},

which could also have been obtained by successive applications of the formula inf f = - /*(0). 0


2.5 Post-Composition with an Increasing Convex Function

For f E Conv lRn and g E Conv lR, the function g 0 f is in Conv lRn if g is increasing (we asswne f(lRn) n domg :j:. 0 and we set g(+oo) = +(0). A relevant question is then how to express its conjugate, in terms of f* and g*.

We start with some preliminary observations, based on the particular nature of g. The domain of g is an interval unbounded from the left, whose relative interior is intdomg :j:. 0. Also, domg* c lR+ and (remember §1.3.2)

g(y) = g**(y) = sup {py - g*(p) : p E ridomg*}.

Theorem 2.5.1 With f and g as above, assume that f (lRn) n int dom g :j:. 0. For all s E dom(g 0 1)*, define the function Vrs E ConvlR by

Then

{ af*{ks) + g*(a)

lR 3 a ~ Vrs(a) = Udom!(S) + g*(O) +00

(g 0 I)*(s) = min Vrs(a) . aER

PROOF. By definition,

-(g 0 I)*(s) = infx[g(f(x» - (s, x)]

ija>O, ija=O, ija<O.

= infx,r{g(r) - (s,x) : f(x) ~ r} = infx,r[g(r) - (s, x) + Iepi!(x, r)].

Then we must compute the conjugate of a swn; we set

lRn x lR 3 (x, r) ~ fl(X, r) := g(r) - (s, x)

and h := Iepi!' We have

[g is increasing]

domfl = lRn x domg, intdomfl = lRn x intdomg;

so, by asswnption:

intdomfl ndomh = (lRn X intdomg) nepif:j:. 0.

Theorem 2.3.2, more precisely Fenchel's duality theorem (2.3.2), can be applied with the qualification condition (2.3.Q.JJ'):

(g 0 I)*(s) = min {ft(-p, a) + f2*(P' -a) : (p, a) E lRn x lR}.

The computation of the above two conjugates is straightforward and gives

(g 0 I)*(s) = min[g*(a) + I{-s}(-p) + Uepi!(P, -a)] = min Vrs(a) , p,a a

where the second equality comes from (1.2.2). o

Let us take two examples, using the ball-pen function defined on its domain B(O, I) by f(x) = I - JI - IIx1l2: here feRn) = [0, I] U {+oo}.


- With g(r) = r+, we leave it to the reader to compute the relevant conjugates: t/ls(a) has the unique minimum-point a = 0 for all s.

- With g = 1]-00,0], the qualification assumption in the above theorem is no longer satisfied. We have (g 0 f)*(s) = 0 for all s, while milla t/ls(a) = IIsll - 1.

Remark 2.5.2 Under the qualification assumption, t/ls always attains its minimum at some a ~ o. As a result, if for example (J'dom f (s) = (f*)60 (s) = +00, or if g* (0) = +00, we are sure that a > O. 0

Theorem 2.5.1 takes an interesting form in case of positive homogeneity, which can apply to f or g. For the two examples below, we recall that a closed sublinear function is the support of its subdifferential at the origin.

Example 2.5.3 If g is positively homogeneous, say g = (J'l for some closed interval l, the domain of '1/1 s is contained in l, on which the term g* (a) disappears. The interval l supported by g has to be contained in lR+ (to preserve monotonicity) and there are essentially two instances: Example 2.4.8 was one, in which l was [0, I]; in the other instance, l is unbounded (a half-line), and the following example gives an application.

Consider the set C := (x E lRn : f(x) ~ OJ, with f : lRn ~ lR convex; its support function can be expressed in terms of f*. Indeed, write the indicator of C as the composed function:

Ie = 1]-00,0] 0 f . This places us in the present framework: g is the support oflR +. To satisfy the hypotheses of Theorem 2.5.1, we have only to assume the existence ofanxo with f(xo) < 0; then the result is: for 0 f. s E dom oc, there exists a > 0 such that

ac(s) = min a f*(ks) = af*(s/a) . a>O

Indeed, we are in the case of Remark 2.5.2 because dom f = lRn. o

Example 2.5.4 In our second example, it is f that is positively homogeneous. Then, for all s E dom(g 0 f)*,

{ g*(a) if a > 0 and ks E af(O) ,

'I/Is(a)= adom/(S)+g*(O) ifa=O, +00 if a < O.

If f is finite everywhere, i.e. af(O) is bounded, we have for all s f. 0

(g 0 f)*(s) = min {g*(a) : a > 0 and ks E af(O)} .

As an application, let Q be symmetric positive definite and define the function

lRn :3 x 1-+ f(x) := J{Q-1X, x),

which is the support function of the elliptic set (see Example Y.2.3.4)

EQ := af(O) = {s E lRn : (s, Qs) ~ I}.


The composition of I with the function r ~ g(r) := I/U2 is the quadratic from associated with Q-I, whose conjugate is known (see Example 1.1.3):

(g 0 f)(x) = ~(Q-IX, x), (g 0 f)*(s) = ~(s, Qs) .

Then ~{s,Qs)=minHa2: a~OandsEaEQ}

is the half-square of the gauge function YEQ (Example Y.2.3.4).

2.6 A Glimpse of Biconjugate Calculus

o

Taking the biconjugate of a function, i.e. "close-convexifying" it, is an important operation. A relevant question is therefore to derive rules giving the biconjugate of a function obtained from other functions having known biconjugates.

First of all, Proposition 1.3.1 can easily be reproduced to give the following statements, in which the various f's and fj's are assumed to satisfy (1.1.1).

G) The biconjugate of g(.) := 1(-) + a is (co g)(.) = (co f)(-) + a. (jj) With a > 0, the biconjugate of g := al is (co g) = a(co f).

(jjj) With a =1= 0, the biconjugate of the function g(x) := I(ax) is (co g)(x) = (co f)(ax).

(jv) More generally: if A is an invertible linear operator, co(f 0 A) = (co f) 0 A. (v) The biconjugate of g(.) := 1(· - xo) is (co g)(-) = (co f)(. - xo).

(vj) The biconjugate of g := I + (so, .) is (co g) = (co f) + (so, .).

(vjj) If II ::;,; 12, then co II ::;,; co f2. (jx) The biconjugate of I(x) = 'L1=1 fj(Xj) is co 1= 'Lj=1 co fj.

Note: there is nothing corresponding to (viii). All these results are straightforward, from the very definition of the closed convex hull. Other calculus rules are not easy to derive: for most of the operations involved, only inequalities are obtained.

Consider first the sum: let II and 12 satisfy (1.1.1), as well as

dom/l ndomh"# 0.

Clearly, co II +co 12 is then a function of Con v IRn which minorizes II + f2. Therefore

co II + co 12 ::;,; co (II + h) . This inequality is the only general comparison result one can get, as far as sums are concerned; and closed convexity of II, say, does not help: it is only when II is affine that equality holds.

Let now (fj)jeJ be a collection of functions and I := sup fj. Starting from fj ::;,; I for j E J, we obtain immediately co fj ::;,; co I and

sup(co fj) ::;,; co I . jeJ

Nothing more accurate can be said in general. The case of an infimum is hardly more informative:


Proposition 2.6.1 Let {fj}jEJ be a collection offunctions satisfying (1.1.1) and assume that there is a common affine function minorizing all of them. Then

PROOF. The second relation is trivial. As for the first, Theorem 2.4.1 gives (infj fj)* = SUPj fj*. The left-hand function is in Conv IRn, so we can take the conjugate of both sides and apply Theorem 2.4.4. 0

So far, the calculus developed in the previous sections has been oflittle help. The situation improves for the image function under a linear mapping.

Proposition 2.6.2 Let g : IRm ~ IR U {+oo} satisfy (1.1.1) and let A be linear from IRn to IRm. Assuming ImA* n ridomg* "# 0, the equality co(Ag) = A (co g) holds. Actually, for each x E domco(Ag), there exists y such that

[co(Ag)](x) = (co g)(y) and Ay = x .

PROOF. Since ImA* n domg* "# 0, Theorem 2.1.1 applies: (Ag)* = g* 0 A*. Likewise, Theorem 2.2.3 (with g and A replaced by g* and A *) allows the computation of the conjugate of the latter. 0

The qualification assumption needed in this result is somewhat abstract; a more concrete one is for example O-coercivity of co g: then 1m A * 3 0 E int dom g* (see Remark 1.3.10).

The cases of marginal functions and inf-convolutions are immediately taken care of by Proposition 2.6.2; the corresponding statements and proofs are left to the reader.

3 Various Examples

The examples listed below illustrate some applications of the conjugacy operation. It is an obviously useful tool thanks to its convexification ability; note also that the knowledge of f* fully characterizes co f, i.e. f in case the latter is already in Conv IRn; another fundamental property is the characterization (1.4.2) of the subdifferential. All this concerns convex analysis more or less directly, but the conjugacy operation is instrumental in several areas of mathematics.

3.1 The Cramer Transformation

Let P be a probability measure on IRn and define its Laplace transform L p : IRn ~ ]0, +00] by

IRn 3 s 1-+ Lp(s) := f exp(s, z) dP(z) .

an

The so-called Cramer transform of P is the function

3 Various Examples 73

IRn 3 x H- Cp(X) := sup {(S, x) -logLp(s) : s E IRn}.

It is used for example in statistics. We recognize in C p the conjugate of the function 10gLp, which can be shown to be convex (as a consequence of Holder's inequality) and lower semi-continuous (from Fatou's lemma). We conclude that C p is a closed convex function, whose conjugate is log L p .

3.2 Some Results on the Euclidean Distance to a Closed Set

Let S be a nonempty closed set in IRn and define the function I := 1/211 . 112 + Is:

I(x) := { !lIxll2 ~f E S, +00 If not.

(3.2.1)

Clearly, I satisfies (1.1.1), and is lower semi-continuous and I-coercive; it is convex if and only if S is convex; its conjugate turns out to be of special interest. By definition,

1*(s)=sup{(s,x)-!lIxIl2: XES},

which is the function CPS of Example IY.2.1.4:

(3.2.2)

From Theorem 1.5.4, co 1= co I and, for each X E co S, there are x], ... ,Xn+1 in S and convex multipliers cx], ... ,CXn+1 such that

n+1 n+1 x = L CXjXj and (co f)(x) = ! L CXj IIXj 112.

j=1 j=1

The (nonconvex) function I of (3.2.1) has a subdifferential in the sense of (1.4.1), which is characterized via the projection operation onto S:

Ps(x) := {y E S : ds(x) = lIy - xII}

(a compact-valued multifunction, having the whole space as domain).

Proposition 3.2.1 Let XES. With I given by (3.2.1),

(i) s is a subgradient 011 at x if and only ifits projection onto S contains x; in other words,

a/(x) = {s E IRn x E Ps(s)} = {s E IRn ds(s) = lis -xII};

in particular, x E a/(x); (ii) (co f)(x) = I(x) = !lIxll2 and a(co f)(x) = a/(x).


PROOF. From the expression (3.2.2) of !*, s E af(x) if and only if

Using Is(x) = 0, this means ds(s) = lis - x II, i.e. x E P S(s). In particular, s = x is in af(x),justbecause Ps(x) = {x}. Hence, af(x) is nonempty and (ii) is a consequence of (1.4.3), (1.4.4). 0

Having thus characterized a f, we can compute a (co f) using Theorem 1.5.6, and this in turn allows the computation of af*:

Proposition 3.2.2 For all s E ]R.n,

af*(s) = coPs(s). (3.2.3)

PROOF. Because co f is closed, x E a!*(s) '¢} s E a (co f)(x), which in turn is expressed by (1.5.4): there are x" ... , Xn+1 in S and convex multipliers ai, ... , an+1 such that

n+1 x = I>jXj and s E n{af(xj) : aj > O}.

j=1

It then suffices to apply Proposition 3.2.l(i): s E af(xj) '¢} Xj E Ps(s) for j = 1, ... ,n+1. 0

This result is interesting in itself. For example, we deduce that the projection onto a closed set is almost everywhere a singleton, just because V f* exists almost everywhere. When S is convex, Ps(s) is for any s a singleton {ps(s)}, which thus appears as the gradient of f* at s:

VGd~) =I-ps ifSisconvex.

Another interesting result is that the converse to this property is true: if d~ (or equivalently f*) is differentiable, then S is convex and we obtain a convexity criterion:

Theorem 3.2.3 For a nonempty closed set S, the following statements are eqUivalent:

(i) S is convex; (ii) the projection operation Ps is single-valued on ]R.n;

(iii) the squared distance d~ is differentiable on ]R.n; (iv) the function f = IS + 1/211 . 112 is convex.

PROOF. We know that (i) => (ii); when (ii) holds, (3.2.3) tells us that the function !* = 1/2 (II . 112 - d~) is differentiable, i.e. (iii) holds. When fin (iv) is convex, its domain S is convex. There remains to prove (iii) => (iv).

Thus, assuming (iii), we want to prove that f = co f (= co f). Take first x E

ridomco f, so that there is an s E a(co f)(x) (Theorem 1.4.2). In view of (1.5.4), there are Xj E S and positive convex multipliers aj such that


x = Lajxj, (co f)(x) = Laj/(xj), s E njO/(Xj)' j j

The last property implies that each Xj is the unique element of a/*(s): they are all equal and it follows (co f) (x) = 1 (x).

Now use Proposition IV.1.2.5, expressing co 1 outside the relative interior of its domain: for x E domco I, there is a sequence {Xk} C ridomco 1 tending to x such that

I(x) ~ (co f)(x) = lim (co I)(Xk) = lim/(Xk) ~ I(x) , k~+oo

where the last inequality comes from the lower semi-continuity of I.

3.3 The Conjugate of Convex Partially Quadratic Functions

o

A simple case in §3.2 is one in which S is a subspace: we essentially obtain a quadratic function "restricted" to a subspace, resembling the /* obtained in Example 1.1.4. Thus, given a linear symmetric positive semi-definite operator B and a subspace H, it is of interest to compute the conjugate of

(x):= { ~(Bx,x) if x E ~, g +00 otherwtse .

(3.3.1)

Such a function (closed, convex, homogeneous of degree 2) is said to be partially quadratic. Its conjugate turns out to have just the same form, so the set of convex partially quadratic functions is stable with respect to the conjugacy operation (cf. Example 1.1.4).

Proposition 3.3.1 Thefunction g of (3.3.l) has the conjugate

*(s) = { ~(s, (PH 0 B 0 PH)-S) ifs Elm B + H1., g +00 otherwise,

(3.3.2)

where PHis the operator of orthogonal projection onto Hand (.) - is the MoorePenrose pseudo-inverse.

PROOF. Writing g as 1 + IH with 1 := 1/2 (B·, .), use Proposition 1.3.2: g* = (f 0 PH)* 0 PH; knowing that

(1 0 PH)(X) = ~(PH 0 B 0 PHX. x),

we obtain from Example 1.1.4 g* (s) under the form

(l oP )*(P s) = { ~(s, (PH 0 B 0 PH)-S) if PHS .E Im(PH 0 B 0 PH), H H +00 otherwtse.

It could be checked directly that Im(P HoB 0 PH) + H 1. = 1m B + H 1.. A simpler argument, however, is obtained via Theorem 2.3.2, which can be applied since dom 1 = ]Rn. Thus,


g*(s) = (1* tIH .. t.) (s) = min {4(p, B-p) p E 1mB, s - p E H1.},

which shows that dom g* = 1m B + H 1.. 0

For example, suppose H = 1mB. Calling A := B- the pseudo-inverse of B (hence A - = (B-)- = B), we see that g of(3.3.1) is just f* of Example 1.1.4 with b = O. Since 1m B + H 1. = JR.n, and using the relations PH 0 B = BoP H = B, we obtain finally that g* is the f of(1.1.4)-and this is perfectly normal: g* = f** = f.

As another example, take the identity for B; then 1m B = JR.n and Proposition 3.3.1 gives immediately

This last example fits exactly into the framework of §3.2, and (3.2.2) becomes

a classical relation known as Pythagoras' Theorem.

3.4 Polyhedral Functions

For given (Si, bi) E JR.n x JR., i = 1, ... , k, set

Ii (x) : = (s i , x) - bi for i = 1, ... , k

and define the piecewise affine function

JR.n 3xl-+f(x):=max{fi(x): i=I, ... ,k}. (3.4.1)

Proposition 3.4.1 At each s E co{s(, ... , skI = dom f*, the conjugate off has the value (L1k is the unit simplex)

(3.4.2)

PROOF. Theorem 2.4.7 is directly applicable: since It = I{s;} + bi, the variables Sj in the minimization problem (2.4.6) become the given Sj. This problem simplifies to (3.4.2), which at the same time reveals dom f*. 0

In JR.n x JR. (considered here as the dual of the graph-space), each of the k vertical half-lines {(Si, r) : r ~ bj}, i = 1, ... , k is the epigraph of It, and these halflines are the "needles" of Fig. IY.2.5.1. Now, the convex hull of these half-lines is a closed convex polyhedron, which is just epi f*. Needless to say, (3.4.1) is obtained by conjugating again (3.4.2). This double operation has an interesting application in the design of minimization algorithms:


Example 3.4.2 Suppose that the only available information about a function ! E

Conv IRn is a finite sampling of function- and subgradient-values, say

!(Xj) andsj E a!(Xj) fori = 1, ... ,k.

By convexity, we therefore know that 'PI ~ ! ~ 'P2, where

'PI(Y):= max {f(Xj) + (Sj, Y - Xj) : i = 1, ... ,k}, f/J2 := co (minj gj) with gj(Y):= I{xj}(Y) + !(Xi) for i = 1, ... , k.

The resulting bracket on! is drawn in the left-part of Fig. 304.1. It has a counterpart in the dual space: 'Pi ~ 1* ~ 'P~, illustrated in the right-part of Fig. 304.1, where

'P~(S) = co [min{I{sj}(s) + (Sj,Xj) - !(Xj»],

'P;(s) = max{(s,xj)-!(Xj): i=I, ... ,k}.

Note: these formulae can be made more symmetric, with the help of the relations (Sj, Xj) - !(Xj) = I*(Sj).

X1 0 x2

t*(S1) I·--· .. ·--·~

t*(S2) 1 .. ·---·-.... ·-+ .. ·--1~

Fig. 3.4.1. Sandwiching a function and its conjugate

In the framework of optimization, the relation 'PI ~ ! is well-known; it will be studied in more detail in Chapters XII and xv. It is certainly richer than the relation ! ~ 'P2, which ignores all the first-order information contained in the subgradients Sj. The interesting point is that the situation is reversed in the dual space: the "cuttingplane" approximation of 1* (namely 'Pi) is obtained from the poor primal approximation 'P2 and vice versa. 0

Because the ! of (3 04.1) increases at infinity no faster than linearly, its conjugate 1* has a bounded domain: it is not piecewise affine, but rather polyhedral. A natural idea is then to develop a full calculus for polyhedral functions, just as was done in §3.3 for the quadratic case.

A polyhedral function has the general form g = ! + I p, where P is a closed convex polyhedron and! is defined by (304.1). The conjugate of g is given by Theorem 2.3.2: g* = !* t Up, i.e.


g*(S) = min [E~=1 ajbj + up(s - E~=1 ajsj)]. aei!1k

(3.4.3)

This fonnula may take different fonns, depending on the particular description of P. When

P = co {PI, ... , Pm}

is described as a convex hull, Up is the maximum of just as many linear functions (Pj' .) and (3.4.3) becomes an explicit fonnula. Dual descriptions of P are more frequent in applications, though.

Example 3.4.3 With the notations above, suppose that P is described as an intersection (assumed nonempty) of half-spaces:

Hj := {x E ]Rn : (Cj' x) ~ dj} for j = 1, ... , m and P:= nHj = {x E]Rn : Cx ~ d} .

It is convenient to introduce the following notation: ]Rk and ]Rm are equipped with their standard dot-products; A [resp. C] is the linear operator which, to x E ]Rn, associates the vector whose coordinates are (Sj, x) in]Rk [resp. (Cj' x) E ]Rm]. Then

it is not difficult to compute the adjoints of A and C: for a E ]Rk and Y E ]Rm,

k

A*a = Lajsj, j=1

m

C*Y = LYjCj. j=1

Then we want to compute the conjugate of the polyhedral function

{ .max (sj,x)-bj ifCx~d,

x ~ g(x):= }=I •...• k +00 if not ,

and (3.4.3) becomes

g*(S) = min [bTa+up(s-A*a)]. aei!1k

(3.4.4)

To compute Up, we have its conjugate I p as the sum of the indicators of the Hj's; using Theorem 2.3.2 (with the qualification assumption (2.3.Q.j), since all the functions involved are polyhedral),

up(p) = min {Ej:.,1 UHj (Sj) : EJ=1 Sj = p} .

Using Example Y.3.4.4 for the support function of Hj' it comes

up(p) = min {dT Y : C*Y = p} .

Piecing together, we finally obtain g*(s) as the optimal value of the problem in the pair of variables (a, y):

4 Differentiability of a Conjugate Function 79

Observe the nice image-function exhibited by the last constraint, characterized by the linear operator [ A * IC* ]. The only difference between the variables a (involving the piecewise affine f) and y (involving the constraints in the primal space) is that the latter do not have to sum up to 1; they have no a priori bound. Also, note that the constraint of (3.4.4) can be more generally expressed as Cx - d E K (a closed convex polyhedral cone), which induces in (3.4.5) the constraint y E KO (another closed convex polyhedral cone, standing for the nonnegative orthant).

Finally, g* is a closed convex function. Therefore, there is some s E ]Rn such that the feasible domain of (3.4.5) is nonempty, and the minimal value is never -00.

These properties hold provided that g in (3.4.4) satisfies (1.1.1), i.e. that the domain C x ::;; d is nonempty. 0

4 Differentiability of a Conjugate Function

For f E Conv]Rn, we know from Corollary 1.4.4 that

s E af(x) if and only if x E af*(s);

and this actually was our very motivation for defining f* , see the introduction to this chapter. Geometrically, the graph of a f and of a f* in]Rn x]Rn are images of each other under the mapping (x, s) 1-+ (s, x). Knowing that a convex function is differentiable when its subdifferential is a singleton, smoothness properties of f* correspond to monotonicity properties of af, in the sense of §VI.6.1.

4.1 First-Order Differentiability

Theorem 4.1.1 Let f E Conv]Rn be strictly convex. Then int dom f* =f:. 0 and f* is continuously differentiable on int dom f*.

PROOF. For arbitrary Xo E dom f and nonzero d E ]Rn, consider Example 2.4.6. Strict convexity of f implies that

f(xo - td) - f(xo) f(xo + td) - f(xo) o < + for all t > 0 , t t

and this inequality extends to the suprema:

o < f~(-d) + f~(d).

Remembering that f~ = adorn f* (Proposition 1.2.2), this means


i.e. dom f* has a positive breadth in every nonzero direction d: its interior is nonempty - Theorem Y.2.2.3(iii).

Now, suppose that there is some S E intdomf* such that 8f*(s) contains two distinct points XI and X2. Then S E 8f(xl) n 8f(X2) and, by convex combination of the relations

f*(s) + f(Xi) = (s, Xi) for i = 1,2,

we deduce, using Fenchel's inequality (1.1.3):

!*(s) + E:=I (li!(Xi) = (s, E:=I (liXd :::;; !*(s) + f(E:=1 (liXi) ,

which implies that f is affine on [XI. X2], a contradiction. In other words, 8!* is single-valued on int dom f*, and this means f* is continuously differentiable there.

o

For an illustration, consider

whose conjugate is

lRn :3 s ~ f*(s) = { --/1 - IIsll2 if lis II :::;; 1, +00 otherwise.

Here, f is strictly convex (compute V2 f to check this), dom!* is the unit ball, on the interior of which !* is differentiable, but 8!* is empty on the boundary.

Incidentally, observe also that !* is strictly convex; as a result, f is differentiable. Such is not the case in our next example: with n = 1, take

f(x) := Ixi + !X2 = max {!X2 + X, !x2 - x} . (4.1.1)

Use direct calculations or calculus rules from §2 to compute

{ ! (s + 1)2 for s:::;; - 1 • f* (s) = 0 for - I :::;; s :::;; 1 ,

!(s - 1)2 for s ;;:: 1 .

This example is illustrated by the instructive Fig. 4.1.1. Its left part displays gr f, made up of two parabolas; !* (s) is obtained by leaning onto the relevant parabola a straight line with slope s. The right part illustrates Theorem 2.4.4: it displays 1/2 S2 + s and 1/2 S2 - s, the conjugates of the two functions making up f; epi!* is then the convex hull of the union of their epigraphs.

Example 4.1.2 We have studied in §IY.1.3(t) the volume of an ellipsoid: on the Euclidean space (Sn (lR), (( .•. ))) of symmetric matrices with the standard scalar product oflRnxn, the function

f(A) := { -log(detA) - nl2 if A is ~ositive definite. +00, othefWIse


x

s

Fig. 4.1.1. Strict convexity corresponds to differentiability of the conjugate

is convex. It is also differentiable on its domain, with gradient V f (A) = - A -I (cf. §A.4.3). Its conjugate can then be computed using the Legendre transform (Remark 0.1):

f*(B):= { -log(det(-B» -n12 if B is~egativedefinite, +00 otherwIse.

Thus f* is again a differentiable convex function; actually, f is strictly convex. This property, by no means trivial, can be seen either from the Hessian operator ~ f(A) = A-I ® A-I, i.e.

a2f -.....:....--(A) = (A-Ik(A-I)jk aaijaakt J

or from Theorem IVA.l.4: V f is strictly monotone because

for all symmetric positive definite matrices A =f:. B. o

Actually, the strict convexity of f in Example 4.1.2 can be directly established in our present framework. In fact, Theorem 4.1.1 gives a sufficient condition for smoothness of a closed convex function (namely f*). Apart from possible side-effects on the boundary, this condition is also necessary:

Theorem 4.1.3 Let f E Conv lRn be differentiable on the set D := intdom f. Then f* is strictly convex on each convex subset C C V f(il).

PROOF. Let C be a convex set as stated. Suppose that there are two distinct points SI and S2 in C such that f* is affine on the line-segment [sJ, S2]' Then, setting S := 1/2 (Sl + S2) E C C V f(il), there is x E il such that V f(x) = s, i.e. x E af*(s). Using the affine character of f*, we have

2

0= f(x) + f*(s) - (s, x) = ! L [J(x) + f*(sj) - (Sj, x)] j=1


and, in view ofFenchel 's inequality (1.1.3), this implies that each term in the bracket is 0: x E a!*(Sl) n a!*(S2), i.e. af(x) contains the two points Sl and S2, a contradiction to the existence of V f (x). 0

The strict convexity of f* cannot in general be extended outside V f(ll); and be aware that this set may be substantially smaller than one might initially guess: the function fs* in Example 1.6.2.4(e) is finite everywhere, but not strictly convex.

We gather the results ofthis section, with a characterization ofthe "ideal situation", in which the Legendre transform alluded to in the introduction is well-defined.

Corollary 4.1.4 Let f : lRn -+ lR be strictly convex, differentiable and I-coercive. Then

(i) f* is likewise finite-valued on lRn, strictly convex, differentiable and I-coercive; (ii) the continuous mapping V f is one-to-one from lRn onto lRn, and its inverse is

continuous; (iii) f*(s) = {s, (Vf)-l(S)} - f(Vf)-l (s») for all s E lRn. o

The simplest such situation occurs when f is a strictly convex quadratic function, as in Example 1.1.3, corresponding to an affine Legendre transform. Another example is f(x) := expO IIxIl2).

4.2 Towards Second-Order Differentiability

Along the same lines as in §4.1, better than strict convexity of f and better than differentiability of f* correspond to each other. We start with the connection between strong convexity of f and Lipschitz continuity of V f*.

(a) Lipschitz Continuity of the Gradient Mapping. The next result is stated for a finite-valued f, mainly because the functions considered in Chap. VI were such; but this assumption is actually useless.

Theorem 4.2.1 Assume that f : lRn -+ lR is strongly convex with modulus c > 0 on lRn: for all (Xl, X2) E lRn x lRn and a E ]0, 1 [,

Then dom f* = lRn and V f* is Lipschitzian with constant 1/ c on lRn:

PROOF. We use the various equivalent definitions of strong convexity (see Theorem VI.6.1.2). Fix Xo and So E of(xo): for all 0 =f. dE lRn and t ~ 0


hence f60 (d) = adorn f* (d) = +00, i.e. dom j* = IRn. Also, f is in particular strictly convex, so we know from Theorem 4.1.1 that f* is differentiable (on IRn). Finally, the strong convexity of f can also be written

in which we have Si E af(Xi), i.e. Xi = V j*(Si), for i = 1,2. The rest follows from the Cauchy-Schwarz inequality. 0

This result is quite parallel to Theorem 4.1.1: improving the convexity of f from "strict" to "strong" amounts to improving V f* from "continuous" to "Lipschitzian". The analogy can even be extended to Theorem 4.1.3:

Theorem 4.2.2 Let f : IRn -+ IR be convex and have a gradient-mapping Lipschitzian with constant L > 0 on IRn: for all (XI, X2) E ]Rn x IRn,

Then f* is strongly convex with modulus 1/ L on each convex subset C C dom a f*. In particular, there holds for all (XI, X2) E IRn x ]Rn

(4.2.2)

PROOF. Let Sl and S2 be arbitrary in dom af* C dom f*; take sand s' on the segment [Sl' S2]. To establish the strong convexity of f*, we need to minorize the remainder term j*(s') - j*(s) - (x, s' - s), with X E aj*(s). For this, we minorize f*(s') = SUPy[(s', y} - fey)], i.e. we majorize fey):

fey) = f(x) + (V f(x), y - x} + 11 (V f(x + t(y - x» - V f(x), y - x}dt

~ f(x) + (Vf(x), y - x} + !Llly - Xll2 = = - f*(s) + (s, y) + !LIIY - Xll2

. I (we have used the property fo t dt = 1/2, as well as x E aj*(s), i.e. V f(x) = s). In summary, we have

f*(s') ~ f*(s) + sup [(s' - s, y) - !Llly - x1l2] . y

Observe that the last supremum is nothing but the value at s' - s of the conjugate of 1/2 L II . -x 112. Using the calculus rule 1.3.1, we have therefore proved

f*(s') ~ f*(s) + (s' - s, x) + tt-lis' - sll2 (4.2.3)

for all s, s' in [Sl' S2] and all x E aj*(s). Replacing s' in (4.2.3) by Sl and by S2, and setting s = aSI + (1 - a)s2' the strong convexity (4.2.1) for j* is established by convex combination.


On the other hand, replacing (s, s') by (SI' S2) in (4.2.3):

f*(S2) ~ f*(sl) + (S2 - SJ, XI) + n:lIs2 - sI1I2 for all XI E of*(sl).

Then, replacing (s, s') by (S2' SI) and summing:

(XI - X2, SI - S2) ~ ilisl - s2112.

In view ofthe differentiability of f, this is just (4.2.2), which has to hold for all (XI, X2) simply because 1m of* = dom "If = ]Rn. 0

Remark 4.2.3 For a convex function, the Lipschitz property of the gradient mapping thus appears as equivalently characterized by (4.2.2); a result which is of interest in itself. As an application, let us return to §3.2: for a convex set C, the convex function II . 112 - d~ has gradient Pc, a nonexpansive mapping. Therefore

IIpC(x) - pc(y) 112 ~ (pc(x) - pc(y), X - y}

for all X and y in ]Rn. Likewise, V (I /2 d~) = I - PC is also nonexpansive:

II(I - pc)(x) - (I - pc)(y) 112 ~ (I - pc)(x) - (1- pc) (y), X - y}. 0

Naturally, Corollary 4.1.4 has also its equivalent, namely: if f is strongly convex and has a Lipschitzian gradient-mapping on ]Rn, then f* enjoys the same properties. These properties do not leave much room, though: f (and f*) must be "sandwiched" between two positive definite quadratic functions.

(b) Second-Order Approximations. Some Lipschitz continuity of the gradient is of course necessary for second-order differentiability, but is certainly not sufficient: (4.1.1) is strongly convex but its conjugate is not C2• More must therefore be assumed to ensure the existence of "12 f*, and we recall from §1.6.2 that finding minimal assumptions is a fairly complex problem.

In the spirit of the "unilateral" and "conical" character of convex analysis, we proceed as follows. First, a function ({J E Conv]Rn is said to be directionally quadratic if it is positively homogeneous of degree 2:

((J(tx) = t 2({J(x) for all X E ]Rn and t > o. If ({J is directionally quadratic, immediate consequences of the definition are:

- ({J(O) = 0 (see the beginning of §v.l.l); - ({J* is directionally quadratic as well (easy to check); - hence ({J* (0) = 0, which in turn implies that ({J is minimal at 0: a directionally

quadratic function is nonnegative;

- as a result (see Example V.l.2. 7), "fiP is the support function of a closed convex set containing o.

Of particular importance are the directionally quadratic functions which are finite everywhere, and/or positive (except at 0); these two properties are dual to each other:


Lemma 4.2.4 For a directionally quadratic function qJ, the following properties are equivalent:

(i) qJ is finite everywhere; (ii) VqJ(O) exists (and is equal to 0);

(iii) there is C ;;;: 0 such that qJ(x) ~ 1/2 Cllxll2 for all x E JR.n;

(iv) there is c > 0 such that qJ*(s) ;;;: 1/2CllsIl2 for all s E JR.n;

(v) qJ*(s) > o for all s;/; o.

PROOF. [(i) {:} (ii) ::} (iii)] When (i) holds, qJ is continuous; call 1/2 C ;;;: 0 its maximal value on the unit sphere: (iii) holds by positive homogeneity. Furthermore, compute difference quotients to observe that qJ'(O,.) == 0, and (ii) follows from the differentiability criterion IY.4.2.1.

Conversely, existence of VqJ(O) implies finiteness of qJ in a neighborhood of 0 and, by positive homogeneity, on the whole space.

[(iii)::} (iv)::} (v)] When (iii) holds, (iv) comes from the calculus rule 1.3. 1 (vii), with for example 0 < c:= I/C (or rather I/(C + 1), to take care of the case C = 0); this implies (v) trivially.

[(v) ::} (i)] The lower semi-continuous function qJ* is positive on the unit sphere, and has a minimal value c > 0 there. Being positively homogeneous of degree 2, qJ* is then I-coercive; finiteness of its conjugate qJ follows from Proposition 1.3.8. 0

Definition 4.2.5 We say that the directionally quadratic function qJs defines a minorization to second order of f E Conv JR.n at Xo E dom f, and associated with s E af(xo), when

f(xo + h) ;;;: f(xo) + (s, h) + qJs(h) + 0(lIhIl2).

We say that qJs defines likewise a majorization to second order when

f(xo + h) ~ f(xo) + (s, h) + qJs(h) + 0(lIhIl2).

(4.2.4)

(4.2.5) o

Note in passing that, because qJs ;;;: 0, (4.2.4) could not hold if s were not a subgradient of f at Xo. Whenever Xo E dom af, the zero function and I{o} define trivial minorizations and majorizations to second order respectively, valid for all SEa f (Xo). This observation motivates the particular class of directionally quadratic functions introduced in Lemma 4.2.4.

We proceed to show that the correspondences established in Lemma 4.2.4 are somehow conserved when remainder terms are introduced, as in Definition 4.2.5. First observe that, if (4.2.5) holds with qJs finite everywhere, then V f(xo) exists (and is equal to s). The next two results concern the equivalence (iii) {:} (iv).

Proposition 4.2.6 With the notation of Definition 4.2.5, suppose that there is a directionally quadratiC function qJs satisfying, for some c > 0,


(4.2.6)

and defining a minorization to second order of f E Conv]Rn, associated with s E

of (Xo). Then ({J: defines a majorization of f* at s, associated with Xo. This implies in particular V f* (s) = Xo.

PROOF. [Preamble] First note that ({J: is finite everywhere: as already mentioned after Definition 4.2.5, differentiability of f* will follow from the majorization property of f*.

We take the tilted function

]Rn 3 h ~ g(h) := f(xo + h) - f(xo) - (s, h) ,

so that the assumption means

In view of the relation

g*(p) = f*(s + p) + f(xo) - (s + p, xo) = f*(s + p) - f*(s) - (p, xo) ,

we have only to prove that ({J: defines a majorization of g*, i.e.

[Step 1] By assumption, for any e > 0, there is a 0 > 0 such that

g(h) ~ ((Js(h) - ellhll2 whenever IIhll ~ 0;

and, in view of (4.2.6), we write:

I g(h) ~ (1 - ¥) ((Js(h) whenever IIhll ~ 0, g(h) ~ Hc _ e) IIh1l2.

Our aim is then to establish that, even though (*) holds locally only, the calculus rule 1.3. 1 (vii) applies, at least in a neighborhood ofO.

[Step 2] For IIhll > 0, write the convexity of g on [0, h] and use g(O) = 0 to obtain with (**)

hence g(h) ~ Hc - e) ollhll.

Then, if lip II ~ (1/2C - e)o =: 0', we have

(p, h) - g(h) ~ Hc - e) IIhll - g(h) ~ 0 whenever IIhll > o.

Since g* ~ 0, this certainly implies that


g*(p) = sUPllhll ~ o[(p, h} - g(h)] ~ sUPllhll ~o[(p,h) - (1- ¥)q1s(h)] ~ [(1 - ¥) q1s]*(p) for IIpll ~ 8'.

[Step 3] Thus, using the calculus rule l.3.l(ii) and positive homogeneity of q1;, we have for P E R(O, 8'),

g*(p) ~ l-iE/Cq1;(P) = q1;(p) + c:'Euq1;(p) ~ q1;(p)+c:'EuicllpII 2. [from (4.2.6)]

Let us sum up: given s' > 0, choose S in Step 1 such that C(C:'2E) ~ s'. This gives

8 > ° and 8' > ° in Step 2 yielding the required majorization

g*(p) ~ q1;(p) + s'lIplI2 for all P E R(O, 8'). o

Proposition 4.2.7 With the notation of Definition 4.2.5, suppose that there is a directionally quadratic function q1s satisfYing (4.2.6) for some c > ° and defining a majorization to second order of f E Conv IRn, associated with s E af(xo). Then q1; defines a minorization to second order of f* at s, associated with Xo.

PROOF. We use the proof pattern of Proposition 4.2.6; using in particular the same tilting technique, we assume Xo = 0, f(O) = ° and s = 0. For any S > 0, there is by assumption 8 > ° such that

f(h) ~ q1s(h) + sIIhll2 ~ (1 + 2s/c)q1s(h) whenever IIhll ~ 8.

It follows that

f*cp) ~ sup {(p,h) - (1 +2s/c)q1s(h) : IIhll ~8} (4.2.7)

for all P; we have to show that, for lip II small enough, the right-hand side is close to q1;(p).

Because of (4.2.6), q1; is finite everywhere and Vq1;(O) = ° (Lemma 4.2.4). It follows from the outer semi-continuity of aq1; (Theorem VI.6.2.4) that, for some 8' > 0,

0=1- aq1;(I+iE/CP) c R(O, 8) for all P E R(O, 8').

Said otherwise: whenever IIpll ~ 8', there exists ho E R(O, 8) such that

*( I ) (p,ho) q1s I+u/cP = 1 + 2s/c - q1s(ho).

Thus, the function (p, .) - (1 + 2s/c)q1s attains its (unconstrained) maximum at ho. In summary, provided that II p II is small enough, the index h in (4.2.7) can run in the whole space, to give

f*(p) ~ [(1 + 2s/c)q1s]*(p) for all p E R(O, 8').


To conclude, mimic Step 3 in the proof of Proposition 4.2.6.

To show that (4.2.6) cannot be removed in this last result, take

1R2 3 (~, 1/) ~ f(~.1/) = ~~2 + *1/4,

o

so that ({Jo(p, 'l') := 1/2 p2 defines a majorization to second order of f at Xo = O. We leave it as an exercise to compute f* and ({J6' and to realize that the latter is not suitable for a minorization to second order.

Example 4.2.8 Take

f(x) = max {Jj(x) : j = I .... , m},

where each fj : IRn -+ IR is convex and twice differentiable at Xo. Using the notation J := {j : Jj (xo) = f(xo)} for the active index-set at Xo, and ,1 being the associated unit simplex, assume for simplicity that Aj := "12 Jj(xo) is positive definite for each j E J. For a E ,1, define

and

Sa := 'LajVJj(xo) E af(xo) jeJ

({Ja(h) := ~ 'Laj{Ajh, h}; jeJ

this last function is convex quadratic and yields the minorization

f(xo + h) ~ 'LajJj(xo + h) = f(xo) + {sa. h} + ((Ja(h) + o(lIhIl2). jeJ

Because ({Ja is strongly convex, Proposition 4.2.6 tells us that

f*(sa + p) ~ f*(sa) + {Po xo} + ((J:(p) + o(lIpII2) ,

where

({J:(p) = ~([ LjeJ ajAj rip, p).

Note that the function ({J := co(minaeLl ((Ja) also defines a minorization to second order of fat Xo, which is no longer quadratic, but which still satisfies (4.2.6), and is independent of a, i.e. of SEa f (Xo). Dually, it corresponds to

f*(s + p) ~ f*(s) + {p, Xo} + ~ ~{At p, p} + o(lIpll2) leJ

for all p E IRn , and this majorization is valid at all s E af(xo). o

If a directionally quadratic function ({Js defines both a minorization and a rnajorization of f (at Xo, associated with s), it will be said to define an approximation to second order of f. As already stated, existence of such an approximation implies a rather strong smoothness property of f at Xo.

Convex quadratic functions are particular instances of directionally quadratic functions. With Propositions 4.2.6 and 4.2.7 in mind, the next result is straightforward:


Corollary 4.2.9 Let f E Conv]Rn have an approximation at Xo which is actually quadratic:for s := V f (xo) and some symmetric positive semi-definite linear operator A,

f(xo + h) = f(xo) + (s. h) + ~(Ah. h) + 0(lIhIl 2).

If, in addition, A is positive definite, then f* has also a quadratic approximation at s, namely

f*(s + p) = f*(s) + (P. xo) + ~(A -I P. p) + 0(lIpIl2). o

The global version of Corollary 4.2.9 is:

Corollary 4.2.10 Let f : ]Rn -+ ]R be convex, twice differentiable and I-coercive. Assume, moreover, that V2 f (x) is positive definite for all x E ]Rn. Then f* enjoys the same properties and

PROOF. The l-coercivity of f implies that dom f* = ]Rn, so 1m V f = dom f* = ]Rn:

apply Corollary 4.2.9 to any x E ]Rn with A = V2 f(x). 0

The assumptions of Corollary 4.2.10 are certainly satisfied if the Hessian of f is uniformly positive definite, i.e. if

3c > 0 such that (V2 f(x)d, d) ~ clldll 2 for all (x, d) E ]Rn x ]Rn .

Use the I-dimensional counter-example f(x) := expx to realize that I-coercivity is really more than mere positive definiteness of V2 f (x) for all x E ]Rn.

Remark 4.2.11 To conclude, let us return to Example 4.2.8, which illustrates a fundamental difficulty for a second-order analysis. Under the conditions of that example, f can be approximated to second order in a number of ways.

(a) For h E ]Rn, define

Jh := {j E J : f'(xo. h) = (VJj(xo). h}};

in other words. the convex hull of the gradients indexed by J h forms the face of a f (xo) exposed by h. The function

is directionally quadratic and, for all d E ]Rn, we have the estimate

f(xo + td) = f(xo) + (f' (xo. d) + (2rp(d) + 0«(2) for ( > o.

Unfortunately, the remainder term depends on d, and actually rp is not even continuous (too bad if we want to approximate a nicely convex function). To see this. just take m = 2. fl and h quadratic, and Xo at a kink: fl (xo) = h(xo). Then let do be tangent to the kinky surface (if necessary look at Fig. VI.2.I.3, which displays such a


surface). For d in the neighborhood of this do, q;(d) is equal to either 1/2 (AId, d) or 1/2 (A2d, d), depending on which index prevails at Xo + td for small t > O.

(b) Another second-order estimate is

It does not suffer the non-uniformity effect of (a) and the approximating function is now nicely convex. However, it patches together the first- and second-order information and is no longer the sum of a sublinear function and a directionally quadratic function. D

Documents

[Grundlehren der mathematischen Wissenschaften] Convex Analysis and Minimization Algorithms II Volume 306 ||