[Grundlehren der mathematischen Wissenschaften] Convex Analysis and Minimization Algorithms II Volume 306 || Approximate Subdifferentials of Convex Functions

XI. Approximate Sub differentials of Convex Functions

Prerequisites. Sublinear functions and associated convex sets (Chap. V); characterization of the subdifferential via the conjugacy correspondence (§X.l.4); calculus rules on conjugate functions (§X.2); and also: behaviour at infinity of one-dimensional convex functions (§I.2.3).

Introduction. There are two motivations for the concepts introduced in this chapter: a practical one, related with descent methods, and a more theoretical one, in the framework of differential calculus.

- In §VIII.2.2, we have seen that the steepest-descent method is not convergent, essentially because the subdifferential is not a continuous mapping. Furthermore, we have defined Algorithm IX.l.6 which, to find a descent direction, needs to extract limits of subgradients: an impossible task on a computer.

- On the theoretical side, we have seen in Chap. VI the directional derivative of a finite convex function, which supports a convex set: the subdifferential. This latter set was generalized to extended-valued functions in §X.1.4; and infinite-valued directional derivatives have also been seen (Proposition 1.4.1.3, Example X.2.4.3). A natural question is then: is the supporting property still true in the extended-valued case? The answer is not quite yes, see below the example illustrated by Fig. 2.1.1.

The two difficulties above are overcome altogether by the so-called e-subdifferential of f, denoted as f, which is a certain perturbation of the subdifferential studied in Chap. VI for finite convex functions. While the two sets are identical for e = 0, the properties of as f turn out to be substantially different from those of af. We therefore study as f with the help of the relevant tools, essentially the conjugate function f* (which was of no use in Chap. VI). In return, particularizing our study to the case e = 0 enables us to generalize the results of Chap. VI to extended-valued functions.

Throughout this chapter, and unless otherwise specified, we therefore have

1 f E Conv]Rn and e ~ 0 .1

However, keeping in mind that our development has a practical importance for numerical optimization, we will often pay special attention to the finite-valued case.

J.-B. Hiriart-Urruty et al., Convex Analysis and Minimization Algorithms II© Springer-Verlag Berlin Heidelberg 1993

92 XI. Approximate Subdifferentials of Convex Functions

1 The Approximate Subdifferential

1.1 Definition, First Properties and Examples

Definition 1.1.1 Given x E dom f, the vector s E JRn is called an e-subgradient of f at x when the following property holds:

fey) ~ f(x) + (s, y - x) - e for all y E JRn . (1.1.1)

Of course, s is still an e-subgradient if (1.1.1) holds only for y E dom f. The set of all e-subgradients of f at x is the e-subdifferential (of f at x), denoted by ad(x).

Even though we will rarely take e-subdifferentials of functions not in Conv JRn , it goes without saying that the relation of definition (1.1.1) can be applied to any function finite at x. Also, one could set ad (x) = 0 for x ~ dom f. 0

It follows immediately from the definition that

aef(x) Cae' f(x) whenever e ~ e' ;

af(x) = aof(x) = n{ad(x) : e > O} [= lime.j,o ad(x)];

aae+(l-a)e' f(x) :,) aad(x) + (1 - a)ae, f(x) for all a E ]0, 1[.

(1.1.2)

(1.1.3)

(1.1.4)

The last relation means that the graph of the multifunction JR+ 3 e f---+ ae f (x) is a convex set in JRn+l; more will be said about this set later (Proposition 1.3.3).

We will continue to use the notation af, rather than aof, for the exact subdifferential- knowing that ae f can be called an approximate subdifferential when e > O.

Figure 1.1.1 gives a geometric illustration of Definition 1.1.1: s, together with r E JR, defines the affine function y f-+ as ,r (y) := r + (s, y - x); we say that s is an e-subgradient of f at x when it is possible to have simultaneously r ~ f (x) - e and as,r ~ f. The condition r = f(x), corresponding to exact subgradients, is thus relaxed bye; thanks to closed convexity of f, this relaxation makes it possible to find such an as,r:

, f(x) I

e ~: _------ gr s,r ..--1---...... -----... f x .....

Fig.t.t.t. Supporting hyperplanes within c

Theorem 1.1.2 For all x E dom f, ad (x) =f. 0 whenever e > O.

1 The Approximate Subdifferential 93

PROOF. Theorem 111.4.1.1 implies the existence in IRn x IR of a hyperplane Hs,a separating strictly the point (x, 1 (x) - e) from the closed convex set epi I: for all y E IRn

(s, x) + a[f(x) - e] < (s, y) + al(y).

Taking y = x in this inequality shows that

a[f(x) - e] < al(x) < +00

hence a > o (the hyperplane is non-vertical). Thens':= -s/a E 8el(x). 0

Incidentally, the above proof shows that, among the e-subgradients, there is one for which strict inequality holds in (1.1.1). A consequence of this result is that, for e > 0, the domain of the multifunction x ~ 8e 1 (x) is the convex set dom I. Here is a difference with the case e = 0: we know that dom 81 need not be the whole of dom I; it may even be a nonconvex set.

Consider for example the one-dimensional convex function

I(x) := { -2.jX ifx ~ ?, +00 otherwtse .

(1.1.5)

Applying the definition, an e-subgradient of 1 at x = 0 is an s E IR satisfying

sy + 2Jy - e ~ 0 for all y ~ 0 ,

and easy calculation shows that the e-subdifferential of 1 at 0 is ] - 00, -1/ e ]: it is nonempty for all e > 0, even though 81(0) = 0. Also, note that it is unbounded, just because 0 is on the boundary of dom I.

For a further illustration, let 1 : IR ~ IR be I(x) := Ix!. We have

{ [-I,-I-e/x] if x <-e/2,

8ef(x)= [-1,+1] if-e/2~x~e/2,

[1-e/x,l] ifx>e/2.

The two parts of Fig. 1.1.2 display this set, as a multifunction of e and x respectively. It is always a segment, reduced to the singleton {f'(x)} only for e = O(whenx # 0). This example suggests that the approximate subdifferential is usually a proper enlargement of the exact subdifferential; this will be confirmed by Proposition 1.2.3 below.

Another interesting instance is the indicator function of a nonempty closed convex set:

Definition 1.1.3 The set of e-normal directions to a closed convex set C at x E C, or the e-normal set for short, is the e-subdifferential of the indicator function Ie at x:

Ne,e(x) := 8eIc(x) = {s E IRn : (s, y - x) ~ e for all y E C}. (1.1.6) o


s s

x o E

for fixed E for fixed x Fig.l.l.2. Graphs of the approximate subdifferential of Ixl

The 8-nonnal set is thus an intersection of half-spaces but is usually not a cone; it contains the familiar nonnal cone Nc(x), to which it reduces when 8 = O. A condensed fonn of (1. 1.6) uses the polar of the set C - {x}:

aeIC(x) = 8(C - x)O for all x E C and 8 > o. These examples raise the question of the boundedness of aef(x).

Theorem 1.1.4 For 8 ~ 0, aef(x) is a closed convex set, which is nonempty and bounded if and only if x E int dom f.

PROOF. Closedness and convexity come immediately from the definition (1.1.1). Now, if x E intdomf, then aef(x) contains the nonempty set af(x) (Theo

rem X.1.4.2). Then let 8 > 0 be such that B(x, 8) C intdomf, and let L be a Lipschitz constant for f on B(x, 8) (Theorem Iy'3.1.2). For 0 :f:. s E aef(x), take y = x + 8s/lIsll:

f(x) + L8 ~ f(y) ~ f(x) + 8{s, s/lIsll) - 8

i.e. IIsll ::;,;; L + 8/8. Thus, the nonempty aef(x) is also bounded. Conversely, take any s) in the normal cone to dom f at x E dom f:

(sJ, y - x) ::;,;; 0 for all y E dom f .

If aef(x) :f:. 0, add this inequality to (1.1.1) to obtain

f(y) ~ f(x) + (s + SJ, Y - x) - 8

for all y E dom f. We conclude s + s) E as f (x): ae f (x) contains (actually: is equal to) ae f (x) + Ndom f (x), which is unbounded if x ¢ int dom f. 0

In the introduction to this chapter, it was mentioned that one motivation for the 8-subdifferential was practical. The next result gives a first explanation: as f can be used to characterize the 8-solutions of a convex minimization problem - but, starting from Chap. XIII, we will see that its role is much more important than that.

Theorem 1.1.S The follOWing two properties are equivalent:

o E aef(x) ,

f(x) ::;,;; f(y) + 8 for all y E ]Rn •

PROOF. Apply the definition. o


1.2 Characterization via the Conjugate Function

The conjugate function /* allows a condensed writing for (1.1.1):

Proposition 1.2.1 The vector s E lRn is an s-subgradient of f at x E dom f if and only if

f*(s) + f(x) - (s, x) ~ s . (1.2.1)

As a result, S E 8ef(x) <==> x E 8ef*(s) . (1.2.2)

PROOF. The definition (1.1.1) of 8ef(x) can be written

f(x) + [(s, y) - f(y)] - (s, x) ~ s for all y E dom f

which, remembering that /* (s) is the supremum of the bracket, is equivalent to (1.2.1). This implies that 8ef(x) C dom /* and, applying (1.2.1) with f replaced by f*:

x E 8ef*(s) <==> f**(x) + f*(s) - (s, x) ~ s .

The conclusion follows since Conv lRn 3 f = f**. o

In contrast with the case s = 0, note that

8e f(lRn) = dom f* whenever s > O.

Indeed, the inclusion "c" comes directly from (1.2.1); conversely, if s E dom f*, we know from Theorem 1.1.2 that there exists x E 8ef*(s), i.e. s E 8ef(x).

Likewise, for fixed x E dom f,

[lime-jo+oo 8ef(x) =] U {8ef(x) : s > O} = dom f* . (1.2.3)

Here again, the "c" comes directly from (1.2.1) while, if s E dom /*, set

ss := f*(s) + f(x) - (s, x) E [0, +oo[

to see that s is in the corresponding 8esf(x).

Example 1.2.2 (Convex Quadratic Functions) Consider the function

lRn 3 x f-+ f(x) := 4(Qx, x) + (b, x)

where Q is symmetric positive semi-definite with pseudo-inverse Q-. Using Example X.1.1.4, we see that 8e f (x) is the set

{sEb+lmQ: f(x)+4(s-b,Q-(s-b))~(s,x)+s}.

This set has a nicer expression if we single out V f (x): setting p = s - b - Qx, we see that s - b E 1m Q means p E 1m Q and, via some algebra,


ad(x) = {Vf(x)} + {p E ImQ : !(p, Q-p)::::; e} . (1.2.4)

Another equivalent formula is obtained if we set y := Q- p (so p = Qy):

ad(x) = {Vf(x) + Qy : !(Qy, y)::::; e}. (1.2.5)

To illustrate Theorem 1.1.5, we see from (1.2.4) that

... f 'h' {Vf(X)=Qx+bEIm Q [i.e.bEImQ], x mlmmlzes Wit In e {::::=::> and !(V f(x), Q-V f(x)) ::::; e .

When Q is invertible and b = 0, f defines the norm Illxlll = (Qx, X)I/2. Its esubdifferential is then a neighborhood of the gradient for the metric associated with the dual norm Illslll* = (s, Q-1S)I/2. 0

The above example suggests once more that as f is usually a proper enlargement of af; in particular it is "never" reduced to a singleton, in contrast with af, which is "often" reduced to the gradient of f. This is made precise by the next results, which somehow describe two opposite situations.

Proposition 1.2.3 Let f E ConvlR.n be I-coercive, i.e. f(x)/lIxll ~ +00 ifllxll ~ 00. Then, for all x E dom f,

0::::; e < e' ===} ad(x) c int as' f(x).

Furthermore, any s E lR.n is an e-subgradient of f at x, providing that e is large enough.

PROOF. Written in the form (1.2.l), an approximate subdifferential of f appears as a sublevel-set of the convex function f* - (', x); but the 1-coercivity of f means that this function is finite everywhere (Proposition X.1.3.8). For e' > e, the conditions of Proposition VI. 1.3.3 are then satisfied to compute the interior of our sublevel-set:

int as' f(x) = {s E lR.n : f*(s) + f(x) - (s, x) < e'} :::> ad(x).

As for the second claim, it comes directly from (1.2.3). o

Proposition 1.2.4 Let f E Conv lR.n and suppose that aso f (Xo) is a Singleton for some Xo E dom f and eo > O. Then f is affine on lR.n.

PROOF. Denote by So the unique eo-subgradient of f at Xo. Let e E ]0, eo[; in view of the monotonicity property (1.1.2), the nonempty set as f (Xo) can only be the same singleton {so}. Then let e' > eo; the graph-convexity property (1.1.4) easily shows that as' f(xo) is again {so}.

Thus, using the characterization (1.2.l) of an approximate subgradient, we have proved:

s =1= So ===} f*(s) > e + (so, xo) - f(xo) for all e > 0,


and not much room is left for f*: indeed

f* = r + I{sol for some real number r .

The affine character of f follows; and naturally r = !*(so) = (so, xo) - f(xo). 0

Our next example somehow generalizes this last situation.

Example 1.2.5 (Support and Indicator Functions) Let a = ae be a closed sublinear function, i.e. the support of the nonempty closed convex set C = aa (0). Its conjugate a * is just the indicator of C, hence

aeac(d) = {s E C : a(d) ~ (s, d) + B}. (1.2.6)

Remembering that a (0) = 0, we see that

aeac(d) c aac(O) = aeac(O) = C for all d E IRn .

Keeping Proposition 1.2.4 in mind, the approximate subdifferential of a linear -or even affine - function is a singleton. The set aeac(d) could be called the "B-face" of C exposed by d, i.e. the set of B-maximizers of (', d) over C. As shown in Fig. 1.2.1, it is the intersection of the convex set C with a certain half-space depending on B. For B = 0, we do obtain the exposed face itself: aac(d) = Fe(d).

( .... 1\ ""'ctd)

•• C , •• ~ \\ . .., ...... ,

o

Fig.I.2.I. An 8-face

d

Along the same lines, the conjugate ofIe in (1.1.6) is ae, so the B-normal set to C at x is equivalently defined as

Ne,e(x) = {s E IRn : ac(s) ~ (s, x) + B}.

Beware of the difference with (1.2.6); one set looks like a face, the other like a cone; the relation linking them connotes the polarity relation, see Remark V.3.2.6.

Consider for example a half-space: with s I- 0,

H-:= {x ERn: (s,x) (r}.

Its support function is computed by straightforward calculations, already seen in Example V.2.3.1; we obtain for x E H-:

NH-,s(x) = {AS: ° (A and (r - (s, X)A (8}.

In other words, the approximate norq1al set to a half-space is a segment, which stretches to the whole normal cone when x approaches the boundary. Figure 2.1.1 will illustrate an 8-normal set in a nonlinear situation. o


1.3 Some Useful Properties

(a) Elementary Calculus. First, we list some properties coming directly from the definitions.

Proposition 1.3.1

(i) For the function g(x) := I(x) + r, aeg(x) = ad(x). (ii) For the function g(x) := al(x) and a > 0, aeg(x) = aae/al(x). (iii) For the function g(x) := I(ax) and a =f:. 0, aeg(x) = aael(ax). (iv) Moregenerally, if A is an invertible linear operator, ae(f oA)(x) = A*ad(Ax). (v) For the function g(x) := I(x - xo), aeg(x + xo) = a/(x). (vi) For the function g(x) := I(x) + {so, x}, aeg(x) = ad(x) + {so}. (vii) If II ~ 12 and II (x) = l2(x), then adl(x) C ael2(x).

PROOF. Apply (1.1.1), or combine (1.2.1) with the elementary calculus rules X.l.3 .1, whichever seems easier. 0

Our next result expresses how the approximate subdifferential is transformed, when the starting function is restricted to a subspace.

Proposition 1.3.2 Let H be a subspace containing a point of dom I and call PH the operator of orthogonal projection onto H. For all x E dom I n H,

i.e.

PROOF. From the characterization (1.2.1), s E ae(f + IH)(X) means

e ~ (f + IH)(X) + (f + IH)*(S) - {s, x} = = I(x) + [(f oPH)* 0PH] (s) - {s,x} = I(PH(X» + (f 0 PH)*(PH(S» - (s, PH(X)} = (f 0 PH)(X) + (f 0 PH)*(PH(S» - (PH(S), x}.

[from Prop. X.l.3.2] [x = PH(X)]

[p H is symmetric]

This just means that PH (s) is in the e-subdifferential of lop H at x. o

A particular case is when doml C H: then I + IH coincides with I and we have ael = ae(f 0 PH) + H.l..

(b) The Tilted Conjugate Function. From (1.2.1), ad(x) appears as the sublevelset at level e of the "tilted conjugate function"

]Rn :3 s ~ I*(s) - {s, x} + I(x) =: g;(s) , (1.3.1)

which is clearly in Conv]Rn (remember x E dom I!) and plays an important role. Its infimum on]Rn is ° (Theorem 1.1.2), attained at the subgradients of I at x, if there


are any. Its conjugate g;* = gx is easily computed, say via Proposition X.l.3.1, and is simply the "shifted" function

JRn :3 h ~ gx(h) := f(x + h) - f(x). (1.3.2)

Indeed, when connecting g; to the approximate subdifferential, we just perform the conjugacy operation on f, but with a special role attributed to the particular point x under consideration.

Proposition 1.3.3 For x E dom f, the epigraph of g; is the graph of the multifunction s ~ oef(x):

{(s, s) E JRn x JR : g;(s) ~ s} = {(s, s) E JRn x JR : s E oef(x)}. (1.3.3)

The supportfunction of this set has, at (d, -u) E JRn x JR, the value

O'epig*(d, -u) = sup{(s, d) - su : s E oef(x)} = x s,e

(1.3.4)

{ u[f(x + diu) - f(x)] if u > 0,

= O'domj*(d) = f60(d) if u = 0, +00 if u < O.

PROOF. The equality of the two sets in (1.3.3) is trivial, in view of(I.2.1) and (1.3.1). Then (1.3.4) comes either from direct calculations, or via Proposition X.l.2.1, with f replaced by g;, whose domain is just dom f* (Proposition X.l.3.1 may also be used).

o

Remember §IY.2.2 and Example 1.3.2.2 to realize that, up to the closure operation at u = 0, the function (l.3.4) is just the perspective of the shifted function h ~ f(x + h) - f(x).

Figure 1.3.1 displays the graph of s ~ oe f (x), with the variable s plotted along the horizontal axis, as usual (see also the right part of Fig. 1.1.2). Rotating the picture so that this axis becomes vertical, we obtain epi g;.

£

Fig. 1.3.1. Approximate subdifferential as a SUblevel set in the dual space


(c) A Critical e-Value.

Proposition 1.3.4 Assume that the infimal value j of lover JRn is finite. For all x E dom I, there holds

inf (8 > 0 : 0 E aef(x)} = I(x) - j.

PROOF. Because g; is a nonnegative function, we always have

g1(0) = inf (8 > 0 : 8 ~ g1(0)}

which, in view of (1.3.3) and of the definition of g;, can also be written

(1.3.5)

inf (e > 0 : 0 E aef(x)} = 1*(0) + I(x) = - j + I(x). 0

The relation expressed in (1.3.5) is rather natural, and can be compared to Theorem 1.1.5. It defines a sort of "critical" value for 8, satisfying the following property:

Proposition 1.3.5 Assume that 1 is l-coercive. For a non-optimal x E dom I, define

e := I(x) - j > o. Then the normal cone to ae 1 (x) at 0 is the set of directions pointing from x to the minimizers of I:

Naef(x)(O) = JR+(Argminl - (x}).

PROOF. We know that ael(x) is the sublevel-set of g; at level e and we use Theorem VI.1.3.5 to compute its normal cone. From the coercivity of I, g; is finite everywhere (Proposition X.1.3.S); if, in addition, e > 0, the required Slater assumption is satisfied; hence, for an s satisfying g; (s) = e [i.e. s E bd aef (x)],

Nae!(x)(s) = JR+ag1(s) = JR+(a/*(s) - (x}).

In this formula, we can set e = e > 0 and s = 0 because

g1(0) = 1*(0) + I(x) - (0, x) = e. The result follows from the expression (X. 1.4.6) of Argminl as a/*(O). 0

(d) Representations of a Closed Convex Function. An observation inspired by the function g; is that the multifunctions e t---+ F(e) that are e-subdifferentials (of a function 1 at a point x) are the closed convex epigraphs; see §IY.1.3(g). Also, if ael(x) is known for all e > 0 at a given x E dom I, then /* is also known, so 1 itself is completely determined:

Theorem 1.3.6 For x E dom I, there holds

I(x + h) = I(x) + sup {(s, h) - 8 : e > 0, s E ael(x)} (1.3.6)

for all h E JRn; or, using support functions:

I(x + h) = I(x) + sup[crae!(x) (h) - e]. e>O

(1.3.7)


PROOF. Fix x e dom/. Then set u = 1 in (1.3.4) to obtain uepigj(h, -1) = gx(h), i.e. (1.3.7), which is just a closed form for (1.3.6). 0

Using the positive homogeneity of a support function, another way of writing (1.3.7) is: ifx e dom I,

I(x + td) = I(x) + t sup [uataf(x) (d) - a] for all t > o. u>o

Theorem 1.3.6 gives a converse to Proposition 1.3.1 (vii): if, for a particular x e dom II n dom 12,

odl (x) C oeh(x) for alle > 0,

then II (y) - II (x) :::;; h(y) - h(x) for all y e ]Rn .

Remark 1.3.7 The function g; differs from f* by the affine function I (x) - (., x) only. In the primal graph-space, gx is obtained from I by a mere translation of the origin from (0,0) to (x, I(x». Basically, considering the graph of od, instead of the epigraph of f*, just amounts to this change of variables. In the framework of numerical optimization, the distinction is significant: it is near the current iterate x of an algorithm that the behaviour of I matters. Here is one more practical motivation for introducing the approximate subdifferential.

Theorem 1.3.6 confirms this observation: numerical algorithms need a model of I near the current x - remember §II.2. A possibility is to considera model for 1*: it induces via (1.2.1) a model for the multifunction e I---+- oel(x), which in turn gives via (1.3.7) a model for I, hence a possible strategy for finding the next iterate x+.

o

This remark suggests that, in the framework of numerical optimization, (1.3.7) will be especially useful with h close to O. More globally, we have

[/**(x) =] I(x) = sup {(s, x) - I*(s) : s e ]Rn},

which also expresses a closed convex function as a supremum, but of affine functions. The index s can be restricted to dom 1*, or even to ri dom 1*; another better idea: restrict s to dom of* (::J ridomf*), in which case f*(s) = (s, y) - I(y) for some y e dom 01. In other words, we can write

I(x) = sup {f(y) + (s,x-y) : yedomol, seol(y)}.

This formula has a refined form, better suited to numerical optimization where only one subgradient at a particular point is usually known (see again the concept of black box (Ul) in §VIII.3.5).

Theorem 1.3.8 Associated with I e Conv]Rn, let s : ri dom I ~ ]Rn be a mapping satisfying s (y) e 01 (y) for all y e ri dom I. Then there holds

I(x) = sup [/(y) + (s(y), x - y}] for all x e dom I. (1.3.8) YEridomf


PROOF. The existence of the mapping s is guaranteed by Theorem X.1.4.2; besides, only the inequality ~ in (1.3.8) must be proved. From Lemma III.2.I.6, we can take d e JRn such that

Yt:=x+tderidomf forte]O,I].

Then f(x + d) ~ f(Yt) + (s(Yt), x + d - Yt)

so that { ( ) d) ~ f (x + d) - f (Yt) S Yt, "" 1 . -t

Then write

f(Yt) + (s(Yt), x - Yt) = f(Yt) - t{s(Yt), d) ~ f(Yt) ~ tf(x + d) -t

and let t ..1- 0; use for example Proposition IY.I.2.S to see that the right-hand side tends to f(x). 0

2 The Approximate Directional Derivative

Throughout this section, x is fixed in dom f. As a (nonempty) closed convex set, the approximate subdifferential aef(x) then has a support function, for any e > O. We denote it by f;(x, .):

JRn 3 d ~ f;(x, d) := C1aef(x) (d) = sup (s, d), seaef(x)

(2.0.1)

a closed sublinear function. The notation f; is motivated by §VI.I.I: f' (x, .) supports af(x), so it is natural to denote by f;(x, .) the function supporting aef(x). The present section is devoted to a study of this support function, which is obtained via an "approximate difference quotient".

2.1 The Support Function of the Approximate SubditTerential

Theorem 2.1.1 For x e dom f and e > 0, the support function of ae f (x) is

1!])n d ~'( d) _ . f f(x + td) - f(x) + e IN.. 3 ~ Je x, - In ,

t>O t (2.1.1)

which will be called the e-directional derivative of fat x.

PROOF. We use Proposition 1.3.3: embedding the set ae f (x) in the larger space JRn x JR, we view it as the intersection of epi gi with the horizontal hyperplane

2 The Approximate Directional Derivative 103

(rotate and contemplate thoroughly Fig. 1.3.1). Correspondingly, f; (x, d) is the value at (d, 0) of the support function of our embedded set epigi n He:

f;(x, d) = (Iepigj + IHe)*(d, 0).

Our aim is then to apply the calculus rule X.2.3.2, so we need to check the qualification assumption (X.2.3.1). The relative interior of the hyperplane He being He itself, what we have to do is to find in ri epi gi a point of the form (s, e). Denote by A the linear operator which, to (s, a) E JRn x JR, associates A(s, a) = a: we want

e E A(riepig;) = ri[A(epig;)],

where the last equality comes from Proposition III.2.1.12. But we know from Theorem 1.1.2 and Proposition 1.3.3 that A(epigi) is JR+ or JRt; in both cases, its relative interior is JRt, which contains the given e > O. Our assumption is checked.

As a result, the following problem

min{CTHe(P,U) +CTePigj(q, v) : p+q=d,u+v=O}=f;(x,d) (2.1.2)

has an optimal solution. Now look at its minimand:

-unless p = 0, CTHe(P, u) = +00; andCTHe(O, u) = ue; - unless v ~ 0, CTepigj(q, v) = +00 - see (1.3.4).

Ina word,

f;(x, d) = min (CTepigj(d, -u) + ue : u ~ O}.

Remembering that, as a support function, u f-+ CT epi gj (d, u) is in particular continuous for u -I- 0, and using its value (1.3.4), we finally obtain (2.1.1). 0

The above proof strongly suggests to consider the change of variable t = 1 I u in (2.1.1), and to set

{ u[f(x + diu) - f(x) + e]

re(u) := fbo(d) = CTdomj*(d) +00

ifu > 0, ifu=O, ifu<O.

(2.1.3)

When e > 0, this function does achieve its minimum on JR+; in the t-Ianguage, this means that the case t -I- 0 never occurs in the minimization problem (2.1.1). On the other hand, 0 may be the unique minimum point of re, i.e. (2.1.1) may have no solution "at finite distance".

The importance ofthe assumption 8 > 0 cannot be overemphasized; Theorem X.2.3.2 cannot be invoked without it, and the proof breaks down. Consider Fig. 2.1.1 for a counterexample: B(O, 1) is the unit Euclidean ball oflR2, I its indicator function and x = (0, -1). Clearly enough, the directional derivative I' (x, d) for d = (y, 8) =/:. 0 is

I'(x,d) = { 0 ~f8 > 0, +00 If8 ~O,


which cannot be a support function: it is not even closed. On the other hand, the infimum in (2.1.1), which is obtained for x + td on the unit sphere, is easy to compute: for d =f. 0,

It is the support function of the set

if 8> 0,

if8 ~ 0.

which, in view of Theorem 2.1.1, is the e-normal set to B(O, 1) at x.

Fig. 2.1.1. The approximate subdifferential of an indicator function

(2.1.4)

Note: to obtain (2.1.4), it is fun to use elementary geometry (carry out in Fig. 2.1.1 the inversion operation from the pole x), but the argument of Remark VI.6.3. 7 is more systematic, and is also interesting.

Remark 2.1.2 We have here an illustration of Example X.2.4.3: closing f' (x, .) is just what is needed to obtain the support function of af(x). For x E dom af, the function

~n 3 d 1-+ U(Jj(x)(d) = sup {(s,d) : s E af(x)}

is the closure of the function

1I])n d f'e d) _ I· f(x + td) - f(x) ~ 3 1-+ x, - 1m .

t,j,o t

This property appears also in the proof of Theorem 2.1.1: for e = 0, the closure operation must still be performed after (2.1.2) is solved. 0

Figure 2.1.2 illustrates (2.1.1) in a less dramatic situation: for e > 0, the line representing t 1-+ f(x) - e + tf;(x, d) supports the graph oft 1-+ f(x + td), not at t = ° but at some point te > 0; among all the slopes joining (0, f(x) - e) to an arbitrary point on the graph of f (x + . d), the right-hand side of (2.1.1) is the smallest possible one.

On the other hand, Fig. 2.1.2 is the trace in ~ x ~ of a picture in ~n x R among all the possible hyperplanes passing through (x, f(x) - e) and supporting epi f, there is one touching gr f somewhere along the given d; this hyperplane therefore gives the maximal slope along d, which is the value (2.0.1). The contact x + tEd plays an important role in minimization algorithms and we will return to it later.

f(x) ----I f(x+td) -E- -

~:::""_...L....


Fig.2.1.2. The e-directional derivative and the minimizing te

The same picture illustrates Theorem 1.3.6: consider the point x + ted as fixed and call it y. Now, for arbitrary 8 ~ 0, draw a hyperplane supporting epi f and passing through (0, f (x) - 8). Its altitude at y is f (x) - 8 + f; (x, y - x) which, by definition of a support, is not larger than f(y), but equal to it when 8 = e.

In summary: fix (x, d) E dom f x lRn and consider a pair (e, y) E lRt x lRn linked by the relation y = x + ted.

- To obtain y from e, use (2.1.1): as a function of the "horizontal" variable t > 0, draw the line passing through (x, f(x) - e) and (x + td, f(x + td)); the resulting slope must be minimized.

- To obtain e from y, use (1.3.7): as a function of the "vertical" variable 8 ~ 0, draw a support passing through (x, f (x) - 8); its altitude at y must be maximized.

Example 2.1.3 Take again the convex quadratic function of Example 1.2.2: I: (x, d) is the optimal value of the one-dimensional minimization problem

. t(Qd,d)t2+(V/(x),d)t+e mf . t>O t

If (Qd, d) = 0, this infimum is (V I(x), d). Otherwise, it is attained at

Rf;e t -6 - (Qd,d)

and I;(x, d) = (V/(x), d) + .j2e(Qd, d).

This is a general formula: it also holds for (Qd, d) = O. It is interesting to observe that[l: (x, d) - I' (x, d)]/ e ~ +00 when e -J.. 0, but

H/:(x, d) - I'(x, d)]2 == (Qd, d). e

This suggests that, for C2 convex functions,

lim t[f:(x, d) - 1'(x,d)]2 = I" (x , d) 6-1-0 e

where


f "( d)'- I' f(x + td) - f(x) - tf'(X, d) _ I' f'ex + td, d) - f'ex, d) x, . - 1m - 1m -'---'-----'---'---'---'-t--+o !t2 t--+o t

2

is the second derivative of f at x in the direction d. Here is one more motivation for the approximate subdifferentiaI: it accounts for second order behaviour of f. 0

2.2 Properties of the Approximate Difference Quotient

For x E dom f, the function qe encountered in (2.1.1) and defined by

f(x + td) - f(x) + e ]0, +oo[ 3 t ~ qeCt) := "-------'---

t (2.2.1)

is called the approximate difference quotient. In what follows, we will set q := qo; to avoid the trivial case qe == +00, we assume

[x,x +td] c domf for some t > O.

Our aim is now to study the minimization of qe, a relevant problem in view ofTheorem 2.1.1.

(a) Behaviour of q.,. In order to characterize the set of minimizers

Te := {t > 0 : qe(t) = f;(x, d)}, (2.2.2)

two numbers are important. One is

f/x;(d) := sup {q(t) : t > O}

which describes the behaviour of f(x + td) for t -+ +00 (see §IY.3.2 and Example X.2.4.6 for the asymptotic function f6o)' The other,

tl:=SUp{t~O: f(x+td)-f(x) = tf'(X, d)},

concerns the case t -J, 0: t l = 0 if f(x+td) has a "positive curvature fort -J, 0". When t l = +00, i.e. f is affine on the half-line x +lR+d, we have qe(t) = f'ex, d) + elt; then it is clear that f; (x, d) = f' (x, d) = fbo (d) and that Te is empty for e > O.

Example 2.2.1 Before making precise statements, let us see in Fig. 2.2.1 what can be expected, with the help of the example

o ~ t ~ f(x + td) := max {t, 3t - I}.

The lower-part of the picture gives the correspondence e -4 Te, with the same abscissa-axis as the upper-part (namely t). Some important properties are thus introduced:

- As already known, q increases from f'ex, d) = 1 to f/x;(d) = 3;

- indeed, q(t) is constantly equaito its minimal value f'ex, d) forO < t ~ 1/2, and t l = 1/2;


t(d) =3

f'(x,d) = 1

10

1000 = 1

~--::6 i

_._ ... _ ................ _._ ..... _ ... -.. _ ............ _ ......... --------

Fig.2.2.1. A possible qe and Te

- f;(x, d) stays between f'ex, d) and f60(d), and reaches this last value for s ;;:: 1; - To is the segment ]0, 1/2]; TJ is the half-line [1/2, +00[, and Te is empty for s > 1. 0

This example reveals another important number associated with large t, which we will callsoo (equal to 1 in the example). The statements (i) and (ii) in the next result are already known, but their proof is more natural in the t-Ianguage.

Proposition 2.2.2 The notations and assumptions are as above.

(i) If s > 0, then qe(t) ~ +00 when t ..j, O. (ii) When t ~ +00, qe(t) ~ f/x,(d) independently ols ~ O.

(iii) The set E := {s ~ 0 : f;(x, d) < f~(d)}

is empty if and only if t l = +00. Its upper bound

sup E =: SOO E [0, +00]

satisfies Te "# 0 if s < SOO and Te = 0 if s > SOO •

PROOF. For s > 0, let So E Be/2!(x): for all t > 0,

f(x) + t(so, d) - s/2 - f(x) + s d) s qe(t) ~ = (so, + -2

t t

(2.2.3)

and this takes care of (i). For t ~ +00, qe(t) = q(t) + s/t has the same limit as q (t), so (ii) is straightforward.

Thus f~(x, d) ~ f/x,(d), and non-vacuousness of Te depends only on the behaviour of qe at infinity. To say that E is empty means from its definition (2.2.3):

f/x,(d) ~ inf inf qe(t) = inf inf qe(t) = f'(x, d). e ~ 0 t>o t>o e ;;:: 0


Because f'(x,·) ~ f/x;, this in turn means that q(t) is constant for t > 0, or that l- = +00.

Now observe that s r+ f; (x, d) is increasing (just as de f); so E, when nonempty, is an interval containing 0: s < soo implies sEE, hence Te #- 0. Conversely, if s > soo, take s' in ]Soo, s[ (so s' fj E) and t > 0 arbitrary:

t cannot be in Te , which is therefore empty. o

The example t r+ f(x + td) = .Jf+t2 illustrates the case SOO < +00 (here soo = l);itcorrespondstof(x+, d) having the asymptote t r+ f(x)-soo+ f60(d)t. Also, one directly sees in (2.2.3) thatsoo = +00 whenever f60(d) is infinite. With the example f of(1.1.5)(and taking d = 1), wehavetl = 0, f/x;(1) = Oandsoo = +00.

(b) The Closed Convex Function r e. For a more accurate study of Te, we use now the function re of (2.1.3), obtained via the change of variable U = I/t: re(u) = qe(1/u) for u > O. It is the trace on lR of the function (1.3.4), which is known to be in Conv(lRn x lR); therefore

re E ConvlR.

We know from Theorem 2.1.1 that re is minimized on a closed interval

Ue := {u ~ 0 : re(u) = f;(x, d)} (2.2.4)

which, in view of Proposition 2.2.2(i), is nonempty and bounded if s > O. Likewise,

Te = {t = I/u : u E Ue and u > O}

is a closed interval, empty if Ue = {O}, a half-line if {O} *- Ue. Knowing that

it will be convenient to set, with the conventions 1/0 = +00, 1/00 = 0:

(2.2.5)

There remains to characterize Te , simply by expressing the minimality condition o E dre(u).

Lemma 2.2.3 Let ((J be the convexfunction 0 ~ t r+ ((J(t) := f(x + td). For s ~ 0 and 0 < u E domre, the subdifferential ofre at u is given by

d () _ re(u) - d((J(I/u) re u - . (2.2.6)

u


PROOF. The whole point is to compute the subdifferential of the convex function u ~ t/f(u) := u((J(llu), and this amounts to computing its one-sided derivatives. Take positive u' = lit' and u = lit, with u E domre (hence ((J(llu) < +00), and compute the difference quotient of t/f (cf. the proof of Theorem 1.1.1.6)

u'((J(llu') - u((J(llu) ((J(t') - ((J(t) , = ((J(t) - t, . u-u t-t

Letting u' ,j.. u, i.e. t' t t, we obtain the right-derivative

the left-derivative is obtained likewise. Thus, the subdifferential of t/f at u is the closed interval

ot/f(u) = ((J(t) - to((J(t) .

Knowing that re(u') = t/f(u') - u'[f(x) - e] for all u' > 0, we readily obtain

ore(u) = ((J(t) - to((J(t) - f(x) + e = re(u)lu - to((J(t). 0

TherninimalityconditionO E ore(u) is therefore rdu) E o((J(llu).Int-language, we say: tETe if and only if there is some a E o((J(t) such that

f(x + td) - ta = f(x) - e. (2.2.7)

Remark 2.2.4 Using the calculus rule from Theorem 3.2.1 below, we will see that aq> can be expressed with the help of af. More precisely, if x + R+d meets ridomf, we have

arE(u) = rE(u) - (af(x + diu), d) . u

Then Fig. 2.1.2 gives a nice interpretation of (2.2.7): associated with t > 0 and S E

af(x + td), consider the affine function

o ~ r 1-+ l(r) := f(x + td) + (s, d)(r - t),

whose graph supports epi q> at r = t. Its value 1(0) atthe vertical axis r = 0 isa subderivative of u 1-+ uq>(l/u) at u = lit, and t is optimal in (2.1.1) when 1(0) reaches the given f(x) - e. In this case, TE is the contact-set between gr 1 and grq>. Note: convexity and Proposition 2.2.2(iii) tell us that f(x) - eoo ~ 1(0) ~ f(x). 0

Let us summarize the results of this section concerning the optimal set Te, or Ue of (2.2.4).

- First, we have the somewhat degenerate case t l = +00, meaning that f is affine on the half-line x + 1R+ d. This can be described by one of the following equivalent statements:

f'(x, d) = f60(d);

f;(x,d) = f'(x,d) foralle > 0;

Vt > 0, qe(t) > f60(d) for aIle> 0;

Te = I{) for all e > 0;

Ue = {OJ for all e > O.

110 XI. Approximate SubdifIerentials of Convex Functions

- The second situation, more interesting, is when tf. < +00; then three essentially different cases may occur, according to the value of e .

- When 0 < e < eoo, one has equivalently

- When eOO < e:

f'(x, d) < f:(x, d) < f/:.o(d);

3t > 0 such that qs(t) < f60(d) ;

Ts is a nonempty compact interval;

Of/. Us.

f:(x, d) = f/:.o(d);

Vt > 0, qs(t) > f/:.o(d);

Ts =0;

Us = {O}.

f:(x, d) = f60(d) ;

Ts is empty or unbounded;

o E Us.

Note: in the last case, Ts nonempty but unbounded means that f(x + td) touches its asymptote f(x) - e + tf60(d) for t large enough.

2.3 Behaviour of f; and Te as Functions of e

In this section, we assume again [x, x + td] c domf for some t > O. From the fundamental relation (2.1.1), the function e t-+- - f:(x, d) appears as a supremum of affine functions, and is therefore closed and convex: it is the conjugate of some other function. Following our general policy, we extend it to the whole real line, setting

() ._ {-f:(X,d) ife~O, ve .- th . +00 0 erwlse.

Then v E Conv lR: in fact, dom v is either [0, +oo[ or ]0, +00[. When e -l- 0, v(e) tends to - f'(x, d) E lR U {+oo}.

Lemma 2.3.1 With r := ro of (2.1.3) andfor all real numbers e and u:

v(e) = r*(-e), i.e. r(u) = v*(-u).

PROOF. For e ~ 0, just apply the definitions:

-v(e) = f;(x, d) = inf [r(u) + eu] = -r*(-e). uEdomr

For -e > 0, we have trivially (remember that f' (x, d) < +00)

r*(-e)~ lim [-eu-r(u)]=+oo-f'(x,d)=+oo=v(e). 0 U~+OO


Observe that the case u = 1 is just Theorem 1.3.6; u = 0 gives

[f~(d) =] r(O) = sup f;(x, d) [= lime-++oo ff(x, d)], e>O

and this confirms the relevance of the notation f60 for an asymptotic function.

Theorem 2.3.2 With the notation o/Proposition 2.3.1 and (2.2.4),

ov(e) = -Ue /orall e ~ O. (2.3.1)

Furthermore, e 1--+ ff(x, d) is strictly increasing on [0, eOO [ and is constantly equal to f6o(d)/ore ~ eOO.

PROOF. By definition and using Lemma 2.3.1, -u E ov(e) if and only if

-ue = v(e) + v*(-u) = - f;(x, d) + r(u) <==> f;(x, d) = re(u) ,

which exactly means that u E Ue. The rest follows from the conclusions of §2.2. 0

Figure 2.3.1 illustrates the graph of e 1-+ f: (x, d) in the example of Fig. 2.2.1. Since the derivative of e 1-+ qe(t) is l/t, a formal application of Theorem VI.4.4.2 would give directly (2.3.2); but the assumptions of that theorem are hardly satisfied - and certainly not for e ~ eOO.

f~(x,d)

t,:.(d) = 31------:::ao'I"'"-----

f'(x,d) = 1

Fig. 2.3.1. The e-directional derivative as a function of e

Remark 2.3.3 Some useful formulae follow from (2.3.2): whenever Te -::/= 0, we have

I I 1'/ - e fT/(x,d) ~ fe(x,d) + -- for all tETe,

t I I 1'/ - e

fT/(x, d) = fe(x, d) + -. - - 0(1'/ - e) ifl'/ ~ e, te

I d ~I d 1'/ - e . fT/(x, ) = Je(x, ) + -- - 0(1'/ - e) If 1'/ ~ e, !e

where!e and te are respectively the smallest and largest elements of Te, and the remainder terms 0(·) are nonnegative (of course,!e = te except possibly for countably many values of e). We also have the integral representation (1.4.2.6)

f:(x, d) = !'(x, d) + foe Uada for all 0 ~ e < eOO . o


A natural question is now: what happens when e ,!.. o? We already know that fi(x, d) ~ f'(x, d) but at what speed? A qualitative answer is as follows (see also Example 2.1.3).

Proposition 2.3.4 Assuming -00 < f'(x, d), there holds

1· fi(x,d)-f'(x,d 1 [0 ] 1m = ""l E ,+00,

e.J..o e t

PROOF. We know from (2.2.5) and (2.3.2) that av(O) = -uo =] - 00, -lltl ], so 0+ v (0) = -1 I t l and everything comes from the elementary results of §1.4.2. 0

For fixed x and d, use the notation of Remark 2.2.4 and look again at Fig. 2.1.2, with the results of the present section in mind: it is important to meditate on the correspondence between the horizontal set of stepsizes and the vertical set of f -decrements.

To any stepsize t ~ 0 and slope (s, d) ofa line supporting epiql at (t, qI(t», is associated a value

8r,s := f(x) - 1(0) = f(x) - f(x + td) + t(s, d) ~ o. (2.3.2)

Likewise, to any decrement 8 ~ 0 from f (x) is associated a stepsize te ~ 0 via the slope ff(x, d). This defines a pair of multi functions t t---+ -ar(llt) and 8 t---+ Te, inverse to each other, and monotone in the sense that

for tETe and t' E Te" 8 > 8' ¢=> t > t' .

To go analytically from t to 8, one goes first to the somewhat abstract set of inverse stepsizes u, from which e is obtained by the duality correspondence of Lemma 2.3.1. See also the lower part of Fig. 2.2.1, for an instance of a mapping T (or rather its inverse).

To finish this section, we give an additional first-order development: there always holds the estimate

f(x + td) = f(x) + tf'(x, d) + o(t) ,

and various ways exist to eliminate the remainder term o(t): one is (1.3.7); the meanvalue Theorem VI.2.3.3 is more classical; here is a third one.

Proposition 2.3.5 With the hypotheses and notations given so far, assume t l < +00 (i.e. f is not affine on the whole half-line x + JR.+d). For t > 0, there is a unique e(t) E [0, eOO [ such that

f(x + td) = f(x) + tf:(t) (x, d). (2.3.3)

PROOF. We have to solve the equation in e: v(e) = -q(t). In view of the second part of Theorem 2.3.2, this equation has a unique solution provided that the right-hand side -q(t) is in the interval

3 Calculus Rules on the Approximate Subdifferential 113

(v(a): 0:::;;a<8°O}=-[I'(x,d),I~(d)[.

As a result, the unique 8(t) of (2.3.3) is

{ 0 for t E ]0, tl], 8(t) = (-V)-I(q(t» for t > tl. o

Figure 2.3.2 emphasizes the difference between this 8(t) and the 8t,s defined in (2.3.2).

f(x+td) --~------------~---------------

f(x)

f(x)-£(t)

f(x)-Et,s

gr <p

-··" .. · ....... ~""::-::::.::::l slope q(t)

Fig.2.3.2. The numbers s(t) and Sl,s

3 Calculus Rules on the Approximate Subdifferential

In §VI.4, we developed a calculus to compute the subdifferential of a finite-valued convex function I constructed from other functions Ii. Here, we extend the results of §VI.4 to the case of ael, with 8 ~ 0 and I E ConvJRn . Some rudimentary such calculus has already been given in Proposition 1.3 .1.

The reader may have observed that the approximate subdifferential is a global concept (in contrast to the exact subdifferential, which is purely local): for 8 arbitrarily large, ae I (x) depends on the behaviour of I arbitrarily far from x. It can therefore be guessed that ae I has to depend on the global behaviour of the Ii'S, while a I (x) depends exclusively on the a Ii (x) 'so It turns out that the conjugacy operation gives a convenient tool for taking this global behaviour into account. Indeed, knowing that I results from some operation on the Ii'S, the characterization (1.2.1) in terms of 1* shows that the whole issue is to determine the effect of the conjugacy on this operation: it is the set of calculus rules of §X.2 that is at stake.

Recall that all the functions involved are in Conv JRn .

3.1 Sum of Functions

From Theorem X.2.3.1, the conjugate of a sum of two functions II + h is the closure of the infimal convolution It t g. Expressing ae (iJ + h) (s) will require an expres-


sion for this infimal convolution, which in turn requires the following basic assumption:

When s E dom(fl + 12)*'1 (fl + h)*(s) = !t(PI) + g(P2) for some Pi satisfying PI + P2 = s .

(3.1.1)

This just expresses that the inf-convolution of !t and 12* is exact at s = PI + P2: the couple (Ph h) actually minimizes the function (Ph P2) ~ !t(PI) + g(P2). Furthermore, we know (Theorem X.2.3.2) that this property holds under various conditions on II and 12; one of them is

ridom/l nridomh #- 0, (3.1.2)

which is slightly more stringent than the minimal assumption

dom II n dom 12 #- 0 [¢::::::> II + h ~ +00].

Theorem 3.1.1 For any 8 ~ 0 and x E dom(fl + h) = dom II n dom h. there holds

8d/l + h)(x) :::> u {8eJI (x) + 8e2 h(x) : 8i ~ 0, 81 + 82 ~ 8}

with equality under assumption (3.1.1),for example if (3.1.2) holds.

(3.1.3)

PROOF. Let 8 ~ 0 be given. For arbitrary nonnegative 810 82 with 81 + 82 ~ 8, Definition 1.1.1 clearly implies

8eJI (x) + 8e2 h(x) C 8e(fl + h)(x).

Conversely, take s E 8e(fl + h)(x), i.e.

(fl + 12)*(S) + (fl + h)(x) - (s, x) ~ 8 . (3.1.4)

This s is therefore in dom(fl + 12)* and we can apply (3.1.1): with the help of some PI and P2, we write (3.1.4) as 81 + 82 ~ 8, where we have set

8i := N(Pi) + Ii (x) - (Pi, x) for i = 1,2.

We see that Pi E 8ej Ii (x) for i = I, 2, and the required converse inclusion is proved. o

Naturally, if 1= LI=I Ii, the right-hand side in (3.1.3) becomes

U {LI=I 8e;!i(X) : 8i ~ 0, LI=I 8i ~ 8}. (3.1.5)

Since 8el(x) increaseswith8, the constraints Li 8i ~ 8 can be replaced by Li 8i = 8 in (3.1.3) and (3.1.5). Also, when 8 = 0, there is only one possibility for 8i in (3.1.3):

Corollary 3.1.2 For x E dom II n dom h. there holds

with equality under, for example, assumption (3.1.2). o


To emphasize the need of an assumption such as (3.1.2), consider in ]R2 the Euclidean balls C] and C2 , of radius 1 and centered respectively at (-1,0) and (1,0): they meet at the unique point x = (0,0). Then take for Ii the indicator function of Cj, so I is the indicator function I{oJ' We have al(O) = ]R2, while a/] (0) + ah(O) = ]R x {OJ.

Example 3.1.3 (£-Normal Sets to Closed Convex Polyhedra) Let C be a closed convex polyhedron described by its supporting hyperplanes:

Hj-:={XE]Rn: (sj,x):::;rj} forj=l, ... ,m, C:= nHj- = {x E]Rn : (Sj,x) :::;rj for j = 1, ... ,m}.

(3.1.6)

Its indicator function is clearly

m

Ie = L)H:-' j=] J

Let us compute the £-subdifferential of this function, i.e. the £-normal set of Definition 1.1.3. The approximate normal set to Hj- has been given in Example 1.2.5, we obtain for x E C

(3.1.7)

where we have set Cj(x) := rj - (Sj, x) ~ 0. In fact, Ie is a sum of polyhedral functions, and Theorem X.2.3.2 - with its qualification assumption (X.2.3.Qj) - does imply (3.1.1) in this case.

Still for x E C, set K(x) := maxj Cj(x). Clearly enough

K(x) co{s], ... , sm} C Ne,e(x)

(and the normal cone Nc(x) can even be added to the left-hand set). Likewise, set k(x):= minj Cj(x). Ifk(x) > 0,

Ne,e(x) C kfu co{s], ... ,sm}· o

Example 3.1.4 (Approximate Minimality Conditions) Let be given a convex function f : IRn -+ IR and a nonempty closed convex set C; assume that

Ie := inf {f(x) : x E C} (3.1.8)

is not -00. The e-minimizers of f on C are those x E C such that f(x) :::;; lc + e; clearly enough, an e-minimizer is an x such that (remember Theorem 1.1.5)

(f + Ie)(x):::;; inf(f + Ie) + e or equivalently 0 E os(f + Ie)(x). JRn

Here domf = ]Rn: we conclude from Theorem 3.1.1 that an e-minimizer is an x such that

o E oaf(x) + Ne,s-a(x) for some a E [0, e],

i.e. f has at x an a-subgradient whose opposite lies in the (e - a)-normal set to C. The situation simplifies in some cases.


- Set 8 = 0 to obtain the standard minimality condition of (VII. 1. 1.3):

x solves (3.1.8) {=} -of ( x) n Nc(x) =10.

- Another case is when C = (xo}+H is an affine manifold. Then Nc.p (x) = H.l forall,8 ~ 0 and oaf(x) increases with a: our 8-minimality condition becomes -oef(x) n H.l =10.

-Also, when f = (so,,) is linear, oaf(x) = of (x) = {so} for all a ~ 0, while Nc.p(x) increases with,8; so our e-minimality condition is: -so E Nc,s(x), a triviality if we replace Nc,s(x) by its definition (1.1.6).

- If C is a closed convex polyhedron as in (3.1.6), its e-nonnal set can be specified as in (3.1. 7). Take for (3.1.8) a linear programming problem

min{(so,x): (sj,x):(rjforj=I •...• m}.

An 8-minimizer is then an x E C for which there exists JL = (JLl •...• JLm) E Rm such that

2:J=1 JLjSj + So = 0, ) 2:J=1 JLj'j ~ (so. x) :( e,

JLj ~ 0 for J = 1 •...• m. o

We leave it as an exercise to redo the above examples when

C := {x E K : (Sj. x) = rj for j = 1 •... , m}

is described in standard form (K being a closed convex polyhedral cone, say the nonnegative orthant).

Remark 3.1.5 In the space IRn) x IRn2 equipped with the scalar product of a productspace, take a decomposable function:

Because ofthe calculus rule X.1.3.1(ix), the basic assumption (3.1.1) holds automatically, so we always have

but beware that this set is not a product-set in IRn) x IRn2 , except for e = O. 0

3.2 Pre-Composition with an Affine Mapping

Given g E ConvlRm and an affine mapping A : IRn -+ IRm (Ax = Aox + Yo E IRm with Ao linear), take f := goA E ConvlRn. As in §3.1, we need an assumption, which is in this context:

When S E dom(g 0 A)*, I (g 0 A)*(s) = g*(p) - (p, Yo) for some p such that A6' p = S

(3.2.1)

«(', .) will denote indifferently the scalar product in IRn or IRm). As was the case with (3.1.1), Theorem X.2.2.1 tells us that the above p actually minimizes the function


p ~ g* (p) - (p, Yo) on the affine manifold of equation At p = s. Furthermore we know (Theorem X.2.2.3) that (3.2.1) holds under various conditions on f and Ao; one of them is

(3.2.2)

Once again, note that removing the words "relative interior" from (3.2.2) amounts to assuming goA 1= +00.

Theorem 3.2.1 Let g and A be defined as above. For all e ~ 0 and x such that Ax E dom g, there holds

(3.2.3)

with equality under assumption (3.2.1),for example if (3.2.2) holds.

PROOF. Fix x such that Ax E domg and let p E aeg(Ax) c IRm:

g(z) ~ g(Ax) + (p, z - Ax) - e for all z E IRm .

Taking in particular z = Ay with y describing IRn :

g(Ay) ~ g(Ax) + (A~p, y - x) - e for all y E IRn ,

where we have used the property A(y - x) = Ao(Y - x). Thus we have proved that Atp E ae(g 0 A)(x).

Conversely, lets E ae(g 0 A)(x), i.e.

(g 0 A)*(s) + (g 0 A)(x) - (s, x) :::;; e. (3.2.4)

Apply (3.2.1): with the help of some p such that Atp = s, (3.2.4) can be written

e ~ g*(p) - (p, Yo) + g(Ax) - (p, Aox) = g*(p) + g(Ax) - (p, Ax) .

This shows that P E aeg(Ax). Altogether, we have proved that our sis in Ataeg(Ax). o

Naturally, only the linear part of A counts in the right-hand side of (3.2.3): the translation is taken care ofby Proposition X.1.3. l(v).

As an illustration of this calculus rule, take Xo E domg, a direction d =f. 0, and compute the approximate subdifferential of the function

IR 3 t ~ qJ(t) := g(xo + td) .

If Xo + IRd meets ri dom g, we can write

aeqJ(t) = (aeg(xo + td), d) for all t E domqJ.


3.3 Image and Marginal Functions

We recall that, for g E Cony lRm and A linear from lRm to lRn , the image of g under A is the function Ag defined by

lRn 3 x H- (Ag)(x) := inf {g(y) : Ay = x}. (3.3.1)

Once again, we need an assumption for characterizing the 8-subdifferentia1 of Ag, namely that the infimum in (3.3.1) is attained "at finite distance". A sufficient assumption for this is

ImA* nridomg* =1= 0, (3.3.2)

which implies at the same time that Ag E Cony lRn (see Theorem X.2.2.3). As already seen for condition (X.2.2.Q.iii), this assumption is implied by

g~(d) > 0 for all nonzero dE Ker A.

Theorem 3.3.1 Let 8 ~ 0 and x E dom Ag = A (dom g). Suppose that there is some y E lRm with Ay = x and g(y) = Ag(x);for example assume (3.3.2). Then

8£(Ag)(x) = {s E lRn : A*s E 8£g(y)}. (3.3.3)

PROOF. To say that s E 8£(Ag)(x) is to say that

(Ag)*(s) + g(y) - (s, Ay) ~ 8,

where we have made use of the existence and properties of y. Then apply Theorem X.2.1.1: (Ag)* = g* 0 A*, so

S E 8£(Ag)(x) {::=} g*(A*s) + g(y) - (A*s, y) ~ 8. 0

This result can of course be compared to Theorem VI.4.5.1: the hypotheses are just the same - except for the extended-valuedness possibility. Thus, we see that the inverse image under A * of 8£g(yx) does not depend on the particular Yx optimal in (3.3.1).

We know that a particular case is the marginal function:

lRn 3 x H- I(x) := inf {g(x, z) : Z E lRP}, (3.3.4)

where g E Conv(lRn x lRP). Indeed, I is the image of g under the projection mapping from lRn+ P to lRn defined by A (x, z) = x. The above result can be particularized to this case:

Corollary 3.3.2 With g E Conv(lRn x lRP), let g* be associated with a scalar product preserving the structure oflRm = lRn x lRP as a product space, namely:

(3.3.5)

and consider the marginal function I of (3.3.4). Let 8 ~ 0, x E dom I; suppose that there is some Z E lRP such that g(x, z) = I(x); Z exists for example when

3so E lRn such that (so, 0) E ridomg* . (3.3.6)

Then 8ef(x) = {s E lRn : (s,O) E 8£g(x, z)}. (3.3.7)


PROOF. Set A : (x, z) t-+ x. With the scalar product (3.3.5), A* : IRn ---+ IRn x IRP is defined by A*s = (s, 0), ImA* = IRn x {OJ, so (3.3.6) and (3.3.7) are just (3.3.2) and (3.3.3) respectively. 0

3.4 A Study of the Infimal Convolution

As another example of image functions, consider the infimal convolution:

where I, and !2 are both in Conv IRn. With m = 2n, and 1R2n being equipped with the Euclidean structure of a product-space, this is indeed an image function:

(s" S2), (y" Y2)hn := (s" y,) + (S2' Y2) , } g(y" Y2) := I, (y,) + !2(Y2); A(y" Y2) := Yt + Y2 ,

g*(s" S2) = !t*(s,) + g(S2); A*s = (s, s). (3.4.1)

Theorem 3.4.1 Let 8 ~ 0 and x E dom(f, ~ h) = dom I, + dom fz. Suppose that there are y, and Y2 such that the inf-convolution is exact at x = y, + Y2; this is the case for example when

ridomlt n ridomg =f. 13. (3.4.2)

Then

ae(f, ~ !2)(x) = u {ae.f, (Yt) n ae2 !2(Y2) : 8i ~ 0, 8, + 82 ~ 8} . (3.4.3)

PROOF. We apply Theorem 3.3.1 to g and A of (3.4.1). First of all,

dom g* = dom It x dom 12*

so (Proposition III.2.UI)

ridomg* = ri dom It x ri dom 12*

and (3.3.2) means:

3s E IRn such that (s, s) E ri dom It x ri dom g

which is nothing other than (3.4.2). Now, with (y" Y2) as stated and s E IRn , set

8j := /;*(s) + !;(Yi) - (s, Yi) ~ 0 for i = 1,2, (3.4.4)

so that s E aeJi (Yi) for i = 1,2. Particularizing (3.3.3) to our present situation:

which, in view of the definitions (3.4.1), means


With (3.4.4), this is just (3.4.3). o

As with Theorem 3.1.1, nothing is changed if we impose 81 + 82 = 8 in (3.4.3). The particular value 8 = 0 in the above result brings

aUI t h)(x) = all (YI) n ah(Y2) when the inf-convolution is exact at x = YI + Y2 .

(3.4.5)

Some more information, not directly contained in (3.4.3), can also be derived: a sort of converse result ensuring that the inf-convolution is exact.

Proposition 3.4.2 With I := II t h. consider Yi E dom Ii for i = 1, 2 and x := YI + Y2 (E dom f). Then

If all (YI) n a 12 (Y2) =f:. 0, the inf-convolution is exact at x = YI + Y2 and equality holds in (3.4.6).

It(s) + II (YI) + g(s) + h(Y2) = (s, YI + Y2) = (s, x). (3.4.7)

Then, using It + g = (II t 12)* (Corollary X.2.l.3) and the definition of an inf-convolution,

UI t h)*(s) + UI t h)(x) :( (s, x) ;

in view of the Fenchel inequality (X. 1. l.3), this is actually an equality, i.e. SEa I (x). Now use this last equality as a value for (s, x) in (3.4.7) to obtain

i.e. the inf-convolution is exactatx = YI + Y2; equality in (3.4.6) follows from (3.4.5). o

In summary, fix x E dom(fl t h) and denote by H C Rn x Rn the hyperplane of equation YI + Y2 =x. When (YI, Y2) describes H, the set D(YI, Y2):= a/I(Y]) nah(Y2) assumes at most two values: the empty set and a(fl t h)(x). Actually, there are two possibilities:

- either D(YI, Y2) = 0 for all (YI, Y2) E H;

- or D(YI. Y2) =I 0 for some (YI, Y2) E H. This implies a(fl t h)(x) =I 0 and then, we have for all (YI. Y2) E H:

D(YI. Y2) =I 0 {:=} D(YI. Y2) = a(fl t h)(x) {:=}

{:=} the inf-convolution is exact at x = YI + Y2 .


Remark 3.4.3 Beware that D may be empty on the whole of H while a UI i!i h)(x) ¥= 0. Take for example

II (y) = exp Y and 12 = exp( -Y);

then II i!i 12 == 0, hence aUI i!i h) == {O}. Yet D(YI, Y2) = 0 for all (YI, Y2) E H: the inf-convolution is nowhere exact.

Note also that (3.4.5) may express the equality between empty sets: take for example II = 1[0, +oo[ and

R ~() {-,JYi. ifY2;?:0, 3 Y2 1-+ J2 Y2 = +00 otherwise.

It is easy to see that

UI i!i h)(x) = inf {-,JYi. : 0 ~ Y2 ~ x} = -.Ji' for all x;?: O.

This inf-convolution is exact in particular at 0 = 0 + 0, yet aUI i!i 12)(0) = 0. 0

Example 3.4.4 (Moreau-Yosida Regularizations) For e > 0 and I E Conv IRn, let I(c) := f t (1/2 ell . 11 2), i.e.

f(c)(X) = min {f(y) + !ellx - yll2 : y E IRn};

this f(c) is called the Moreau-Yosida regularization of f. Denote by Xc the unique minimal y above, characterized by

o E of (xc) + e(xc - x).

Using the approximate subdifferential of 1/211 . 112 (see Example 1.2.2 if necessary), Theorem 3.4.1 gives

Oef(c)(x)= U [oe-af(xc)nB(e(X-xc),J2ea)]. o~a~e

It is interesting to note that, when e = 0, this formula reduces to

Vf(c)(x) = e(x -xc) [E of (x c)]. (3.4.8)

Thus f(c) is a differentiable convex function. It can even be said that V f(c) is Lipschitzian with constant e on IRn. To see this, recall that the conjugate

f(~) = f* + ~ II . 112

is strongly convex with modulus 1/ e; then apply Theorem X.4.2.1. In the particular case where e = 1 and f is the indicator function of a closed

convexsetC, f(c) = 1/2d~ (the squared distance to C) and Xc = Pc (x) (the projection of x onto C). Using the notation of (1.1.6):

oeGd~)(x)= U [Nc,e-a(Pc(X»nB(x- pc(x),.J2a)]. 0

o~a~e

122 XI. Approximate Subdifferentia1s of Convex Functions

Another regularization is ftc] := f t cll . II, i.e.

f[c](x) := inf {f(y) + cllx - yll : y E IRn},

which will be of importance in §4.1.

Proposition 3.4.5 For f E ConvlRn, define ftc] as above; assume

c >.~:= inf {lIsll : s E 8f(x), x E dom8f}

and consider cc[n := {x E IRn : f[c](x) = f(x)} ,

the coincidence set of f and ftc]. Then

(3.4.9)

(i) ftc] is convex, finite-valued, and Lipschitzian with Lipschitz constant c on IRn; (ii) the coincidence set is nonempty and characterized by

Cc[n = {x E IRn : 8f(x) n B(O, c) =f. 0} ;

(iii) for all x E Cc[n, there holds

8ef[c](x) = 8ef(x) n B(O, c) .

PROOF. [(i)] There exist by assumption Xo and So E 8f(xo) with IIsoll ::;; c, hence

f ~ f(xo) + (so,, - xo) and cll·1I ~ (so,,);

f and cll . II are minorized by a common affine function: from Proposition IV.2.3.2, ftc] E Conv IRn . Furthermore, ftc] is finite everywhere by construction.

Now take x and x' in IRn. For any TJ > 0, there is Y'1 such that

f(y'1) + cllx - Y'111 ::;; f[c](x) + TJ

and, by definition of .f[C]'

f[c] (x') ::;; f(y'1) + cllx' - Y'111 ::;; f(y'1) + cllx' - xII + cllx - Y'111 ::;; f[c](x) + cllx' - xII + TJ·

This finishes the proof of (i), since TJ was arbitrary.

[(ii) and (iii)] To say that x E Cc[n is to say that the inf-convolution of f and cll·1I is exactatx = x+O. From Theorem 3.4.2, thismeansthat8f(x)nB(0, c) isnonempty: indeed

8(cll . 11)(0) = cB(O, 1) = B(O, c) = 8e(cll . 11)(0) .

The last equality comes from the sublinearity of c 11·11 (cf. Example 1.2.5) and implies with the aid of Theorem 3.4.1:

8e.f[c](x) = U{8a f(x) n B(O, c) : 0::;; a::;; e} = 8ef(x) n B(O, c). 0

Just as in Example 3.4.4, we can particularize ftc] to the case where f is the indicator function of a closed convex set C: we get ftc] = cdc and therefore

8e(cdc)(x) = NC,e(x) n B(O, c) for all x E C.

Thus, when e increases, the set K' in Fig. V.2.3.1 increases but stays confined within B(O, c).


Remark 3.4.6 For x fj. Cc[!], there is some Ye =f:. x yielding the infimum in (3.4.9). At such a Ye, the Euclidean norm is differentiable, so we obtain that f[e] is differentiable as well and

x - Ye V f[e] (x) = c Ilx _ Yell E 8f(Ye) .

Compare this with Example 3.4.4: here Ye need not be unique, but the direction Ye - x depends only on x. 0

3.5 Maximum of Functions

In view of Theorem X.2.4.4, computing the approximate subdifferential of a supremum involves the closed convex hull of an infimum. Here, we limit ourselves to the case offmitely many functions, say f := maxj=l, ... ,p f), a situation complicated enough. Thus, to compute f*(s), we have to solve

p

i!lf. Lajfj*(sj) , a},s} j=l

p

L ajsj = s , (3.5.1) j=l

p

aj ~ 0, L aj = 1 [i.e. a E Llp] . j=l

An important issue is whether this minimization problem has a solution, and whether the infimal value is a closed function of s. Here again, the situation is not simple when the functions are extended-valued.

Theorem 3.5.1 Let fl' ... , fp be a finite number of convex functions from lRn to lR and let f := maxj f); set m := min{p, n + I}. Then, S E 8ef(x) ifand only if there exist m vectors Sj E dom fj*, convex multipliers aj, and nonnegative I>j such that

Sj E 8sj / aj f)(x) for all j such that aj > 0, }

S = L,jajsj, L,j[l>j - ajf)(x)] + f(x) ~ 1>.

PROOF. Theorem X.2.4.7 tells us that

m f*(s) = Lajft(sj) ,

j=l

(3.5.2)

where (aj, Sj) E lR+ x dom ft form a solution of(3.5.1)(afterpossible renumbering of the j's). By the characterization (1.2.1), the property S E 8ef(x) is therefore equivalent to the existence of (aj, Sj) E lR+ x dom fj* - more precisely a solution of (3.5.1) - such that

124 Xl. Approximate Subdifferentials of Convex Functions

m m

"Lajfj*(Sj) + f(x) ~ S + "Laj(sj,x) , j=1 j=1

which we write m m

"Laj[fj*(Sj) + Jj(x) - (Sj' x)] ~ S + "LajJj(x) - f(x). (3.5.3) j=1 j=1

Thus, if S E ad(x), i.e. if(3.5.3) holds, we can set

Sj := aj[f/(sj) + fj(x) - (Sj, x)]

so that Sj E aSj/ajJj(x) ifaj > 0: (Sj, aj, Sj) are exhibited for (3.5.2). Conversely, if(3.5.2) holds, multiply by aj each of the inequalities

* Sj fj (Sj) + Jj(x) - (Sj' x) ~ -; aj

add to the list thus obtained those inequations having aj = 0 (they hold trivially!); then sum up to obtain (3.5.3). D

It is important to realize what (3.5.2) means. Its third relation (which, incidentally, could be replaced by an equality) can be written

where, for each j, the number

m

~)ej + ajej) ~ e, j=1

ej := f(x) - fj(x)

(3.5.4)

is nonnegative and measures how close ij comes to the maximal f at x. Using elementary calculus rules for approximate subdifferentials, a set-formulation of(3.5.2) is

ad(x) = U {l:J=1 a£/ajij)(x) : a E Llm, l:1=I(ej +ajej) ~ e} (3.5.5)

(remember that the e-subdifferential of the zero function is identically {O}). Another observation is as follows: those elements (aj, Sj) with aj = 0 do not matter and can be dropped fromthecombinationmakingups in (3.5.2). Then set 1)j := ej/aj to realize that S E ad(x) if and only ifthere are positive aj summing up to 1, 1)j ~ 0 and Sj E a'ljf(x) such that

Laj(1)j+ej)~e and Lajsj=s. (3.5.6) j j

The above formulae are rather complicated, even more so than (3.1.5) corresponding to a sum - which was itself not simple. In Theorem 3.5.1, denote by

J£(x) := {j : ij(x) ~ f(x) - e}

the set of e-active indices; then we have the inclusion

UjEJ,(X)aij(x) c ad(x) ,

which is already simpler. Unfortunately, equality need not hold: with n = 1, take fl (~) = ~, h (~) = -~. At ~ = 1, the left-hand side is {I} for all e E [0, 2[, while the right-hand side has been computed in §l.l: ad(l) = [1- e, 1].

In some cases, the exact formulae simplify:


Example 3.5.2 Consider the function f+ := max{O, f}, where f : lRn --+ lR is convex. We get

as(r)(x) = u {a8(af)(X) : 0::;; a::;; 1, 8 + rex) - af(x) ::;; 8} . Setting 8(a) := 8 - rex) + af(x), this can be written as

with the convention ard(x) = 0 if1} < o. o

Another important simplification is obtained when each fJ is affine:

Example 3.5.3 (piecewise Affine Functions) Take

f(x):=max{(sj,x}+bj: j=I, ... ,p}. (3.5.7)

Each 8j /arsubdifferential in (3.5.2) is constantly {Sj}: the 81'S play no role and can be eliminated from (3.5.4), which can be used as a mere definition

o [::;; L]=I 8j] ::;; 8 - L)=I ajej .

Thus, the approximate subdifferential ofthe function (3.5.7) is the compact convex polyhedron

ad(x) = {L)=I ajsj : a E ..1p , L)=I ajej ::;; 8} . In this formulation, ej = f(x) - (Sj' x) - bj. The role of ej appears more explicitly if

the origin of the graph-space is carried over to (x, f(x» (remember Example VI.3.4): f of (3.5.7) can be alternatively defined by

lRn 3 Y 1-+ fey) = f(x) + max {-ej + (Sj, y -x) : j = 1, ... , pl·

The constant term f(x) is of little importance, as far as subdifferentials are concerned. Neglecting it, ej thus appears as the value at x (the point where asf is computed) ofthe j'll affine function making up f.

Geometrically, as f (x) dilates when 8 increases, and describes a sort of spider web with af(x) as "kernel". When 8 reaches the value maxj ej' ad(x) stops at CO{SI, ... , Sp}. 0

Finally, if 8 = 0, (3.5.4) can be satisfied only by 8j = 0 and ajej = 0 for all j. We thus recover the important Corollary VI.4.3.2. Comparing it with (3.5.6), we see that, for larger ej or 1}j, the 1}rsubgradient Sj is "more remote from af(x)", and as a result, its weightaj must be smaller.

3.6 Post-Composition with an Increasing Convex Function

Theorem 3.6.1 Let f : ~n --+ ~ be convex, g E Cony ~ be increasing, and assume the qualification condition f (~n) n int dom g =1= 0. Then,for all x such that f (x) E

domg, s E as(g 0 f)(x) {:::::::}

381, 82 ~ 0 and a ~ 0 such that } 81 + 82 = 8, S E aSl (af)(x), a E as2 g(f(x» .

(3.6.1)

126 XI. Approximate SubdiiIerentials of Convex Functions

PROOF. [=*] Let a ~ 0 be a minimum of the function 1/Is in Theorem X.2.5.1; there are two cases:

(a) a = O. Because domf = lRn , this implies s = 0 and (g 0 f)*(0) = g*(O). The characterization of s = 0 E oe (g 0 f) (x) is

g*(O) + g(f(x» ~ 8, i.e. 0 E 0eg(f(x» .

Thus, (3.6.1) holds with 82 = 8,81 = 0 - and a = 0 (note: o(Of) == {OJ because f is finite everywhere).

(b) a > O. Then (g 0 f)*(s) = aj*(s/a) + g*(a) and the characterization of s E oe(g ° f)(x) is

af*(s/a) + g*(a) + g(f(x» - (s, x) ~ 8,

i.e. (af)*(s) + af(x) - af(x) + g*(a) + g(f(x» - (s, x) ~ 8.

Split the above left-hand side into

(af)*(s) + af(x) - (s, x) =: 81 ~ 0, g*(a) + g(f(x» - af(x) =: 82 ~ 0,

which can be enunciated as: s E oel (af)(x), a E oe2g(f(x», 81 + 82 ~ 8. Because, once again, 7/-subdifferentials increase with 7/, (3.6.1) is established.

[<=] (3.6.1) means:

a[f(y) - f(x)] ~ (s, y - x) - 81 for all y E lRn ,

g(r) ~ g(f(x» + a[r - f(x)] - 82 for all r E lR,

hence, with r = f(y):

g(f(y» ~ g(f(x» + (s, y - x) - 8 for all y E lRn .

As an application, consider the set

C := {x E lRn : c(x) ~ O} with c: lRn ~ lR convex.

o

(3.6.2)

Writing Ie as the compositionI]_oo,o]OC, the 8-normal setNe,e(x) of Definition 1.1.3 can be characterized in terms of approximate subdifferentials of c at x. Theorem 3.6.1 is valid under our qualification condition c(lRn)n ]-00, O[ oF 0, in which we recognize Slater's assumption. Furthermore, the approximate subdifferential ofI]-oo,o] = g has been computed in Example 1.2.5:

Then we obtain: s E Ne,e(x) if and only if there are nonnegative a, 8t. 82, with 81 + 82 = 8, such that

s E oel (ac)(x) and ac(x) + 82 ~ O. (3.6.3)

4 The Approximate Subdifferential as a Multifunction 127

Corollary 3.6.2 Let C be described by (3.6.2) and assume that c(xo) < 0 for some Xo. Then

NC,e(x) = U {a8 (a c) (x) : a ~ 0, 8 ~ 0, 8 - ac(x) ~ 6} . (3.6.4)

Inparticular, ifx e bdC, i.e. ifc(x) = 0,

NC,e(x) = U{ae(ac)(x) : a ~ O}. (3.6.5)

PROOF. Eliminate 62 and let 8 := 6) in (3.6.3) to obtain the set-formulation (3.6.4). Then remember (VI.l.3.6) and use again the monotonicity of the multifunction 8 ~ ~~~W. 0

Naturally, (3.6.5) with 6 = 0 reduces to Theorem VI.1.3.5.

All the necessary material to reproduce Chap. VII is now at hand. For example, Corollary 3.6.2 and various calculus rules from the present Section 3 allow the derivation ofnecessary and sufficient conditions for approximate minimality in a constrained convex minimization problem. With the help of Example 3.5.2, the theory of exact penalty can be reproduced, etc. It is interesting to note that the various (Q)-type assumptions, giving the calculus rules in §X.2.2 and §X.2.3, are intimately related to the constraint qualification conditions of §Vll.2.

Let us conclude with a general remark: in §III.5.3, we have alluded to some calculus rules for normal and tangent cones. They can be developed in a rigorous manner, by an application of the present calculus to functions of the form /j = lej' All these exercises are left to the reader.

4 The Approximate Subdifferential as a Multifunction

4.1 Continuity Properties of the Approximate SubditTerential

We will see in this section that the multifunction ae f is much more regular when 6 > 0 than the exact subdifferential, studied in §VI.6.2. We start with two useful properties, stating that the approximate subdifferential (e, x) ~ aef(x) has a closed graph, and is locally bounded on the interior of dom f; see §A.5 for the terminology.

Proposition 4.1.1 Let {(ek> Xk, Sk)} be a sequence converging to (e, x, s), with sk e aekf(Xk) for all k. Then s e aef(x).

PROOF. By definition,

f(y) ~ f(Xk) + (Sk> y - Xk) - 6k for all y e ]Rn .

Pass to the limit on k and use the lower semi-continuity of f. o

Proposition 4.1.2 Assume intdom f '" 0; let 8 > 0 and L be such that f is Lipschitzian with constant L on some ball B(x, 8), where x e intdomf. Then, for all 8' < 8,

e IIslI ~ L + 8 _ 8'

whenever s e aef(y), with y e B(x, 8').

(4.1.1)


PROOF. We know (Theorem N.3.1.2) that f is locally Lipschitzian on intdomf, so we can take x, 8, L as stated. To prove (4.1.1), take 8', y ands as stated, assume s =f:. 0 and set z := y + (8 - 8')s/lIsll in the definition

f(z) ~ f(y) + (s, z - y) - e.

Observe that z and yare in B(x, 8) and conclude

L(8 - 8') ~ (8 - 8')lIsll- e. o

As a result, the multifunction ae f is outer semi-continuous, just as is the exact subdifferential. But for e > 0, it also enjoys fairly strong continuity properties. We first assume that f is finite everywhere, and then we consider the general case via the regularized version f ~ ell . II of f. In the result below, recall from Theorem V.3.3.8 that the Hausdorff distance between two (nonempty) compact convex sets A and B has the expression

Theorem 4.1.3 Let f: Rn -+ R be a convex LipschitzianJunction on Rn. Then there exists K > 0 such that,for all x, x' in Rn and e, e' positive:

LlH(aef(x), ae' f(x'» ~ --:-----{K '} (lix - x'il + Ie - e'I). mm e,e (4.1.2)

PROOF. With d of norm 1, use (2.1.1): for any TJ > 0, there is tl1 > 0 such that

(4.1.3)

where we have used the notation qe(x, t) for the approximate difference quotient. By assumption, we can let 8 -+ +00 in Proposition 4.1.2: there is a global Lipschitz constant, say L, such that f;(x, d) ~ L. From

we therefore obtain 1 2L + TJ -~--tl1 e

Then we can write, using (2.1.1) and (4.1.3) again:

f;,(x', d) - f;(x, d) - TJ ~ qe'(x', tl1 ) - qe(x, tl1 ) =

f(x' + tl1 d) - f(x + tl1d) + f(x) - f(x') + e' - e = ~ tl1

2Lllx' xii + Ie' el L ~ - - ~2:7](2Lllx'-xll+le'-el). tl1

Remembering that TJ > 0 is arbitrary and inverting (x, e) with (x', e'), we do obtain


I f.e', (x', d) - f;(x, d)1 ~ .2(L ') (2Lllx' - xII + Ie' - el) mm 8,8

so the theorem is proved, for example with K = max{2L, 4L2}. o

This result definitely implies the inner semi-continuity of (x, e) 1---+ aef(x) for a Lipschitz-continuous f. In particular, for fixed e > 0,

aef(y) c ae!(x) + lIy - xIlB(O, f) for all x and y,

a property already illustrated by Fig. 1.1.2. Remember from §VI.6.2 that the multifunction af need not be inner semi-continuous; when e = 0, no inclusion resembling the above can hold, unless x isjixed.

A local version of Theorem 4.1.3 can similarly be proved: (4.1.2) holds on the compact sets included in int dom f. Here, we consider an extension of the result to unbounded subdifferentials. Recall that the Hausdorff distance is not convenient for unbounded sets. A better distance is obtained by comparing the bounded parts of closed convex sets: for c ~ 0, we take

Corollary 4.1.4 Let f E Conv IRn. Suppose that S C dom f and f. > ° are such that af(x) n B(O, 0 #- 0forall XES. Then,forall c > ~ there exists Kc such that, for all x, x' in Sand e, e' positive,

LlH c(ae!(x), ae,f(x'» ~ ~{ ,}(lIx' - xII + Ie' - el). , mm 8,E

PROOF. Consider fic] := f ~ cll . II and observe that we are in the conditions of Proposition 3.4.5: f[c] is Lipschitzian on IRn and Theorem 4.1.3 applies. The rest follows because the coincidence set of f[c] and f contains S. 0

Applying this result to an f finite everywhere, we obtain for example the following local Lipschitz continuity:

Corollary 4.1.5 Let f : IRn ~ IR be convex. For any 8 ~ 0, there is K8 > Osuch that

K8 LlH(aef(x), ae!(x'» ~ -lix - x' II for all x and x' in B(O, 8).

e

PROOF. We know from Proposition 4.1.2 that ae f is bounded on B(O, 8), so the result is a straightforward application of Corollary 4.1.4. 0

4.2 Transportation of Approximate Subgradients

In §VI.6.3, we have seen that af(x) can be constructed by piecing together limits of subgradients along directional sequences. From a practical point of view, this is an


important property: for example, it is the basis for descent schemes in nonsmooth optimization, see Chap. IX. This kind of property is even more important for approximate subdifferentials: remember from §II.l.2 that the only information obtainable from f is a black box (Ul), which computes an exact subgradient at designated points. The concept of approximate subgradient is therefore of no use, as long as there is no "black box" to compute one. Starting from this observation, we study here the problem of constructing aef(x) with the sole help of the same (Ul).

Theorem 4.2.1 (A. Brndsted and R.T. Rockafellar) Let be given f E ConvJR.n,

x E dom f and 8 ~ o. For any 11 > 0 and s E aef(x), there exist xl'/ E B(X,l1) and

sl'/ E af(xl'/) such that IIsl'/ - sil ~ 8/11.

PROOF. The data are x E dom f, 8 > 0 (if 8 = 0, just take xl'/ = x and sl'/ = s!), 11 > 0 and s E aef(x). Consider the closed convex function

JR.n 3 Y 1--+ fP(y) := f(y) + /*(s) - (s, y) .

It is nonnegative (Fenchel's inequality), satisfies fP(x) ~ 8 (cf. (1.2.1», and its subdifferential is

afP(Y) = af(y) - Is} for all y E domfP = domf.

Perturb fP to the closed convex function

8 JR.n 3 y 1--+ 1/f(y) := fP(y) + -lly - x II ,

11

whose subdifferential at y E dom f is (apply Corollary 3.1.2: (3.1.2) obviously holds)

a1/f(y) = afP(Y) + ~a(ll . -xll)(y) c afP(Y) + B(O,~).

Because fP is bounded from below, the O-coercivity of the norm implies the 0-coercivity of 1/f; there exists a point, say xl'/' minimizing 1/f on JR.n; then 0 E a1/f(xl'/) is written

o E af(xl'/) - Is} + B(O, *). It remains to prove that Xl'/ E B(x, 11). Using the nonnegativity of fP and optimality of

o

This result can be written in a set formulation:

aef(x) c n u {af(Y) + B(O, *)}. 1'/>0 lIy-xll ~ 1'/

(4.2.1)

It says that any 8-subgradient at x can be approximated by some exact subgradient, computed at some y, possibly different from x. For Y close to x (11 small) the approximation may be coarse; an accurate approximation (11 large), may require seeking y far from x. The value 11 = ..;e is a compromise, which equates the deviation from x and the degree of approximation: (4.2.1) implies


aef(x) c U {af(y) + B(O,../8) : y E B(O,../8)} .

An illustrative example is the one-dimensional function f: x t--+- x + 1/ x, with x = 1 and e = 1 (which is the value eoo of §2.2). Then 1 E ae f (x) but 1 is nowhere the derivative of f: in the above proof, xfj is unbounded for 11 ~ +00.

We also see from (4.2.1) that aef(x) is contained in the closure of af(Rn) - and the example above shows that the closure operation is necessary. Our next question is then: given an exact subgradient s computed somewhere, how can we recognize whether this s is in the set as f (x) that we are interested in? The answer turns out to be very simple:

Proposition 4.2.2 (Transportation Formula) With x and x' in dom f, let s' E

a f (x'). Then s' E aef (x) if and only if

f(x') ~ f(x) + (s', x' - x) - e. (4.2.2)

PROOF. The condition is obviously necessary, since the relation of definition (1.1.1) must in particular hold at y = x'. Conversely, for s' E a f (x'), we have for all y

f(y) ~ f(x') + (s', y - x') = = f(x) + (s', y - x) + [f(x') - f(x) + (s', x - x')].

If (4.2.2) holds, s' E aef(x). o

The next chapters, dealing with numerical algorithms, will use (4.2.2) intensively. We call it the transportationformula, since it "transports" at x a given subgradient at x'. Geometrically, Fig.4.2.1 shows that it is very natural, and it reveals an important concept:

E

x ~

Fig. 4.2.1. Linearization errors and the transportation formula

Definition 4.2.3 (Linearization Error) For (x, x', s') E dom f x dom f x IRn, the linearization error made at x, when f is linearized at x' with slope s', is the number

e(x, x', s') := f(x) - f(x') - (s', x - x').

This linearization error is of particular interest when s' E af(x'); then, calling

ix',s'(Y) := f(x') + (s', y - x')


the corresponding affine approximation of I, there holds the relations

lx',s' :s:;; I, lx',s'(x') = I(x'), - , ,

Ix',s'(x) = I(x) - e(x, x ,s ).

The definition of s' E 01 (x') via the conjugate function can also be used:

e(x, x', s') = I(x) + I*(s') - (s', x) ~ 0 if s' E ol(x'). 0

The content of the transportation formula (4.2.2), illustrated by Fig. 4.2.1, is that any s' E ol(x') is an e(x, x', s')-subgradient of 1 at x; and also that it is not in a tighter approximate subdifferential, i.e.

e < e(x, x', s') ==} s' l/. oel(x) .

This latter property relies on the contact (at x') between gr lx',s' and gr I. In fact, a result slightly different from Proposition 4.2.2 is:

Proposition 4.2.4 Let s' E 0T//(x'). Then s' E oe!(x) if

I(x') ~ I(x) + (s', x' - x) - e +."

or equivalently e(x, x', s') + ." :s:;; e .

PROOF. Proceed as for the "if" -part of Proposition 4.2.2. o

The transportation formula gives an easy answer to the pragmatic question: "Let a subgradient be computed at some point by the black box (UI), is it an approximate subgradient at some other point?" Answer: just compare I-and l-values. Returning to a more theoretical framework, we now ask: given x and e, what are those x' such that (4.2.2) holds? This question is ambiguous when ol(x') is not a singleton, so we define two sets:

Vdx) := {x' E domol : ol(x') c oe!(x)} Vs(x) := {x' E]Rn : ol(x') n oe!(x) =f:. 0} .

Equivalent definitions are

Ve(x):= {x' E domol : e(x,x',s'):S:;; e for all s' E ol(x')} Vs(x) := {x' E]Rn : e(x, x', s') :s:;; e for some s' E 01 (x') } ,

(4.2.3)

and it is clear that Ve (x) C Vs (x). If 1 is differentiable on ]Rn, the two sets obviously coincide:

Ve(x) = Vs(x) = {x' E]Rn : e(x, x', V/(x'»:s:;; e}.

The next result motivates our notation.

Proposition 4.2.5 Suppose that x E intdom/. Then

(i) Ve(x) is a neighborhood o/x ife > O. (ii) Vs(x) is the closure o/Ve(x).


PROOF. [(i)] Apply Proposition 4.1.2: there exist ~ > 0 and a constant K such that, for all x' E B(x,~) ands' E af(x'),

e(x, x', s') ~ If(x') - f(x)1 + 1Is'1i IIx' - xII ~ 2Kllx' - xII;

so Ve(x) contains the ball of center x and radius min{~, e/(2K)}.

[(ii)] Take a point x' E Ys(x). We claim that

af(x + t(x' - x» c ae!(x) for all t E ]0, 1[; (4.2.4)

see Fig.4.2.1: we insert a point between x and x'. A consequence of (4.2.4) will be Ys(x) c cl Ve(x) (let ttl).

To prove our claim, set d := x' - x and use the function r = ro of (2.1.3) to realize with Remark 2.2.4 that

{e(x, x + td, s) : s E af(x + td)} = ar(l/t) for all t > o.

By definition of Ys(x), we can pick s' E af(x') such that

-ar(l) 3 e(x, x', s') ~ e.

(4.2.5)

Then we take arbitrary l/t = u > 1 ands" E af(x +td), so that -e(x, x +td, s") E

ar (u). The monotonicity property (V1.6.1.1) or (1.4.2.1) of the subdifferential ar gives

[-e(x, x + td, s") + e(x, x', s')](u - 1) ~ 0

hence e(x, x + td, s") ~ e(x, x', s') ~ e, which proves (4.2.4). There remains to prove that Ys(x) is closed, so let {xk} be a sequence of Ys(x)

converging to some x'. To each xk' we associate Sk E af(xk> n ae!(x); {Sk} is bounded by virtue of Theorem 1.1.4: extracting a subsequence if necessary, we may assume that {Sk} has a limit s'. Then, Proposition 4.1.1 and Theorem 1.1.4 show that s' E af(x') n aef(x), which means that x' E Ys(x). 0

From a practical point of view, the set Ve (x) and the property (i) above are both important: provided that y is close enough to x, the s(y) E of(y) computed by the black box (Ul) is guaranteed to be an e-subgradient at x. Now, Ve(x) and ~(x) differ very little (they are numerically indistinguishable), and the latter has a strong intrinsic value: by definition, x' E ~(x) if and only if

3s' E oef(x) such that s' E of (x'), i.e. such that x' E of*(s').

In the language of multifunctions, this can be written.

~(x) = of*(oef(x». (4.2.6)

Furthermore the above proof, especially (4.2.5), establishes a connection between ~ (x) and §2.2. First ~(x) - {x} is star-shaped. Also, consider the intersection of ~(x) with a direction d issuing from x. Keeping (4.2.3) in mind, we see that this set is the closed interval


where the perturbed difference quotient qe is decreasing. In a word, let te(d) be the largest element of Te = Te (d) defined by (2.2.2), with the convention te (d) = +00 if Te is empty or unbounded (meaning that the approximate difference quotient is decreasing on the whole of lR;). Then

¥e(x) = (x + td : t E [0, te(d)], d E B(O, I)}. (4.2.7)

See again the geometric interpretations of §2, mainly the end of §2.3.

Our neighborhoods Ve(x) and Ye(x) enjoy some interesting properties if additional assumptions are made on f:

Proposition 4.2.6

(i) If f is I-coercive, then Ye (x) is bounded. (ii) Iff is differentiable on IR.n, then Ve (x) = Ye (x) and V f (Ve (x» c 8d (x).

(iii) Iff is I-coercive and differentiable, then V f(Ve(x» = 8d(x).

PROOF. In case (i), f* is finite everywhere, so the result follows from (4.2.6) and the local boundedness of the subdifferential mapping (Proposition VI.6.2.2).

When f is differentiable, the equality between the two neighborhoods has already been observed, and the stated inclusion comes from the very definition of Ve (x).

Finally, let us establish the converse inclusion in case (iii): for s E 8ef(x), pick y E 81*(s), i.e. s = Vf(y). The result follows from (4.2.6): y E Ye(x) = Ve(x).

o

Example 4.2.7 Let f be convex quadratic as in Example 1.2.2. Substitute in (1.2.5)

Vf(x) + Qy = Qx +b + Qy = Q(x + y) +b = Vf(x + y)

so as to obtain the form

8d(x) = {V f(x') : 4(Q(x' - x), x' - x) ~ e},

which discloses the neighborhood

Ye(x) = {x' E IR.n : 4(Q(x' - x), x' - x) ~ e}.

(4.2.8)

(4.2.9)

When Q is invertible, we have a perfect duality correspondence: Ye(x) [resp. 8d(x)] is the ball of radius ,J2£ associated with the metric of Q [resp. Q-I], and centered at x [resp. at V f(x)].

Finally note that we can use a pseudo-inverse in (4.2.9):

x' = Q-(Qx' + b - b) = Q-[Vf(x') - b]

and obtain the form

Ye(x) = {Q-[Vf(x') - b] : 4 (Q(x' - x), x' - x) ~ e} .

This illustrates (4.2.6),knowingthat 81*(s) = Q-(s-b)+KerQfors-b E ImQ and that Ker Q c Ye(x). 0


A final remark: we have seen in §II.2 the importance of defining a neighborhood of a given iterate x = Xk, to construct suitable directions of search. For the sake of efficiency, this neighborhood had to reflect the behaviour of I near x. Here, Ye (x) is a possible candidate: interpret the constraint in (11.2.3.1) as defining a neighborhood to be compared with (4.2.9). However, it is only because VI and Vf* are affine mappings that Ye (x) has such a nice expression in the quadratic case. For a general I, Ye (x) is hardly acceptable; (4.2.7) shows that it need not be bounded, and (4.2.6) suggests that it need not be convex, being the image of a convex set by a mapping which is rather complicated

Example 4.2.8 Let I be defined by

I(x) = max{O, ~2 + TJ2 - I} for all x = (~, TJ) E]R2.

Consider x = (1,1), at which I(x) = 1 and V I(x) = (2,2), and take e = 1/2. Look at Fig. 4.2.2 to see how Ye (x) is constructed: if I were the quadratic function

e + TJ2 - 1, we would obtain the ball

D = {(~, TJ) : (~- 1)2 + (TJ - 1)2 ~ I/2}

(see Example 4.2.7). However, I is minimal on the unit ball B(O, 1), i.e. 0 E a/(x') forallx' E B(O,I);furthermore,O ¢ al/d(x).Inviewofitsdefinition(4.2.3), Ve(x) therefore does not meet B(O, 1):

V1/2(X) C D\B(O, 1) .

On the other hand, it suffices to remove B(O, 1) from D, and this can be seen from the definition (4.2.3) of Ye(x): the linearization error e(x, " .) is left unperturbed by the max-operation defining I. In summary:

Ye(x) = {(~, TJ) : (~- 1)2 + (TJ - 1)2 ~ 1/2, ~2 + TJ2 ~ I}.

We thus obtain a nonconvex neighborhood; observe its star-shaped character. 0

8(0,1)

Fig.4.2.2. A nonconvex neighborhood Ve(x)

By contrast, another interesting neighborhood is indeed convex: it is obtained by inverting the role of x and x' in (4.2.3). For x E intdom I, define the set illustrated in Fig. 4.2.3:


Ve*(X) := {x' E]R.n : e(x', x, s) ~ e for all s E af(x)).

Another expression is

which shows that Ve*(x) is closed and convex. Reproduce the proof of Proposition 4.2.5(i) to see that Ve*(x) is a neighborhood of x. For the quadratic function of Example 4.2.7, Ve*(x) and Ve(x) coincide.

E

grf

x' -.. f(x) + <s,x'-x>

Fig. 4.2.3. A second-order neighborhood

Documents

[Grundlehren der mathematischen Wissenschaften] Convex Analysis and Minimization Algorithms II Volume 306 || Approximate Subdifferentials of Convex Functions