120
Adaptive Frame Based Regularization Methods for Linear Ill-Posed Inverse Problems von Mariya Zhariy Dissertation zur Erlangung des akademischen Grades Doktorin der Naturwissenschaften – Dr. rer. nat. – Vorgelegt im Fachbereich III der Universit¨ at Bremen im Oktober 2008 Betreuer: Prof. Dr. Gerd Teschke Prof. Dr. Ronny Ramlau

Adaptive Frame Based Regularization Methods for Linear Ill ...elib.suub.uni-bremen.de/diss/docs/00011263.pdf · This thesis is concerned with the development and analysis of adaptive

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Adaptive Frame Based Regularization Methods

for Linear Ill-Posed Inverse Problems

von Mariya Zhariy

Dissertation

zur Erlangung des akademischen GradesDoktorin der Naturwissenschaften

– Dr. rer. nat. –

Vorgelegt im Fachbereich III der Universitat Bremenim Oktober 2008

Betreuer: Prof. Dr. Gerd TeschkeProf. Dr. Ronny Ramlau

Abstract

This thesis is concerned with the development and analysis of adaptive regularization methods for solvinglinear inverse ill-posed problems. Based on nonlinear approximation theory, the adaptivity concept hasbecome popular in the field of well-posed problems, especially in the solution of elliptic PDE’s. Undercertain conditions on the smoothness of the solution and the compressibility of the operator it has beenshown that the nonlinear approximation guarantees a more efficient approximation with respect to thesparsity of the solution and to the computational effort.In the area of inverse problems, the sparse approximation approach has been applied in solving de-noising and de-blurring problems as well as in general regularization. An essential cost reduction hasbeen achieved by newly developed strategies like domain decomposition and specific projection meth-ods. However, the option of adaptive application, leading simultaneously to cost reduction and sparseapproximation, has not been taken into account yet.In our research we combine the advantages of the nonlinear approximation with the classical regular-ization and parameter identification strategies, like Tikhonov and Landweber methods. We modify theclassical approach for solving the ill-posed inverse problems in order to obtain a sparse approximationwith essentially reduced numerical effort. The main novelty of the developed regularization methods isthe adaptive operator approximation.In general, we have shown that it is possible to construct adaptive regularization methods, which yieldthe same convergence rates as the conventional regularization methods, but are much more efficient withrespect to the numerical costs.The analytic results of this work have been confirmed and complemented by numerical experiments, thatillustrate the sparsity of the obtained solution and desired convergence rates.

Zusammenfassung

Das Hauptvorhaben dieser Arbeit ist die Entwicklung adaptiver Regularisierungsmethoden fur die Losunglinearer schlechtgestellter inverser Probleme. Basierend auf der nichtlinearen Approximation haben adap-tive Verfahren in den letzten Jahren insbesondere im Bereich der gutgestellten Probleme an Popularitatgewonnen. Unter bestimmten Voraussetzungen an die Glattheit der Losung und die Kompressibilitat desOperators ist es gelungen die Losung eines gutgestellten Problems durch eine dunnbesetzte Darstellungmit linearer Komplexitat zu approximieren.In der letzten Zeit sind die auf der dunnen Darstellung basierten Ansatze auf dem Gebiet der schlecht-gestellten Probleme mehrmals erfolgreich verwendet worden, etwa im Bereich der Verbesserung derBildqualitat durch Entrauschen und Scharfen. Die darauffolgend entwickelten numerischen Verfahren,wie z.B. Gebietzerlegungsalgorithmen oder bestimmte Projektionsstrategien, ermoglichen eine Senkungder Iterationsschritte, berucksichtigen aber nicht die Moglichkeit der adaptiven Anwendung.Die in dieser Arbeit vorgestellten adaptiven Regularisierungsverfahren basieren auf den klassischen Meth-oden zur Losung schlechtgestellter Probleme, etwa Landweber-Iteration und Tikhonov-Regularisierungsowie entsprechender Regeln zur Wahl der optimalen Regularisierungsparameter. Die Modifikation derklassischen Verfahren erfolgt mit dem Ziel, die Losung der schlechtgestellten Probleme auf eine adaptiveWeise zu approximieren und gleichzeitig den Berechnungsaufwand zu reduzieren. Dabei entstehen aufnaturliche Weise dunn besetzte Rekonstruktionen.Die theoretischen Resultate der Arbeit wurden anhand eines numerischen Beispiels, der Rekonstruktionvon Tomographie-Daten, verifiziert. Es ist uns gelungen, den Rechenaufwand zu verringern und dabeidie Konvergenzraten der nichtadaptiven Verfahren zu erhalten.

Contents

I Wavelet Bases And Frames For Operator Equations 5

1 Wavelet Bases 71.1 Multiresolution and Wavelet Riesz Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Riesz Basis Property and Biorthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Characterization of Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.1 Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2 Direct and Inverse Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Wavelets on Bounded Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Frames 172.1 Basic Frame Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Gelfand Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3 Localization of Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Frame Based Operator Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

II Inverse Problems 23

3 Basic Definitions 253.1 Ill-Posedness and Generalized Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Regularization and Order Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Compact Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Filter Based Formulation 314.1 Regularization Property and Parameter Choice Rules . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 A-priori Parameter Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.2 Morozov’s Discrepancy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1.3 Balancing Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Regularization in Hilbert Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.1 Wavelet Based Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.1 Regularization Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.2 Tikhonov Regularization with an A-Priori Parameter Choice . . . . . . . . . . . . 394.3.3 Tikhonov Regularization with Morozov’s Discrepancy Principle . . . . . . . . . . . 394.3.4 Tikhonov Regularization with Balancing Principle . . . . . . . . . . . . . . . . . . 404.3.5 Tikhonov Regularization in Hilbert Scales . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4.1 Regularization Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.2 A-priori Truncation Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.4.3 Morozov’s Discrepancy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.4.4 Landweber Iteration in Hilbert scales . . . . . . . . . . . . . . . . . . . . . . . . . 42

III Adaptivity Issues 43

5 Nonlinear Approximation and Adaptivity 455.1 Nonlinear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Best N -term Approximation in �2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Adaptive Iterative Scheme for Well-Posed Problems 496.1 Frame Based Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.2 Adaptive Discretization of the Operator Equation . . . . . . . . . . . . . . . . . . . . . . . 506.3 Frame Based Adaptive Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

IV Adaptive Regularization Methods 57

7 Approximate Filter Based Methods 597.1 Approximate Filter Based Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.1.1 Approximate Wavelet Based Regularization . . . . . . . . . . . . . . . . . . . . . . 617.2 Approximate Morozov’s Discrepancy Principle . . . . . . . . . . . . . . . . . . . . . . . . 627.3 Order Optimality by Approximate Balancing Principle . . . . . . . . . . . . . . . . . . . . 63

8 Adaptive Tikhonov Regularization 678.1 Adaptive Tikhonov Regularization: Formulation . . . . . . . . . . . . . . . . . . . . . . . 69

8.1.1 Regularization Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.2 Order Optimality by A-priori Parameter Choice . . . . . . . . . . . . . . . . . . . . . . . . 70

8.2.1 Standard Filter Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708.2.2 A-priori Error Estimate in Hilbert scales . . . . . . . . . . . . . . . . . . . . . . . . 70

8.3 Complexity of Adaptive Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . 718.4 Approximate Application of Balancing Principle . . . . . . . . . . . . . . . . . . . . . . . 72

9 Adaptive Landweber Iteration 739.1 Modified Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739.2 Adaptive Formulation of Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 749.3 Order Optimality by A-Priori Parameter Choice . . . . . . . . . . . . . . . . . . . . . . . 769.4 Regularization by A-Posteriori Parameter Choice . . . . . . . . . . . . . . . . . . . . . . . 78

9.4.1 Adaptive Residual Discrepancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789.4.2 A Monotonicity of Adaptive Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 789.4.3 Convergence in Exact Data Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819.4.4 Regularization Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

10 Example Problem 9110.1 Linear Radon Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9110.2 Regularity of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9210.3 Requirements on the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

10.3.1 Requirements Imposed by the Regularity of the Solution . . . . . . . . . . . . . . . 9310.3.2 Requirements Imposed by the Operator Discretization . . . . . . . . . . . . . . . . 95

10.4 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9710.4.1 Adaptive Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

11 Summary and Outlook 105

Introduction

Adaptivity is a promising concept in signal representation whenever a structure or a signal under con-sideration is in some sense sparse, i.e. it can be approximated by only a few components in terms ofsome basis or frame. The main two questions in sparse approximation are how to find an appropriatedecomposing system and which components to use.

In this thesis we apply the concept of nonlinear approximation [28] in terms of wavelets, which has itsorigin in the multiresolution analysis [20, 40]. The multiscale decomposition benefits from the simultane-ous representation of the signal components on different resolution levels, which in many relevant casesresults in a more efficient approximation than the representation on the finest resolution level. However,given a multilevel decomposition, there are different ways to approximate the decomposed signal fromits building components. The efficiency of approximation can be considerably improved if one considerssignificant components of the signal instead of working with linear subspaces, built by truncation inscale. Unlike the level-by-level approximation, which is linear, the mentioned best N-term approximationis nonlinear. However the computation of the best N-term approximation is more efficient for certainsmooth classes of functions.

There are several research areas which are based on the sparse representation, among them de-blurringand de-noising of the sparse signal, see e.g. [2, 24]. In our case the signal is not given directly, butis transformed by means of some linear operator and has to be reconstructed from the image, which inpraxis can be only obtained (measured) approximately. The process of solving operator equations dependsconsiderably on the invertibility of the forward operator. If the operator is not continuously invertible, itis in general not possible to reconstruct the solution from the perturbed data with satisfactory accuracy.In this case one deals with so-called ill-posed inverse problems, cf. [32, 39], which will be one of the mainsubjects of this work. A common approach to solving ill-posed inverse problems consists in regularizationof the inverse operator, and in subsequent optimal regularization parameter identification, which providesthe best possible order of convergence.

Nonlinear approximation by means of wavelets has been widely exploited in solving well-posed problems,in particular, elliptic PDE’s, [10, 11, 58, 14, 15]. There, iterative reconstruction algorithms have beenconstructed, which are essentially based on the adaptive application of the forward operator and adaptivethresholding. It has been shown that in such a way it is possible to approximate the solution with orderoptimal computational complexity. In general, it means that the number of computational steps (as well asof computed components) is of the same order as N , where N is the number of the significant coefficientsneeded for the approximation with the prescribed accuracy. One of the crucial conditions, needed for theconvergence of the mentioned adaptive algorithms is the continuous invertibility of the operator, whichfails to hold in the ill-posed case.

In order to reconstruct sparse solutions for ill-posed problems, regularization methods with sparsity con-straints have been introduced in [2, 6, 21, 23, 24, 25, 26, 54, 61], where convergent reconstruction methods,favoring sparse approximations have been developed. If a proper basis or frame is chosen, then the use ofa sparsity constraint allows to capture local morphological properties of the solution, e.g. discontinuities,

1

which are represented with only few coefficients leading to very efficient expansions. In the past thiscould only be achieved at a high computational complexity combined with a slow convergence speed ofthe employed iteration method. Currently, there are several accelerated iteration methods (or so-calledcompressed algorithms) on the market that yield for instance through domain decomposition strategies,see [33], or through accelerated projection strategies, see [22], with much less iterations reasonable results.These methods, however, act up to the domain decomposition or projection step on all involved framefunctions in a uniform way (even on those atoms that are absolutely not involved in representing thesolution).

In our work, we combine the adaptive routines, developed for the well-posed inverse problems withthe basic regularization strategies, such as Tikhonov regularization [62, 63, 50] and truncated Landweberiteration [37]. As parameter identification rules we take the balancing principle, first introduced in [64, 38],and Morozov’s discrepancy rule, [44]. Due to the adaptive application of the operator and the adaptivethresholding, the iterative regularization methods can be only approximately evaluated. In this context,we pursue the ideas of [48, 52, 53] about iterative regularization methods, which take perturbations ofthe forward operator into account. However, in these methods the modeling error is measured in theoperator norm, which is a stronger requirement than can be realized by the adaptive routines, whichprovide just a pointwise approximation.

In our methods we incorporate the pointwise error caused by the adaptive approximation into the basicregularization strategies. The first adaptive approach presented here, the adaptive Tikhonov regulariza-tion, is a direct application of the existing adaptive algorithms [10, 11, 58, 14, 15] to the well-posedregularized normal equation, which results from the minimization of the Tikhonov functional. However,we introduce and compare two adaptively applied parameter identification rules, based on the Moro-zov’s discrepancy principle and on the balancing rule, and deduce order optimal convergence rates. Ithas turned out, that the adaptive balancing principle, essentially based on [49] is more suitable for theadaptive Tikhonov method than the adaptive version of the discrepancy rule.

In the second approach, we adaptively modify the classical Landweber iteration, which results fromthe minimization of the least-squares residuum. For the ill-posed operators, the method is in generalnot convergent. As in the classical case [27], the regularization follows from the Morozov’s discrepancyrule, which can also be adaptively formulated. The main inspiration on modified Landweber iterationcomes from [52, 53]. We show the convergence of the adaptive Landweber method and compare thecomputational complexity with the original one.

A very related approach on regularization affected by the pointwise approximation has been developedrecently in [5]. There, the �p-regularization from [21] has been re-investigated with the aim to introducethe pointwise error at the operator evaluation step, in order to be able to substitute the full operatorapplication by the adaptive one. However, the allowed pointwise error in [5] is limited by the order ofthe data noise. It means that in the case of the exact data no adaptivity is possible. At this point ourscheme differs from that proposed in [5]. Generally speaking, the adaptive Landweber iteration startswith some coarse pointwise accuracy, which does not depend on the data noise, and is getting finer duringthe iteration.

We want to stress that all the methods considered here can be realized by wavelet bases as well as bywavelet frames. As over-complete systems, frames offer even more opportunities for a sparse approxima-tion, which might benefit from the combination of the approximation properties of the single bases [61].Another practical advantage of using frames is that they provide a simple approach for the discretizationon polygonal domains, cf. [51, 58, 14] etc. However, a comparison between bases and frames under theadaptivity aspect is not the subject of this research.

The thesis is organized as follows. In Part I we give an introduction into the multiresolution analysis andthe decommposition of a signal with respect to bases and frames. We point out the specific requirements,

2

imposed by the adaptive approximation and select some construction strategies for the analyzing systems.In Part II we recall the basic definitions and solution strategies for the ill-posed inverse inverse problems,cf. Chapter 3. In Chapter 4 we introduce the so-called filter based regularization, which provides ageneralization for the different methods, like Tikhonov or Landweber regularization. In this chapter wediscuss the convergence and order optimality of the classical filter based methods, equipped with theclassical parameter choice rules. In Part III we discuss the issues of nonlinear approximation, whichgives rise for the adaptivity concept. As an application we recall the adaptive approach for solving thewell-posed problems in Chapter 6. Part IV is concerned with adaptive regularization methods, whichform the main subject of this work. First, we give a general formulation of approximate filter basedmethods in Chapter 7 and guide through a general strategy of proving their convergence and orderoptimality. However, every single regularization method exhibits its own specifics. For instance, theadaptive Tikhonov method, cf. Chapter 8, works better with the approximate version of the balancingprinciple, see Section 7.3. On the other hand, a convergent adaptive version of Landweber methodintroduced in Chapter 9 can be achieved with the discrepancy principle and not with the balancingprinciple. Of course, this is only the current state of the art. We still do not have any theoreticalestimate of the convergence rates for the adaptive Landweber, while numerical examples show the sameorder of convergence as in the non-adaptive case. For the verification of the developed numerical schemewe consider the inversion of the Radon operator, which arises in computerized tomography, see Chapter10. It is a widely known fact that the Radon operator is compact and, consequently, ill-posed. In Section10.4 we compare the results of the adaptive reconstruction made by the adaptive Landweber iterationwith the full one, in order to show the reduction of computational costs.This thesis is a result of more than three years of research at the University of Bremen, the KonradZuse Zentrum Berlin and the Johann Radon Institute in Linz. At first, I want to thank my advisorsGerd Teschke and Ronny Ramlau for their guidance and patience. Many thanks go to my colleaguesfor their helpful advises, in mathematical questions especially to Thorsten Raasch, Sergei PereversyevSr and Massimo Fornasier, and encouragement, among others to Esther Klann, Lutz Justen, JennyNiebsch, Philipp Schroder, Stefanie Schwaar, Lin He, Hui Cao, Shuai Lu, Mourad Sini, Nataliya Metla,Hanna Pikkarainen, Sergey Pereverzyev Jr, Daniel Wachsmuth, Clemens Zarzer, Kattrin Arning, StephanAnzengruber, Stefan Kinderman and Stefan Muller.Furthermore, I would like to thank my friends who accompanied me during the years of PhD, especiallyUrsula Splettstosser, Simon Loer, Alessandro Zaupa and Tarik Hofmann, my capoeira-colleagues and myfamily, in particular my mother for the emotional backup.Throughout three years of my work on this thesis I received financial support from Deutsche Forschungs-gemeinschaft Grant TE 354/3-1, and for the eight months I have been funded by the Johann RadonInstitute in Linz.

3

4

Part I

Wavelet Bases And Frames For

Operator Equations

5

Chapter 1

Wavelet Bases

Multiresolution analysis is a powerful tool in signal processing, which enables decomposition of the sig-nal into the spatially localized smooth components on different scales. Especially functions with certainregularity can be efficiently decomposed in terms of wavelets. By efficiency we mean that smooth func-tions can be approximated with high accuracy using only a few wavelet components. The multilevelapproximation can be carried out in different ways. It turns out that the approximation by significantcomponents, which is in general nonlinear, is in some cases more economical than the linear scale-by-scaleapproximation, cf. Chapter 5. There, the advantage of the nonlinear approximation is exploited to createthe term adaptivity.In Section 1.1 we define the term multiresolution analysis using the Riesz basis concept for a generalHilbert space.In Section 1.2 we deduce biorthogonality as a necessary condition for the Riesz property. Further on,we formulate sufficient conditions for the multiresolution analysis, such as the direct and the inverseestimates, which reflect accuracy and regularity properties of the sequence of multiresolution subspaces.Biorthogonality enables generation of wavelet Riesz bases whose regularity and approximation propertiesare independent from each other. In the orthogonal setting such constructions are in general not possible.As we will see in Section 2.4, these specific requirements on the multiresolution analysis are of importancefor the compressed operator representation.In Section 1.3 we recall the definition of some important classes of smooth functions, such as Sobolevand Besov spaces, and their characterizations in terms of wavelets. Resulting norm equivalences are ofparticular interest regarding the adaptive approach, cf. Chapter 5.In Section 1.4 we give a brief overview of the biorthogonal wavelet Riesz bases on the unit cube Ω = [0, 1]d

in Rd, d spatial dimension, which satisfy desirable accuracy and regularity requirements and enable

appropriate characterizations of the mentioned smoothness spaces.

1.1 Multiresolution and Wavelet Riesz Bases

The notion of multiresolution analysis has its origin in the shift-invariant setting [19, 41, 43], where theconstruction of a wavelet Riesz basis is made for L2(R). The most of the applications are restricted tothe bounded domains Ω in R

d, so that a suitable generalization of the term multiresolution is required.In this section we resume the introduction to wavelet Riesz bases as proposed in [17, 51] for the adaptiveapproximation of well-posed operator equations.Let H be a separable Hilbert space with inner product 〈·, ·〉H and associated norm ‖ · ‖H . Let Δ be somecountable index set.A countable collection of functions Φ = {φλ}λ∈Δ ⊂ H is called stable, if

7

‖c‖�2(Δ) ∼ ‖cTΦ‖H , c ∈ �2(Δ), (1.1)

where by a ∼ b we mention that the terms a, b can be bounded by a positive constant multiple of eachother. Note that stability (1.1) of the system Φ implies its non-redundance, another necessary conditionfor a basis is completeness of Φ with respect to H.

Definition 1.1. A system Ψ = {ψλ : λ ∈ ∇} ⊂ H with a countable index set ∇ is called Riesz basis forH, if any f ∈ H has a unique expansion c = (cλ)λ∈∇ with respect to the family Ψ, i.e.

f = cTΨ =∑λ∈∇

cλψλ, (1.2)

such that for some 0 < cΨ ≤ CΨ <∞ the following norm equivalence

cΨ‖c‖�2(∇) ≤ ‖f‖H ≤ CΨ‖c‖�2(∇). (1.3)

holds for any f ∈ H. The condition (1.3) is also called Riesz stability condition.

Remark 1.2. The Riesz stability property (1.3) provides a characterization of the Hilbert space H bymeans of the wavelet coefficients. Such a characterization might be useful since it reduces the estimatesof the norm in the function space H to the norm estimates in the sequence space �2.

Definition 1.3. A multiresolution sequence V = (Vj)j≥j0 , consists of nested closed subspaces Vj ⊂ H

whose union is dense in H, i.e.

Vj ⊂ Vj+1, ∪j≥j0VjH

= H. (1.4)

Here j0 ∈ Z is some coarsest level.

Define for any countable set Φ ⊂ H its linear span

S(Φ) := span{φ ∈ Φ}H , (1.5)

the spaces {Vj}j≥j0 in (1.4) are usually of the form

Vj := S(Φj), Φj = {φλ : λ = (j, k), k ∈ Δj}. (1.6)

for some (possibly infinite) countable index set Δj . The multiindex notation λ = (j, k) encodes the scaleinformation j, usually denoted by |λ| := j, and the spatial location k. From the nestedness (1.4) of Vjfollows that every φj,k ∈ S(Φj) possesses an expansion

φj,k =∑

l∈Δj+1

mjl,kφj+1,l (1.7)

with a mask or filter sequence mjk = {mj

l,k}l∈Δj+1. The expression (1.7) is usually called two-scale

relation.

Definition 1.4. A family {Φj}j≥j0 is called uniformly stable, if the stability (1.1) of the sets Φj holdsuniformly in j, i.e.

c‖c‖�2(Δj) ≤ ‖cTΦj‖H ≤ C‖c‖�2(Δj), c ∈ �2(Δj), j ≥ j0, (1.8)

with constants 0 < c,C <∞ independent from j.

8

Based on the multiresolution property (1.4) of the sequence V, one defines complements Wj between twosuccessive spaces Vj , Vj+1, such that

Vj+1 = Vj ⊕Wj , (1.9)

and looks for collections Ψj = {ψj,k : k ∈ ∇j} which span the complement subspacesWj , i.e. Wj = S(Ψj).Repeating the decomposition (1.9), one can write each space Vj = S(Φj) as a sum of complement spaces

Vj = S(Φj) = Vj0⊕

j0≤i<jWi = S(Φj0)

⊕j0≤i<j

S(Ψi). (1.10)

Accordingly, any element g ∈ S(Φj) can be expressed in a single-scale form with respect to Φj as well asin a multiscale form with respect to a truncated wavelet basis

Ψj := Φj0⋃

j0≤i<jΨi. (1.11)

Hence, by denseness (1.4) of V, the union

Ψ := Φj0 ∪j≥j0 Ψj =: {ψλ : λ ∈ ∇} (1.12)

is a candidate for a Riesz basis for the whole space H.

Definition 1.5. If the collection (1.12) forms a Riesz basis for H, then we call the sequence V = {Vj}j≥j0 ,defined by (1.6), multiresolution analysis for H.

1.2 Riesz Basis Property and Biorthogonality

Due to the estimate (1.3), the coefficient functionals cλ = cλ(f) in the expansion (1.2) are bounded onH. Hence, by the Riesz representation theorem [1] for linear bounded functionals on a Hilbert space,there exists a unique family of dual functionals Ψ = {ψλ} ⊂ H, such that cλ = cλ(f) = 〈f, ψλ〉H .As a consequence of duality, the sets Ψ and Ψ are biorthogonal, i.e., the relation

〈Ψ, Ψ〉H = I, (1.13)

holds, such that Ψ is also a Riesz basis for H, with the Riesz constants cΨ = C−1Ψ and CΨ = c−1

Ψ .As we see, biorthogonality is a necessary condition for the Riesz basis property to hold, nevertheless thereis no equivalence. In the following theorem we recall the requirements on the multiresolution sequence Vfor H, which assure the so-called basis-free Riesz stability and imply (1.3), if bases are considered.

Theorem 1.6 ([16, Thm. 3.2]). Let S = (Sj)j≥j0 be a multiresolution sequence for H, i.e. satisfy (1.4).Assume that Q = (Qj)j≥j0 is a sequence of uniformly bounded projectors Qj : H → Sj, such that

QlQj = Ql, l ≤ j. (1.14)

Let S = (Sj)j≥j0 be the ranges of the sequence of adjoints Q′ = (Q′j)j≥j0 . Moreover, suppose thatthere exists a family of uniformly bounded subadditive functionals ω(·, t) : H → R

0+, t > 0, such that

limt→0+ ω(f, t) = 0 for each f ∈ H, and some γV > 0, such that the pair of estimates

infv∈Vj

‖f − v‖H � ω(f, 2−j), f ∈ H, (1.15)

and

9

ω(vj , t) � (min {1, t2j})γ(V)‖vj‖H , vj ∈ Vj , (1.16)

holds for both sequences V ∈ {S, S}. Then we have the norm equivalence

‖ · ‖H ∼ NQ(·) ∼ NQ′(·), v ∈ H, (1.17)

where

NQ(v) :=(∑j≥j0

‖(Qj −Qj−1)v‖2H)1/2

, v ∈ H (1.18)

and

NQ′(v) :=(∑j≥j0

‖(Q′j −Q′j−1)v‖2H)1/2

, v ∈ H, (1.19)

with Qj0−1 := Q′j0−1 := 0.

Remark 1.7. Estimates of the kind (1.15) are usually called direct or Jackson type estimates withrespect to the modulus of smoothness ω. They measure the approximation power of the underlying nestedsequence of spaces Vj as j →∞. Conversely, estimates like (1.16) describe smoothness properties of thespaces Vj and are called inverse or Bernstein type estimates.

Remark 1.8. Given some biorthogonal wavelet system {Ψ, Ψ}, the original Riesz condition (1.3) followsfrom the Theorem 1.6 by the specific choice

Qjv := 〈v, Ψj〉Ψj , j ≥ j0 (1.20)

and

Q′jv := 〈v,Ψj〉Ψj , j ≥ j0, (1.21)

where the truncated multiscale bases Ψj , Ψj are defined as in (1.11), (1.13).Note that the operator Qj defined by (1.20) is indeed a projector by the biorthogonality relation (1.13),and the idempotence condition (1.14) is automatically fulfilled.

In a specific situation, we are therefore left to verify the sufficient conditions for the Riesz basis propertyas in Theorem 1.6, i.e. the uniform boundedness of the operators Qj and the validity of the correspondingJackson and Bernstein inequalities. In Section 1.3 we will use the estimates (1.15) and (1.16) for thecharacterization of further function spaces. In Section 1.4 we will discuss how this is done in the case ofthe wavelet constructions on bounded domains.

1.3 Characterization of Function Spaces

This section contains an introduction to the smoothness spaces from the multiresolution point of view,and essentially repeats considerations made in [51].One of the most important properties of wavelets is that they can be used to characterize function spacessuch as Sobolev spaces Hs or Besov spaces Bsq(Lp); see, for example, [30, 43]. The Riesz basis property(1.3) for H = L2 is a special case in a whole scale of those relations.

10

1.3.1 Function Spaces

In the following we review the definition of some classical function spaces and show how they can becharacterized by weighted sequence norms of wavelet expansions.Let Ω be some bounded Lipschitz domain in R

d. For p ∈ (0,∞], Lp(Ω) := Lp(Ω; dx) let be the spaces ofLebesgue-measurable functions f : Ω → C, such that

‖f‖Lp(Ω) :=

⎧⎨⎩(∫

Ω|f(x)|pdx

)1/p, p <∞

ess supx∈Ω|f(x)| , p = ∞(1.22)

is finite. For p ≥ 1, the Lp-spaces are Banach spaces, whereas for p < 1, they are only quasi-Banachspaces, since the triangle inequality will hold only up to a constant. The most important special case isp = 2, as L2(Ω) is a Hilbert space with inner product

〈v, w〉L2(Ω) :=∫

Ω

v(x)w(x)dx (1.23)

and ‖v‖2L2(Ω) = 〈v, v〉L2(Ω). For a fixed domain Ω we will abbreviate 〈·, ·〉 := 〈·, ·〉L2(Ω) and ‖·‖ := ‖·‖L2(Ω)

in the sequel.Given a positive integer m ∈ N, the Sobolev space Wm(Lp(Ω)) is defined as the space of all functions f ∈Lp(Ω) with weak partial derivatives ∂αf in Lp(Ω) for all multiindices α ∈ N

d0 with |α| = α1+ · · ·+αd = m

and

|f |Wm(Lp(Ω)) :=( ∑|α|=m

‖∂αf‖pLp(Ω)

)1/p. (1.24)

Equipped with the norm ‖ · ‖Wm(Lp(Ω)) = ‖ · ‖Lp(Ω) + |·|Wm(Lp(Ω)), Wm(Lp(Ω)) is a Banach space. Themost important special case is again p = 2, where we get the Hilbert spaces Hm(Ω) := Wm(L2(Ω)).For a real positive smoothness parameter s, the Sobolev spaces Hs are defined as interpolation spacesbetween L2 and Hm, cf. [4, 65] for details:

Hs(Ω) := [L2(Ω),Hm(Ω)]s/m,2, s ∈ (0,m), (1.25)

For s < 0, we define H−s(Ω) := (Hs0(Ω))′ by duality, where Hs

0(Ω) is defined as closure in Hs(Ω) of thespace C∞c (Ω) of infinitely differentiable compactly supported functions on Ω.We shall also be concerned with Besov spaces Bsq(Lp(Ω)) which arise as interpolation spaces betweenLp(Ω) and Wm(Lp(Ω)), cf. [28, 66], as follows:

Bsq(Lp(Ω)) = [Lp(Ω),W r(Lp(Ω))]s/r,q, s ∈ (0, r). (1.26)

For a more concrete definition of the corresponding Besov norm, one may use the r-th Lp-modulus ofsmoothness

ωr(f, t)Lp(Ω) := sup‖h‖≤t

‖Δrhf‖Lp(Ωrh), t > 0. (1.27)

Here, Δrh is the r − th forward difference operator

Δ0hf := f, Δ1

hf := f(·+ h)− f, Δk+1h := Δ1

hΔkh, (1.28)

and the admissible sets Ωh are given by

Ωh := {x ∈ Ω : x+ th ∈ Ω, t ∈ [0, 1]}, h ∈ Rd. (1.29)

Then, for parameters, one can introduce the Besov spaces as follows:

11

Definition 1.9. For s > 0 and p, q ∈ (0,∞] Besov spaces are given by

Bsq(Lp(Ω)) := {f ∈ Lp(Ω) : |f |Bsq(Lp(Ω))<∞}, (1.30)

where for r := �s�+ 1

|f |Bsq(Lp(Ω)) :=

⎧⎨⎩(∫∞

0(t−sωr(f, t)Lp(Ω))qdt/t

)1/q, 0 < q <∞

supt≥0t−sωr(f,t)Lp(Ω), q = ∞

(1.31)

is the Besov semi-(quasi-)norm, and ‖ · ‖Bsq(Lp(Ω)) := ‖ · ‖Lp(Ω) + |·|Bs

q(Lp(Ω)) is the corresponding Besov(quasi-)norm.

For the equivalence of Definition 1.9 and (1.26) cf. [35].By the monotonicity of ωr, Besov spaces can also be endowed with the equivalent seminorm [28, 67]

‖(2sjωr(f, 2−j)Lp(Ω))j≥0‖�q(N) ∼ |f |Bsq(Lp(Ω)). (1.32)

Note that, for p = q = 2 from (1.25) and (1.26) follows that Hs(Ω) = Bs2(L2(Ω)). In this case, using theequivalence (1.32), one obtains the following result on the characterization of Sobolev spaces Hs:

Theorem 1.10 ([17, Thm. 5.8]). In the situation of Theorem 1.6 for H = L2(Ω), assume that the directestimate (1.15) holds in the form

infv∈Vj

‖f − v‖L2(Ω) � 2−sj‖f‖Hs(Ω), f ∈ Hs(Ω), 0 ≤ s ≤ mV , (1.33)

and that the inverse estimate (1.16) holds in the form

‖v‖Hs(Ω) � 2sj‖v‖L2(Ω), v ∈ Vj , s < γV , (1.34)

for both scales of spaces V ∈ {S, S}, where

0 < γ := min {γS ,mS} and 0 < γ := min {γS ,mS}, (1.35)

respectively. Then we have the norm equivalence

‖f‖Hs(Ω) ∼( ∞∑j=0

22sj‖(Qj −Qj−1)f‖2L2(Ω)

)1/2, s ∈ (−γ, γ). (1.36)

Remark 1.11. Using again the projectors Qj from (1.20), from the basis-free formulation (1.36) weyield a characterization of the Sobolev spaces Hs(Ω) in terms of wavelet coefficients on L2(Ω):

‖f‖Hs(Ω) ∼( ∞∑j=0

22sj∑|λ|=j

|〈f, ψλ〉|2)1/2

, s ∈ (−γ, γ), (1.37)

which gives rise for wavelet Riesz bases for Hs(Ω).In fact, the system D−sΨ with diagonal matrix D

(D)λ,λ′ := 2|λ|δλ,λ′ , λ ∈ ∇, (1.38)

is a Riesz basis in Hs(Ω), with stability estimate

‖f‖Hs(Ω) ∼ ‖〈f,DsΨ〉T ‖�2(∇), (1.39)

where 〈f,DsΨ〉T are the expansion coefficients of f with respect to D−sΨ.

12

Remark 1.12. For a single wavelet f = ψλ, the norm equivalence (1.37) implies for the Sobolev normof ψλ

‖ψλ‖Hs(Ω) ∼ 2sj‖ψλ‖L2(Ω) ∼ 2sj , λ ∈ ∇. (1.40)

Remark 1.13. It should be noted that the results of Theorems 1.6 and 1.10 can be generalized towards awavelet based characterization of Besov spaces for 1 < p <∞. Using (1.32) and a given L2-biorthogonalwavelet basis of polynomial approximation order m > s and of sufficiently high regularity, we can employthe special projectors Qj from (1.20) one obtains

‖f‖Bsq(Lp(Ω)) ∼

(∑j≥j0

2jq(s+d(1/2−1/p))( ∑|λ|=j

|〈f, ψλ〉|p)q/p)1/q

. (1.41)

This equivalence can be shown to hold also for the case p, q < 1 given that Bsq(Lp(Ω)) is embedded in L1(Ω),cf. [9], which is in particular important in connection with nonlinear approximation, cf. Section 5.1. Wewill be interested in characterizing the approximation spaces for the best N -term approximation, which areexactly the Besov spaces Bsd+tτ (Lτ (Ω)), where τ and s are connected via τ = (s+1/2)−1, cf. Section 5.2.Inserting this special case into (1.41) yields the equivalence

‖f‖Bsd+tτ (Lτ (Ω)) ∼

(∑j≥j0

2jτt∑|λ|=j

|〈f, ψλ〉|τ)1/τ = ‖Dt〈f, Ψ〉T ‖�τ , (1.42)

where the right hand side is the �τ -norm of the expansion of f in terms of a Riesz basis D−tΨ in Ht.

1.3.2 Direct and Inverse Estimates

As we have seen in Sections 1.1 and 1.3, for the characterization of the smoothnes spaces by means ofwavelets, the multiresolution sequence V must satisfy specific direct and inverse estimates.Let Ω ⊆ R

d be a connected domain with sufficiently smooth boundary, such that there exists an extensionoperator E : Lp(Ω) → Lp(Rd) that is bounded inWm(Lp(Ω)) for anym ∈ N. As we will see in Section 1.4,for our purposes it is sufficient to consider the wavelet constructions on the unit interval [0, 1]. We nextgive a simple criterion for verifying a direct estimate that will apply in our case.

Proposition 1.14. [17, Prop. 5.1]Let Φj ⊂ Lp(Ω) and Φj ⊂ Lp′(Ω) with 1

p + 1p′ = 1 (that do not necessarily generate multiresolution

sequences) have the following properties:

(i) Φj and Φj are biorthogonal,

〈Φj , Φj〉Ω = I, (1.43)

where 〈·, ·〉Ω denotes the dual pairing on Lp(Ω)× Lp′(Ω).

(ii) The elements of Φj and Φj are uniformly bounded, that is,

‖φλ‖Lp , ‖φλ‖Lp′ = O(1), λ = (j, k), k ∈ Δj , j ≥ j0. (1.44)

(iii) The collections Φj, Φj are locally finite, that is, there exists a constant C ≤ ∞ such that

#{k′ : �j,k ∩�j,k′ �= ∅} ≤ C, diam(�j,k) � 2−j , (1.45)

where �j,k is the smallest cube containing supp(φj,k) and supp(φj,k).

13

(iv) The spaces S(Φj) contain all polynomials of order m (degree ≤ m− 1) on Ω,

Πm ⊆ S(Φj). (1.46)

Then one has‖f − 〈f, Φj〉ΩΦj‖Lp(Ω) � 2−mj‖f‖Wm(Lp(Ω)). (1.47)

Proof. The proof can be found in [17].

Remark 1.15. Analogous estimates for the Sobolev spaces Hs or for the Besov spaces Bsp(Lq) can bederived from (1.47) by interpolation [17].

Remark 1.16. Under assumptions (1.44) and (1.45), the collections Φj are uniformly stable relative to‖ · ‖Lp(Ω) and ‖ · ‖�p(Δj). The uniform stability of Φj implies the uniform boundedness of projectors Qj,as required in Theorem 1.6.

Remark 1.17. The direct or Jackson estimates as derived in Proposition 1.14 describe the approximationproperty of the multiresolution sequence V.From the biorthogonality relation (1.43) and the polynomial exactness Πm ∈ S(Φj) we infer that theprimal wavelets have vanishing moments of order m, i.e.

〈xi, ψj,k〉 = 0, 0 ≤ i ≤ m− 1, k ∈ ∇j . (1.48)

Analogously, the dual wavelets will have vanishing moments of order m.An important consequence of the polynomial exactness is that inner products of ψj,k with smooth functionsdecay exponentially fast as the scale j = |λ| tends to infinity, i.e.

|〈f, ψj,k〉| � 2−j(d2 +m)|f |W m(L∞(supp(ψj,k))), k ∈ ∇j . (1.49)

The annihilation of the polynomials by wavelets can be exploited in the localization theory of frames,see Section 2.3. Moreover, vanishing moments or cancelation properties play a crucial role in matrixcompression and in the adaptive application of certain differential and integral operators in waveletrepresentation as we shall see in Section 5.The inverse or Bernstein estimates (1.16) or (1.34)describe the regularity property of the trial spaces Vj .For the wavelet constructions on the interval [0, 1], the following inverse estimate is required:

‖v‖Hs([0,1]) � 2sj‖v‖L2([0,1]), v ∈ S(Φj), s ≤ γ. (1.50)

As we will see in Section 1.4, there are wavelet constructions on interval satisfying the approximationrequirements of Proposition 1.14 and the regularity requirements like (1.50), simultaniously.

1.4 Wavelets on Bounded Domains

The aim of this section is to construct a wavelet Riesz basis Ψ for the energy space Ht(Ω), t ∈ R, whereΩ ⊂ R

d is a bounded domain with sufficiently smooth boundary. For our needs it is sufficient to take theunit cube [0, 1]d, a generalization for more complex domains can be found in [17, 51]. The tensor productapproach, frequently used in multivariate wavelet construction, allows us to reduce the consideration tothe univariate case, i.e. Ω = [0, 1]. For the adaptivity purposes it is important to have an analogouscharacterization (1.42) of specific Besov spaces, as we will see in Chapter 5.As we have seen in the previous section, to characterize the appropriate scales of Sobolev and Besovspaces, the trial spaces Vj should exhibit certain polynomial accuracy m and regularity γ. Moreover, at

14

least the primal wavelets Ψ should be accessible analytically, making point evaluations and quadraturepossible.These requirements already exclude various classical wavelet constructions on the interval. To state someexamples, periodic wavelet bases are useless for our purpose, since they do not reproduce the correctset of polynomials. In orthogonal wavelet constructions, it is not possible to increase the number ofvanishing moments independent from the order of accuracy, moreover, the construction rules are oftengiven implicitely, [13].The interval bases we shall consider are based on the construction [18]. There, the primal generators φj,klive in a spline space of order m, and coincide in the interior of the domain Ω = (0, 1) with dilates andtranslates of cardinal B-splines of order m. As a consequence of the embedding into spline spaces, thecritical L2-Sobolev regularity γ of the primal multiresolution spaces is given by m− 1/2.It should be noted that in the numerical discretization of operator equations using wavelets, the dualwavelets are not needed explicitely. Moreover, for some fixed order m of the primal generators the dualgenerators can be constructed of arbitrary high order m ≥ m, with the only condition m + m even,so that the critical L2-Sobolev regularity γ is proportional to m [12], which is positive as soon as thedual generators are L2-integrable [68]. Therefore, we need no exact knowledge about the accuracy andregularity of the duals, since both can be chosen arbitrary large.For the adaptivity purposes, cf. Chapter 5, Theorem 5.5, the wavelets Ψ have to be chosen so that theybuild a Riesz basis of some subspace H of the Sobolev space Ht(Ω) and enable characterization (1.42)of Besov spaces Bsd+tτ (Lτ (Ω)), with τ = (s+ 1/2)−1 for some s > 0, which are approximation spaces fornonlinear approximation in Ht(Ω), cf. Proposition 5.3. This is the case when s satisfies

0 < s < (m− t)/d, (1.51)

where m is the polynomial order of Ψ and when Ψ is contained in Bsd+tτ (Lτ (Ω)) with s ≤ 1/2 if t < −n/2[58].If the wavelets are piecewise smooth, globally Cr-functions for some r ∈ N ∪ {−1}, with r = −1 mean-ing that they satisfy no global smoothness requirements, then it is known, that they are contained inBντ (Lτ (Ω)) when ν < r + 1 + 1/τ [58, 8]. In the case of the spline wavelets, r = m − 2, we are lookingfor those, which are contained in Bsd+tτ (Lτ (Ω)) with τ = (s + 1/2)−1, for the critical value s = s∗ withm = s∗d+ t. It means that the order m should be such that sd+ t < m− 1 + s+ 1/2, or, for the criticalvalue s∗ = (m− t)/d,

m− td

> 1/2. (1.52)

Another important aspect of the nonlinear wavelet-based approximation is adaptive operator application.Generalized towards frames for Hilbert spaces, in Section 2.4 we recall that linear operators of a certainclass exhibit fast off-diagonal decay, if discretized in terms of wavelets. However, such discretizationsare usually dense biinfinite matrices. In Section 6.3 we show how the matrices with the fast off-diagonaldecay can be compressed, i.e. approximated by means of sparse matrices with sufficiently high accuracy.

15

16

Chapter 2

Frames

Frames [7] provide a generalization of the Riesz basis concept, in the sense that the linear independenceof the decomposing elements is not required any more. There are many reasons to use frames insteadof bases. One of them is that frames allow comfortable decomposition of polygonal domains [58, 51],which can be represented as an overlapping union of the smooth images of the unit cube. A frame on thepolygonal domain can be then constructed taking the union of the wavelet Riesz bases of the overlappingparts. Another reason might be that a frame, which combines Riesz bases with different approximationproperties, in some cases yields more efficient representation of a signal, than a single basis, cf. [61].In Section 2.1 we introduce some basic definitions from the frame theory for the Hilbert spaces, such asanalysis and synthesis operator and the concept of dual frames.For the adaptivity reasons, the analyzing system must enable specific norm equivalences, needed for thecharacterization of the smoothness spaces, cf. Section 1.3. Note that the characterization (1.39) of theSobolev space Hs is nothing else than the frame property of the rescaled dual system DsΨ.In Section 2.2 we recall the concept of Gelfand frames which allows construction of frames for the Sobolevspace Hs, by rescaling the dual frames Ψ, Ψ for the energy space L2.A sufficient condition for the Gelfand frames, the localization property of frames, will be considered inSection 2.3.Another aspect of the localization property is that it is related to the nonlinear operator discretizationand matrix compression. In Section 2.4 we recall under what conditions the linear bounded operatorscan be adaptively discretized in terms of frames.

2.1 Basic Frame Theory

Following [7], we introduce the concept of classical Hilbert frames in a separable Hilbert space H. Thecorresponding inner product shall be denoted by 〈·, ·〉 := 〈·, ·〉H , and the induced norm ‖ · ‖ := ‖ · ‖H :=〈·, ·〉1/2H .Let Λ be a countable index set. A system Ψ = {ψλ}λ∈Λ ⊂ H is called a Hilbert frame or simply a framefor H, if there exist frame bounds 0 < AΨ, BΨ <∞ such that the following condition

AΨ‖f‖2 ≤∑λ∈Λ

|〈f, ψλ〉|2 ≤ BΨ‖f‖2 (2.1)

holds for all f ∈ H.It can be shown that any Riesz basis Ψ for H is also a frame for H due to the estimate (1.3). As we willsee below, any frame Ψ for H is automatically dense, i.e. S(Ψ) = H. However, the frame condition (2.1)does not imply that the elements of Ψ are linearly independent. It means that the frame decomposition

17

of a function f ∈ S(Ψ) may not necessarily be unique, i.e. there might exist different coefficient vectorsf1,f2 ∈ �2(Λ) with f =

∑λ∈Λ fλψλ.

In the fallowing, we call the mapping F

F : H → �2(Λ), f �→ 〈f,Ψ〉TH (2.2)

analysis or frame operator, and its �2(Λ)-adjoint

F ∗ : �2(Λ) → H, c �→ cTΨ, (2.3)

synthesis or adjoint frame operator corresponding to the frame Psi. Both operators F and F ∗ arebounded with ‖F‖ ≤

√BΨ and ‖F ∗‖ ≤

√1/AΨ, respectively. Moreover, R(F ) is closed, i.e. R(f) =

R(F ).As a consequence of the orthogonal decomposition

�2(Λ) = R(F )⊕N (F ∗), (2.4)

a frame Ψ for H is a Riesz basis if and only if R(F ) = H, which is equivalent to N (F ∗) = {0}. In thesequel we assume the nontrivial case, i.e. N (F ∗) �= {0}.The self-adjoint operator F ∗F , mapping as follows:

F ∗F : H → H, f �→∑λ∈Λ

〈f, ψλ〉Hψλ, (2.5)

is bounded with ‖F ∗F‖ ≤ BΨ. Moreover, F ∗F is boundedly invertible with

AΨ〈f, f〉H ≤ 〈F ∗Ff, f〉H ≤ BΨ〈f, f〉H , f ∈ H. (2.6)

The orthogonal projection of �2(Λ) onto R(F ) is given by the operator

Q := F (F ∗F )−1F ∗ : �2(Λ) → �2(Λ). (2.7)

The system Ψ = {ψλ}λ∈Λ with ψλ := (F ∗F )−1ψλ is again a frame for H, with frame bounds B−1Ψ and

A−1Ψ . Ψ is usually called the canonical dual frame. The frame operator of Ψ and its adjoint are given by

F = F (F ∗F )−1, F ∗ = (F ∗F )−1F ∗. (2.8)

From the bounded invertibility of F ∗F we infer that any f ∈ H can be represented as follows:

f = (F ∗F )−1F ∗Ff =∑λ∈Λ

〈f, ψλ〉H ψλ = F ∗F (F ∗F )−1f =∑λ∈Λ

〈f, ψλ〉Hψλ, (2.9)

The latter equality proves the denseness of Ψ in H.For a frame Ψ, in the nontrivial case of N (F ∗) �= {0}, the coefficient functionals {cλ}λ∈Λ in the repre-sentation

f =∑λ∈Λ

cλ(f)ψλ, f ∈ H, (2.10)

may not be unique. Hence there may exist other non-canonical dual frames Ξ = {ξλ}λ∈Λ which alsofulfill a reconstruction equation of the form

f =∑λ∈Λ

〈f, ψλ〉Hξλ =∑λ∈Λ

〈f, ξλ〉Hψλ, f ∈ H. (2.11)

18

However then, the analogous operator P Ξ to Q from (2.7) with

P Ξ : �2(Λ) → �2(Λ), P Ξc :=(∑μ∈Λ

〈cTΨ, ξμ〉〈ξμ, ψλ〉)λ∈Λ

(2.12)

is only a projector onto R(F ) and not orthogonal.

2.2 Gelfand Frames

As we have seen in Section 1.4, wavelet Riesz bases are usually constructed for L2(Ω) first. In Theorem1.10 it has been shown that the rescaled versions of the original basis are Riesz bases for other smoothnessspaces.To generalize this strategy towards frames, we use the powerful concept of Gelfand frames, cf. [34].Let V be some Hilbert space, and let B be some Banach space, densely embedded in V , B ↪→ V . Wecall (B, V,B′) Gelfand triple, if

B ⊂ V � V ′ ⊂ B′, (2.13)

and the right inclusion is also dense.A given Hilbert frame Ψ = {ψλ}λ∈Λ for V with canonical dual frame Ψ = {ψλ}λ∈Λ is called a Gelfandframe for the Gelfand triple (B, V,B′), if Ψ ⊂ B, Ψ ⊂ B′ and there exists a Gelfand triple (b, �2(Λ), b′)of sequence spaces such that

F ∗ : b→ B, c �→ cTΨ and F : B → b, f �→ (〈f, ψλ〉B×B′)λ∈∇ (2.14)

are bounded operators. By duality arguments in the Gelfand triple (B, V,B′), we can infer from theidentity F ∗F = F ∗F = IV that for a Gelfand frame Ψ, also the operators

F ∗ : b′ → B′, c �→ cT Ψ and F : B′ → b′, f �→ (〈f, ψλ〉B′×B)λ∈∇ (2.15)

are bounded.For any strictly positive diagonal matrix W = diag(wλ)λ∈Λ and 1 < p <∞, let us introduce the weighted�p spaces

�p,W := {c : ‖c‖�p,W:= ‖W c‖�p <∞}, (2.16)

then we can immediately relate �2,W and �2(Λ) by an isomorphism

φW : �2,W → �2(Λ), c �→ W c (2.17)

with dual mapping

φ∗W : �2(Λ) → �2,W−1 , c �→ W c. (2.18)

We recall the Proposition 2.3. from [51]

Proposition 2.1. Let H be a Hilbert space and Ψ = {ψλ}λ∈Λ be a Gelfand frame for (H,V,H ′) with theGelfand triple of sequence spaces (�2,W , �2(Λ), �2,W−1), where w : ∇ → R+ is a strictly positive weightfunction. Then the systems W−1Ψ = {w−1

λ ψλ}λ∈∇ and W Ψ = {wλψλ}λ∈∇ are (Hilbert) frames for Hand H ′, respectively.

Proof. For the proof see [51].

19

Remark 2.2. Proposition 2.1 indicates how to construct wavelet frames for a subspace H of the Sobolevspace Ht, given a frame Ψ of L2. Namely, if Ψ is a Gelfand frame for the Gelfand triple (H,L2,H

′) witha corresponding Gelfand triple of sequence spaces (�2,Dt , �2(Λ), �2,D−t), then D−tΨ and DtΨ are framesfor H and H ′, respectively. As already mentioned in [51], the concept of Gelfand frames can be avoided,as in [58], if a-priori assumed that D−tΨ is a frame for H and if H is identified with H ′ via the Rieszmapping.

2.3 Localization of Frames

A sufficient condition for the Gelfand frame property, and consequently, for the construction of framesby rescaling as in the Proposition 2.1, is given by the localization property of frames, as stated in [51].

Definition 2.3. A frame Ψ = {ψλ}λ∈Λ for a Hilbert space H is called (intrinsically) localized, if theGramian matrix entries G(Ψ) with G(Ψ)λ,λ′ := 〈ψλ, ψλ′〉 exhibit certain decay in terms of δ(λ, λ′), where

δ : Λ× Λ → R0+ (2.19)

is an appropriate (generalized) distance function.

Here we are particularly interested in decay estimates of Lemarie type, i.e. when the Gramian matrix ofthe frame Ψ belongs to the Lemarie class.

Definition 2.4. We call Aσ,β = Aσ,β,δ Lemarie class, if for some σ, β > 0 it consists of all matricesB = (bλ,λ′)λ,λ′∈Λ, with the following decay estimate

|bλ,λ′ | � 2−σ||λ|−|λ′||(1 + δ(λ, λ′))−β , λ, λ′ ∈ Λ, (2.20)

with the distance function δ, which in our considerations will be the following

δ(λ, λ′) = 2min {|λ|,|λ′|}dist(supp(ψλ), supp(ψλ′)), λ, λ′ ∈ ∇. (2.21)

In [51], the concept of generalized Lemarie metric was applied in order to prove the Gelfand frameproperty of localized wavelet frames, satisfying the Lemarie estimate (2.20). We do not go in more detailhere.

2.4 Frame Based Operator Discretization

Another important aspect, related to the frame localization property, is the frame-based discretizationof linear bounded operators, which under certain circumstances also yields Lemarie matrices, cf. [17].In case of some arbitrary linear operator L, the values of σ, β in the estimate (2.20) with bλ,λ′ = 〈Lψλ, ψλ′〉will depend on the regularity and cancelation properties of the analyzing frame as well as on the propertiesof the operator L as a mapping between specific Sobolev spaces.

Theorem 2.5. Let L : Ht+s → H−t+s be bounded for |s| ≤ τ , τ > 0, i.e.

‖Lf‖H−t+s � ‖f‖Ht+s . (2.22)

Let (Ψ, Ψ) be a pair of dual frames for L2 with regularity bounds γ, γ defined by (1.35) and cancellationproperty (1.49) of order m, m, respectively.Then, the stiffness matrix (LΨ,Ψ) belongs to the Lemarie class with σ = min {τ, γ − t, t+ m}, i.e. thefollowing estimate

20

2−(|λ|+|λ′|)t|〈Lψλ, ψλ′〉| � 2−||λ|−|λ′||σ(1 + δ(λ, λ′))−β , λ, λ′ ∈ ∇, (2.23)

holds with

δ(λ, λ′) = 2min {|λ|,|λ′|}dist(supp(ψλ), supp(ψλ′)), λ, λ′ ∈ ∇, (2.24)

The value of β is greater or equal to d+ 2m+ 2t in the case of integral operators, and arbitrary large inthe case of differential operators.

Proof. The proof can be found in [17].

Remark 2.6. The preconditioned matrix 2−(|λ|+|λ′|)t〈Lψλ, ψλ′〉 in (2.23) is nothing else than the stiffnessmatrix of L with respect to the rescaled frame D−tΨ, which is a frame for the Sobolev space Ht, as statedin Remarik 2.2.

21

22

Part II

Inverse Problems

23

Chapter 3

Basic Definitions

This chapter is concerned with the basic definitions related to the subject of linear ill-posed inverseproblems. Equipped with the wavelet analysis tool, our aim is to construct an implementable adaptivescheme for the solution of the ill-posed equations as it has been done in the well-posed case in [58, 51]etc.In Section 3.1 we introduce the classical concepts of ill-posedness and generalized inverse. The mainchallenge of the theory consists in the fact that the solution of an ill-posed operator equation Lf = g

cannot be computed by the direct inversion of the considered operator L : X → Y , since it is usually notboundedly invertible. In case that only an inexact version gδ of the data g with ‖gδ − g‖ is given, thesolution might not exist or be unique. The generalization L† : D(L†) → X of the inverse operator handlesthe existence and the uniqueness of the solution on the extended domain D(L†) := R(L)⊕R(L)⊥, but isin general unbounded. It means that for gδ ∈ D(L†) the approximate solution L†gδ does not continuouslydepend on the noise level δ. Moreover, for g† /∈ D(L†) no general solution is defined at all.A common approach to the solution of the ill-posed problems is made by regularization, cf. Section 3.2.It suggests an approximation of the generalized inverse L† by means of a family of bounded operatorsRα : Y → X, α ∈ R+. The regularizing family Rα and the regularization parameter α = α(δ, gδ) shouldbe chosen so that that Rαgδ approximates the generalized solution f† := L†g as δ → 0.A quality of a chosen regularization can be expressed by means of convergence rates estimates of the totalerror ‖Rαgδ − f†‖. Equipped with additional information about the solution f†, one derives the bestpossible convergence rates of the total error, summarized in the term order optimal convergence.In Section 3.3 we illustrate the basic ideas of the theory of ill-posed inverse problems, e.g. the conceptof a function of operator, by means of compact operators. The concept of filter functions, applied to theoperators, will be exploited in Chapter 4 in the context of filter based regularization.A detailed introduction to the basic concept of the linear ill-posed inverse problems can be found in[32, 39]. This chapter essentially follows the lines of [32].

3.1 Ill-Posedness and Generalized Inverse

Let L : X → Y be a linear bounded operator between Hilbert spaces X,Y . Our aim is to find a solutionof the operator equation

Lf = g (3.1)

for some given right hand side g. Usually we assume that the exact right hand side g is not available,but an approximation gδ with

25

‖gδ − g‖ ≤ δ, (3.2)

for some δ > 0. It is obvious that gδ → g as δ → 0.In some cases, if R(L) �= Y , there will be no solution of (3.1) for g /∈ R(L). One can get a least squaressolution of (3.1) for g ∈ R(L)⊕R(L)⊥ by minimizing the functional

‖Lf − g‖ → minf, (3.3)

which is equivalent to solving the normal equation

L∗Lf = L∗g, g ∈ R(L)⊕R(L)⊥. (3.4)

If L is not injective, the solution of (3.4) is in general not unique. To get uniqueness, one chooses somef† with the smallest norm, which solves (3.4). Such a function f† is called generalized solution of (3.1)with the data g ∈ R(L)⊕R(L)⊥.Hence, we get a mapping

L† : R(L)⊕R(L)⊥ → X, g → f†, (3.5)

which is called Moore-Penrose generalized inverse. In what follow we use the notation D(L†) := R(L)⊕R(L)⊥.

Remark 3.1. Let L be a linear bounded operator, then

• N (L†) = R(L)⊥, i.e. the generalized solution f† is not affected by the parts of g in R(L)⊥, onecan assume without limitation g ∈ R(L)

• R(L†) = N (L)⊥ = R(L∗), which means that R(L†) is closed in X

• L† is linear

• L† is bounded if and only if R(L) is closed

Existence and uniqueness are as good as possible treated by the generalized inverse. Another difficultycan appear, if only an approximation gδ with (3.2) is given. If R(L) is not closed, then in general,

L†gδ �→ L†g, as δ → 0. (3.6)

In this chapter we consider the ill-posed case, i.e. linear bounded operators L with non-closed rangeR(L).

3.2 Regularization and Order Optimality

The main idea of regularization is to replace the unbounded inverse L† by means of bounded operators.

Definition 3.2. A regularization of L† is a family of bounded operators {Rα}α>0

Rα : Y → X, (3.7)

with

α(δ, gδ) : R+ × Y → R+, (3.8)

26

chosen such that for all g ∈ R(L)⊕R(L)⊥ and for all gδ ∈ Y with ‖gδ − g‖ ≤ δ holds

limδ→0

Rα(δ,gδ)gδ = L†g (3.9)

and

limδ→0

α(δ, gδ) → 0. (3.10)

The parameter α = α(δ, gδ) is called regularization parameter and fδα := Rαgδ is called regularized

solution for the right hand side gδ.

If the regularization parameter α does not depend on gδ but only on the noise level δ, then we call αan a-priori parameter choice rule and write α = α(δ). Otherwise, we call α an a-posteriori parameterchoice rule.The quality of different regularization methods can be compared by estimating the convergence rates ofthe total error

‖fδα − f†‖ (3.11)

for δ → 0.It has been shown in [32, Prop. 3.11], [57], that, without any additional information on the solution f†

or on the exact right hand side g, it is not possible to specify a uniform convergence rate of the totalerror (3.11) for all regularization methods Rα, i.e. the convergence might be arbitrary slow. It means,that it is in general not possible to find a function φ : R

+ → R+ with φ(δ) → 0 for δ → 0 and

‖Rαgδ − L†g‖ ≤ φ(δ). (3.12)

Convergence rates in form (3.12) can be only given using an additional assumption on the exact data gor on the generalized solution f†. We introduce a source condition on f† by means of the μ-norm. Letfor some μ > 0 and some ρ > 0 the generalized solution f† satisfy

‖f†‖μ ≤ ρ, (3.13)

where the μ-norm

‖f‖μ := ‖(L∗L)−μf‖ (3.14)

is defined using the notion (3.29) for F (L∗L) := (L∗L)−μ.

Remark 3.3. If K is a compact operator with the singular system (σn; vn, un), then the boundedness ofthe μ-norm of f ,

‖f‖2μ =∑n

σ−4μn |〈f, vn〉|2 <∞, (3.15)

means that the expansion coefficients |〈f, vn〉|, are decaying fast enough to neutralize the term σ−μn forn→ 0. For the corresponding right hand side g the boundedness of the μ-norm of f implies

∑n

|〈g, un〉|2σ2+4μn

<∞. (3.16)

To analyze the convergence rates with respect to the source condition (3.13)-(3.14), we define the worstcase error

Eμ(δ, ρ,Rα) := sup {‖Rαgδ − L†g‖ : g ∈ R(L)⊕R(L)⊥, ‖gδ − g‖ ≤ δ, ‖L†g‖μ ≤ ρ}. (3.17)

27

Definition 3.4. We call the regularization Rα with parameter rule α = α(δ) order optimal for someμ > 0, if there exists a constant c > 1, such that for all δ > 0 and all ρ > 0 holds

Eμ(δ, ρ,Rα) ≤ cδ2μ

2μ+1 ρ1

2μ+1 . (3.18)

The supremum value μ0 of all possible μ for which a regularization Rα can be order-optimal is calledqualification of the method.

3.3 Compact Linear Operators

Definition 3.5. An operator K : X → Y is called compact, if the image under K of any bounded subsetof X is a relatively compact subset of Y .

If K is a compact linear operator, there exists an eigensystem (λn; vn) of the self-adjoint operator K∗K :X → X consisting of eigenvalues λn, which are all positive, and corresponding eigenvectors vn, whichbuild a complete orthonormal system of R(K). With such an eigensystem, the operator K∗K can bediagonalized as follows:

K∗Kf =∞∑n=1

λn〈f, vn〉vn, f ∈ X. (3.19)

Based on the eigensystem (λn, vn) of the self-adjoint operator K∗K, a singular system (σn; vn, un) of Kis defined as follows:

σn := ‖Kvn‖ =√λn; un :=

Kvnσn

. (3.20)

For the singular system the following holds:

Kvn = σnun, (3.21)

K∗un = σnvn, (3.22)

Kf =∑n∈N

σn〈f, vn〉un, f ∈ X, (3.23)

K∗g =∑n∈N

σn〈g, un〉vn, g ∈ Y. (3.24)

Expressions (3.23) and (3.24) are called singular value expansions of K and K∗ respectively.For compact operators the following fact [32] is of importance: The range R(K) of K is closed if andonly if it is finite dimensional.Therefore, if dim(R(K)) =∞, K† is an unbounded linear operator due to Remark 3.1.If R(K) is infinite dimensional, there are infinitely many singular values σn, which accumulate at 0, i.e.

limn→∞σn = 0, (3.25)

and are usually numbered in decreasing order. In the following proposition we recall the characterizationof the generalized inverse K† in terms of the singular system of K.

Proposition 3.6. [32, Thm. 2.8] Let (σn; vn, un) be a singular system for the compact linear operatorK : X → Y , g ∈ Y . Then we have

(i)

g ∈ D(K†) ⇔∑n∈N

|〈g, un〉|2σ2n

<∞. (3.26)

28

(ii) For any g ∈ D(K†) holds

K†g =∑n∈N

〈g, un〉σn

vn. (3.27)

In order to introduce regularization methods, we need to define functions of bounded operators, alsocalled filter functions. For the general bounded operator L, the term of the filter function is based on thespectral decomposition of the self-adjoint operator L∗L.

Definition 3.7. Let F be a (piecewise) continuous function. For a compact operator K with the singularsystem {σn, vn, un}, a function F of the self-adjoint operator K∗K is defined by

F (K∗K)f =∑n∈N

F (σ2n)〈f, vn〉vn, f ∈ X. (3.28)

For a bounded linear operator L, a function F of the self-adjoint operator L∗L is defined by

F (L∗L) :=∫ c

0

F (λ)d Eλ, (3.29)

where

L∗L =∫ c

0

λd Eλ (3.30)

is the spectral decomposition of L∗L and c is a constant with c ≥ ‖L∗L‖.

Functions F , applied to the operators as in (3.29), are usually called filter.

29

30

Chapter 4

Filter Based Formulation

A general approach to regularization of the ill-posed problem (3.1),(3.2) is based on the filter concept,cf. [39, 32].

Formally, we can apply Definition 3.7 of a filter function with F (λ) = λ−1 to obtain the filter basednotion of the generalized solution

f† = (L∗L)−1L∗g. (4.1)

The formal expression in (4.1) is in general not well-defined as follows from Proposition 3.6. To get asuitable approximation for f† we have to replace F (λ) = λ−1 by a family of filter functions Fα(λ) in away that Rα given by

Rαgδ := Fα(L∗L)L∗gδ (4.2)

defines a regularization of L†. In Section 4.1 we specify sufficient conditions on the filter function Fα,which guarantee the regularization property of the filter based method Rα. In Section 4.1.1 we formulatea common a-priori parameter choice strategy, which yields order optimal convergence of Rα defined by(4.2).

As an example of an a-posteriori parameter choice rule, we recall in Section 4.1.2 the Morozov’s discrep-ancy principle, first introduced in [44], and show that it is order optimal up to μ0 − 1/2, where μ0 is thequalification of the a-priori method. (For the detailed description of the μ-notion cf. Chapter 3).

In Section 4.1.3 we review an alternative approach for an a-posteriori parameter rule, the balancingprinciple [49], and show how it can be incorporated into the filter based concept. We outline someadvantages of the balancing principle compared to the Morozov’s discrepancy principles, such as it’sorder optimality up to qualification μ0. The differences of the two methods concerning their numericalrealization will be specified in more detail in Chapter 8.

The concept of the μ-norm, cf. Section 3.2, is quite artificial, since the operator L is involved intothe estimate of the smoothness of f†. In order to deal with more realistic source conditions, such asSobolev smoothness of the solution, we consider in Section 4.2 the regularization concept in Hilbert scales[45, 47, 60, 31].

In Sections 4.3 and 4.4 we illustrate all the mentioned filter-based concepts with two well known andwidely used regularization strategies, Tikhonov regularization and Landweber iteration. Equipped withthe basic strategies, we develop in Chapter 8 and Chapter 9 adaptive Tikhonov and adaptive Landweberregularization methods, with the objective to reduce essentially the computational complexity of thestandard regularization schemes.

31

4.1 Regularization Property and Parameter Choice Rules

Proposition 4.1. The family Rα given by (4.2) defines a linear regularization approach, i.e.

fδα := Rαgδ → f† as δ → 0 , (4.3)

if Fα : [0, ‖L‖2] → R is a piecewise continuous function with the following properties:

(i) |λFα(λ)| ≤ C for some C > 0,

(ii) limα→0 Fα(λ) = 1λ pointwise on λ ∈ (0, ‖L‖2],

(iii) α = α(δ) is chosen so that δ2G(α) → 0 as δ → 0 , where

G(α) := supλ∈[0,‖L‖2]

|Fα(λ)| . (4.4)

Remark 4.2. As usual, we drop the superscript δ in (4.2) if the exact data g are mentioned. If only thefirst two requirements (i) and (ii) are fulfilled, then for all g ∈ D(L†) holds

fα = Rαg → f†, α→ 0. (4.5)

Proof. The proof follows from two facts. First, conditions (i) and (ii) on the filter Fα yield the convergence(4.5) in the exact data case, cf. [32, Thm. 4.1]. Using the stability estimate

‖fδα − fα‖ ≤ δ√CG(α) (4.6)

from [32, Thm. 4.2] we get with (iii) and (4.5)

‖fδα − f†‖ ≤ ‖fδα − fα‖+ ‖fα − f†‖ → 0, as δ → 0. (4.7)

4.1.1 A-priori Parameter Choice

Order optimality of the filter based approach (4.2) for an a-priori parameter rule is shown by the following

Proposition 4.3. Let the assumptions of Proposition 4.1 be fulfilled. Let f† satisfy the μ-source condition(3.13)-(3.14) for some μ, ρ > 0. If, additionally,

(i) G(α) = O(

)as α→ 0,

(ii) ωμ(α) ≤ cαμ for some c > 0, where

ωμ(α) := λμ|1− λFα(λ)|, (4.8)

then, with the a-priori parameter choice rule

α ∼( δρ

) 22μ+1

(4.9)

the regularization method Rα is of optimal order.

Proof. The proof is based on the fact [32, Cor. 4.4] that the approximation error fα−f† can be estimatedas follows:

‖fα− f†‖ ≤ cαμρ. (4.10)

Order optimality of Rα with parameter α chosen by (4.9) follows from (4.10) and the stability estimate(4.6).

32

4.1.2 Morozov’s Discrepancy Principle

We recall a widely used a-posteriori parameter choice rule, the Morozov’s discrepancy principle.

Proposition 4.4. Let Fα be defined as in Proposition 4.1, and let the conditions of Proposition 4.3 besatisfied for some range 0 < μ < μ0. Furthermore, let

τ > sup {|1− λFα(λ)| : α > 0, λ ∈ [0, ‖L‖2]}. (4.11)

Then, the regularization method Rα with α chosen by the Morozov’s discrepancy principle

α(δ, gδ) := sup {α > 0 : ‖Lfδα − gδ‖ ≤ τδ} (4.12)

is convergent for all g ∈ R(L), and of optimal order for μ ∈ (0, μ0 − 1/2].

Proof. For the proof see [32, Thm. 4.17].

Remark 4.5. The expression ‖Lfδα − gδ‖ is called residuum or discrepancy, and is often used in a-posteriori parameter choice rules, since its evaluation does not require any explicit smoothness informationon the solution f†.

Remark 4.6. As follows from Proposition 4.4 applied to the methods of finite qualification μ0 < ∞,the Morozov’s discrepancy rule (4.12) does not yield optimal convergence rates for μ0 − 1/2 < μ ≤ μ0.An example for such a method is the Tikhonov regularization with μ0 = 1, described in more detail inSection 4.3. In such a case another a-posteriori parameter choice rules are preferable, especially if it isknown that the source condition ‖f†‖μ < ρ is fulfilled for some μ with μ0 − 1/2 < μ ≤ μ0 and ρ < ∞.In the next section we therefore introduce another a-posteriori rule, the balancing principle, which avoidssuch a restriction.

4.1.3 Balancing Principle

First results on balancing principle can be found in works of Tikhonov and Glasko [64] and Lepskii [38].We recall the setting proposed in [49]:The idea of the balancing principle is to find a balance between the approximation part fα − f† and thestability part fδα − fα of the total error

fδα − f† = (fδα − fα) + (fα − f†). (4.13)

Usually, the approximation error and the data error can be estimated as follows:

‖fα − f†‖ ≤ ω(α) (4.14)

and

‖fδα − fα‖ ≤ δξ(α), (4.15)

respectively. The functions ω and ξ are both positive and usually continuous and monotone functions.Due to the regularization effect, the estimate ω is decreasing, s.t. ω(α) → 0 as α → 0. Conversely, thefunction ξ is increasing, s.t. ξ(α) →∞ as α→ 0.

Remark 4.7. Recall that any filter based regularization (4.2) satisfies the estimate (4.14) with ω(α) =ωμ(α)ρ, ωμ, ρ as defined in Proposition 4.3.The data error estimate (4.15) is satisfied by the filter based regularization (4.2) with ξ(α) =

√CG(α)

where G(α) is defined as in (4.4), cf. (4.6).

33

Given the information (4.14) and (4.15), the estimate of the total error can be minimized by the followingparameter choice:

α = argmin{ω(α) + δξ(α)}. (4.16)

As shown in [42], for continuous functions ξ(α) and ω(α) a choice of αopt satisfying

ω(αopt) = δξ(αopt) (4.17)

yields the same order of convergence of the total error (4.13) as the principle (4.16). In this context theterm order optimality has been defined in [49] as follows:

Definition 4.8. Let the estimates (4.14) and (4.15) be satisfied for some functions ω(α) and ξ(α). Wecall a parameter choice rule α = α(δ, gδ) order optimal, if the following estimate of the total error holds:

‖fδα − f†‖ � ω

((ωξ

)−1

(δ)). (4.18)

Remark 4.9. Obviously, α = αopt =(ωξ

)−1

(δ) from (4.17) satisfies 4.18 with the generic constant at �equal 2.

In the following proposition we show that Definition 4.8 implies Definition 3.4, at least if the filter functionFα satisfies the requirements of Proposition 4.3.

Proposition 4.10. Let Rα be a filter based method (4.2) with the filter function Fα. Let f†, Fα satisfythe requirements of Proposition 4.3 for some μ, ρ > 0, so that the estimates (4.14) and (4.15) are fulfilledwith ω(α) = cαμρ and ξ(α) =

√CG(α).

Then Rα with parameter α = αopt, chosen by the principle (4.17), is order optimal in the sense ofDefinition 3.4.

Proof. For Fα as in Proposition 4.3, we set ω(α) = cαμρ and ξ(α) = δ√CG(α). Substituting ω(α) and

G(α) = O( 1α ) in (4.17), we get

αopt =( ω√

CG

)−1

(δ) ∼( cαμρ√

Cα−1

)−1

(δ) =(√C

c

δ

ρ

) 22μ+1

. (4.19)

The order optimal convergence for the total error ‖fδαopt−f†‖ as in (3.18)can be then derived from (4.18)

as follows:

‖fδαopt− f†‖ � ω

(( ω√CG

)−1

(δ))

= c(√C

c

δ

ρ

) 2μ2μ+1

ρ ∼ δ2μ

2μ+1 ρ1

2μ+1 . (4.20)

In practical realizations, the parameter α can only be chosen from some discrete set ℵ, as proposed in[49]:

ℵ := {αi}0≤i≤N , αi = α0qi, q > 1, N ≥ 0. (4.21)

However, the set ℵ should be chosen appropriately, i.e. α0 ≤ αopt ≤ αN .

Remark 4.11. For the small values of δ the regularization requirement implies δξ(αopt) < 1. Therefore,if α0 is chosen such that

δξ(α0) = 1, (4.22)

34

then α0 < αopt follows from the monotonicity of ξ. The requirement αopt ≤ αN is of less importance,since the process stops once an order optimal value αi = α0q

i is found. The criterium for the orderoptimal parameter choice is given in the following proposition.

Proposition 4.12. Let α0, q,N in (4.21) be chosen so that αopt satisfies α0 ≤ αopt ≤ αN .If ξ in the stability estimate (4.15) fulfills

ξ(αi−1) ≤ ζξ(αi) for some ζ > 1, (4.23)

then the parameter α+ chosen as

α+ := max0≤i≤N

{αi : ‖fδαi− fδαj

‖ ≤ 4δξ(αj), j = 0, . . . , i} (4.24)

yields order optimal convergence rates (4.18).If additionally, ξ satisfies a strong Δ2-condition, i.e.

κ2ξ(2α) ≤ ξ(α) ≤ κ1ξ(2α) for some κ1 > κ2 > 1 and all α > 0, (4.25)

then α++ chosen as

α++ := max0≤i≤N

{αi : ‖fδαj− fδαj−1

‖ ≤ 4δξ(αj−1), j = 1, . . . , i} (4.26)

also yields order optimal convergence rates (4.18).

Proof. For the proof cf. [49, Thm. 1.1-Thm. 1.2].

Corollary 4.13. Let ω, ξ be chosen as in Proposition 4.10 for some μ, ρ > 0. Then both parameterchoice rules (4.24) and (4.26) yield order optimal regularization in the sense of Definition 3.4, i.e.

‖fδα+− f†‖ � δ

2μ2μ+1 ρ

12μ+1 (4.27)

and

‖fδα++− f†‖ � δ

2μ2μ+1 ρ

12μ+1 . (4.28)

Proof. G = 1α fulfills requirements (4.23) and (4.25) of Proposition 4.12. If, additionally ω has the form

ω(α) := cαμρ for some ρ > 0, μ > 0, then with Proposition 4.10, Proposition 4.12 yields (ρ, μ)-orderoptimality.

4.2 Regularization in Hilbert Scales

We use the concept of Hilbert scales in order to incorporate the smoothness information like f ∈ Hp(Ω)into the analysis of regularization methods. First, we recall some basic facts about filter based regular-ization in Hilbert scales.For the first time the source condition in Hilbert scales has been formulated in [45, 47] for the Tikhonovregularization approach. Here we recall a generalization towards arbitrary filter based methods, developedin [60].Let X,Y be Hilbert spaces, L : X → Y some linear bounded injective operator. Let (Xr)r∈R be a Hilbertscale, cf. [36], i.e. Xr is a Hilbert space with inner product 〈f1, f2〉r = 〈Br/2f1, Br/2f2〉X and norm

‖f‖r = ‖Br/2f‖X , r ∈ R, (4.29)

35

where B is a linear, not necessarily bounded, self-adjoint and strictly positive operator in X. In general,the following source condition

‖f†‖p ≤ ρ (4.30)

is assumed to hold for some p ≥ 0 and ρ > 0. Additionally, the operator L has to satisfy the continuityassumption

‖f‖−a ∼ ‖Lf‖Y , for some a ≥ 0. (4.31)

In case of Hilbert scales, generated by the operator B, and given some filter function Fα we define Rα asfollows:

fδα := Rαgδ := Fα(B−sL∗L)B−sL∗gδ, (4.32)

the admissible values of s will be specified later. Note that in the particular case s = 0 we get the basicfilter-based regularization approach (4.2).

Definition 4.14. Let L, f† satisfy (4.30) and (4.31) with some p > 0, a ≥ 0. We call the regularizationin Hilbert scales (4.32) order optimal, if for some r ∈ R the following total error estimate

‖fδα − f†‖r ≤ cρa+rp+a δ

p−rp+a (4.33)

holds with some constant c independent of δ and ρ.

Remark 4.15. Note that in case r = 0, we get the following order optimal estimate

‖fδα − f†‖X ≤ cρ1

p/a+1 δp/a

p/a+1 , (4.34)

which obviously coincides with the μ-notion of the order optimality (3.18), if p/a = 2μ and B is chosenso that B−a = L∗L.

As usual, we assume that the filter function Fα in (4.32) is (piecewise) continuous on (0, c] with Fα(λ) → 1λ

as α→ 0 for all λ ∈ (0, c].Additional assumptions on the filter function Fα guarantee the regularization property and the orderoptimality of the method Rα defined in (4.32) with appropriately chosen parameter α. First, we recallan order optimal a-priori parameter choice rule.

Proposition 4.16. [60, Thm. 2.3]Let f† and L satisfy assumptions (4.30) and (4.31) with some p, a ≥ 0 and ρ > 0. Moreover, let Fα besuch that there exist constants cγ , cβ <∞ and β0 > 0 such that with c = ‖B−s/2L∗LB−s/2‖ the followingestimates are satisfied:

(i) supλ∈[0,c]|λγFα(λ)| ≤ cγαγ−1 for 0 ≤ γ ≤ 1,

(ii) supλ∈[0,c]|λβ [1− λFα(λ)]| ≤ cβαβ for 0 ≤ β ≤ β0.

Then for p ≤ 2s+ a the family Rα defined by (4.32) with parameter α = α(δ), chosen a-priori by

α ∼( δρ

) 2(s+a)p+a

, (4.35)

is a regularization of (3.1),(3.2) and the order optimal error estimate (4.33) holds for

r ∈ [max {p− 2β0(s+ a),−a}, p]. (4.36)

36

A possible a-posteriori parameter choice rule is given by Morozov’s discrepancy principle. We recall thecorresponding result concerning regularization in Hilbert scales.

Proposition 4.17. [60, Thm. 3.3] Let the assumptions (4.30) and (4.31) be satisfied by L, f† for somep, a ≥ 0, ρ > 0. Moreover, let the filter function Fα be continuous and monotonically decreasing withrespect to α, such that

(i) λFα(λ) → 1 for α→ 0,

(ii) λFα(λ) → 0 for α→∞,

(iii) 0 ≤ Fα(λ) ≤ kα for λ ∈ [0, c],

(iv) 0 ≤ kα(1− λFα(λ)) ≤ Fα(λ) for λ ∈ [0, c],

where kα is a positive constant depending on the regularization parameter α.Moreover, let ‖PN (L∗)g

δ‖ < Cδ < ‖gδ‖ for some C > 1. If s satisfies the inequality s ≤ p ≤ 2s+ a andα is chosen from the Morozov’s discrepancy principle, i.e.

‖Lfδα − gδ‖ = Cδ, (4.37)

then the order optimal error estimate (4.33) holds for all r ∈ [−a, s].

4.2.1 Wavelet Based Regularization

Let Ω ⊂ Rd be some bounded domain. Let Ψ = {ψλ}λ∈∇, Ψ = {ψλ}λ∈∇ be a dual pair of wavelet

frames for X = L2(Ω) with the analysis operator F : X → �2 : f → 〈f,Ψ〉, the synthesis operatorF ∗ : �2 → X : f → fTΨ for the primal frame Ψ and the corresponding operators F and F ∗ for the dualframe Ψ, respectively. Consider a Hilbert scale Xr on �2(∇), generated by B := D2, D defined in (1.38),with ‖ · ‖r defined as follows:

‖f‖r := ‖Drf‖�2(∇), f ∈ �2(∇). (4.38)

Let the frame Ψ exhibit sufficient localization property, as required to constitute a Gelfand frame for theGelfand triple (Hr(Ω), L2(Ω),H−r(Ω)), cf. Section 2.3. Subsequently, the rescaled pair D−rΨ,DrΨ is adual pair of frames for the Sobolev space Hr(Ω), cf. Section 2.2, i.e.

‖f‖Hr(Ω) ∼ ‖〈f,D−rΨ〉‖�2(∇) ∼ ‖〈f,DrΨ〉‖�2(∇). (4.39)

With the definition (4.38) of the Hilbert scale Xr we get the following equivalence:

‖f‖Hr(Ω) ∼ ‖〈f,DrΨ〉‖�2(∇) ∼ ‖Dr〈f, Ψ〉‖�2(∇) ∼ ‖〈f, Ψ〉‖r. (4.40)

Since 〈f, Ψ〉 = F f , and F ∗F = I by the duality relation, we get 〈f, Ψ〉 = f , where f is a framerepresentation of the function f , i.e. f = F ∗f . Finally, we obtain the equivalence between the Sobolevnorm of the function f and the r-norm of its wavelet representation f in the Hilbert scale Xr:

‖f‖Hr(Ω) ∼ ‖f‖r. (4.41)

Consider an equivalent formulation of the ill-posed equation Lf = g in terms of the frame Ψ:

Af = g, ‖gδ − g‖ ≤ δ, (4.42)

where A = LF ∗ : �2 → Y is again a linear bounded ill-posed operator, δ > 0. We denote the generalizedsolution of (4.42) by f †. The continuous solution f† and the discrete solution f † are related by f† = F ∗f †.

37

For the approximation of the generalized solution f † of the equation (4.42) we apply the regularizationin Hilbert scale with B = D2:

f δα := Rαgδ := Fα(D−2sA∗A)D−2sA∗gδ. (4.43)

Theorem 4.18. Let the assumption on the filter function Fα of Proposition 4.16 be fulfilled for someβ0 > 0. Let f† be in some Sobolev space Hp(Ω), i.e.

‖f†‖Hp(Ω) < ρ (4.44)

for some p, ρ > 0. Let L be continuous in the following sense:

‖f‖H−a(Ω) ∼ ‖Lf‖Y . (4.45)

Let α = α(δ) be chosen a-priori as in (4.35).Then for p ≤ 2s+a the method Rα given by Rαgδ := fδα := F ∗f δα defines an order optimal regularizationof L†, i.e.

‖fδα − f†‖Hr(Ω) � ρa+rp+a δ

p−rp+a for all r ∈ [max {p− 2β0(s+ a),−a}, p]. (4.46)

Proof. First note that the method Rα (4.43) is an order optimal regularization of the frame basedformulation (4.42). In fact, from the Sobolev source condition (4.44) due to the equivalence (4.41) weinfer the Hilbert scales source condition

‖f‖p ≤ ρ. (4.47)

By the same argument, we get the Hilbert scales continuity of the operator A, using the continuityassumption (4.45):

‖f‖−a ∼ ‖f‖H−a(Ω) ∼ ‖Lf‖Y = ‖Af‖Y . (4.48)

Hence, the requirements of Proposition 4.16 are satisfied, so that the following estimate in the Hilbertscale

‖f δα − f †‖r � ρa+rp+a δ

p−rp+a (4.49)

holds for r ∈ [max {p− 2β0(s+ a),−a}, p].Consequently, with the norm equivalence (4.41) we yield the final estimate of the total error:

‖fδα − f†‖Hr(Ω) ∼ ‖f δα − f †‖ � ρa+rp+a δ

p−rp+a . (4.50)

4.3 Tikhonov Regularization

In this section we recall a prominent example of the filter-based methods, the Tikhonov-Phillips regular-ization, cf. [62, 63, 50].Consider the minimization problem

‖Lf − gδ‖2Y + α‖f‖2X , α > 0, (4.51)

which is equivalent [32, Thm. 5.1] to the solution of the regularized normal equation

38

(L∗L+ αI)f = L∗gδ. (4.52)

The equation (4.52) is well-posed for α > 0. The Tikhonov method Rα is defined as the unique solutionfδα of (4.52):

Rαgδ := fδα := (L∗L+ αI)−1L∗gδ, gδ ∈ Y. (4.53)

The filter-based formulation of (4.53) is given by (4.2) with the filter function

Fα(λ) =1

λ+ α. (4.54)

4.3.1 Regularization Result

Proposition 4.19. Tikhonov method Rα given by (4.53) is a regularization of (3.1),(3.2), if α = α(δ)is chosen so that

α(δ) → 0 andδ2

α(δ)→ 0, for δ → 0. (4.55)

Proof. The proof is based on the fact that the filter function (4.54) satisfies the requirements (i) and (ii)of Proposition 4.1 with C = 1 and G(α) from (4.4) given by

G(α) =1α. (4.56)

4.3.2 Tikhonov Regularization with an A-Priori Parameter Choice

Next we recall an order optimality result for an a-priori parameter choice rule.

Proposition 4.20. Supplied by the a-priori parameter choice rule

α ∼( δρ

) 22μ+1

, (4.57)

Tikhonov regularization (4.53) is order optimal for μ ∈ (0, 1], ρ > 0.

Proof. It is easy to show, cf. [32, Ex. 4.15], that the requirements of Proposition 4.3 are fulfilled by thefilter function (4.54) for μ ∈ (0, 1].

We conclude, that the qualification for Tikhonov is μ0 = 1 with the best possible convergence rate O(δ23 ).

4.3.3 Tikhonov Regularization with Morozov’s Discrepancy Principle

Analogously, the order optimality of Morozov’s discrepancy principle (4.12) can be shown by applyingProposition 4.4.

Proposition 4.21. Supplied by the Morozov’s a-posteriori parameter choice rule (4.12), Tikhonov regu-larization Rα is order optimal for μ ∈ (0, 1/2].

Remark 4.22. After [32, Prop. 4.20], the best convergence rates for non-degenerated compact operatorsare reached at μ = 1/2, consequently, for μ ∈ (1/2, 1], Tikhonov regularization is neither order optimalnor it yields better convergence rates than O(

√δ), so that another a-posteriori strategies are of interest,

which are of optimal order up to μ0 = 1.

39

4.3.4 Tikhonov Regularization with Balancing Principle

Theorem 4.23. Tikhonov regularization (4.53) with parameter values α+ and α++ chosen a-posterioriby the balancing principles (4.24) and (4.26), respectively, is order optimal in the sense of the implicitformulation (4.18) and in the sense of the μ-norm formulation (3.18) for 0 < μ ≤ 1, ρ > 0, supposedthat f† ∈ Xμ,ρ.

Proof. Recall that any filter based regularization satisfies the estimate (4.14) with ω(α) = ωμ(α)ρ, ωμas defined in Proposition 4.3. In case of Tikhonov regularization, ωμ satisfies the requirement (ii) ofProposition 4.3 with μ ∈ (0, 1] and CG(α) = 1

α , cf. (4.56). Both formulations of order optimality followfrom Proposition 4.12 and Corollary 4.13.

4.3.5 Tikhonov Regularization in Hilbert Scales

It has been shown in [60, Ex. 2.4] that Tikhonov regularization in Hilbert scales, determined by

fδα := (L∗L+ αBs)−1L∗gδ, (4.58)

satisfies requirements of Proposition 4.16 with the filter function Fα(λ) = (α + λ)−1, which is the sameas in (4.54), and β0 = 1.Moreover, Fα fulfills the assumptions (i)-(iv) of Proposition 4.17 with kα = 1/α.

4.4 Landweber Iteration

The formal minimization of the least squares functional

‖Lf − gδ‖2 (4.59)

yields the normal equation

L∗Lf = L∗gδ, (4.60)

which is in general not solvable, if gδ /∈D(L†), cf. Section 3.1. Even if gδ ∈D(L†), the solution of (4.60)does not depend continuously on the noisy data gδ.The steepest descent approach for the normal equation (4.60) with some initial guess fδ0 yields followingsubsequent approximation step:

fδk+1 = fδk + βLL∗(gδ − Lfk), k ≥ 0, fδ0 ∈ X, (4.61)

usually called Landweber iteration. The approximation (4.61) is in general not convergent for arbitrarygδ ∈ Y . However, in [37] Landweber proved its convergence for compact operators and g ∈ D(L†).Moreover, it is known that iterative methods exhibit a “self-regularizing property” in the sense that earlytermination has a regularizing effect. In this case the termination index plays the role of the regularizationparameter, and the stopping rule plays the role of the parameter choice method.As in the former sections, we will drop the superscript δ in (4.61), if the exact data g are mentioned.Moreover, in this section it will be convenient to have ‖L‖ ≤ 1.If we replace gδ by gδ − Lfδ0 in (4.61), and set fδ0 = 0, we get the same iterates as in (4.61). In whatfollows, we assume fδ0 = 0, meaning that the initial choice is already incorporated into the right handside.In order to show the regularization property we recall the filter based formulation of (4.61).

Lemma 4.24. For the Landweber iteration (4.61) with the initial value fδ0 = 0 the following holds:

40

(i)

fδk+1 = βL

k∑j=0

(I − βLL∗L)jL∗gδ (4.62)

(ii)fδk+1 = Fk+1(L∗L)L∗gδ, (4.63)

with

Fk+1(λ) =k∑j=0

(1− βLλ)j . (4.64)

Proof. For the proof see e.g. [32, Thm. 6.1].

Remark 4.25. For 0 < βL <2

‖L‖2 and λ ≤ ‖L‖2 holds |1− βLλ| < 1.

4.4.1 Regularization Result

We recall the regularization result for the truncated Landweber iteration (4.61).

Proposition 4.26. Let 0 < βL <2

‖L‖2 , δ > 0. Landweber iteration (4.61) is a regularization method ifit is truncated at some index k = k(δ) with

√kδ → 0 as δ → 0, (4.65)

while k−1 plays role of the regularization parameter α.

Proof. It is easy to show that the filter (4.64) fulfills the requirements of Proposition 4.1 with α = k−1.

4.4.2 A-priori Truncation Rule

Proposition 4.27. The Landweber iteration (4.61) yields for 0 < βL <2

‖L‖2 , μ, ρ > 0 an order optimalregularization with

c0 = (1 + 2μ)(4μe)−μ

2μ+1 , (4.66)

if it is truncated at

k =⌊βL

(βLμe) 2μ

2μ+1(ρδ

22μ+1 )⌋

, (4.67)

where �x� denotes the largest integer smaller or equal x.

Proof. For the proof see e.g. [39], p. 109.

4.4.3 Morozov’s Discrepancy Principle

A suitable a-posteriori parameter choice rule is given by the Morozov’s discrepancy principle (cf. (4.12)),i.e. k = k(δ, gδ) is defined as

k(δ, gδ) = mink≥0

{‖gδ − Lfk‖ ≤ τδ}, for some τ > 1. (4.68)

The regularization property of Landweber iteration (4.61), truncated at the step k, given by (4.68), hasbeen shown by Defrise and de Mol [27], using the following fact:

Lemma 4.28. Let g ∈ R(L) and f be a solution of Lf = g. If ‖gδ − Lfδk‖ > 2δ, then fδk+1 is a betterapproximation of f then fδk .

41

However, the monotonicity of the data error, described in Proposition 4.28, is not sufficient for theconvergence.We conclude this section by a regularization and order optimality result from [32]:

Proposition 4.29. If g ∈ R(L), then the Landweber iteration (4.61) with the discrepancy principle(4.68) is an order optimal regularization method for all μ > 0, i.e.

‖fδk(δ,gδ) − f†‖ � δ2μ

2μ+1 . (4.69)

Proof. The proof is based on the fact that the filter function (4.64) fulfills the requirements of Proposition4.4 about the order optimality of a filter based regularization supplied by Morozov’s parameter choicerule.

Note that there is no upper limit on μ in Theorem 4.29, i.e. the a-posteriori truncated Landweberiteration (4.61),(4.68) has infinite qualification μ0.

4.4.4 Landweber Iteration in Hilbert scales

It has been shown in [60, Ex. 2.7] that the Landweber iteration in Hilbert scales, given by Rαgδ := fδN ,where fδN is computed iteratively by

fδk+1 = fδk − βLB−sL∗(Lfδk − gδ), k = 0, . . . , N − 1, α = k−1, (4.70)

satisfies requirements of Proposition 4.16 with the filter function Fα(λ) = [1− (1− βLλ)N ]/(βLλ), whichcoincides with (4.64), and β0 = ∞.Moreover, the Landweber filter function Fα fulfills the assumptions (i)-(iv) of Proposition 4.17, as statedin [60].

42

Part III

Adaptivity Issues

43

Chapter 5

Nonlinear Approximation and

Adaptivity

This chapter is concerned with the nonlinear approximations, which gives rise to the adaptivity concept.The basic idea of the nonlinear approximation is to work with significant components of the analyzedsignal instead of approximating by the elements of linear subspaces. Of course the efficiency of thenonlinear approximation depends on the analyzing system and on the smoothness of the signal itself.In Section 5.1 we introduce the term of nonlinear approximation following the lines of [28], and specifythe smoothness classes of functions for which the adaptive approximation pays off.In Section 5.2 we recall the definition of the best N-term approximation, which is the sequence spaceanalogon of the nonlinear approximation in Hilbert spaces. We recapitulate how the smoothness condi-tions needed for the nonlinear approximation in the function spaces can be translated into conditions onthe coefficient sequences.

5.1 Nonlinear Approximation

Let (X, ‖ · ‖X) be a normed space, v ∈ X and assume that we are looking for a numerical approximationof v with some prescribed accuracy ε > 0, using only only finite many basis (or frame) functions from aset Ψ = {ψλ}λ∈∇ ⊂ X.Here we compare two different approximation strategies. If the approximation is a linear combination ofthe first N elements of Ψ, i.e. it is taken from

SN = S({ψλ}1≥λ≤N ), (5.1)

then we deal with linear approximation. Another situation arises if the approximating sets are definedonly by the cardinality N of the generating system ΨΛ ⊂ Ψ, so that the approximating sets

ΣN = ∪#ΨΛ≤NS(ΨΛ), (5.2)

consist of all linear combinations from Ψ with at most N nontrivial coefficients. Obviously, this is a caseof nonlinear approximation, since the sum of two elements with N nonzero coefficients is not necessaryin ΣN . This kind of nonlinear approximation is usually called N -term approximation.For some T ⊂ X and some element v ∈ X we define the best approximation error by

distX(v, T ) = infw∈T

‖v − w‖X . (5.3)

45

A question is, given some approximating system T = {TN}N≥0, which elements v ∈ X can be approx-imated by elements of TN with certain decay of distX(v, TN ) in terms of N for all N ≥ 0. A typicalexample for such an approximation space is the set of all v ∈ X which can be approximated within thesets TN with the following rate

distX(v, TN ) � N−s, v ∈ X (5.4)

for some s > 0.

Definition 5.1. In general, if T = {TN}N≥0 is a family of nested and asymptotically dense subsetsTN ⊂ X, the approximation spaces Asq(X) related to T are defined by

Asq(X) := {v ∈ X : |v|Asq(X) <∞}, (5.5)

where for s > 0, 0 < q ≤ ∞

|v|Asq(X) :=

⎧⎨⎩

(∑n≥0((n

sdistX(v, Tn))q 1n )1/q , 0 < q <∞

supn≥0 {nsdistX(v, Tn)} , q = ∞. (5.6)

Remark 5.2. Note that from Definition 5.1 follows that the whole scale of spaces Asq(X) for 0 < q ≤ ∞satisfies the requirement (5.4).

Under appropriate conditions on the sequence T of approximating sets, it can be shown that Asq(X)coincide with more classical function spaces [9]. Here we recall the main result about approximation inSobolev spaces Ht, cf. [8, 9, 28]:

Proposition 5.3 ([51, Corollary 4.2]). Under appropriate approximation and regularity requirements onthe underlying wavelet basis Ψ of some Sobolev space Ht(Ω), the following holds:

(i) For linear approximation in Ht(Ω), the corresponding approximation space for q = 2 is given byAs2(Ht(Ω)) = Hsd+t(Ω).

(ii) For N -term approximation with the spaces Tn = Σn, the corresponding approximation space forq = 2 is given as Asq(Ht(Ω)) = Bsd+tτ (Lτ (Ω)), where τ and s are related via τ = (s+ 1

2 )−1.

The main conclusion of this section, following Proposition 5.3 is, if the target object v has higher smooth-ness in the Besov scale Bsd+tτ (Lτ (Ω)) then in the Sobolev scale Hsd+t(Ω), then the nonlinear approxi-mation methods pay off compared with the uniform space refinement. In other words, if for some fixedvalues of d, t and some function f , the value

sup {s > 0 : f ∈ Bsd+tτ (Lτ (Ω))} (5.7)

is higher than the valuesup {s > 0 : f ∈ Hsd+t(Ω)}, (5.8)

then the rates (5.4) attained by the nonlinear approximation in the system {ΣN}N≥0, (5.2), are higherthan the rates obtained by the linear approximation in the system {SN}N≥0, (5.1).

5.2 Best N-term Approximation in �2

Given a Riesz basis Ψ = {ψλ}λ∈∇ in X, we can use the norm equivalence (1.3) in order to replace theapproximation in X by the approximation in the corresponding coefficient space �2.

46

Assume that we are looking for some vector vε approximating v = (vλ)λ ∈ ∇ in �2(∇) with some givenaccuracy ε > 0. For some N ≥ 0, the most economical way to form the approximation vN is to take Ncoefficients of v with the largest ablolute values, and to replace the remaining coefficients by zero. Thenone looks for the smallest N = N(ε,v) such that

‖v − vN‖ ≤ ε. (5.9)

Such a vector vN is usually called the best N -term approximation on v (given some accuracy ε). It isobvious that vN minimizes the �2-distance among all vectors wN with at most N nonzero coefficients,i.e.

‖v − vN‖�2 := σN (v) := inf {‖v −wN‖�2 : #supp(wN ) ≤ N}. (5.10)

Consequently, vN is the vector of the expansion coefficients of vN ∈ X in terms of the basis Ψ, whichminimizes the distance to the nonlinear set ΣN , i.e.

‖vN − v‖X = infw∈ΣN

(w − v)X . (5.11)

As in the case of nonlinear approximation in X, we are particularly interested in the elements v ∈ �2

with the following decay of the best N -term approximation error σN (v):

σN (v) ≤ C(v)N−s (5.12)

for some s > 0.

Remark 5.4. If some vector v ∈ �2 satisfies the decay condition (5.12), then the number N = N(ε,v)of the nonzero coefficients of the best N -term approximation vN must be bounded by a fixed multiple ofε−1/s for any ε > 0, as it follows from the estimate (5.9).

All the vectors in �2, satisfying the decay condition (5.12), form the approximation spaces As := As∞(�2),defined as in (5.5), endowed with the equivalent (quasi-)norm

‖v‖As := supN≥0

(N + 1)sσN (v), σ0(v) := ‖v‖�2 . (5.13)

It turns out that the approximation spaces As coincide with the weak �τ spaces [29]

�wτ := {v ∈ �2 : |v|�wτ := supn≥1

n1/τ |γn(v)| <∞}, 0 < τ < 2. (5.14)

where by γn(v) we denote the n-th largest coefficient in modulus of v. The expression |·|�wτ is a quasi-seminorm on �wτ , since the triangle inequality holds only up to a τ -dependant constant C1(τ)

|v + w|�wτ ≤ C1(τ)(|v|�wτ + |w|�wτ ), v,w ∈ �wτ . (5.15)

The continuous and dense embeddings

�τ ↪→ �wτ ↪→ �τ+δ, 0 < δ < 2− τ, (5.16)

are the reason for which �wτ are called weak �τ spaces.Moreover, the norms ‖ · ‖As and ‖ · ‖�wτ := ‖ · ‖�2 + |·|�wτ are equivalent with

|v|�wτ ∼ supN≥1

NsσN (v), v ∈ �wτ . (5.17)

Especially v ∈ �wτ implies the decay condition (5.12), where the smallest possible value of C(v) is |v|�wτ .

47

We recall a sufficient condition for the estimate (5.12), cf. [28, 8, 58], based on the characterization ofthe Besov spaces (1.42).

Theorem 5.5. Let Ψ be a wavelet Riesz basis of Ht(Ω), Ω ⊂ Rd with order of vanishing moments m.

Moreover, let the wavelets Ψ be sufficiently smooth, so that they are contained in Bsd+tτ (Lτ (Ω)), ands ≤ 1/2 if t < −d/2.Under the assumption

0 < s < (m− t)/d, (5.18)

the following equivalence between the regularity of the function u and its wavelet coefficients u holds:

u ∈ �τ ⇔ u ∈ Bsd+tτ (Lτ (Ω)), τ = (s+ 1/2)−1. (5.19)

Moreover, from the embedding (5.16) we get the implication

u ∈ Bsd+tτ (Lτ (Ω)), τ = (s+ 1/2)−1 ⇒ σN (u) � N−s, N ≥ 0. (5.20)

Remark 5.6. In the context of the adaptive approximation of the operator equation Lf = g, Theorem5.5 imposes a smoothness condition on the solution f . In a specific application we will have to verifythe condition (5.19) and assure that the smoothness of the solution f in the Besov scale Bsd+tτ (Lτ (Ω)) ishigher than the corresponding smoothness in the Sobolev scale Hsd+t(Ω).

Remark 5.7. [58, Rem. 4.3] In case that the wavelets are globally Cr-functions for some r ∈ N0 ∪{−1}, with r = −1 meaning that they are piecewise smooth, then it is known that they are contained inBντ (Lτ (Ω)), when ν < r+ 1 + 1/τ . For the spline wavelets with r = m− 2, it means that if the followingcondition

m− td

≥ 12

(5.21)

holds, then the smoothness of the wavelets does not limit the range (5.18).

48

Chapter 6

Adaptive Iterative Scheme for

Well-Posed Problems

This chapter provides a brief introduction into the adaptive methods for solving well-posed inverse prob-lems. It is mainly based on the works of [51, 58].In Section 6.1 we give an equivalent �2-formulation of an operator equation Lf = g for some boundedlyinvertible operator L : H → H ′ in terms of a Hilbert frame for the solution space H. We show how thesolution of the equivalent �2-problem can be approximated by means of Richardson iteration, and recallcorresponding convergence result.In Section 6.2 we introduce a convergent adaptive approach from [58] for solving well-posed operatorequations, which is essentially based on the nonlinear approximation of the Richardson iteration.The detailed description of the whole scheme will be given in Section 6.3.

6.1 Frame Based Setting

Let L : H → H′ be a bounded self-adjoint linear operator with the bounded inverse L−1. Let Ψ = {ψλ}be a Gelfand frame for the Gelfand triple (H, L2(Ω),H′) with a corresponding Gelfand triple of sequencespaces (�2,Dt , �2, �2,D−t). Then, from Proposition 2.1 follows that D−tΨ is a (Hilbert) frame for H.We consider the operator equation

Lf = g, (6.1)

or, in case L is not self-adjoint, the normal equation

L∗Lf = L∗g, (6.2)

which is equivalent to (6.1) for invertible L.Discretization of (6.1) with respect to the frame D−tΨ yields an �2-system

Sf = g, (6.3)

where

S := D−tFLF ∗D−t = D−t〈LΨ,Ψ〉TD−t (6.4)

is the system or stiffness matrix of operator L, f is a frame representation of f with

f = F ∗D−tf (6.5)

49

and

g = D−tFg (6.6)

is the discretized right hand side.S is a bounded operator from �2 to �2, moreover, if L is a boundedly invertible operator from H to H′,then S is boundedly invertible on R(S) = R(D−tF ), cf. [51, Lem. 5.1].In the sequel, we denote by Q : �2 → R(S) the orthogonal projector onto R(S).

Proposition 6.1. Let L : H → H′ be a symmetric bounded linear operator with bounded inverse L−1.Let 0 < a < 2/‖S‖L(�2).The unique solution f in R(S) of the �2-system (6.3) can be computed in infinite series form as

f = a∞∑n=0

(I − aS)ng. (6.7)

Moreover, a convergent iterative approximation on R(S) of f is made by the Richardson iteration

fn+1 = fn − a(Sfn − g), n ≥ 0, f0 ∈ R(S). (6.8)

Proof. The proof, cf. [51, 58], is based on the fact that I − aS is a contraction on R(S) with

q(a,S) := ‖I − aS|R(S)‖ = max {aλmax − 1, 1− aλmin} < 1, (6.9)

where λmax, λmin are the maximal and the minimal eigenvalues of S|R(S), which are strict positive. Theminimal value of the contraction parameter q(a,S) is given by

qmin(S) := mina>0

q(a,S) =λmax − λmin

λmax + λmin, (6.10)

cf. [58].Note that, since g,f0 ∈ R(S), the iterates fn+1 in (6.8) stay in R(S) = R(S).

Let f = Qf be the result of the infinite series (6.7), or equivalently, the limit of the Richardson iteration(6.8). The unique solution f of the well-posed operator equation (6.1) can be computed by f = F ∗D−tf =F ∗D−tQf .

6.2 Adaptive Discretization of the Operator Equation

Given some approximation accuracy ε > 0, the idea of the adaptive scheme is to calculate the bestN = N(f , ε)-term approximation fN with (5.9),(5.10) on the solution f on R(S) of the system (6.3).Since in practical applications the solution f can only be approximated, e.g. by the Richardson iteration(6.8), the aim of the adaptive frame based iteration [58] is to replace the n-th exact iteration step fn byits best N(fn, εn)-term approximation, for some appropriately chosen accuracy εn, so that the adaptiveapproximation converges towards f with n→∞.

Definition 6.2. Let {εCn }n≥0, {εAn }n≥0, {εRn }n≥0 be a-priori known positive sequences with ε{C,A,R}n → 0as n→∞.The adaptive Richardson iteration is given by

f (n+1) = COARSEεCn[f (n) − a(APPLYεA

n[Sf (n)]−RHSεR

n[g])], n ≥ 0, (6.11)

50

where APPLY, RHS and COARSE are adaptive routines for the approximate application of theoperator matrix S, adaptive approximation of the right hand side g, and adaptive thresholding, repec-tively. The notation ROUTINEε[INPUT] means that the routine ROUTINE approximates the inputINPUT with the accuracy ε, i.e.

‖ROUTINEε[INPUT]− INPUT‖�2 ≤ ε, ε > 0. (6.12)

The detailed realization of the building blocks will be given in Section 6.3. In what follows, we use theabbreviations

COARSEn := COARSEεCn, APPLYn := APPLYεA

n, RHSn := RHSεR

n. (6.13)

The approximation accuracies εCn , εAn and εRn have to be appropriately chosen in order to assure theconvergence of the scheme (6.11). A description of the covergent scheme based on (6.11) will be given inSection 6.3.Another important issue is the computational complexity. As we have seen in Section 5.2, a desireddecay of the error made by the best N -term approximation is N−s for some s > 0. Given some accuracyε > 0, the optimal number of coefficients N should be then at most of order ε−1/s.We recall the definition of optimal computational complexity from [58]:

Definition 6.3. Let vε be the output vector of an algorithm ROUTINEε[v] → vε for some v ∈ �2,ε > 0. Let N(v, ε) be defined by (5.9)-(5.10). If the number of nonzero coefficients of vε as well as thenumber of computation steps needed to calculate the output can be bounded by some fixed multiple ofN(v, ε), or which is equivalent, by some multiple of ε−1/s|v|1/s�wτ

for some 0 < s ≤ s∗, then we say thatthe algorithm ROUTINE performs with optimal computational complexity.

In Chapter 5 we already discussed which functions f can be approximated efficiently, i.e. with optimalcomplexity by the best N -term approximation, cf. Theorem 5.5 and when this kind of nonlinear approx-imation pays off compared to the linear one. In the following section we give a detailed description of theadaptive scheme, i.e. of the building blocks COARSE, APPLY and RHS, and recall the requirementson the system components L and g, which permit construction of an efficient adaptive routines.

6.3 Frame Based Adaptive Solver

The numerical calculation of the best N -term approximation of some vector v with a finite number ofnonzero coefficients (in practical numerical calculation we will always assume the number of inputs to befinite) consists in finding the N largest by modulus coefficients of v and requires complete sorting of thenonzero components by their absolute value. In this case one needs #supp(v) log(#supp(v)) operations.Another approach, suggested in [3, 58], shows that it is possible to compute the largest CN componentsof v, where C is a fixed constant, with order optimal computational complexity, i.e. so that the number ofcomputational steps is also bounded by some constant multiple of N . All the adaptive routines, presentedin this Section, use a different sorting technique, the bin sorting, in order to avoid the additional log-factorin the complexity estimate.

Adaptive Thresholding

As we have seen in Section 5.2, only the vectors from �wτ (5.14) with τ = (s + 1/2)−1, s > 0 can beefficiently approximated by the best N -term approximation, i.e. with the rate N−s.Note that �wτ has another characterization

51

�wτ = {v ∈ �2 : #{λ : 2−(j+1) < |vλ| ≤ 2−j} � 2jτ , j ∈ Z}. (6.14)

Therefore, instead of complete sorting, we use a natural way to sort the entries of v by their dyadic order

Vi := {(λ, vλ) : 2−(i+1) <|vλ|‖v‖�2

≤ 2−i}, (6.15)

which is usually called bin sorting.In order to get rid of small coefficients and therefore, to increase the adaptivity effect, an adaptivethresholding procedure COARSEε[v] → vε with

‖vε − v‖ ≤ ε (6.16)

for v ∈ �2 with finitely many nonzero coefficients is defined as follows [58]:

Algorithm 6.4 (COARSEε[v] → vε).

• Set q := �log((supp(v))1/2‖v‖�2/ε

)�.

• Use the bin sorting (6.15) to build the sets Vi, 0 ≤ i < q and put the remaining elements into theset Vq.

• Create vε by successively extracting elements from V0, V1 and so on, until the accuracy requirement

‖v − vε‖�2 ≤ ε (6.17)

holds.

Remark 6.5. For vectors v ∈ �2 with finitely many nonzero elements, the adaptive thresholding routineCOARSE fulfills the optimality requirements of Definition 6.3, as shown in [58].

The following fact is important concerning the adaptive thresholding of vectors in �wτ :

Remark 6.6. [58, Prop 3.2]Let 0 < θ < 1/3 be fixed, τ ∈ (0, 2) and τ = (s + 1/2)−1. Then, for any ε > 0, v ∈ �wτ , and afinitely supported approximation u ∈ �2 with ‖v −u‖�2 ≤ θε, the output w := COARSE(1−θ)ε[u] fulfills‖v−w‖�2 ≤ ε and the number of significant entries in w as well as of the computational steps is boundedby ε−1/s|v|1/s�wτ

.

The proof of the last statement can be found in [10, 58]. As a consequence, there is a constant C2(τ),only depending on τ , such that

|w|�wτ ≤ C2(τ)|v|�wτ . (6.18)

Approximate Input Data

To get an adaptive approximation of the right hand side of the system (6.3), we assume as in [58] thatthe wavelet expansion coefficients g = 〈g,D−tΨ〉T are known, at least so that in practice g can beapproximated in �2 by some gε, containing finitely many expansion coefficients, up to any given accuracyε > 0:

‖g − gε‖�2 ≤ ε. (6.19)

52

Definition 6.7. We call the input vector g ∈ �wτ s∗-optimal, if for some s∗ > 0 there exists a routineRHSε[g] → gε which satisfies (6.19) and

#supp(gε) � |g|1/s�wτε−1/s, s ≤ s∗, (6.20)

moreover, the associated computational expense is of the same order as #supp(gε).

Since the right hand side of (6.3) can be written as g = D−tFg, we sometimes use the notationRHSε[g] := RHSε[D−tFg], if a frame expansion of g in terms of a fixed frame is mentioned.It seems very natural to use the adaptive thresholding routine COARSE to compose gε. A sufficientcondition for (6.20) could then be that the right hand side g belongs to some space �wτ , τ = (s+ 1/2)−1

for all 0 < s ≤ s∗, cf. Remark 6.6, or that the data g itself belongs to a specific Besov space, cf. Theorem5.5.

Adaptive Operator Discretization

Another important question is the adaptive approximation of the operator matrix S in the matrix equa-tion (6.3), or, more precisely, the adaptive calculation of the matrix-vector product Sfn in (6.11) by theroutine APPLY.As in the case of adaptive thresholding and input data approximation, we aim to construct a routineAPPLYε[Sf ] → wε, approximating the matrix-vector product Sf for any finite input vector f withsome given accuracy ε > 0:

‖Sf −wε‖�2 ≤ ε, (6.21)

which is optimal in the sense of Definition 6.3.The notation of the routine APPLY can vary in different contexts, we will also use APPLY[S,f ] or,for simplicity of notation APPLY[f ] since S is usually a fixed matrix.It has been shown in [58] that the matrix-vector product Sf can be approximated by APPLY with orderoptimal computational complexity, if the operator matrix S is quasi-sparse, i.e. it can be approximatedwell by sparse matrices. In this context we recall the compressibility definition from [10]:

Definition 6.8. We call a bounded operator S ∈ L(�2) s∗-compressible, when for each j ∈ N there existsan infinite matrix Sj with at most αj2j nonzero entries in each row and column, with

∑j∈N

αj < ∞,such that for any s < s∗ we have

‖S − Sj‖ ≤ Cj , (6.22)

and the constants Cj fulfill∑j∈N

Cj2js <∞.

Proposition 6.9. The s∗-compressibility definition is satisfied by Lemarie matrices (2.20)-(2.21) with

s∗ = min{σd− 1

2,β

d− 1

}, (6.23)

if σ > d/2, β > d.

Proof. For the proof see [10].

Estimates of the form (2.20)-(2.21) are known to hold as soon as the underlying primal wavelets aresufficiently smooth and possess adequate cancelation properties, and the operator L is continuous betweencertain Sobolev spaces, see Theorem 2.5. From Proposition 6.9 follows the s∗-compressibility for specificstiffness matrices, satisfying the Lemarie property:

53

Corollary 6.10. Let L, D−tΨ,DtΨ satisfy conditions of Theorem 2.5 with some m, γ > 0. Then, thestiffness matrix S = 〈LD−tΨ,D−tΨ〉 is s∗-compressible with

s∗ =min {τ, γ − t, t+ m}

d− 1

2. (6.24)

Remark 6.11. As we will see in Remark 6.14, the value s∗ defines the range 0 < s ≤ s∗, where APPLYsatisfies Definition 6.3, and therefore yields approximation rates of order N−s.

Remark 6.12. In [59], it has been shown that the requirement s∗ ≤ γ−td − 1

2 can be relaxed to s∗ > m−td .

Since 0 < s < m−td is the range for the best N -term approximation with the rate N−s, cf. (5.18), the

adaptive scheme with s∗ > m−td does not limit convergence rates, prescribed by the (Besov) regularity of

the solution.

Having an s∗-compressible matrix S at hand, we shall work with the following version of APPLY from[58]:

Algorithm 6.13 (APPLYε[S,v] → wε).

• Set q := �log(2(#supp(v))1/2‖v‖�2‖S‖L(�2)/ε

)�, where by �x� we denote the smallest integer larger

or equal x.

• Regroup the elements of v into the sets V0, . . . , Vq as in Algorithm 6.4.

• Find the smallest integer l ≥ 0 with

‖S‖L(�2)

∥∥∥v −l∑

k=0

v[k]

∥∥∥�2≤ ε/2. (6.25)

where for k = 0, . . . , l − 1 the vectors v[k] are generated by subsequently extracting 1 element fork = 0 or 2k−1 elements for 0 < k < l respectively from the set sequence Vi, 0 ≤ i ≤ q. For k = l

extract subsequently less or equal 2k − �2k−1� elements until v[l] satisfies (6.25).

• Compute the smallest j ≥ l such that

l∑k=0

Cj−k‖v[k]‖�2 ≤ ε/2. (6.26)

• For 0 ≤ k ≤ l, compute the columns of the matrices Sj−k, defined by the positions of the elementsof the vectors v[k] and calculate

wε :=l∑

k=0

Sj−kv[k]. (6.27)

Remark 6.14. For the proof of the estimate (6.21) as well as of the order optimal computational opti-mality of the routine APPLY for 0 < s ≤ s∗ see [58, Prop. 3.8].

The Main Algorithm

Based on Richardson iteration (6.8), used to approximate the solution f of a well-posed �2-problem (6.3),and the building blocks COARSE, RHS and APPLY, a convergent frame-based approximation of(6.8) for the well-posed case can be formulated, which essentially consists of approximate iteration step(6.11). We recall a possible realization of the adaptive scheme from [58].

54

Algorithm 6.15 (SOLVEε[S, g] → fε). Let ε > 0 be given. Let θ < 1/3 and K ∈ N be fixed such that

3q(a,S)K < θ. (6.28)

Set i := 0, v(0,0) := 0,

ε0 := ‖(S|R(S))−1‖L(�2)‖g‖�2 . (6.29)

While εi > ε

εi = 3q(a,S)K

θ εi−1

εAi := εRi := θεi

6aK

εCi := (1− θ)εiFor j = 1, . . . ,K

v(i,j) := v(i,j−1) − a(APPLYi[Sv(i,j−1)]−RHSi[g])endv(i+1,0) := COARSEi[v(i,K)]i := i+ 1

endfε := v(i,0).

Note that although the exact iterates of (6.8) stay in R(S) for appropriately chosen initial value f0,the output of the iteration (6.11) need not be contained in R(S) due to the perturbations made by theadaptive approximation. Therefore, convergence of SOLVE can be expected only on R(S). In order toget convergence on �2, a modified routine modSOLVE, containing an additional projector on R(S) canbe developed, see [58] for details.

Theorem 6.16 ([51, Theorem 5.4]). In the situation of Proposition 6.1, let f ∈ �2 be a solution of(6.3), so that f = F ∗D−tf is the unique solution of the well-posed problem (6.1). Then SOLVEε[S, g]produces finitely supported vectors v(i,j) with

‖Q(f − v(i+1,0))‖�2 ≤ εi, i ≥ 0. (6.30)

In particular, one has‖f − F ∗D−tfε‖H ≤ ‖F ∗‖L(�2,Dt ,H)ε. (6.31)

For the analysis of complexity we recall the following result from [58]:

Theorem 6.17 ([58, Theorem 3.12]). For some s∗ > 0 assume that S is s∗-compressible, g is s∗-optimal,and that for some s ∈ (0, s∗), τ = (s + 1/2)−1, the system Sf = g has a solution f ∈ �wτ . Moreover,assume that there exists an s ∈ (s, s∗) such that for τ = (s + 1/2)−1, the operator Q is bounded on �wτ .If the parameter K in SOLVE is sufficiently large, so that

3q(a,S)K < θmin {1, (C1(τ)C2(τ)|(I −Q)|L(�wτ ))s/(s−s)}, (6.32)

where C1(·) and C2(·) are as defined in (5.15) and (6.18), respectively, then for all ε > 0, the routineSOLVE works with order optimal computational compexity, i.e. the output fε has at most #supp(fε) �ε−1/s|f |1/s�wτ

nontrivial entries and the number of arithmetic operations needed to compute fε stays boundedby at most a multiple of #supp(fε).

Remark 6.18 (The requirement f ∈ �wτ ). If D−tΨ is a Riesz basis in H, then regularity estimates ofthe form f ∈ Bsd+t

τ (Lτ (Ω)) with τ = (s + 1/2)−1 are equivalent to f = 〈f,DtΨ〉T ∈ �τ , where DtΨ isthe dual Riesz basis in H′. As a consequence, we get f ∈ �wτ without further assumptions on Ψ.

55

In case of frames, the decay of the expansion coefficients 〈f, ψλ〉 of f with respect to some (not necessarycanonical) dual frame Ψ will be sufficiently fast, if Ψ is smooth and has an appropriate number of vanishingmoments.

56

Part IV

Adaptive Regularization Methods

57

Chapter 7

Approximate Filter Based Methods

This chapter is concerned with approximate regularization methods, which can be obtained by the point-wise approximation of the classical regularization methods. Let Rα be a regularization of the generalizedinverse L† : Y → X. Let the mapping Rα,ε := Rα,ε(gδ) define a pointwise approximation on Rα, s.t. forany ε > 0, gδ ∈ Y holds

‖Rα,ε(gδ)gδ −Rαgδ‖ ≤ ε. (7.1)

Note that the mapping Rα,ε is nonlinear, since it depends on the space variable gδ. The aim of approxi-mate regularization methods is to find a pair of parameters (α, ε) depending on the noise level δ and onthe measured data gδ such that

α(δ, gδ) → 0 and ε(δ, gδ) → 0 as δ → 0, (7.2)

and

Rα,ε(gδ)gδ → L†g as δ → 0 for all g ∈ D(L†). (7.3)

Furthermore, the order optimality concept can be adapted to the parameter pair (α, ε) in order to achievethe same order of convergence as can be obtained by the order optimal regularization Rα.In Section 7.1 we apply the concept of approximate regularization to the general filter based methodsand derive order optimal a-priori parameter choice rules based on the μ-notion and in Hilbert scales.In Section 7.2 we derive an approximate version of the Morozov’s discrepancy principle and point out thegap between the approximation of the discrepancy and of the total error, which makes impossible theorder optimal choice of the accuracy ε. Moreover, if fδα,ε is a sparse approximation of fδα, the discrepancyterm ‖Lfδα,ε − gδ‖ still requires full application of the operator L. A further sparse approximation of thediscrepancy term might cause additional error terms which have also to be considered. However, as wewill see in Chapter 9, Morozov’s discrepancy principle can still be applied to the approximate Landweberiteration in order to achieve a regularization result.In Section 7.3 we suggest an alternative a-posteriori parameter choice method, based on the balancingprinciple, cf. Section 4.1.3. Unlike the Morozov’s discrepancy principle, the approximate balancingprinciple yields order optimal estimates of the total error up to the qualification μ0, which is of importancefor the methods with finite qualification, e.g. for Tikhonov regularization. Another advantage of theapproximate balancing principle is that it is based on the calculation of the distances ‖fδαj ,ε − fδαj−1,ε‖between two subsequent regularizations. These distances are easy to implement and do not need anyfurther approximation once the terms fδαj ,ε are sparse. In Chapter 8 we apply the approximate balancingprinciple in order to develop an order optimal adaptive Tikhonov regularization.

59

7.1 Approximate Filter Based Regularization

Let L : X → Y be a linear bounded operator between some Hilbert spaces X and Y . In order to solvethe ill-posed problem

Lf = g, (7.4)

given some measurement gδ of g with

‖gδ − g‖ ≤ δ, (7.5)

we suggest an approximate realization of the standard filter based regularization method Rα, defined by

Rαgδ := Fα(L∗L)L∗gδ, (7.6)

and alternatively, of the filter based regularization in Hilbert scales Rα, given as

Rαgδ := Fα(B−sL∗L)B−sL∗gδ, (7.7)

cf. Section 4.2 for details.Generally, we will show that a regularization of (7.4)-(7.5) can be given by an approximation Rα,ε of Rα,satisfying the estimate

‖Rα,ε(gδ)gδ −Rαgδ‖ ≤ ε (7.8)

with some appropriately chosen accuracy ε > 0.

Theorem 7.1. Let fδα := Rαgδ be given by the method (7.6), such that the requirements of Proposition

4.3 on the filter function Fα are fulfilled. Let f† satisfy the μ-source condition (3.13)-(3.14) for someμ, ρ > 0.Let fδα,ε := Rα,ε(gδ)gδ be an approximation to fδα with some accuracy ε > 0, i.e. (7.8) holds.If α = α(δ) is chosen a-priori as in (4.9) and ε = ε(δ) satisfies

ε ∼ δ2μ

2μ+1 ρ1

2μ+1 , (7.9)

then Rα,ε is an order optimal regularization of the ill-posed problem (7.4)-(7.5).

Proof. Using the result of Proposition 4.3 and the assumptions (7.8),(7.9), we can estimate the total erroras follows:

‖fδα,ε − f†‖ ≤ ‖fδα,ε − fδα‖+ ‖fδα − f†‖ � δ2μ

2μ+1 ρ1

2μ+1 . (7.10)

An analogous result to the stated in Theorem 7.1 can be obtained for the filter based regularization inHilbert space.

Theorem 7.2. Let the operator Rα, defined in (7.7) fulfill the requirements of Proposition 4.16 and letRα,ε be an approximation of Rα as in (7.8). Moreover, let f†, L satisfy the assumptions (4.30) and(4.31) with some p, a ≥ 0, ρ > 0.If ε = ε(δ) is chosen a-priori, such that

ε ∼ ρa+rp+a δ

p−rp+a , (7.11)

then Rα,ε defines an order optimal regularization in Hilbert scales.

Proof. For the proof we use the order optimality of the method (7.7) and the estimates (7.8),(7.11).

60

7.1.1 Approximate Wavelet Based Regularization

Let Ω ⊂ Rd be some bounded domain, Ψ = {ψλ}λ∈∇, Ψ = {ψλ}λ∈∇ be a frame for L2(Ω) as in Sec-

tion 4.2.1.

Let the Hilbert scale Xr on �2(∇) be generated by B := D2, D defined in (1.38) with norm ‖ · ‖r :=‖Dr · ‖�2 .Recall that in Section 4.2.1 we obtained the equivalence (4.41) between the Sobolev norm ‖f‖Hr(Ω) andthe Hilbert scales norm ‖f‖r of the frame representation f of f .

Consider again an equivalent wavelet based formulation

Af = g, ‖gδ − g‖ ≤ δ, (7.12)

of the ill-posed equation (7.4) with A = LF ∗ : �2 → Y and generalized solution f †.

Theorem 7.3. Let Rα be a filter based regularization (4.43) in Hilbert scales with some filter functionFα of the wavelet formulation (7.12).

Let the filter function Fα fulfill the assumptions of Proposition 4.16 for some β0 > 0.

Let f† satisfy the Sobolev smoothness condition

‖f†‖Hp(Ω) < ρ (7.13)

for some p, ρ > 0, and let the operator L be continuous in the following sense:

‖f‖H−a(Ω) ∼ ‖Lf‖Y . (7.14)

Suppose that for any accuracy ε > 0 there exists an approximation f δα,ε of f δα with respect to the r-normsuch that

‖f δα,ε − f δα‖r ≤ ε. (7.15)

Let α = α(δ) and ε = ε(δ) be chosen a-priori as in (4.35) and (7.11), respectively.

Then for p ≤ 2s + a the method Rα,ε given by Rα,ε(gδ)gδ := fδα,ε := F ∗f δα,ε defines an order optimalapproximate regularization approach to (7.4)-(7.5), i.e.

‖fδα,ε − f†‖Hr(Ω) � ρa+rp+a δ

p−rp+a for all r ∈ [max {p− 2β0(s+ a),−a}, p]. (7.16)

Proof. The assertion follows from Theorem 4.18 and the definition of the approximate regularization interms of wavelets (7.15),(7.11). In fact, with the triangle inequality, we yield

‖fδα,ε − f†‖Hr(Ω) ≤ ‖fδα,ε − fδα‖Hr(Ω) + ‖fδα − f†‖Hr(Ω)

(4.41)∼ ‖f δα,ε − f δα‖r + ‖fδα − f†‖Hr(Ω)

(7.15)

≤ ε+ ‖f δα − f †‖Hr(Ω)

(7.11),(4.46)

� ρa+rp+a δ

p−rp+a . (7.17)

61

7.2 Approximate Morozov’s Discrepancy Principle

In the current section we formulate a modified version of the Morozov’s discrepancy principle for theapproximate filter based regularization (7.8).Recall that Morozov’s discrepancy principle defines the optimal value of the regularization parameter αMfor some regularization method Rα as a solution of the following equation:

‖Lfδα − gδ‖ = τδ, τ > 1. (7.18)

If only an approximation fδα,ε on fδα is known with ‖fδα,ε − fδα‖ ≤ ε, the exact discrepancy ‖Lfδα − gδ‖can be estimated by the value of the approximate discrepancy ‖Lfδα,ε − gδ‖ as follows:

∣∣∣‖Lfδα − gδ‖ − ‖Lfδα,ε − gδ‖∣∣∣ ≤ ‖Lfδα − Lfδα,ε‖ ≤ ‖L‖ε. (7.19)

In order to use the order optimality of the exact filter based method Rα combined with Morozov’sdiscrepancy principle (7.18), we have to assume ε � δ.

Theorem 7.4. Let

ε∗ ≤ cδ with 0 < c <τ

2‖L‖ . (7.20)

Let the discrepancy principle for Rα,ε be defined as follows:

α∗ := max {α > 0 : (τ − c‖L‖)δ ≤ ‖Lfδα,ε∗ − gδ‖ ≤ (τ + c‖L‖)δ}. (7.21)

Then for the method Rα∗,ε∗ we obtain the order optimal convergence of the total error, i.e.

‖fδα∗,ε∗ − f†‖ � ρ

12μ+1 δ

2μ2μ+1 , 0 < μ ≤ μ0 − 1/2, (7.22)

where μ0 is the qualification of the method Rα. Here we implicitly assume that ‖f†‖μ ≤ ρ for someμ > 0, ρ > 1.

Proof. From the definition (7.18) of αM , the estimate (7.19) and the requirement (7.20) we infer thatαM satisfies

‖LfδαM ,ε∗ − gδ‖ ≤ ‖LfδαM

− gδ‖+ ‖L‖ε∗δ ≤ (τ + c‖L‖), (7.23)

analogously

‖LfδαM ,ε∗ − gδ‖ ≥ ‖LfδαM

− gδ‖ − ‖L‖ε∗ ≤ (τ − c‖L‖)δ. (7.24)

From the definition (7.37) of α∗ follows then αM ≤ α∗.By the same argument, we can show that α∗ satisfies the inequality

(τ − 2c‖L‖)δ ≤ ‖Lfδα − gδ‖ ≤ (τ + 2c‖L‖)δ. (7.25)

Taking the maximum over all α in (7.25), we get α∗∗, which also yields order optimal regularizationRα∗∗ . By definition, α∗∗ is greater or equal α∗. It follows that, due to the fact that αM ≤ α∗ ≤ α∗∗, theregularization Rα∗ is order optimal as well.The last statement can be shown by means of the μ-notion. As we know, if f† satisfies the sourcecondition ‖f†‖μ ≤ ρ for some μ, ρ > 0, then the order optimal parameter behave as α ∼ (δ/ρ)

22μ+1 . Now

with αM , α∗∗ ∼ (δ/ρ)2

2μ+1 we obtain the same behavior for α∗.For the total error we get by the triangle inequality

62

‖fδα∗,ε∗ − f†‖ ≤ ‖fδα∗,ε∗ − f

δα∗‖+ ‖fδα∗ − f

†‖(7.8)

≤ ε∗ + ‖fδα∗ − f†‖

(7.20)

≤ cδ + c1ρ1

2μ+1 δ2μ

2μ+1 (7.26)0<δ<1,ρ>1

� ρ1

2μ+1 δ2μ

2μ+1 .

Remark 7.5. As we can see from (7.26), the estimates of the approximate term ‖fδα∗,ε∗ − fδα∗‖ and ofthe classical total error ‖fδα∗ − f†‖ are of different order with respect to the noise level δ. In general, itwould be sufficient to approximate fδα∗ by fδα∗,ε with accuracy ε ∼ ρ

12μ+1 δ

2μ2μ+1 , as it has been done by the

a-priori choice (7.9). An approximation with higher accuracy ε ∼ δ, as proposed in Theorem 7.4, meansmore computational effort and is therefore not optimal.

In the next section we suggest an alternative a-posteriori parameter choice method for the approximateregularization Rα,ε, based on the application of the balancing principle, cf. Section 4.1.3.

7.3 Order Optimality by Approximate Balancing Principle

Let Rα : Y → X be a regularization of the generalized inverse L†. Suppose that for any ε > 0 and gδ ∈ Ythere exists a pointwise approximation Rα,ε(gδ) : Y → X of Rα satisfying (7.8).The main idea of the parameter identification by using the balancing principle is to find the intersectionpoint αopt of the estimates ω(α) and δξ(α) of the approximation error ‖fα − f†‖ and of the data error‖fδα − fα‖, respectively:

ω(αopt) = δξ(αopt). (7.27)

Obviously, αopt is the maximum value satisfying

ω(α) ≤ δξ(α). (7.28)

From the implicit definition (7.27) one immediately obtains(ωξ

)(αopt) = δ, and hence

αopt =(ωξ

)−1(δ). (7.29)

From the definition (7.27) of αopt we infer the following estimate of the total error for the regularizationRαopt

:

‖fδαopt− f†‖ ≤ 2ω(αopt) = 2ω

((ωξ

)−1(δ)). (7.30)

In this context, a choice α is called order optimal, if the regularization Rα yields following convergencerates:

‖fδα − f†‖ � ω((ωξ

)−1(δ)). (7.31)

Concerning the approximate regularization Rα,ε, our aim is to specify an a-posteriori parameter choicefor the parameter pair (α, ε), providing the same order of convergence, i.e.

‖fδα,ε − f†‖ � ω((ωξ

)−1(δ)). (7.32)

63

Following [49], we look for an approximation of αopt from the discrete set ℵ, defined for some α0 > 0,q > 1, N ∈ N as follows:

ℵ := {αi : αi = α0 · qi, i = 0, . . . , N}. (7.33)

In what follows we will focus on the a-posteriori parameter choice rule (4.26) for the regularization Rα,which identifies an order optimal parameter choice α++ by the following rule:

α++ := max {αi ∈ ℵ : ‖fδαj− fδαj−1

‖ ≤ 4δξ(αj−1), j = 1, . . . , i}, (7.34)

for the motivation cf. Section 4.1.3. All the results of this section can be proven analogously for theparameter choice rule formulated in (4.24).The order optimality of α++ in the sense of the implicit definition (7.31) has been shown in [49], see alsoProposition 4.10.In what follows, we formulate the approximate balancing principle for the parameter pair (α, ε), and showthe order optimality of the resulting approximate regularization Rα,ε.

Theorem 7.6 (Approximate Balancing Principle). Let Rα,ε be an approximation on Rα satisfying (7.8).Let ω and ξ be positive on R+, monotone continuous estimators from (4.14) and (4.15), respectively.Moreover, let ξ satisfy the strong Δ2-condition (4.25) for some κ1 > κ2 > 1.If the approximation accuracy ε = ε(δ, α) is chosen such that

ε(δ, α) = δξ(α), (7.35)

and α = αε(δ) is given by the approximate balancing principle as follows:

αε(δ) := max0≤i≤N

{αi : ‖fδαj ,ε − f

δαj−1,ε‖ ≤ 6δξ(αj−1), j = 1, . . . , i

}, (7.36)

then Rαε,ε is an order optimal regularization in the sense of Definition 4.8, i.e. it fulfills the estimate(7.32).

Proof. Let α∗ be the maximal value from the discrete set ℵ, satisfying (7.28), i.e.

α∗ := max {α ∈ ℵ : ω(α) ≤ δξ(α)}. (7.37)

Note that, for αi ≤ α++ with triangle inequality follows

‖fδαi,ε − fδαi−1,ε‖ ≤ ‖fδαi,ε − f

δαi‖+ ‖fδαi

− fδαi−1‖+ ‖fδαi−1

− fδαi−1,ε‖(7.8),(7.34)

≤ ε(αi, δ) + 4δξ(αi−1) + ε(αi−1, δ)(7.35)

≤ δξ(αi) + 4δξ(αi−1) + δξ(αi−1)

Due to the monotony of ξ, i.e. ξ(αi) ≤ ξ(αi−1) we get for αi ≤ α++

‖fδαi,ε − fδαi−1,ε‖ ≤ 6δξ(αi−1), (7.38)

so that, by the definition (7.36) of αε holds α++ ≤ αε.On the other hand, in the proofs of [49, Thm. 1.1, Thm. 1.2] it has been shown that α∗ ≤ α++.Introducing the notation αl := αε, αk := α∗, consider

64

‖fδαε,ε − f†‖ ≤ ‖fδα∗,ε − f

†‖+l∑

j=k+1

‖fδαj ,ε − fδαj−1,ε‖

(7.36)

≤ ‖fδα∗,ε − fδα∗‖+ ‖fδα∗ − fα∗‖+ ‖fα∗ − f†‖+

l∑j=k+1

6δξ(αj−1)

(7.8),(4.14),(4.15)

≤ ε(α∗, δ) + δξ(α∗) + ω(α∗) +l−k−1∑i=0

6δξ(α∗qi)

(7.35),(7.37)

≤ 3δξ(α∗) +l−k−1∑i=0

6δξ(α∗qi) (7.39)

Since q > 1, there exists m ∈ N0 with 2m ≤ q ≤ 2m+1. From the successive application of (4.25) oneobtains

ξ(qiα∗) ≤ ξ(2imα∗) ≤1κim2

ξ(α∗) ≤κ2

κlog2(q

i)2

ξ(α∗) (7.40)

and

ξ(αi+1) = ξ(qαi) > ξ(2m+1αi) ≥1

κm+11

ξ(αi) ≥1

κlog2q+11

G(αi), (7.41)

Using the fact that κlog2 q2 > 1 we infer from (7.39)

‖fδαε,ε − f†‖

(7.40)

≤ 3δξ(α∗) + 6κ2δξ(α∗)l−k−1∑i=0

( 1

κlog2 q2

)i

≤ δξ(α∗)[3 +

6κ1+log2 q2

κlog2 q2 − 1

]= c1δξ(αk)

(7.41)

≤ c1

√κ

log2 q+11 δξ(αk+1)

(7.37)

≤ c1

√κ

log2 q+11 δξ(αopt)

(7.28)= c2ω(αopt)

(7.29)= c2ω

((ωξ

)−1δ), (7.42)

where c1 := 3 + 6κ1+log2 q2

κlog2 q2 −1

, c2 := c1

√κ

log2 q+11 .

65

66

Chapter 8

Adaptive Tikhonov Regularization

This chapter is concerned with the adaptive approximation of the Tikhonov regularization method, cf.Section 4.3, for the approximate solution of the linear ill-posed problem

Lf = g, ‖gδ − g‖ ≤ δ. (8.1)

The numerical realization of the proposed adaptive Tikhonov method is based on the adaptive solver forthe well-posed problems, cf. Chapter 6, for some recent works in this area see [10, 14, 15, 51, 58]. As wehave seen in Chapter 5, adaptive approximation performs a kind of thresholding of the frame coefficientsof the approximating functions, in order to get rid off the non-significant components of the signal. Ofcourse such an approximation makes sense only if the solution f† of the ill-posed equation (8.1) is sparseor can be good approximated by sparse functions. In this context, adaptivity is closely related withsparsity.A prominent example of sparse regularization has been obtained in [21], where the regularization of theleast squares functional is made by adding a weighted �p-norm of the wavelet coefficients in some basisΦ = {φλ}:

minf‖Lf − gδ‖2 + α

∑λ

wλ|〈f, φλ〉|p. (8.2)

Following [21], in the case 1 ≤ p ≤ 2 the solution of the minimization problem (8.2) can be approximatedby the soft shrinkage iteration

fn+1 =∑λ

Sαwλ,p[(fn − L∗(Lfn − gδ))λ]φλ, (8.3)

with the soft shrinkage operator

Sαw,p(x) =

⎧⎨⎩

sign(x)[|x| − αw2 ]+ p = 1

G−1αw,p(x) 1 < p ≤ 2

(8.4)

and Gαw,p given by

Gαw,p(x) = x+αwp

2sign(x)|x|p−1. (8.5)

The iterative soft shrinkage (8.3) promotes sparsity, especially in the case p = 1, since all components xwith |x| < αw

2 are set to zero. In case p = 2, the soft shrinkage operator does not perform any thresholdingat all, but only a rescaling of the coefficients, as we can see from (8.5).

67

However, the numerical realization of the soft shrinkage iteration (8.3) requires calculation of the expres-sion fn −L∗(Lfn − gδ), which is in general very expensive, if the operator L does not possesses a sparserepresentation.In [5], an adaptive approximation of the soft shrinkage iteration (8.3) has been analyzed, with the aimto prove the strong convergence of the adaptive iteration

fn+1 =∑λ

Sλw,p[(fn − [L∗([Lfn]h − gδ)]h∗)λ

]φλ, (8.6)

where [Lf ]h, [L∗f∗]h∗ are approximations on Lf and L∗f∗ with accuracies h and h∗, respectively. Thestrong convergence of (8.6) has been shown to hold for h, h∗ < δ. It means that for the small valuesof δ the approximating accuracies h, h∗ have initially to be very small, even in the first iteration steps.In our approach we will derive an adaptive scheme with variable approximation accuracies, which al-low to approximate the coarse components of the solution at the begin of iteration, and to refine theapproximation by adding the finer components at further steps.The adaptive scheme developed in this chapter relies on the Tikhonov regularization, which correspondsto the minimization problem (8.2) with p = 2 and wλ = 1. The corresponding normal equation

(L∗L+ αI)f = L∗gδ (8.7)

is evidently well-posed, so that we can apply the adaptive solver from Chapter 6. As we will see inChapter 8, the adaptive solver approximates the solution fδα of the normal equation (8.7) iteratively by

fδ,n+1α = APPROXεn

[fδ,nα − αL∗(Lfδ,nα − gδ)], (8.8)

where the notation APPROXε[f ] means that ‖APPROXε[f ]− f‖ ≤ ε. The abbreviation APPROXencodes the adaptive building blocks APPLY for the operator application L∗Lfδ,nα , RHS for the adaptiveapproximation of the right hand side L∗gδ and COARSE for the adaptive thresholding of the resultingiteration step, cf. Chapter 6 for details. The approximation accuracies εn in the well-posed schemebuild a decreasing geometric sequence, which better fits the idea of the subsequent requirement than theapproximation with a constant accuracy as in the adaptive approach (8.6). Due to the well-posedness ofthe regularized normal equation (8.7), the solution fδα can be approximated by fδ,nα with any arbitraryaccuracy ε. Since εn → 0, the iteration stops when εn reaches the required value ε > 0, and thereforeperfectly fits the setting of Chapter 7 about approximate filter based regularization.The parameter verification question follows the setting of approximate regularization. In Section 8.2we specify the a-priori parameter choice rules for the adaptive choice of the parameter pair (α, ε) withrespect to the μ-notion and in Hilbert scales.For the well-posed problems, satisfying specific quasi-sparsity requirements, the adaptive scheme (8.8)performs with order optimal computational complexity, as shown in [58]. It means that for the recon-struction with some approximation error ε the number of needed computational steps is at most a fixedmultiple of the number N , defined by the best N -term approximation with the accuracy ε. However, incase of Tikhonov regularization we deal with a family of problems (8.7), which depend on the regulariza-tion parameter α. Due to the presence of noise, it is in general not possible to approximate the solutionwith the arbitrary accuracy ε. To insure the regularization property of Rα,ε in (8.8), both parameters αand ε have to decrease with decreasing noise level δ. It means that the number of iteration steps neededto achieve the accuracy ε increases with δ → 0. In Section 8.3 we estimate the number of the iterationsteps required for the order optimal adaptive regularization.In Section 8.4 we essentially apply the results of Section 7.3 about the balancing principle for theapproximate filter based regularization.

68

8.1 Adaptive Tikhonov Regularization: Formulation

Due to the well-posedness of the operator (L∗L+αI) for α > 0 we can solve the operator equation (8.7)using the adaptive scheme from Chapter 6.First we recall the frame-based formulation of (8.7):

Sαf = gδ, (8.9)

where Sα = F (L∗L + αI)F ∗ is the operator matrix of (L∗L + αI) in terms of a frame Ψ of X, f is aframe representation of f = F ∗f , and gδ = FL∗gδ is a discretized right hand side.Let SOLVEε be an adaptive routine for solving well-posed problems defined by Algorithm 6.15, then

‖f δα,ε − f δα‖ ≤ ε, (8.10)

where f δα is a solution of (8.9) and

f δα,ε := SOLVEε[Sα, gδ] (8.11)

is an approximation of f δα by the adaptive routine SOLVE.

Definition 8.1. We call the function fδα,ε with the frame representation (8.11) adaptive Tikhonov ap-proximation of the (generalized) solution f† of the ill-posed problem (3.1).Accordingly, the method Rα,ε defined as

Rα,ε(gδ)gδ := fδα,ε := F ∗SOLVEε[Sα, gδ] (8.12)

is called adaptive Tikhonov method.

8.1.1 Regularization Result

Theorem 8.2. Let the regularization parameter α(δ) satisfy the requirements of Proposition 4.19, i.e.

α(δ) → 0 andδ2

α(δ)→ 0, for δ → 0. (8.13)

Moreover, let the final accuracy ε = ε(δ) be chosen so that

ε→ 0 as δ → 0. (8.14)

Then, the adaptive Tikhonov method Rα,ε, defined in (8.12) is a regularization of L†, i.e.

Rα,ε(gδ)gδ → L†g, as δ → 0. (8.15)

Proof. Obviously, the total approximation error ‖Rα,ε(gδ)gδ − L†g‖ can be estimated as follows:

‖Rα,ε(gδ)gδ − L†g‖ ≤ ‖Rα,ε(gδ)gδ −Rαgδ‖+ ‖Rαgδ − L†g‖. (8.16)

The adaptive approximation error ‖Rα,ε(gδ)gδ − Rαgδ‖ tends to zero for ε → 0 due to the definition

of SOLVE, cf. (8.10)-(8.11). The second part ‖Rαgδ − L†g‖ is the regularization error made by thestandard Tikhonov regularization Rα, and it tends to zero due to Proposition 4.19.

69

8.2 Order Optimality by A-priori Parameter Choice

8.2.1 Standard Filter Based Method

Theorem 8.3. Let f† satisfy the source condition (3.13)-(3.14) for μ ∈ (0, 1], ρ > 0. Then the adaptiveTikhonov method Rα,ε (8.12) is an order optimal regularization for μ ∈ (0, 1] with α = α(δ) chosena-priori as in (4.57) and ε = ε(δ) chosen as

ε ∼ δ2μ

2μ+1 ρ1

2μ+1 , (8.17)

i.e.

‖f δα,ε − f †‖ � δ2μ

2μ+1 ρ1

2μ+1 . (8.18)

Proof. The order optimality property of the adaptive Tikhonov method (8.12) follows from Theorem 7.1.

8.2.2 A-priori Error Estimate in Hilbert scales

Let A = LF ∗. Applied to the discretized problem

Af = g, ‖g − gδ‖ ≤ δ, (8.19)

an alternative to the adaptive Tikhonov regularization (8.12) is given by the adaptive approximation ofthe solution f δα of the Tikhonov regularization problem in Hilbert scales

(A∗A+ αD2s)f = A∗gδ, (8.20)

cf. (4.58).Since for s > 0 the operator D2s is unbounded, the whole operator A∗A + αD2s is unbounded too.Therefore, we restrict our consideration to the case s = 0, i.e. we get the standard Tikhonov regulariza-tion, discretized in terms of wavelets (8.9). Nevertheless, the current approach allows us to handle theSobolev type source condition, which can be useful in practical realizations.We define the adaptive approximation fδα,ε as in (8.12).As we know from Section 4.3.5, the Tikhonov filter function Fα satisfies requirements of Proposition 4.16with β0 = 1. Therefore, we can apply Theorem 7.3 for the adaptive Tikhonov method Rα,ε defined in(8.12).

Theorem 8.4. Let Fα be the Tikhonov filter function (4.54). Let f† satisfy the Sobolev type sourcecondition ‖f†‖Hp(Ω) ≤ ρ for some p, ρ > 0. Let the operator L satisfy the Sobolev continuity assumption‖f‖H−a(Ω) ∼ ‖Lf‖Y for some a > 0.Then, for p ≤ a, the adaptive Tikhonov approach Rα,ε is an order optimal regularization of the ill-posedproblem (7.4)-(7.5), if α = α(δ) and ε = ε(δ) are chosen a-priori so that

α ∼( δρ

)2a/(p+a)

(8.21)

and

ε ∼ ρa+tp+a δ

p−tp+a , (8.22)

i.e. it holds the error estimate

‖fδ(α,ε) − f†‖Ht(Ω) � ρa+tp+a δ

p−tp+a for all t ∈ [max {p− 2a,−a}, p]. (8.23)

70

Proof. The assertion follows directly from Theorem 7.3 for s = 0 and β0 = 1.

Remark 8.5. For the regularization in L2(Ω), we get with r = 0 the convergence order O(δp

p+a ). Com-pared with the order optimal rates δ

2μ2μ+1 in terms of the μ-norm, we yield that the two methods coincide

if 2μ = p/a. The requirement p ≤ 2a is then equivalent to the requirement μ ≤ 1, i.e. the qualification ofthe Tikhonov method in Hilbert scales can be described by p/a ≤ 2.

8.3 Complexity of Adaptive Tikhonov Regularization

The number of iterations, needed to achieve an order optimal ε as in (8.17), can be estimated by thefollowing

Theorem 8.6. Let Rα,ε, α, ε be as in Theorem 8.3. Let K and n be the number of the inner andouter loop steps of the routine SOLVE, defined by Algorithm 6.15, respectively. Then, to satisfy therequirement 3q(a,Sα)K < θ, cf. (6.28), the number K should be at least of the order

K � α−1 as α→ 0. (8.24)

Moreover, the minimal number of the outer steps nmin, needed to achieve the optimal accuracy ε from(8.17), is bounded by

nmin � log(α−1) as α→ 0, (8.25)

supposed that q(a,Sα) in Algorithm 6.15 is set equal to qmin(Sα), defined as in (6.10).The total number (nmin ·K) of inner iteration steps needed to get an order optimal regularization scheme(8.12) is of the order

nminK ∼ α−1 log(α−1) as α→ 0. (8.26)

Proof. First, we show the estimate (8.24). Note that from (6.10) follows that

qmin =λmax − λmin

λmax + λmin=‖Sα‖ − cα‖Sα‖+ cα

= 1− 2cα‖Sα‖+ cα

, (8.27)

with a constant c := c(Ψ) > 0, which does not depend on α.Then, the condition 3q(a,Sα)K < θ implies

3qKmin < θ, (8.28)

whereas with 0 < qmin < 1 follows

K > logqmin

θ

3=

log θ3

log(qmin)=

log θ3

log(1− 2cα

‖Sα‖+cα) . (8.29)

Since log(1− x) ∼ −x, as x→ 0, we get from (8.29)

K � 12cα

‖Sα‖+cα∼ 1α, as α→ 0. (8.30)

Now, consider the number n of outer iteration steps of SOLVE.From Algorithm 6.15 we get that the stopping index n is connected with the stopping accuracy εn asfollows:

εn :=3q(a,Sα)K

θεn−1 =

(3q(a,Sα)K

θ

)nε0. (8.31)

71

where for ε0 defined in (6.29) holds

ε0 := ‖(Sα|R(Sα))−1‖‖gδ‖ =

1λmin

‖gδ‖ ∼ 1α. (8.32)

From (8.31) we get that n is minimal for q(a,Sα) = qmin(Sα) =: qmin, so that

nmin := log 3qKminθ

(εnmin

ε0

)=

log( εnminε0

)

log(3qKminθ )

. (8.33)

To get order optimality, nmin should be chosen such that εnmin ∼ δ√α, with α = α(δ) satisfying (4.57).

Using (8.32) we get

εnmin

ε0∼ δ√

αα = δ

√α ∼ α

2μ+12 + 1

2 = αμ+1. (8.34)

We get from (8.33) and (8.30) that

nmin ∼logαμ+1

K log qmin� logαμ+1

1α log

(1− 2cα

‖Sα‖+cα) ∼ α(μ+ 1) logα

− 2cα‖Sα‖+cα

∼ − logα = log(α−1). (8.35)

Analogously, we get for nminK

nminK ∼ logαμ+1

log qmin∼ logαμ+1

log(1− 2cα

‖Sα‖+cα) ∼ (μ+ 1) logα

− 2cα‖Sα‖+cα

∼ α−1 log(α−1). (8.36)

8.4 Approximate Application of Balancing Principle

Theorem 8.7. Let Rα,ε be an adaptive Tikhonov approximation (8.12). Let ε = ε(α, δ) be chosenaccording to

ε(α, δ) =δ√α, (8.37)

and let α = α(δ, gδ) satisfy the following balancing principle:

α := max {αi ∈ ℵ : ‖fαj ,εj − fαj−1,εj−1‖ ≤6δ

√αj−1

, j = 1, . . . , i}. (8.38)

Then, the adaptive Tikhonov method Rα,ε is order optimal regularization in the sense of Definition 4.8.

Proof. We use the fact that the regularization error ‖fα − f†‖ in Tikhonov case can be estimated byδξ(α) with ξ(α) = 1√

α. Such a function ξ satisfies the κ-assumption (4.25) with κ1 = κ2 =

√2. The

assertion follows from Theorem 7.6.

72

Chapter 9

Adaptive Landweber Iteration

Many iterative methods exhibit a “self-regularizing property”, in the sense that early termination has aregularizing effect. In this case the termination index plays the role of the regularization parameter, andthe stopping rule plays the role of the parameter choice method. Here we emphasize on the Landweberiteration, first introduced by L. Landweber in [37], and the appropriate parameter choice rules. Later on,we recall some results on the modified Landweber iteration, attained by R. Ramlau in [52], which providea framework for the further consideration on adaptive Landweber, cf. Section 9.2.In this chapter we combine the classical Landweber iteration, cf. Section 4.4, with the adaptive discre-tization in terms of wavelet frames, presented in Section 6.2, in order to obtain an adaptive frame-basedLandweber regularization for linear ill-posed inverse problems.In Section 9.3 we show that the adaptive Landweber iteration supplied by an appropriate adaptive a-priori parameter choice rule, fulfills the regularization property. An adaptive formulation of Morozov’sdiscrepancy principle is given in Section 9.4. The regularization proof in the a-posteriori case is basedon the monotonic behavior of the approximation error, cf. Section 4.4 for the classical and modifiedLandweber, and follows the lines of [52, 53].

9.1 Modified Landweber Iteration

In numerical realizations, continuous operator equations Lf = g are usually approximated by discreteones. Typically a discretization consists in projection onto some finite dimensional subspaces, which canbe linear or nonlinear, cf. Section 6.2, where we will present an adaptive discretization. In order to dealwith perturbations of the operator L, e.g. in the case of numerical approximation, a modification of theLandweber method was suggested in [52].Suppose that a bounded linear operator L with ‖L‖ ≤ 1 can be approximated by a family of linearbounded operators Lm such that

‖Lm − L‖ ≤ εm, (9.1)

where {εm}m≥0 is a monotonically decreasing sequence with εm → 0 as m → ∞. Then, the modifiedLandweber iteration is given by

fδk = fδk−1 + βL∗rδ(k−1)(gδ − Lrδ(k−1)fk−1), k ≥ 1. (9.2)

An a-priori parameter choice rule, which in the case of iterative methods is usually defined by truncatingthe iteration at some step k∗, can be formulated for a fixed approximation index rδ(k) = r0 as well as

73

for a variable index rδ(k). In both cases, the order optimality of (9.2) can bee shown, see [52, Satz 15,Satz 20] for details.Let f† be the generalized solution of Lf = g, with ‖f†‖ ≤ ρ, ρ > 0. Following [52], an a-posterioriparameter choice rule for the modified Landweber iteration (9.2) is formulated as

Definition 9.1 (Modified Discrepancy Rule for Modified Landweber). Let 0 < c < 1, c1 > 0, cL :=supn ‖Ln‖2, 0 < βL < min {2/‖L‖2, infn {2/‖Ln‖2}}.

• for k = 1 set r(0) = 0

• ifc‖gδ − Lrδ(k−1)fk−1‖ >

22− βLcL

(δ + ρεrδ(k−1)), (9.3)

then set rδ(k) = rδ(k − 1)

• ifc‖gδ − Lrδ(k−1)fk−1‖ ≤

22− βLcL

(δ + ρεrδ(k−1)), (9.4)

then

– ifc‖gδ − Lrδ(k−1)fk−1‖ >

22− βLcL

δ and εrδ(k−1) > c1δ, (9.5)

then if l > rδ(k − 1) with εl > c1δ and

c‖gδ − Llfk−1‖ >2

2− βLcL(δ + ρεl) (9.6)

exists, then set rδ(k) = l

– if (9.5) is not fulfilled or if l with (9.6) does not exist, set the stopping index k∗ = k

Remark 9.2. As in the classical Landweber case, the decreasing behavior of the approximation error‖fδk − f†‖ for k ≤ k∗, cf. Proposition 4.28, has been shown in [52, Satz 23].

Remark 9.3. The approximation accuracies εrδ(k), k ≤ k∗ are chosen in every iteration step k accordingto Definition (9.1). Due to the condition rδ(k) ≥ rδ(k − 1) the accuracies εrδ(k) get finer during theiteration, nevertheless the iteration works with the same accuracy εrδ(k), rδ(k) = rδ(k − 1) as long as(9.3) holds, i.e. the refinement of the approximation accuracy is made “on demand”. The conditionεl > c1δ guarantees that the next refinement index rδ(k) can be found at a finite number of steps.

Remark 9.4. The convergence and order optimality of the modified Landweber (9.2) with the a-posterioritruncation rule, given in Definition 9.1 are shown in [52, Satz 28,Satz 29].

In Chapter 9 we use the adaptive discretization from Section 6.2 for the adaptive formulation of Landwe-ber iteration, introduce in an analogous way adaptive a-priori and a-posteriori parameter choice ruleswith appropriate approximation strategies rδ(k) and assure the regularization property of the adaptivediscretization approach, presented in Section 6.2.

9.2 Adaptive Formulation of Landweber Iteration

Let L : X → Y be linear bounded operator between Hilbert spaces X,Y , and let the equation (9.48) beill-posed, with noisy data gδ, ‖g − gδ‖ ≤ δ.For any gδ ∈ D(L†) consider the �2-formulation of the normal equation with noisy data gδ

74

L∗Lf = L∗gδ (9.7)

in terms of a frame Φ for the Hilbert space X with corresponding frame operator F :

Sf = gδ, S := S(L∗L,Φ) := FL∗LF ∗, f = F ∗f , gδ := FL∗gδ. (9.8)

Note that both formulations (9.7) and (9.8) are equivalent, since f = F ∗f solves (9.7) for any solution f

of (9.8) and vice versa.Based on the discrete normal equation (9.8), we formulate a discrete version of Landweber iteration

f δm+1 = f δm − βS(Sf δm − gδ), 0 < βS < 2/‖S‖. (9.9)

Note that the classical Landweber iteration (4.61) follows from (9.9) by application of the adjoint frameoperator F ∗:

fδm+1 = F ∗f δm+1. (9.10)

We formulate an adaptive approximation of the frame based formulation of the Landweber iteration (9.9)using the same building blocks as in the well-posed case, cf. Section 6.2, i.e. the routines COARSE,RHS and APPLY for the adaptive thresholding, the adaptive approximation of the right hand side andthe adaptive operator application, respectively.Let {εCn }n≥0, {εRn }n≥0, {εAn }n≥0 be positive, strictly monotone sequences with ε

{C,R,A}n → 0 as n → ∞.

As in Section 6.2, we use the notation

COARSEn := COARSEεCn, RHSn := RHSεR

n, APPLYn := APPLYεA

n. (9.11)

Note that the operators F , L∗ and consequently the matrix S remain unchanged during the iteration,therefore we usually drop them in the following notation:

RHSn[g] := RHSn[FL∗g], (9.12)

and

APPLYn[f ] := APPLYn[S,f ]. (9.13)

Let the building blocks COARSE, APPLY and RHS be designed so that in the case δ > 0 with vδ → v

as δ → 0 follows

COARSEn[vδ] → COARSEn[v] (9.14)

APPLYn[vδ] → APPLYn[v] (9.15)

RHSn[vδ] → RHSn[v] (9.16)

for any fixed refinement index n. As we will see, the conditions (9.14)-(9.16) require some modificationof the basic routines from Section 6.2, which will be introduced in Section 9.4.

Definition 9.5. Let rδ be a refinement strategy, depending on the iteration step m. Let f0, fδ

0 be somevectors with finitely many nonzero coefficients. We define the adaptive Landweber iteration as follows:

m+1 = COARSErδ(m)[fδ

m − βS(APPLYrδ(m)[fδ

m]−RHSrδ(m)[gδ])], m ≥ 0. (9.17)

Typically, we drop the superscript δ once the exact data g are considered:

75

fm+1 = COARSEr(m)[fm − βS(APPLYr(m)[fm]−RHSr(m)[g])], m ≥ 0. (9.18)

Note that, up to the refinement strategy rδ, the adaptive Landweber iteration (9.17) coincides with theapproximate Richardson iteration (6.11) for solving well-posed problems.The refinement strategy rδ as well as an appropriate truncation index of the iteration (9.17) will bechosen by adaptive parameter choice rules, in order to assure the regularization property without affectingadaptivity.First, we formulate adaptive a-priori truncation rules.

9.3 Order Optimality by A-Priori Parameter Choice

Following Proposition 4.27, the discrete Landweber method (9.9) is an order optimal regularizationmethod for 0 < βS < 2/‖S‖, μ, ρ > 0, if the iteration is truncated at the a-priori chosen iteration index

m∗ =⌊βS

(βSμe) 2μ

2μ+1(ρδ

22μ+1 )⌋

. (9.19)

To get an auxiliary result, we consider the error between the exact version fm of the discrete Landweberiteration and its adaptive approximation fm.

Lemma 9.6. Let for some δ > 0 the iterations f δm and fδ

m be given as in (9.9) and (9.17), respectively.Then, for all m ≥ 0,

‖f δm+1 − fδ

m+1‖ ≤ (1 + βS‖S‖)‖f δm − fδ

m‖+ εCrδ(m) + βS(εArδ(m) + εRrδ(m)) . (9.20)

Proof. This result can be easily deduced, since for any m ≥ 0 one has

‖f δm+1 − f δm+1‖ ≤ ‖f δm+1 − (fδ

m − βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ])‖

+‖f δm − βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ]− (f δm − βSSf δm + βSgδ)‖

≤ εCrδ(m) + ‖f δm − f δm − βSSfδ

m + βSSf δm‖+ βS‖RHSrδ(m)[gδ]− gδ‖

+βS‖APPLYrδ(m)[fδ

m]− βSSfδ

m‖

≤ εCrδ(m) + βS(εArδ(m) + εRrδ(m)) + (1 + βS‖S‖)‖fδ

m − f δm‖ .

By a recursive application of Lemma 9.6, one obtains an explicit description of the difference.

Lemma 9.7. Assume that f δ0 = fδ

0. Then, for all m ≥ 0,

‖f δm+1 − fδ

m+1‖ ≤ βS

m∑i=0

(1 + βS‖S‖)i(εCrδ(m−i)/βS + εArδ(m−i) + εRrδ(m−i)). (9.21)

Proof. Thanks to Lemma 9.6, we have for m = 0

‖f δ1 − fδ

1‖ ≤ εCrδ(0) + βS(εArδ(0) + εRrδ(0)). (9.22)

Now by induction for m→ m+ 1, we apply (9.20) for m ≥ 1 and obtain consequently (9.21).

76

The latter considerations allow now to prove that the truncated adaptive Landweber iteration (9.17) withthe a-priori parameter choice rule (9.19) is an order optimal regularization method, supposed that therefinement map rδ is appropriately chosen.First, we formulate an adaptive a-priori parameter choice with fixed refinement index rδ.

Definition 9.8 (Adaptive a-priori parameter choice rule with fixed refinement).

i) Given error tolerances {ε{C,A,R}m } and routines COARSE, APPLY and RHS defined as above,

ii) for δ > 0 with ‖gδ − g‖ ≤ δ let the truncation index m∗ = m∗(δ, ρ) be given by (9.19),

iii) choose a fixed refinement index rδ(m) = k such that ε{C,A,R}k satisfy

(εCk + βS(εAk + εRk )) ≤ δ2μ/(2μ+1)ρ1/(2μ+1)∑mi=0(1 + βS‖S‖)i

, (9.23)

iv) define the regularization

Rαgδ := F ∗f

δ

m∗ (9.24)

with regularization parameter α = 1/m∗(δ, ρ).

Note that Definition 9.8 is a particular case of the following definition, where the refinement index rδ variesduring the iteration. Hence, we skip the proof of convergency and order optimality of the regularization(9.24) and refer to the variable refinement case.

Definition 9.9 (Adaptive a-priori parameter choice with variable refinement).

i) Given error tolerances {ε{C,A,R}m } and routines COARSE, APPLY and RHS defined as above,

ii) for δ > 0 with ‖gδ − g‖ ≤ δ derive the truncation index m∗ as in (9.19),

iii) define the quantities

Cm,rδ :=m∑i=0

(1 + βS‖S‖)i(εCrδ(m−i) + βS(εArδ(m−i) + εRrδ(m−i))) , (9.25)

iv) choose the sequence rδ such that Cm∗−1,rδ satisfies

Cm∗−1,rδ ≤ δ2μ/(2μ+1)ρ1/(2μ+1) , (9.26)

v) define the regularization

Rαgδ := F ∗f

δ

m∗ (9.27)

with regularization parameter α = 1/m∗(δ, ρ).

Theorem 9.10. Let the truncation index m∗ = m∗(δ, ρ) be chosen as in (9.19) Then, the adaptiveLandweber iteration (9.17) truncated at m∗ and updated with the refinement strategy rδ, as in Definition9.9, yields for α(δ, ρ) = 1/m∗(δ, ρ) a regularization method Rα, which is order optimal for all μ > 0 and0 < βS < 2/‖L‖2.

77

Proof. It follows from Lemma 9.7

‖f δm∗ − fδ

m∗‖ ≤ δ2μ/(2μ+1)ρ1/(2μ+1). (9.28)

Consequently, with Lemma 9.7 and Definition 9.9 iv)

‖Rαgδ − f†‖ ≤ ‖F ∗f δm∗ − F∗f δm∗‖+ ‖fδm∗ − f

†‖≤ (‖F‖+ c0)δ2μ/(2μ+1)ρ1/(2μ+1).

follows that the regularization Rα is order optimal in the sense of Definition 3.4.

9.4 Regularization by A-Posteriori Parameter Choice

As already mentioned in Section 4.4, a possible regularization of the ill-posed operator equation (9.48)is given by truncating Landweber iteration (4.61) according to the stopping rule (4.68). In this Sec-tion we prove that an analogous result holds for the adaptive Landweber iteration (9.17), supplied by anappropriate equivalent of Morozov’s discrepancy principle.

9.4.1 Adaptive Residual Discrepancy

First, we formulate an analogon to the residuum ‖Lf−g‖, using the frame based equivalents of the termsL∗Lf and L∗g and their adaptive approximation by APPLY and RHS.

Definition 9.11. For some g ∈ Y , f ∈ �2 and some integer m ≥ 0 the adaptive residual discrepancyRES is defined by

(RESm[f , g])2 := 〈APPLYm[f ],f〉 − 2〈RHSm[g],f〉+ ‖g‖2. (9.29)

The following lemma compares the adaptive residual discrepancy with the exact one.

Lemma 9.12. For f ∈ �2 with Ff = f , g ∈ H and m ≥ 0 holds

| ‖Lf − g‖2 − (RESm[f , g])2 | ≤ (εAm + 2εRm)‖f‖. (9.30)

Proof. This results follows easily by straightforward computations.

9.4.2 A Monotonicity of Adaptive Iteration

Inspired by the idea of monotonicity of the approximation error, applied in the modified Landweber case,cf. Section 4.4, we formulate a condition which assures the monotonic behavior of f

δ

m−f †, and is basedon the adaptive residual discrepancy RES.

Lemma 9.13. Let δ > 0, 0 < c < 1 and 0 < βS < 2/(3‖S‖). If for some m ≥ 0 a refinement indexrδ(m) exists, such that the approximate discrepancy RESrδ(m)[f

δ

m, gδ] fulfills

c(RESrδ(m)[fδ

m, gδ])2 >

δ2 + Crδ(m)(fδ

m)1− 3

2βS‖S‖, (9.31)

then

‖f δm+1 − f †‖ ≤ ‖f δm − f †‖. (9.32)

The quantity Crδ(m)(fδ

m), defined below in (9.37), depends on the iterate fδ

m, the generalized solution f †,‖S‖, βS and the error tolerances ε{C,A,R}

rδ(m).

78

Proof. At first, we observe that

‖f δm+1 − f †‖2 − ‖f δm − f †‖2 = 〈f δm+1 − fδ

m, fδ

m+1 + fδ

m − 2f †〉

= 〈f δm+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ], f

δ

m+1 + fδ

m − 2f †〉

+ 〈−βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ], f

δ

m+1 + fδ

m − 2f †〉

= 〈f δm+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ],

m+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ]〉

+ 〈f δm+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ], 2f

δ

m − 2f †〉

+ 2〈−βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ],

m+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ]〉

+ 〈−βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ],

−βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ] + 2f

δ

m − 2f †〉

≤ (εCrδ(m))2 + 2εCrδ(m)(‖f

δ

m‖+ ‖f †‖)

+ 2〈βSA∗Afδ

m − βSAPPLYrδ(m)[fδ

m] + βSRHSrδ(m)[gδ]− βSA∗gδ,

m+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ]〉

+ 2〈−βSA∗Afδ

m + βSA∗gδ, f

δ

m+1 − fδ

m + βSAPPLYrδ(m)[fδ

m]− βSRHSrδ(m)[gδ]〉

+ β2S‖APPLYrδ(m)[f

δ

m]−RHSrδ(m)[gδ]‖2︸ ︷︷ ︸

=:T1

+ 2βS〈−APPLYrδ(m)[fδ

m] + RHSrδ(m)[gδ], f

δ

m − f †〉︸ ︷︷ ︸=:T2

≤ (εCrδ(m))2 + 2εCrδ(m)(‖f

δ

m‖+ ‖f †‖) + βS(εArδ(m) + εRrδ(m))εCrδ(m)

+ βS‖FL∗‖‖Lfδm − gδ‖εCrδ(m) + T1 + T2

≤ (εCrδ(m))2 + 2εCrδ(m)(‖f

δ

m‖+ ‖f †‖) + βS(εArδ(m) + εRrδ(m))εCrδ(m)

+12(β2S‖S‖‖Lfδm − gδ‖2 + (εCrδ(m))

2)

+ T1 + T2 , (9.33)

where the abbreviations T1 and T2 are defined as follows

T1 = β2S‖RHSrδ(m)[g

δ]−APPLYrδ(m)[fδ

m]‖

≤ β2S

(‖RHSrδ(m)[g

δ]− FL∗gδ −APPLYrδ(m)[fδ

m] + Sfδ

m‖+ ‖FL∗gδ − Sfδ

m‖)2

≤ 2β2S

((εRrδ(m) + εArδ(m))

2 + ‖S‖‖gδ − Lfδm‖2)

(9.34)

79

and

T2 = 2βS[〈f δm,RHSrδ(m)[g

δ]−APPLYrδ(m)[fδ

m]〉

− 〈f †,RHSrδ(m)[gδ]−APPLYrδ(m)[f

δ

m]〉]

= 2βS[−‖gδ‖2 + 2〈RHSrδ(m)[g

δ], fδ

m〉 − 〈APPLYrδ(m)[fδ

m], fδ

m〉

+ ‖gδ‖2 + 〈FL∗gδ −RHSrδ(m)[gδ], f

δ

m〉 − 〈FL∗gδ, fδ

m〉+ 〈FL∗gδ −RHSrδ(m)[g

δ],f †〉 − 〈FL∗gδ,f †〉

+ 〈APPLYrδ(m)[fδ

m]− Sfδ

m,f†〉+ 〈Sf

δ

m,f†〉]

≤ 2βS[−(RESrδ(m)[f

δ

m, gδ])2 + ‖gδ‖2 + εRrδ(m)‖f

δ

m‖ − 〈gδ, LF ∗fδ

m〉

+ εRrδ(m)‖f †‖ − 〈gδ, LF ∗f †〉+ εArδ(m)‖f †‖+ 〈LF ∗f δm, LF ∗f †〉]

= 2βS[−(RESrδ(m)[f

δ

m, gδ])2 + εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖

+ ‖gδ‖2 − 〈gδ, LF ∗f δm〉 − 〈gδ, g〉+ 〈LF ∗f δm, g〉]

= 2βS[−(RESrδ(m)[f

δ

m, gδ])2 + εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖

+ 〈gδ − g, gδ − LF ∗f δm〉]

≤ 2βS[−(RESrδ(m)[f

δ

m, gδ])2 + εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖

+ ‖gδ − g‖‖gδ − LF ∗f δm‖]

≤ 2βS[−(RESrδ(m)[f

δ

m, gδ])2 + εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖]

+ βS(δ2 + ‖gδ − Lfδm‖2). (9.35)

Inserting the estimates (9.34) for T1 and (9.35) for T2 into (9.33) finally yields

‖f δm+1 − f †‖2 − ‖f δm − f †‖2

≤ 32(εCrδ(m))

2 + 2εCrδ(m)(‖fδ

m‖+ ‖f †‖) + βS(εArδ(m) + εRrδ(m))εCrδ(m)

+12β2S‖S‖‖Lfδm − gδ‖2

+ 2β2S(εRrδ(m) + εArδ(m))

2 + 2β2S‖S‖‖gδ − Lfδm‖2

− 2βS(RESrδ(m)[fδ

m, gδ])2 + 2βS

(εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖)

+ βSδ2 + βS‖gδ − Lfδm‖2

=32(εCrδ(m))

2 + 2εCrδ(m)(‖fδ

m‖+ ‖f †‖) + βS(εArδ(m) + εRrδ(m))εCrδ(m)

+ 2β2S(εRrδ(m) + εArδ(m))

2 + 2βS(εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖)

+ βSδ2 − 2βS(RESrδ(m)[f

δ

m, gδ])2 + βS

(1 +

32βS‖S‖

)‖gδ − Lfδm‖2

≤ βSCrδ(m)(fδ

m) + βSδ2 + βS

(32βS‖S‖ − 1

)(RESrδ(m)[f

δ

m, gδ])2, (9.36)

80

where we have introduced for ease of notation

Crδ(m)(fδ

m) :=3

2βS(εCrδ(m))

2 +2βSεCrδ(m)(‖f

δ

m‖+ ‖f †‖) + (εArδ(m) + εRrδ(m))εCrδ(m)

+ 2βS(εRrδ(m) + εArδ(m))2 + 2

(εRrδ(m)‖f

δ

m‖+ (εRrδ(m) + εArδ(m))‖f †‖)

+(1 +

32βS‖S‖

)(εArδ(m) + 2εRrδ(m))‖f

δ

m‖ . (9.37)

Since (9.31) was assumed, and thanks to (9.36), monotony can be deduced

‖f δm+1 − f †‖2 − ‖f δm − f †‖2 < βS(32βS‖S‖ − 1

)(1− c)(RESrδ(m)[f

δ

m, gδ])2 < 0 . (9.38)

9.4.3 Convergence in Exact Data Case

Lemma 9.13 holds true also for the exact data, i.e. for gδ = g with δ = 0. In this case, fδ

m is replacedby fm and rδ by r and the quantity (9.37) reduces to Cr(m)(fm). Hence, the sufficient condition for themonotone decay of ‖fm − f †‖ in the noise-free case simplifies to

c(RESr(m)[fm, g])2 ≥ Cr(m)(fm)

1− 32βS‖S‖

, (9.39)

which can be achieved if Cr(m)(fm) is chosen accordingly. In order to prove the regularization propertywe follow the idea in [52] and introduce an updating rule (U) for the refinement strategy r, cf. (9.31):

U(i) Let r(0) be the smallest integer ≥ 0 with

c(RESr(0)[f0, g])2 ≥ Cr(0)(f0)

1− 32βS‖S‖

, (9.40)

if r(0) with (9.40) does not exist, stop the iteration, set m0 = 0.

U(ii) if for m ≥ 1

c(RESr(m−1)[fm, g])2 ≥ Cr(m−1)(fm)

1− 32βS‖S‖

, (9.41)

set r(m) = r(m− 1)

U(iii) if

c(RESr(m−1)[fm, g])2 <

Cr(m−1)(fm)1− 3

2βS‖S‖, (9.42)

set r(m) = r(m− 1) + j, where j is the smallest integer with

c(RESr(m−1)+j [fm, g])2 ≥ Cr(m−1)+j(fm)

1− 32βS‖S‖

, (9.43)

U(iv) if there is no integer j with (9.43), then stop the iteration, set m0 = m.

Lemma 9.14. Let δ = 0 (i.e. gδ = g) and {fm}m≥0 be the sequence of iterates defined by the adaptiveLandweber iteration (9.17). Assume that r = r(m) is chosen according to the updating rule (U). Then, ifthe iteration never stops,

81

∞∑m=0

(RESr(m)[fm, g])2 ≤ 1

βS(1− c)(1− 3

2βS‖S‖)‖f0 − f †‖2. (9.44)

If the iteration stops after m0 steps,

m0−1∑m=0

(RESr(m)[fm, g])2 ≤ 1

βS(1− c)(1− 3

2βS‖S‖)‖f0 − f †‖2. (9.45)

Proof. Suppose first that the iteration never stops. Then, updating rule (U) yields for any m ≥ 0

c(RESr(m)[fm, g])2 ≥ Cr(m)(fm)

1− 32βS‖S‖

. (9.46)

Therefore, it follows from (9.38) for δ = 0,

(RESr(m)[fm, g])2 ≤ 1

βS(1− c)( 32βS‖S‖ − 1)

(‖fm − f †‖2 − ‖fm+1 − f †‖2

), (9.47)

and summing with respect to m yields (9.44). If, on the other hand, the iteration terminates after m0

steps, updating rule (9.46) holds for 0 ≤ m ≤ m0 − 1. Consequently, one can apply again (9.47), andsumming over the first m0 summands yields the result.

Combining the monotone decay of the approximation errors, cf. Lemma 9.13, with the uniform bounded-ness of the accumulated residual discrepancies, cf. Lemma 9.14, enables us to prove strong convergenceof the adaptive Landweber iteration (9.17) for exact data g ∈ R(L).

Theorem 9.15. Let L : X → Y be an ill-posed operator between Hilbert spaces X,Y . Consider theinverse problem

Lf = g (9.48)

with exact data g ∈ R(L).Let fm be defined by the adaptive Landweber iteration (9.17) with gδ = g, δ = 0, and refinement strategyr given by updating rule (U). Then, for f0 arbitrarily chosen, the sequence fm converges in norm tosome f †, i.e.

‖fm − f †‖�2 → 0, as m→∞ . (9.49)

where f † is a frame representative of f†, and f† = F ∗f † is a (not necessarily unique) solution of theinverse problem (9.48).The convergence holds true also in the function space, i.e.

‖fm − f†‖X → 0, as m→∞ . (9.50)

where fm := F ∗fm.

Proof. Consider at first the case in which the iteration stops after m0 steps. Thanks to (9.37) it followsthat C(i)(fm0

) → 0 as i→∞. Therefore, according to the updating rule (U), one has for all j ≥ 1

0 ≤ c(RESr(m0−1)+j [fm0, g])2 <

Cr(m0−1)+j(fm0)

1− 32βS‖S‖

→ 0 as j →∞ . (9.51)

Moreover, by Lemma 9.12 formula (9.30) one has for any i ≥ 0

‖Lfm0 − g‖2 ≤ (RES(i)[fm0, g])2 + (εAi + 2εRi )‖fm0

‖ → 0 as i→∞ , (9.52)

82

and therefore g = LF ∗fm0. Consider now the case in which the iteration never stops. For norm

convergence it is sufficient to show that

ek := f † − fk (9.53)

is a Cauchy sequence. From the updating rule (U) and Lemma 9.13 it is clear that ‖ek‖ forms decreasinga sequence. For n ≥ k select an index l = l(k, n) such that k ≤ l ≤ n and that

RESr(l)[f l, g] := mink≤i≤n

{RESr(i)[f i, g]} ≤ RESr(i)[f i, g] for k ≤ i ≤ n. (9.54)

Since

‖en − ek‖ ≤ ‖en − el‖+ ‖el − ek‖ , (9.55)

‖en − el‖2 = ‖en‖2 − ‖el‖2 + 2〈el − en, el〉‖el − ek‖2 = ‖ek‖2 − ‖el‖2 + 2〈el − ek, el〉

and since ‖ei‖ are monotonically decreasing, i.e. there is some ε ≥ 0 with ‖ei‖ → ε as i → ∞, it issufficient to verify

|〈el − en, el〉| → 0 and |〈el − ek, el〉| → 0 for k →∞. (9.56)

Consider

1βS|〈el − ek, el〉| =

1βS|〈fk − f l,f

† − f l〉| ≤1βS

l−1∑i=k

|〈f i+1 − f (i),f† − f l〉|

≤l−1∑i=k

|〈RHSr(i)[g]−APPLYr(i)[S, f (i)],f† − f l〉|

≤l−1∑i=k

[|〈RHSr(i)[g]− g,f † − f l〉|

+|〈Sf (i) −APPLYr(i)[S, f (i)],f† − f l〉|

+|〈g − Sf (i),f† − f l〉|

]

≤l−1∑i=k

[(εRr(i) + εAr(i))‖f † − f l‖+ ‖g − LF ∗f (i)‖‖g − LF ∗f l‖

]

≤l−1∑i=k

[(εRr(i) + εAr(i))‖f † − f (i)‖

+12(‖g − LF ∗f (i)‖2 + ‖g − LF ∗f l‖2)

]

≤l−1∑i=k

[(εRr(i) + εAr(i))‖f † − f (i)‖

+12((RESr(i)[f i, g])

2 + (2εRr(i) + εAr(i))‖f (i)‖

+(RESr(l)[f l, g])2 + (2εRr(l) + εAr(l))‖f l‖)

]From (9.37) one has for all k ≤ i ≤ l − 1,

83

(εRr(i) + εAr(i))‖f † − f (i)‖ ≤ (εRr(i) + εAr(i))(‖f †‖+ ‖f (i)‖) ≤ Cr(i) (9.57)

and

(2εRr(i) + εAr(i))‖f (i)‖ ≤ Cr(i) . (9.58)

Moreover, updating rule (U) yields

Cr(i)(f i) ≤ c(1− 32βS‖S‖)(RESr(i)[f i, g])

2. (9.59)

Consequently, since RESr(l)[f l, g] ≤ RESr(i)[f i, g], estimate (9.57) can be rewritten,

1βS|〈el − ek, el〉| ≤

l−1∑i=k

(Cr(i)(f i) +

12[(RESr(i)[f i, g])

2 + Cr(i)(f i)

+(RESr(l)[f l, g])2 + Cr(l)(f l)

])

≤l−1∑i=k

((RESr(i)[f i, g])

2 + 2c(1− 32βS‖S‖)(RESr(i)[f i, g])

2)

= constl−1∑i=k

(RESr(i)[f i, g])2 ,

which tends to zero as k →∞ thanks to Lemma 9.14. Analogously it can be shown that

|〈el − ej , el〉| → 0 as k →∞. (9.60)

Thus, the sequence ek = f † − fk forms a Cauchy sequence with ‖ek‖ → 0 and therefore,

fk → f † as k →∞ (9.61)

implying

‖fk − f†‖ ≤ ‖F ∗‖‖fk − f †‖ → 0 as k →∞ . (9.62)

9.4.4 Regularization Result

The regularization proof for the adaptive Landweber iteration , truncated by discrepancy rule (9.31) relieson comparison between noise free and noisy iterations (9.18) and (9.17). To use the results obtained inthe noise free case, one has to verify certain stability of the routines COARSE, APPLY and RHS withrespect to the noise. For the routine COARSE it means in particular the following: assume ‖vδ−v‖ ≤ δ,then for any fixed ε > 0 the desired behavior in the noisy case would be

‖COARSEε(vδ)−COARSEε(v)‖ → 0 as δ → 0, (9.63)

cf. (9.14)-(9.16).As we will see, the classical version of COARSE as proposed in [58] does not fulfill the requirement (9.63),so that a noise dependant version of COARSE, satisfying (9.63) has to be defined. Noise dependantversions of APPLY and RHS, satisfying analogous stability property, can be formulated in the sameway.

84

For the reason that the input of COARSE is already preprocessed in a way that it contains a finitenumber of entries, we assume here that the vectors v,vδ contain finitely many non-zero elements.First, we formulate a new version of COARSE, cf. Algorithm 6.4, with a slight modification concerningthe order of the output entries.

COARSEε[v] → vε

i) Let V be the set of non-zero coefficients of v, ordered by their original indexing in v. Defineq :=

⌈log

((#V )1/2‖v‖

ε

)⌉.

ii) Divide the elements of V into sets V 0, . . . ,V q, where for 0 ≤ k < q

V k := {v ∈ V : 2−k−1‖v‖ < |v| ≤ 2−k‖v‖}, (9.64)

and possible remaining elements are put into V q. Let the elements of a single V k be alsoordered by their original indexing in v. Denote the vector obtained by subsequently extractingthe elements of V 0, . . . ,V q by γ(v).

iii) Create vε by extracting elements from γ(v) and putting them at the original indices, until thesmallest l is found with

‖v − vε‖2 =∑i>l

|γi(v)|2 < ε2. (9.65)

Remark 9.16. Note that q in Definition of COARSE, i), is chosen so that∑v∈V q

|v|2 < ε2, i.e. noelements from V q are used to create vε in iii).

Remark 9.17. Note that taking in account the original order of the elements of v inside the containersV k, we get uniquely defined output of COARSE. Moreover, such a modification of COARSE does notcause any additional computational effort, since no additional sorting is required.

Remark 9.18. The uniqueness of the output of COARSE plays an important role, since we are goingto show that the noise dependant version converges to the noise free one for δ → 0, and the limit has to beunique. Unfortunately, the bin sorting used by COARSE can be affected by the noise, so that the indexin γ(vδ) of some noisy element vδi might vary from the index in γ(v) of it’s noise free equivalent vi. Tocircumvent this problem at least for sufficiently small δ, we define a noise dependant variant COARSEδ.

Let the noise dependant version COARSEδ be defined as follows:

COARSEδε[vδ] → vδε

i) Let V δ be the set of non-zero coefficients of vδ. Let the elements of V δ be ordered followingthe indexing of vδ. Define qδ :=

⌈log

((#V δ)1/2(‖vδ‖+δ)

ε

)⌉.

ii) Divide the elements of V δ into the bins V δ0, . . . ,V

δqδ , where for 0 ≤ k < qδ

V δk := {vδi ∈ V δ : 2−k−1(‖vδi ‖+ δ) + δ < |vδi | ≤ 2−k(‖vδ‖+ δ) + δ}, (9.66)

and possible remaining elements are put into V δqδ . Again, let the elements of a single V δ

k beordered using their indexing in vδ. Denote the vector obtained by the bin sorting by γδ(vδ).

iii) Create vδε by extracting elements from γδ(vδ) and putting them on the original places, untilthe first integer lδ is found with

‖vδ − vδε‖2 = ‖vδ‖2 −∑

1≤i≤lδ|γδi (vδ)|2 < ε2 − (lδ + 1)δ(2‖vδ‖+ δ). (9.67)

85

The following lemma shows the desired continuity property of COARSEδ.

Lemma 9.19. Let ε > 0, and for any δ > 0 let v, vδ ∈ �2 be some finitely supported vectors with‖vδ − v‖ ≤ δ.Then the routine COARSEδ is convergent in the sense that

‖COARSEδε[vδ]−COARSEε[v]‖ → 0 as δ → 0. (9.68)

Proof. Note that for δ = 0, it follows from the definition of COARSEδ that we obtain the “basic”routine COARSE.Now, let δ > 0. For sufficiently small δ, the number #V δ of the coefficients of vδ is greater or equal than#V . In fact, since ‖vδi − vi‖ ≤ δ, from |vi| > 0 follows |vδi | > |vi| − δ > 0 for all δ ≤ δ0 < infvi∈V |vi|.Therefore, due to the definition of qδ, we get

qδ =⌈log

( (#V δ)1/2(‖vδ‖+ δ)ε

)⌉≥

⌈log

( (#V )1/2‖v‖ε

)⌉= q. (9.69)

We show now that for δ ≤ δ0, the elements of any V δk are exactly the noisy counterparts of the elements

of V k, 0 ≤ k ≤ q − 1, and due to the indexing assumption they are ordered in the same way.Suppose that vδi belongs to some V δ

k, 0 ≤ k ≤ q − 1 < qδ. Then, with the left inequality of (9.66) and,using the facts ||vi| − |vδi || ≤ δ and |‖v‖ − ‖vδ‖| ≤ δ, we get

|vi| ≥ |vδi | − δ > 2−(k+1)(‖vδ‖+ δ) + δ − δ ≥ 2−(k+1)‖v‖. (9.70)

From the right inequality of (9.66) we get with δ → 0 that |vi| ≤ 2−k‖v‖. Thus, by (9.64) we obtainvi ∈ V k for 0 ≤ k ≤ q − 1. Conversely, if vi ∈ V k, 0 ≤ k ≤ q − 1, vδi must be in V δ

k, otherwise we wouldget a contradiction to the recently proven fact.Now we show that the thresholding index l in COARSE is equal to lδ in COARSEδ once δ is chosensmall enough. Let vε be the output of the routine COARSE. Then, l is the smallest integer with

‖v − vε‖2 = ‖v‖2 −∑

1≤i≤l|γi(v)|2 < ε2. (9.71)

Since ‖vδ‖ is bounded for δ → 0 and l is fixed for a fixed v, there is some δ1 ≤ δ0, so that

∑1≤i≤l

|γi(v)|2 > ‖v‖2 − ε2 + 2(l + 1)δ(2‖vδ‖+ δ) (9.72)

for all δ ≤ δ1. Since the l-th element of γ(v) is in one of the containers V k, 0 ≤ k ≤ q − 1, cf. Remark9.16, the first l elements of γδ(vδ) are exactly the noisy equivalents of the first l values of γ(v), i.e.|γδi (vδ)− γi(vδ)| ≤ δ, 1 ≤ i ≤ l. Consequently, we get

||γδi (vδ)|2 − |γi(v)|2| ≤ δ(|γδi (vδ)|+ |γi(v)|) ≤ δ(‖vδ‖+ ‖v‖) ≤ δ(2‖vδ‖+ δ). (9.73)

Analogously,

|‖vδ‖2 − ‖v‖2| = |‖vδ‖ − ‖v‖|(‖vδ‖+ ‖v‖) ≤ δ(‖vδ‖+ ‖v‖) ≤ δ(2‖vδ‖+ δ). (9.74)

Now, using the estimates (9.71)-(9.74) we get

∑1≤i≤l

|γδi (vδ)|2 ≥∑

1≤i≤l|γi(v)|2 − lδ(2‖vδ + δ‖)

> ‖v‖2 − ε2 + 2(l + 1)δ(2‖vδ‖+ δ)− lδ(2‖vδ‖+ δ)

≥ ‖vδ‖2 − δ(2‖vδ‖+ δ)− ε2 + (l + 2)δ(2‖vδ‖+ δ)

= ‖vδ‖2 − ε2 + (l + 1)δ(2‖vδ‖+ δ). (9.75)

86

From (9.75) follows that l satisfies (9.67), i.e. lδ ≤ l. Suppose that lδ < l. Using (9.67), (9.73) we get

∑1≤i≤lδ

|γi(v)|2 ≥∑

1≤i≤lδ|γi(vδ)|2 − lδδ(2‖vδ‖+ δ)

> ‖vδ‖2 − ε2 + (lδ + 1)δ(2‖vδ‖+ δ)− lδδ(2‖vδ‖+ δ)

> ‖v‖2 − δ(2‖vδ‖+ δ)− ε2 + δ(2‖vδ‖+ δ)

= ‖v‖2 − ε2,

which is a contradiction to the definition of l in COARSE. It follows that for sufficiently small δ > 0holds lδ = l, so that

‖vδε − vε‖ ≤ ‖vδ − v‖ ≤ δ, (9.76)

i.e. COARSEδ converges in the sense of (9.68).

Analogously to updating rule (U), we introduce an updating rule (D) that defines the refinement strategyrδ(m) and the truncating index m∗ for the noisy case (9.17).

D(i) Let rδ(0) be the smallest integer ≥ 0 with

c(RESrδ(0)[fδ

0, gδ])2 ≥

δ2 + Crδ(0)(fδ

0)1− 3

2βS‖S‖, (9.77)

if rδ(0) does not exist, stop the iteration, set m∗ = 0.

D(ii) if for m ≥ 1

c(RESrδ(m−1)[fδ

m, gδ])2 ≥

δ2 + Crδ(m−1)(fδ

m)1− 3

2βS‖S‖, (9.78)

set rδ(m) = rδ(m− 1)

D(iii) if

c(RESrδ(m−1)[fδ

m, gδ])2 <

δ2 + Crδ(m−1)(fδ

m)1− 3

2βS‖S‖, (9.79)

set rδ(m) = rδ(m− 1) + j, where j is the smallest integer with

c(RESrδ(m−1)+j [fδ

m, gδ])2 ≥

δ2 + Crδ(m−1)+j(fδ

m)1− 3

2βS‖S‖(9.80)

andCrδ(m−1)+j(f

δ

m) > c1δ2. (9.81)

D(iv) if (9.79) holds and no j with (9.80),(9.81) exists, then stop the iteration,set m∗ = m.

In the following theorem we show the regularization property of adaptive Landweber method (9.17) witha-posteriori parameter choice rule (D).

87

Theorem 9.20. Let f† be again a solution of (9.48) for exact data g. Suppose that for any δ > 0, gδ

satisfying ‖gδ − g‖ ≤ δ, fδ

m is derived by the adaptive Landweber iteration (9.17) in combination withupdating rule (D) for the refinement strategy rδ and the stopping index m∗ = mδ

∗. Then, the family ofRα defined by

Rαgδ := F ∗f

δ

mδ∗with α = α(δ, gδ) =

1mδ∗

(9.82)

defines a regularization of the ill-posed operator L, i.e.

‖fδmδ∗− f†‖X → 0 as δ → 0. (9.83)

Convergence holds also in the sequence space

‖f δmδ∗− f †‖�2 as δ → 0, (9.84)

where f † is some frame representative of f†, i.e. f† = F ∗f †.

Proof. First we would like to give a brief sketch of the proof, which essentially follows the lines of [52].Using induction over m we show that for sufficiently small δ the refinement strategy rδ(m) in the noisycase is equal to the refinement strategy r(m) in the noise-free case. And, consequently for some fixed mfδ

m → fm with δ → 0. Then, we analyze different possibilities of convergence of the truncation indicesmδn∗ for some zero sequence δn and deduce that in all the cases the corresponding truncated adaptiveiterations f

δn

mδn∗ converge with n→∞.

Compare the noise free iteration fm+1 with the noisy one fδ

m+1, defined by (9.18)+(U) and (9.17)+(D),

respectively. Moreover, assume that the initial value of both iterations is the same, i.e. fδ

0 = f0.The updating rule (U) provides for every 1 ≤ m ≤ m∗ − 1 a refinement index r(m) which is the smallestinteger greater or equal r(m− 1) with

c(RESr(m)[fm, g])2 ≥ Cr(m)(fm)

1− 32βS‖S‖

. (9.85)

The updating rule (D) provides for every 1 ≤ m ≤ mδ∗− 1 a refinement index rδ(m) which is the smallest

integer greater or equal rδ(m− 1), so that

c(RESrδ(m)[fδ

m, gδ])2 ≥

δ2 + Crδ(m)(fδ

m)1− 3

2βS‖S‖(9.86)

holds.We show that, under condition f

δ

0 = f0, for sufficiently small δ holds rδ(0) = r(0).Note that for f

δ

m → fm Crδ(m)(fδ

m) → Cr(m)(fm) as well as RESrδ(m)[fδ

m, gδ] → RESr(m)[fm, g] due

to the property of APPLY and RHS to be continuous with respect to the noise.Then for δ → 0 we have from (D) that rδ(0) is the smallest integer ≥ 0 with

c(RESrδ(0)[fδ

0, gδ])2 ≥

Crδ(0)(fδ

0)1− 3

2βS‖S‖, (9.87)

which is the definition of r(0) from (U).So that rδ(0) = r(0), and therefore f

δ

1 → f1.Now assume for some m ≥ 1 that rδ(m− 1) = r(m− 1) and f

δ

m → fm, δ → 0. With the same argumentas for m = 0 we conclude that rδ(m) = r(m) for sufficiently small δ, and therefore f

δ

m+1 → fm+1, sincefor a fixed i the routines RHSi, APPLYi, COARSEi are continuous with respect to the noise in theinput vector.

88

First we will show the convergence in the sequence space

‖f δmδ∗− f †‖ → 0 as δ → 0 . (9.88)

Let δn be a positive sequence with δn → 0, n→∞.a) Consider first the case, that mδn∗ converges to some point m. Then, for sufficiently large n holdsmδn∗ = m. We denote mn := m∗(δn, gδ

n

).Since for a fixed k, mn − 1 is the greatest integer with

Crδ(mn−1)(fδ

mn−1) > c1δn and c(RESr(mn−1)[fδ

mn−1, gδ])2 ≥ δ2n + Cr(mn−1)(f

δ

mn−1)1− 3

2βS‖S‖, (9.89)

we get with n→∞ that m− 1 is the greatest integer with

cRESr(m−1)[f m−1, g]2 ≥ Cr(m−1)(f m−1)

1− 32βS‖S‖

, (9.90)

which is the definition of the stopping index m∗ in the noise-free case. Then for sufficiently large n wehave mδn∗ = m∗ and f

δn

mδn∗ = fδn

m∗ → fm∗ = f † for n→∞.

b) Now suppose that mδn∗ → ∞ monotonically. For n > l we use the fact that the errors ‖f δm − f †‖decrease with increasing m. Consider

‖f δn

mn− f †‖ ≤ ‖f δn

ml− f †‖ ≤ ‖f δn

ml− fml

‖+ ‖fml− f †‖. (9.91)

Since fm converge to f †, choose l so that

‖fml− f †‖ ≤ ε

2. (9.92)

Since fδ

m → fm, choose n so that

‖f δn

ml− fml

‖ ≤ ε

2. (9.93)

From the last three expressions we obtain

‖f δn

mn− f †‖ → 0 as n→∞. (9.94)

c) Assume that in general fδn

mndoes not converge to f †. Then for any ε > 0 exists a subsequence f

δn(k)

mn(k)

of fδmn

mnwith

‖f δn(k)

mn(k)− f †‖ > ε. (9.95)

Is mn(k) bounded, then it contains a convergent subsequence. Proceeding as in a), we get a contradictionto (9.95).Is mn(k) unbounded, then it contains a monotonically increasing unbounded subsequence. Proceeding asin b), we again get a contradiction to (9.95).Then, (9.94) holds, and since the sequence δn has been chosen arbitrarily, we conclude (9.88). In thefunction space we use

‖f δmδ∗− f†‖X = ‖F ∗(f δmδ∗

− f †)‖X ≤ ‖F ∗‖‖fδ

mδ∗− f †‖�2 , (9.96)

so that from (9.88) we get the convergence in X:

‖f δmδ∗− f†‖X → 0 as δ → 0 . (9.97)

89

90

Chapter 10

Example Problem

10.1 Linear Radon Transform

The problem of computerized tomography (CT) consists in reconstruction of the tissue density of theanalyzed body, given the intensity loss of an X-ray beam after the X-ray have passed through the body.For the detailed introduction in the mathematics of computerized tomography we refer to [46]. Usually,the tissue density f is nonzero on some bounded domain Ω ⊂ R

d. The intensity loss of the parallelX-ray beam is mathematically modeled by taking the integral of the density f over the hyperplanes inRd. A hyperplane in R

d can be uniquely described by its normal direction ω ∈ Sd−1, where Sd−1 ={ω ∈ R

d, ‖ω‖ = 1} denotes the unit sphere in Rd, and the distance to the origin s ∈ R+. In this simple

case, the direct problem is modeled by the linear Radon operator R : L2(Rd) → L2(Z), given by

(Rf)(s, ω) =∫

Rd

f(x)δ(〈x, ω〉 − s)dx, (10.1)

where Z = R× Sd−1 is the unit cylinder in Rd. Then, the tomographic reconstruction problem consists

in inversion of the linear Radon equation

Rf = g. (10.2)

Due to the measurement process and the discrete sampling procedure the intensity loss g is in general notknown exactly. Therefore, the measured data gδ ∈ L2(Z) are usually assumed to be an approximationof the exact data g with some known accuracy δ > 0, i.e.

‖gδ − g‖ ≤ δ. (10.3)

First, we recall some fundamental properties of the operator R. It is known that R is continuous andcontinuously invertible as mapping between certain Sobolev spaces, i.e. there are constants c, C > 0 suchthat

c‖f‖Hσ0 (Ω) ≤ ‖Rf‖Hσ+(d−1)/2(Z) ≤ C‖f‖Hσ

0 (Ω) (10.4)

for any σ ∈ R, cf. [46, Thm. 5.1]. Using a similar argument as in [46], one can show the correspondingcontinuity of the L2-adjoint R∗ of the operator R between the corresponding Sobolev spaces Hσ(Z) andHσ+(d−1)/2(Ω) for all σ ∈ R. Consequently, we deduce the continuous invertibility of the self adjointoperator R∗R, i.e.

‖f‖Hσ0 (Ω) ∼ ‖R∗Rf‖Hσ+(d−1)(Ω). (10.5)

91

In our setting we aim to reconstruct the tissue density f ∈ L2(Ω) given some perturbed data gδ ∈ L2(Z).Due to the presence of noise we cannot assume higher smoothness on the measured data gδ. However,from the continuity property (10.4) we deduce that R is not continuously invertible as a mapping betweenL2(Ωd) and L2(Z). In order to deal with noisy data, we therefore have to regularize (10.2).

10.2 Regularity of Solution

We are going to use the typical information about the tissue structure of the medical images in order toobtain an information about the regularity of the solution f† of the Radon problem (10.2).The smoothness information is of importance with respect to the estimates for convergence rates. TheSobolev smoothness of the solution, e.g. determines the source condition in Hilbert scales, cf. Section 4.2.On the other hand, the Besov regularity of the density plays a crucial role in adaptive approximation, cf.Section 5.1.In this chapter we assume that the tissue density f is a piecewise constant or, in general, a piecewisesmooth function on R

d with discontinuities across smooth (d− 1)-dimensional manifolds.In what follows, we will need to estimate the Sobolev smoothness of the solution f in the scales Hsd+t

and Bsd+tτ (Lτ ), τ = (s+ 1/2)−1, for some appropriate range of t. Note that Hsd+t and Bsd+tτ (Lτ ), τ =(s+ 1/2)−1 are the approximation spaces for the linear and nonlinear approximation in Ht, respectively,cf. Proposition 5.3. The upper limit of the value s for which f belongs to one of the scales specifies thevelocity of the approximation in corresponding linear or nonlinear subspaces.

Proposition 10.1. Let f : Ω → Rd be a piecewise smooth function except for jumps across smooth

(d− 1)-dimensional manifolds in Rd. Then, for |t| < 1/2,

f ∈ Hsd+t(Ω) if and only if s <1/2− td

. (10.6)

Proof. For the proof we use the fact that piecewise smooth functions belong to the Sobolev space Hp(Ω)for all p < 1/2, cf. [46, §IV.2].

Remark 10.2. In the setting of the wavelet Hilbert scale Xr on �2, generated by the matrix B = D2, D

from (1.38), cf. Section 4.2.1, the requirement f† ∈ Hr(Ω) is equivalent to f † ∈ Xr due to the equivalence(4.41).

The next statement specifies the Besov regularity of f† in the scale Bsd+tτ (Lτ ), τ = (s+ 1/2)−1.

Proposition 10.3. Let f : Ω → Rd be as in Proposition 10.1. Then, for |t| < 1/2

f ∈ Bsd+tτ (Lτ (Ω)), τ = (s+1/2)−1, either for 0 < s <1− 2t

2(d− 1)if d ≥ 2 or for any s if d = 1. (10.7)

Proof. Consider the following decomposition of f :

f =n∑k=1

fMkχMk

, (10.8)

where fMk:= f |Mk

is a smooth function on Mk, and χMkare characteristic functions of Mk. Without

loss of generality we assume that fMkare smooth on the whole domain Ω. For τ = (s+ 1/2)−1 from [55,

Thm. 4.6.3/3] follows that χMk∈ Bsd+tτ (Lτ (Ω)) if one of the condition holds:

sd− d/2 < sd+ t ≤ s+ 1/2, s ≥ 1/2

s− 1/2 < sd+ t < s+ 1/2, 0 < s < 1/2,

92

which is equivalent to

−1/2 < t ≤ 1/2, s ≥ 1/2

−1/2 < t < 1/2, 0 < s < 1/2

in one-dimensional case and to

−d− 2t2(d− 1)

< s ≤ 1− 2t2(d− 1)

, s ≥ 1/2

−1− 2t2(d− 1)

< s <1− 2t

2(d− 1), 0 < s < 1/2

in case d ≥ 2. With |t| < 1/2 we get the required regularity.The Besov regularity of the products fMk

χMkand, consequently, of the sum f follows from the multipli-

cation theorem [55, Thm. 4.6.1/3].

Now, we can compare the smoothness of f† in the mentioned Sobolev and Besov scales.

Corollary 10.4. Let f : Ω → Rd be a piecewise smooth function except for jumps across smooth (d− 1)-

dimensional manifolds in Rd. Then, for |t| < 1/2, d ≥ 1 the relation

sups{s : f ∈ Bsd+tτ (Lτ (Ω))} > sup

s{s : f ∈ Hsd+t(Ω)} (10.9)

holds for τ = (s+ 12 )−1. With other words, the smoothness of f in the Besov scale Bsd+tτ (Lτ (Ω)) is higher

than in the Sobolev scale Hsd+t(Ω).

Proof. The assertion follows from Proposition 10.1 and Proposition 10.3.

Remark 10.5. Note that if f is a piecewise C∞, globally Cr-function with r ∈ N0, then the results ofProposition 10.1 and Proposition 10.3 can be applied to the (r+1)-st derivative of f . Thus the regularitystatements can be generalized as follows:

f ∈ Hsd+t(Ω) iff s <r + 3/2− t

d(10.10)

and

f ∈ Bsd+tτ (Lτ (Ω)) if s <r + 3/2− td− 1

. (10.11)

As in Corollary 10.4, also in this general case we conclude that the Bsd+tτ (Lτ (Ω))-smoothness of f ishigher than its Hsd+t(Ω)-smoothness.

10.3 Requirements on the System

10.3.1 Requirements Imposed by the Regularity of the Solution

The regularity/accuracy requirements on the dual wavelet frames Ψ, Ψ arise when characterizations ofSobolev and Besov spaces by means of wavelets are considered. To this end, we summarize some crucialresults from Sections 1.3 and 5.2.We will formulate all results for the piecewise smooth wavelets, and refer to the spline wavelet construc-tions made in [56].Let m, m be the order of accuracy of the dual wavelet frames Ψ, Ψ, or, by duality, the order of vanishingmoments of Ψ,Ψ, respectively. Let γ, γ be the regularity bounds of the systems Ψ, Ψ, i.e.

93

γ := sup {s > 0 : Ψ ⊂ Hs}, γ := sup {s > 0 : Ψ ⊂ Hs}. (10.12)

For the piecewise smooth wavelets Ψ, Ψ, the following relations hold: γ = m − 12 and γ = m − 1

2 ,respectively.The characterization

‖f‖Hs ∼ ‖Ds〈f, Ψ〉T ‖�2(∇) (10.13)

of the Sobolev space Hs holds for all −γ < s < γ.The characterization

‖f‖Bsd+tτ (Lτ (Ω)) ∼ ‖D

t〈f, Ψ〉T ‖�τ (10.14)

of the Besov space Bsd+tτ (Lτ (Ω)) hold for

s <m− td

, (10.15)

if the wavelets Ψ are contained in Bsd+tτ (Lτ (Ω)).The value m−t

d then limits the rates of the best N -term approximation in �2. In the following theoremwe show under what conditions on m, m the best nonlinear approximation rates only depend on thesmoothness of the function and not on the regularity of the approximating wavelets.

Theorem 10.6. Let r be the order of global smoothness of the function f .Let the dual frames Ψ, Ψ consist of piecewise smooth functions with order of accuracy m, m. If m, msatisfy the following requirements:

m ≥ r + 2 (10.16)

and

m ≥ (r + 32 )d− t

d− 1, (10.17)

then the order s of the best N-term approximation in �2 does not depend on the regularity of the framesΨ, Ψ and is limited by (10.11).

Proof. From Remark 10.5 follows that Ψ ⊂ Bsd+tτ (Lτ (Ω)) is fulfilled if s < γ−td−1 = m−1/2−t

d−1 , cf. [58,Rem. 4.3]. The assumption (10.16) assures that this additional condition does not limit the rates of thenonlinear approximation (10.11) in Ht. Nevertheless, condition Ψ ⊂ Bsd+tτ (Lτ (Ω)) is necessary for theestimate of the best N-term approximation rates (10.15). Now, to show that the rates (10.15) do notlimit the rates (10.11), we have to claim

m− td

≥ r + 32 − t

d− 1, (10.18)

which is equivalent to (10.17).

Remark 10.7. Let, for example, f be piecewise constant, i.e. r = −1, which is usually the case inmedical imaging. Moreover in all numerical realizations we will consider the case d = 2. Due to Theorem10.6 we obtain for m ≥ 1 and m ≥ 1− t that the range for the best N-term approximation in �2,

s < 1/2− t, (10.19)

is equal to the range (10.7) of the nonlinear approximation in Ht. In such a case the speed of the nonlinearapproximation is not affected by the discretization in terms of wavelets.

94

10.3.2 Requirements Imposed by the Operator Discretization

As we already know from Section 6.3, under certain conditions on the operator L and the wavelet systemΨ, the discretized operator S(L∗L) = FL∗LF ∗ can be approximated adaptively with order optimalcomputational complexity. It means that for the calculation of the matrix-vector product Sf up to somegiven accuracy ε > 0 one needs at most a constant multiple of NSf ,ε ∼ ε−1/s|Sf |1/s�wτ

operations. By Nv,ε

we usually denote the number of coefficients of the best N -term approximation on v given some accuracyε > 0, cf. Section 5.2.In Section 6.3 we reviewed how a routine APPLY for adaptive calculation of the matrix-vector productcan be constructed for s∗-compressible matrices, cf. Definition 6.8. In Proposition 6.9 we recalled thefact that a matrix S is s∗-compressible, once it belongs to the Lemarie class defined by (2.20)-(2.21).First, we show how the Lemarie estimate (2.20)-(2.21) can be proven in case of the Radon operator,discretized in terms of Haar wavelets.

Example 10.8. The entries of the stiffness matrix S of the self adjoint operator R∗R with respect to theHaar basis Ψ = {ψλ}λ∈∇ of L2(Ω) with Ω = [0, 1]d ⊂ R

d can be estimated as follows

|Sλ,λ′ | =∣∣〈R∗Rψλ, ψλ′〉∣∣ � 2−(|λ|+|λ′|)(1+d/2)

distΩλΩλ′3, for all, (10.20)

where Ωλ denotes the support of ψλ, supposed that dist(Ωλ,Ωλ′) > 0.

Proof. First we recall the integral representation of the operator R∗R, cf. [46, Thm. II.1.5]:

R∗Rf(x) = |Sd−1|∫

Rd

1|x− y|f(y)dy, f ∈ S(Rd), (10.21)

where |Sd−1| is the volume of the unit sphere Sd−1 ⊂ Rd, S(Rd) is the Schwartz-space. By the denseness

argument we extend the representation (10.21) to the square integrable functions f with support inΩ ⊂ R

d.As we know, Haar wavelets are piecewise constant functions with one vanishing moment, i.e.

〈ψλ, c〉 = 0 (10.22)

for all c ∈ R. From the integral representation (10.21) we obtain

∣∣〈R∗Rψλ, ψλ′〉∣∣ = |Sd−1|∣∣∣∫

Ωλ

ψλ(x)∫

Ωλ′

1|x− y|ψλ′(y)dydx

∣∣∣(10.22)

= |Sd−1|∣∣∣∫

Ωλ

ψλ(x)∫

Ωλ′

( 1|x− y| −

1|x− y′|

)ψλ′(y)dydx

∣∣∣. (10.23)

With the Tailor expansion of the kernel |x− y|−1 in some point y = y′ ∈ Ωλ′ and

( 1|x− y|

)y

=x− y|x− y|3 (10.24)

we get for some y′′ ∈ Ωλ′

∣∣〈R∗Rψλ, ψλ′〉∣∣ = |Sd−1|∣∣∣∫

Ωλ

ψλ(x)∫

Ωλ′(y − y′) x− y′′

|x− y′′|3ψλ′(y)dydx∣∣∣. (10.25)

By changing the integration order, calculating the partial derivative

( x− y′′|x− y′′|3

)x

= − 12|x− y′′|3 (10.26)

95

and using the moment condition (10.22), we get with the Tailor expansion of (x− y′′)/|x− y′′|3 in somepoint x = x′ ∈ Ωλ for some x′′ ∈ Ωλ

∣∣〈R∗Rψλ, ψλ′〉∣∣ = |Sd−1|∣∣∣∫

Ωλ′(y − y′)ψλ(y)dy

∫Ωλ

(x− x′)(− 1

2|x′′ − y′′|3)ψλ(x)dx

∣∣∣� 1

|x′′ − y′′|3 supy∈Ωλ′

|y − y′|∫

Ωλ′|ψλ′(y)|dy sup

x∈Ωλ

|x− x′|∫

Ωλ

|ψλ(x)|dx. (10.27)

Due to the following properties of the Haar wavelets such as

|x− x′| ≤ diam(Ωλ) ∼ 2−|λ|, x ∈ Ωλ (10.28)

and

∫Ωλ

|ψλ(x)|dx � 2−d|λ|/2, (10.29)

we get the final estimate

∣∣〈R∗Rψλ, ψλ′〉∣∣ � 1|x′′ − y′′|3 2−2(|λ|+|λ′|) ≤ 2−(1+d/2)(|λ|+|λ′|) 1

(dist(Ωλ,Ωλ′))3. (10.30)

Proposition 10.9. Let Ψ, Ψ be such that min {γ − t, t+ m} > 0, where 2t = 1 − d is the order of theself adjoint operator R∗R. Then, the stiffness matrix S of the operator R∗R with respect to the systemΨ satisfies the Lemarie estimate (2.20) with σ = min {γ − t, t+ m} and β = d+ 2t+ 2m.

Proof. Applying an analogous argument as in Example 10.8 we obtain the following basic estimate:

∣∣〈R∗Rψλ, ψλ′〉∣∣ � 2−(m+d/2)(|λ|+|λ′|)

(dist(Ωλ,Ωλ′))d+2t+2m(10.31)

for all values Sλ,λ′ of the stiffness matrix with dist(Ωλ,Ωλ′) > 0.With the fact that 2t = 1− d for the operator R∗R, we get

∣∣2−t(|λ|+|λ′|)〈R∗Rψλ, ψλ′〉∣∣ � 2−(m+d/2+t)||λ|−|λ′||

(1 + 2min {|λ|,|λ′|}dist(Ωλ,Ωλ′))d+2t+2m, (10.32)

for the stiffness values Sλ,λ′ with dist(Ωλ,Ωλ′) � 2−min {|λ|,|λ′|}.For all remaining stiffness values one can use the continuous invertibility (10.5) of R∗R. Due to theSchwartz’ inequality we get with some σ > 0

∣∣〈R∗Rψλ, ψλ′〉∣∣ ≤ ‖R∗Rψλ′‖H−t+σ(Ω)‖ψλ′‖Ht−σ(Ω)

(10.5)

� ‖ψλ‖Ht+σ‖ψλ′‖Ht−σ , (10.33)

where 2t = 1 − d is the order of R∗R. The last estimate is valid for all σ > 0 satisfying −γ < t − σ <

t+σ < γ. As the left inequality can be relaxed [59] towards −m < t−σ, we get with the norm equivalence‖ψλ‖Hs ∼ 2s|λ|‖ψλ‖L2 , that the stiffness entries of R∗R can be estimated as

∣∣2−t(|λ|+|λ′|)〈R∗Rψλ, ψλ′〉∣∣ � 2−σ||λ|−|λ′||

(1 + 2min {|λ|,|λ′|}dist(Ωλ,Ωλ′))d+2t+2m, (10.34)

for 0 < σ = min {γ − t, m+ t}. Note that in (10.34), the Lemarie estimate is satisfied for the precondi-tioned matrix D−tSD−t. However, as we work with negative order t < 0, we can estimate the values ofS as follows:

96

|〈R∗Rψλ, ψλ′〉| ≤∣∣2−t(|λ|+|λ′|)〈R∗Rψλ, ψλ′〉∣∣. (10.35)

Therefore, the non-preconditioned matrix S is in the Lemarie class with σ = min {γ − t, m+ t} andβ = d+ 2t+ 2m.

The estimate (10.34) is a Lemarie estimate, which allows to show the s∗-compressibility of the stiffnessmatrix S.

Theorem 10.10. Let d = 2. If the wavelet system Ψ, Ψ has regularity γ ≥ 3/2 and γ ≥ 5/2, respectively,then the stiffness matrix S of the operator R∗R is s∗-compressible with s∗ ≥ 1/2.

Proof. We now apply the result about compressibility of Lemarie matrices. It says that a Lemarie matrixis s∗-compressible with

s∗ := min{σd− 1

2,β

d− 1

}. (10.36)

Now, with t = (1− d)/2 = −1/2 for d = 2 we get from Proposition 10.9 the estimates σ ≥ 2, β ≥ 7.Therefore, for the value of s∗ we finally get s∗ ≥ min {2/2− 1/2, 7/2− 1} = 1/2.

Remark 10.11. It is easy to see that the Haar wavelets do not satisfy the condition min {γ − t, t+ m}/d−1/2 > 0 for t = (1−d)/2. Therefore, it is in general not possible to show the compressibility of the stiffnessmatrix S in terms of the Haar system Ψ. However, it is still possible to apply the compression techniquesto this kind of discretization, and achieve certain sparseness of the solution, as it can be seen in ournumerical experiments, cf. Section 10.4.

Remark 10.12. It is sometimes worth to work with a preconditioned version D−sSD−s of the stiffnessmatrix S with 0 < s < −t. For instance, if we choose s = 1/4, the experiments show that the number ofiterations can be considerably reduced, while the convergence rates remain the same. For more theoreticalresults in this direction we refer the reader to [60, 31].

10.4 Numerics

We wish to remark, that in the presented numerical experiment we have used for the reason of simplicityjust Haar wavelet systems. Even in this situation, the numerical results of the proposed adaptive schemeare already considerably sparse compared to the solution of the full system. Moreover, the usage ofHaar wavelets (or B-splines in general) allows an exact computation of stiffness matrix entries avoidingadditional numerical errors as usually introduced by quadrature rules.

10.4.1 Adaptive Discretization

For the equivalent �2-formulation we have chosen the two dimensional Haar wavelet basis Ψ = {ψλ}λ∈Λ

for the functional space L2([0, 1]2). By application of the corresponding frame operator F we transformthe normal equation of (10.2) into an infinite dimensional �2 system

Ssf = gδ, (10.37)

where Ss = D−sFR∗RF ∗D−s is the stiffness matrix ofR∗R preconditioned by a matrix D−s, F ∗D−sf =f is the wavelet decomposition of f , and gδ = D−sFR∗gδ is the preconditioned right hand side. Waveletbased preconditioning is a widely used tool, which allows to speed up significantly the computation ofadaptive methods, cf. [17, 51]. In our case we use the wavelet-based preconditioner D−s with Dλ,λ′ =

97

2|λ|δλ,λ′ and s = −1/4. For the admissible ranges of s in preconditioning of the Landweber iteration werefer to [31]. In Figure 10.1 we compare the condition numbers of the matrix S = FR∗RF ∗ and thepreconditioned one Ss = D−sSD−s.

Figure 10.1: Condition numbers of the matrix S and the preconditioned one Ss = D−sSD−s fors = −1/4 on different resolution levels Jmax

Applied to the Radon equation (10.2), the resulting adaptive Landweber iteration (cf. (9.17)) reads as

m+1 = APPROXrδ(m)[fδ

m, gδ]

:= COARSErδ(m)

[fδ

m − βAPPLYrδ(m)[fδ

m] + βRHSrδ(m)[gδ]], (10.38)

with accuracy/refinement map rδ(m) and truncation index m∗ chosen by the adaptive discrepancy prin-ciple (D). By f

δwe denote the result of the truncated iteration (10.38) for given noise level δ. A

corresponding approximation fδ of the continuous solution f can be computed from the preconditionediteration (10.38) by fδ = F ∗D−sf

δ.

The routine APPLY performs adaptive multiplication of the preconditioned matrix Ss with the currentapproximation f

δ

m. As already mentioned in Section 6.3, the adaptive matrix-vector product is basedon the approximation of the matrix Ss by the sparse matrices Ss

j . In Figure 10.2 we illustrate theconvergence of the approximation and the sparsity of the approximating matrices. As it is numericallyimpossible to calculate the distances Cj = ‖Ss − Ss

j‖ for the infinite matrices, we estimated them bylimiting the highest resolution given by the finest level Jmax. As we can see, the values Cj are numericallyconvergent as Jmax →∞. Also the sparsity of the matrices Ss

j can be estimated by truncating at scale.As we can see, the maximal number of the nonzero elements in a row of Ss

j is consistent for the differentvalues of Jmax.For the numerical simulation we have used the so-called Shepp-Logan Phantom f and its associatedRadon transform g, see Figure 10.3.

98

Figure 10.2: From top left to bottom right: Approximation errors Cj = ‖Ss−Ssj‖ measured on different

resolution levels, log2 of the constants Cj , maximal number of nonzero elements of Ssj per row, log2 of

the maximal number of nonzero elements.

Figure 10.3: Left: Shepp-Logan Phantom image, right: associated noise free sinogram.

99

Figure 10.4:Left: total approximation error vs. noise level, right: total computational complexity vs. noise level for

1. adaptive preconditioned iteration, 2. non-adaptive preconditioned iteration and 3. non-adaptiveiteration without preconditioning.

The reconstructions were done for different relative noise levels δrel = ‖gδ − g‖/‖g‖, see Figure 10.6.In Figure 10.4 we compare the convergence rates as well as the computational complexity of the adaptiveapproximation (10.38) with the non-adaptive one. Here, non-adaptivity means that all iterations of theLandweber method were carried out with the full matrix and without adaptive thresholding. As we cansee, for δ → 0, the adaptive approximations fδ converge to the synthetic data f with the same rates as thenon-adaptive one. Moreover, the preconditioning does not effect the convergence rates but significantlyreduces the computational effort.At the same time, the computational complexity of the adaptive method as well as the number of nonzeroframe coefficients increase as δ → 0, see Figures 10.4 and 10.5. The reason for this effect can be explainedby the truncation condition (9.81) of the adaptive discrepancy principle (D), in which the accuracy termCrδ(m)(f

δ

m) is compared with the squared noise level δ2. (Remember, the term Crδ(m)(fδ

m) essentiallyinvolves of the accuracy εrδ(m) of the routine APPROX implying that truncation accuracy also tendsto zero as δ → 0.However, numerical results show, that if the truncation rule compares Crδ(m)(f

δ

m) with δ instead ofδ2, then the sparsity of the solution as well as the rates of the best N-term approximation increaseconsiderably. Nevertheless, the convergence order decrease from δ2/3 for Crδ(m)(f

δ

m) ∼ δ2 to δ1/3 for

Crδ(m)(fδ

m) ∼ δ, cf. Figure 10.5.In Figure 10.3 the original (noise free) data are shown. As our main aim is to reduce the computationalcomplexity, we illustrate the results of the modified truncation rule, i.e. Crδ(m−1)+j(f

δ

m) ∼ δ, whichyields a convergent numerical scheme with the sparsest approximations. in Figure 10.6, different approx-imations fδ for different noisy right hand sides gδ are illustrated. The final frame grids of the individualreconstructions are given in Figure 10.7.We would like to emphasize that, according to the numerical results, we have illustrated convergenceof our adaptive method, moreover we have obtained sparse approximation for the numerical solution.Additionally, numerical experiments show that the adaptive approximation leads to the same convergencerates as the conventional regularization approach. Thus, we have shown numerically that the orderoptimality of the adaptive regularization can be achieved with improved numerical effort. However,some questions, such as the degree of the obtained sparseness as well as the optimality of computationalcomplexity, still remain open.

100

Figure 10.5:Comparison of the truncation rules Crδ(m)(f

δ

m) ∼ δ2 and Crδ(m)(fδ

m) ∼ δ. From top left to bottomright: total error ε vs. noise level δ, estimates of the convergence order, number of the non-zero

coefficients at truncation step vs. noise level, total error vs. nonzero coefficients number.

101

Figure 10.6: Adaptive Landweber approximations with modified truncation rule Crδ(m)(fδ

m) ∼ δ fordifferent noise levels δrel = 10%, 5%, 2%, 0.00625%. The left column shows the noisy data gδ; the rightcolumn shows the reconstructions.

102

Figure 10.7: Different resulting wavelet frame grids associated to different adaptive reconstructions fornoise levels δrel = 10%, 5%, 2%, 0.00625% shown in Figure 10.6. The individual sub-figures are three-dimensional: the layer ordering is from coarse to fine scale frame coefficients (from top to bottom).Within each layer we have visualized the label/location of the frame coefficients (not its magnitude).

103

104

Chapter 11

Summary and Outlook

In this thesis we have applied the concept of the nonlinear approximation to the problem of solving linearill-posed inverse problems

Lf = g, (11.1)

where L : X → Y is some linear bounded operator between Hilbert spaces X and Y . In general, wecall the problem (11.1) ill-posed, if the solution does not exist, is not unique or if it does not dependcontinuously on the data g. Typically, there is some noisy version gδ of the exact data g with ‖gδ−g‖ ≤ δ

given. A standard approach to solving linear inverse problems is to replace the unbounded generalizedinverse L† with a family of bounded operators Rα, depending on parameter α = α(δ, gδ), which satisfythe regularization property, i.e.

Rαgδ → L†g and α(δ, gδ) → 0, as δ → 0. (11.2)

The speed of convergence (11.2) depends on Rα, the choice of the regularization parameter α = α(δ, gδ)and the properties of the solution.

Based on the classical regularization methods, such as Tikhonov and Landweber regularization, we havedeveloped two adaptive regularization schemes. There, we have exploited the existing adaptive techniquesfor nonlinear operator approximation and nonlinear thresholding. Nonlinear approximation has beenwidely exploited for solving well-posed problems. In our work, we take advantage of the existing adaptivemethods, which allow to approximate the solution of a well-posed problem with any prescribed accuracy.

The first goal was to show that it is possible to modify the standard methods in an adaptive way,conserving their regularization property and convergence rates. In case of the adaptive Tikhonov methodRα,ε we have shown the regularization property and have suggested a-priori and a-posteriori order optimaladaptive parameter choice rules. The crucial advantage of the Tikhonov method is that the regularizationof an ill-posed operator equation (11.1) results in solving the family of the well-posed problems

(L∗L+ αI)f = L∗gδ. (11.3)

The adaptive approach Rα,ε consists then in nonlinear approximation of the solution fδα of the well-posedproblem (11.3) with accuracy ε. This fact allows to split the total approximation error Rα,εgδ − L†g intwo parts:

Rα,εgδ − L†g = (Rα,εgδ −Rαgδ) + (Rαgδ − L†g), (11.4)

105

and to use the classical results on a-priori parameter choice rules. We have also developed an a-posterioriparameter choice rule, based on the balancing principle, which has been adaptively modified, in order tobe compatible with the adaptive scheme.Another scheme follows the idea of regularizing iterative methods. It is a known fact that Landweberiteration

fn+1 = fn − βL∗(Lfn − gδ), (11.5)

which originates from the ill-posed normal equation

L∗Lf = L∗gδ, (11.6)

has a regularizing property, if truncated at some step n = n(δ).As (11.6) is ill-posed, its solution cannot be approximated directly, as it was the case in (11.3). However,we can use the adaptive building blocks, such as adaptive operator application, adaptive thresholdingetc, in order to formulate an inexact iteration

fn+1 = APPROXn[fn − βL∗(Lfn − gδ)], (11.7)

where APPROXn[f ] denotes an approximation of f with some accuracy εr(n). The index functionr(n) denotes the refinement strategy, which chooses the next accuracy value from some exponentiallydecreasing sequence εk. As in the exact case, the regularization property of the approximate Landweberiteration (11.7) is provided by an appropriate truncation rule. In our case we adaptively modify theMorozov’s discrepancy principle, which additionally specifies the refinement strategy r(n), and showconvergence of the truncated adaptive iteration.The analysis of convergence rates has not been carried out so far. Nevertheless, the numerical experimentsshow that the adaptive scheme exhibits the same speed of convergence as the conventional one. In thissituation it is possible to compare the computational costs of both adaptive and non-adaptive methods.In our numerical simulation we compare the full Landweber method with the adaptive one, applied to thetomographic reconstruction problem, given by the linear Radon equation. We observe that the numberof the evaluated coefficients increases in every iteration step, but remains sparse during the iteration.Also we see the effect of the adaptive approximation at the computational effort, which is considerablyreduced in the adaptive case. However, the number of the iteration steps is nearly the same in the full andthe adaptive cases, and it explodes with the decreasing noise level δ. An improvement in this directioncan be made by the preconditioning methods. We implemented a wavelet-based preconditioner for theRadon operator and illustrate how its application reduces the numerical effort.The analytical estimation of the order of computational complexity remains an open question for manyreasons. On the one hand, we do not make any statement concerning the sparsity of the obtainedsolution, i.e. how the number of the obtained coefficients is related to the number N of the best N-termapproximation on the solution. Of course, for the validation of the numerical costs we also need toestimate theoretically the number of the iterates with respect to the noise level δ, and the costs of everyiteration step. The first steps in this direction have been done for the Tikhonov method.As a future work we would like to point out the analysis of the convergence rates of the adaptive Landweberiteration, and the estimates of the overall computational complexity of both adaptive schemes. Moreover,we would like to exploit the sparse structure of the solution in a better way, in order to locally stabilizethe problem and to control the number of the iterations in such a way. Furthermore, we can imaginethat the adaptive approach can be applied to speed up other classical regularization methods, such asthe conjugate gradients approach. It can be also possible to improve the existing sparsity-based iterativemethods, by performing the evaluation of the forward operator in an adaptive way.

106

Bibliography

[1] Hans Wilhelm Alt. Lineare Funktionalanalysis. Springer, Berlin Heidelberg, 2006.

[2] S. Anthoine. Different wavelet-based approaches for the separation of noisy and blurred mixtures ofcomponents. application to astrophysical data. PhD Thesis, 2005.

[3] A. Barinka. Fast evaluation tools for adaptive wavelet schemes. PhD Thesis, RWTH Aachen, 2005.

[4] J. Bergh and J. Lofstrom. Interpolation Spaces. Springer–Verlag, Berlin–Heidelberg–New York,1976.

[5] T. Bonesky and P. Maass. Iterated soft shrinkage with adaptive operator evaluations. Journal ofInverse and Ill-Posed Problems, to appear, 2008.

[6] K. Bredies, D.A. Lorenz, and P. Maass. A generalized conditional gradient method and its connectionto an iterative shrinkage method. Comput. Optim. Appl., 2005.

[7] O. Christensen. An Introductin to Frames and Riesz Bases. Birkhauser, 2003.

[8] A. Cohen. Wavelet methods in numerical analysis. in Handbook of numerical analysis, VII:417–711,2000.

[9] A. Cohen. Numerical analysis of wavelet methods. Studies in Mathematics and its Applications, 32,2003.

[10] A. Cohen, W. Dahmen, and R. DeVore. Adaptive wavelet methods for elliptic operator equations –convergence rates. Math. Comput., 70(233):27–75, 2001.

[11] A. Cohen, W. Dahmen, and R. DeVore. Adaptive wavelet methods II - beyond the elliptic case.Found. Comput. Math., 2:203–245, 2002.

[12] A. Cohen, I. Daubechies, and J.-C. Feauveau. Biorthogonal bases of compactly supported wavelets.Comm. Pure Appl. Math., 45:485–560, 1992.

[13] A. Cohen, I. Daubechies, and P. Vial. Wavelets on interval and fast wavelet transforms. Appl.Comput. Hormon. Anal., 1:54–81, 1993.

[14] S. Dahlke, M. Fornasier, M. Primbs, T. Raasch, and M. Werner. Nonlinear approximation withGelfand frames. in preparation, 2007.

[15] S. Dahlke, M. Fornasier, and T. Raasch. Adaptive frame methods for elliptic operator equations.Adv. Comput. Math., 27(1):27–63, 2007.

[16] W. Dahmen. Stability of multiscale transformations. The Journal of Fourier Analysis and Applica-tions, 2:341–361, 1996.

107

[17] W. Dahmen. Wavelet and multiscale methods for operator equations. Acta Numerica, CambridgeUniversity Press, 6:55–228, 1997.

[18] W. Dahmen, A. Kunoth, and K. Urban. Biorthogonal spline-wavelets on the interval – stability andmoment conditions. Appl. Comput. Harmon. Anal., 6:132–196, 1999.

[19] I. Daubechies. Orthonormal bases of compactly supported wavelets. Comm. Pure & Appl. Math.,41:909–969, 1988.

[20] I. Daubechies. Ten Lectures on Wavelets. SIAM, Philadelphia, 1992.

[21] I. Daubechies, M. Defrise, and C. DeMol. An iterative thresholding algorithm for linear inverseproblems with a sparsity constraint. Comm. Pure Appl. Math, 57:1413–1541, 2004.

[22] I. Daubechies, M. Fornasier, and I. Loris. Accelerated projected gradient methods for linear inverseproblems with sparsity constraints. J. Fourier Anal. Appl., to appear, 2008.

[23] I. Daubechies and G. Teschke. Wavelet-based image decomposition by variational functionals. Proc.SPIE Vol. 5266, p. 94-105, Wavelet Applications in Industrial Processing; Frederic Truchetet; Ed.,Feb. 2004.

[24] I. Daubechies and G. Teschke. Variational image restoration by means of wavelets: simultaneousdecomposition, deblurring and denoising. Applied and Computational Harmonic Analysis, 19(1):1–16, 2005.

[25] I. Daubechies, G. Teschke, and L. Vese. Iteratively solving linear inverse problems with generalconvex constraints. Inverse Problems and Imaging, 1(1):29–46, 2007.

[26] I. Daubechies, G. Teschke, and L. Vese. On some iterative concepts for image restoration. Advancesin Imaging and Electron Physics, 150, 2008.

[27] M. Defrise and C. DeMol. A note on stopping rules for iterative regularization methods and filteredsvd. in: P. C. Sabatier, ed., Inverse Problems: An Interdisciplinary Study, pages 261–268, 1987.

[28] R. DeVore. Nonlinear Approximation. Acta Numerica, 7:51–150, 1998.

[29] R. DeVore and G.G. Lorentz. Constructive Approximation. Springer, Berlin, 1998.

[30] R. A. DeVore and B. J. Lucier. Wavelets. In Acta Numerica 1, pages 1–56. Cambridge UniversityPress, 1991.

[31] H. Egger and A. Neubauer. Preconditioning Landweber iteration in Hilbert scales. Numer. Math.,101:643–662, 2005.

[32] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems. Kluwer, Dordrecht,1996.

[33] M. Fornasier. Domain decomposition methods for linear inverse problems with sparsity constraints.Inverse Problems, 23:2505, 2007.

[34] K.-H. Grochenig. Localization of frames, Banach frames, and the invertibility of the frame operator.J. Fourier Anal. Appl., 10(2):105–132, 2004.

[35] H. Johnen and K. Scherer. On the equivalence of the K-functional and moduli of continuity andsome applications. Constr. Theory Funct. several Variables, Proc. Conf. Oberwolfach 1976, Lect.Notes Math., 571:119–140, 1977.

108

[36] S. G. Krein and Yu. I. Petunin. Scales of Banach spaces. Russ. Math. Surv., 21(2):85–159, 1966.

[37] L. Landweber. An iteration formula for Fredholm integral equations of the first kind. Amer. J.Math., 73:615–624, 1951.

[38] O. V. Lepskii. On one problem of adaptive estimation on white gaussian noise. Teor. Veroyatnost.i Primenen., 35:459–470, 1990.

[39] A. K. Louis. Inverse und schlecht gestellte Probleme. Teubner, Stuttgart, 1989.

[40] A. K. Louis, P. Maaß, and A. Rieder. Wavelets. Teubner, Stuttgart, 1998.

[41] S. Mallat. Multiresolution approximations and wavelet orthonormal bases of L2(R). Trans. Amer.Math. Soc., 315:69–87, September 1989.

[42] P. Mathe and S. Pereverzev. Regularization of some linear ill-posed problems with discretized randomnoisy data. Math. Comput., 75(256):1913–1929, 2006.

[43] Y. Meyer. Wavelets and operators. Cambridge Studies in Advanced Math., 37, 1992.

[44] V. A. Morozov. On the solution of functional equations by the method of regularization. SovietMath. Dokl., 7:414 – 417, 1966.

[45] F. Natterer. Error bounds for Tikhonov regularization in Hilbert scales. Appl. Anal., 18:29–37, 1984.

[46] F. Natterer. The Mathematics of Computerized Tomography. Teubner, Stuttgart, 1986.

[47] A. Neubauer. An a posteriori parameter choice for Tikhonov regularization in Hilbert scales leadingto optimal convergence rates. SIAM J. Numer. Anal., 25:1313–1326, 1988.

[48] A. Neubauer. An a posteriori parameter choice for Tikhonov regularization in the presence ofmodelling error. Appl. Numer. Math., 4:507–519, 1988.

[49] S. Pereverzev and E. Schock. On the adaptive selection of the parameter in the regularization ofill-posed problems. SIAM J. Numer. Anal., 43:2060–2076, 2005.

[50] D. L. Phillips. A technique for the numerical solution of certain integral equations of the first kind.J. Assoc. Comput. Mach., 9:84–97, 1962.

[51] T. Raasch. Adaptive wavelet and frame schemes for elliptic and parabolic equations. PhD thesis,University of Marburg, 2007.

[52] R. Ramlau. Modifizierte Landweber-Iterationen zur Losung von schlecht gestellten inversen Proble-men. Dissertation, 1997.

[53] R. Ramlau. A modified Landweber-method for inverse problems. Journal for Numerical FunctionalAnalysis and Optimization, 20(1&2):79–98, 1999.

[54] R. Ramlau and G. Teschke. A projection iteration for nonlinear operator equations with sparsityconstraints. Numerische Mathematik, 104:177–203, 2006.

[55] T. Runst and W. Sickel. Sobolev spaces of fractional order, Nemytskij operators and nonlinear partialdifferential equations. de Gruyter, Berlin, 1996.

[56] R. Schneider. Multiscalen- und Wavelet-Matrixkompression: Analysisbasierte Methoden zur effizien-ten Losung grosser vollbesetzter Gleichungssysteme. Habilitationsschrift, TH Darmstadt, 1995.

109

[57] E. Schock. Approximate solution of ill-posed equations: arbitrary slow convergence vs. supercon-vergence. In: G. Hammerlin and K. H. Hoffmann, eds., Constructive Methods for the PracticalThreatment of Integral Equations, 1985.

[58] R. Stevenson. Adaptive solution of operator equations using wavelet frames. SIAM J. Numer. Anal.,41:1074–1100, 2003.

[59] R. Stevenson. On the compressibility of operators in wavelet coordinates. SIAM J. Numer. Anal.,35:1110–1132, 2004.

[60] U. Tautenhahn. Error estimates for regularization methods in Hilbert scales. SIAM J. Numer. Anal.,33(6):2120–2130, 1996.

[61] G. Teschke. Multi-frame representations in linear inverse problems with mixed multi-constraints.Applied and Computational Harmonic Analysis, 22:43 – 60, 2007.

[62] A. N. Tikhonov. Regularization of incorrectly posed problems. Soviet Math. Dokl., 4:1624–1627,1963.

[63] A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization method. SovietMath. Dokl., 4:1035–1038, 1963.

[64] A. N. Tikhonov and V. B. Glasko. Approximate solution of Fredholm integral equations of the firstkind. Zh. Vych. Mat. i Mat. Fiz., 4(3):564–571, 1964.

[65] H. Triebel. Interpolation Theory, Function Spaces, Differential Operators. Verlag der Wissenschaften,Berlin, 1978.

[66] H. Triebel. Theory of Function Spaces. Birkhauser, Basel–Boston–Berlin, 1983.

[67] H. Triebel. Theory of Function Spaces II. Birkhauser, Basel–Boston–Berlin, 1992.

[68] L.F. Villemoes. Sobolev regularity of wavelets and stability of iterated filter banks. Progress inwavelet analysis and applications, pages 243–251, 1993.

110