269
Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen von der Fakult¨ at II - Mathematik und Naturwissenschaften der Technischen Universit¨at Berlin zur Erlangung des akademischen Grades Doktor der Naturwissenschaften - Dr.rer.nat. - genehmigte Dissertation Promotionsausschuss Vorsitzender: Prof. Dr. Reinhold Schneider Gutachter: Prof. Dr. Peter B¨ urgisser Gutachter: Prof. Dr. Felipe Cucker Tag der wissenschaftlichen Aussprache: 25. Juli 2017 Berlin, 2017

Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Numerical and Statistical Aspects

of Tensor Decompositions

vorgelegt von

Paul Breiding, M.Sc. Mathematikgeb. in Witzenhausen

von der Fakultat II - Mathematik und Naturwissenschaften

der Technischen Universitat Berlin

zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften- Dr.rer.nat. -

genehmigte Dissertation

Promotionsausschuss

Vorsitzender: Prof. Dr. Reinhold SchneiderGutachter: Prof. Dr. Peter BurgisserGutachter: Prof. Dr. Felipe Cucker

Tag der wissenschaftlichen Aussprache: 25. Juli 2017

Berlin, 2017

Page 2: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 3: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Hereby I declare that I wrote this thesis myself with the help of nomore than the mentioned literature and auxiliary means.

Berlin, 09.08.2017

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .(Paul Breiding)

Page 4: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 5: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Acknowledgements

First and foremost, I wish to express my gratitude to my advisor Prof. Dr. PeterBurgisser for many helpful discussion and advice, for his patience in reading overmy preprint as carefully as he did, and for his support and encouragement. Iwould like to thank him for recommending me to go to various workshops andconferences.

Furthermore, I thank Prof. Dr. Felipe Cucker who agreed to act as a secondsupervisor for this work.

I also want to thank Nick Vannieuwenhoven very much for our very productivcecollaboration, without which the first part of this work wouldn’t have been possible.In addition to that, I thank my colleagues Carlos Amendola, Josue Tonelli Cueto,Jesko Huttenhain, Kathlen Kohn, Pierre Lairez and Antonio Lerario for manyfruitful discussions we had.

I thank the Simons Institute at UC Berkeley for providing the stimulating en-vironment for the program ”Algorithms and Complexity in Algebraic Geometry”in 2014 and for the reunion meeting one year later. Furthermore, I want to thankPeter Burgisser, Joseph Landsberg, Ketan Mulmuley and Bernd Sturmfels for or-ganizing that program and inviting me to give a presentation at the reunion event.In particular, I thank Peter Burgisser for supporting me in attending the program.

I moreover wish to thank the organizers of the workshop ”Tensor Decomposi-tions and Applications 2016” in Leuven, Belgium, where the author first met withNick Vannieuwenhoven.

Finally, I want to put emphasis on the fact that this work would not have beenpossible without the generous financial support (DFG-grant BU 1371/2-2) of theDeutsche Forschungsgemeinschaft.

Page 6: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 7: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Abstract

In this work we study numerical and statistical properties of tensor decompositions,namely the canonical-polyadic decomposition— commonly known as tensor-rankdecomposition—and the computation of tensor eigenpairs.

After a preliminary section, in which we consider tensors and their properties,explain the use of condition numbers in numerical analysis and give a short intro-duction to random tensors, this work is divided into three parts.

In the first part we define a condition number for the tensor-rank decomposi-tion, give an algorithm to compute tensor-rank decompositions and analyze thisalgorithm by means of the aforementioned condition number. Furthermore, wegive an interpretation of the condition number for the tensor-rank decompositionas an inverse distance to ill-posedness.

In the second part we give an introduction to eigenpairs of tensors. There-after, we compute the density of an eigenvalue that is chosen uniform at randomfrom all the eigenvalues of a tensor, whose entries are i.i.d. complex Gaussian ran-dom variables. Furthermore, we construct an efficient (average polynomial-time)homotopy-method to solve for tensor eigenpairs.

Finally, in the third part we investigate the expected number of real eigenpairsfor a random real tensor tensor. We consider two random tensor models: The firstis the generalization of the real Ginibre ensemble from matrices to tensors; thesecond is the generalization of the Gaussian Orthogonal Ensemble from symmetricmatrices to symmetric tensors.

Page 8: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 9: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Zusammenfassung

In dieser Arbeit werden numerische und statistische Eigenschaft von Tensor Zer-legungen untersucht. Diese sind die kanonisch-polyadische Zerlegung—auch bekanntals Tensor-Rang Zerlegung—und die Berechnung von Tensor Eigenpaaren.

Zunachst stellen wir in einem einleitenden Abschnitt Tensoren und ihre Eigen-schaften vor, erklaren den Nutzen von Konditionszahlen in der numerischen Anal-yse und geben eine kurze Einleitung in zufallige Tensoren. Der weitere Teil derArbeit ist in drei Abschnitte gegliedert.

Im ersten Abschnitt definieren wir die Konditionszahl der Tensor-Rang Zer-legung, beschreiben einen Algorithmus um jene zu berechnen und analysierendiesen Algorithmus mit Hilfe der zuvor genannten Konditionszahl. Zudem in-terpretieren wir die Konditionszahl als inversen Abstand zur ”ill-posedness”.

Darauf folgend, im zweiten Teil, geben wir eine Einfuhrung in Tensor Eigen-paare. Im Anschluss berechnen wir die Dichte eines Eigenwertes, der uniform ausallen Eigenwerten eines zufalligen complexen Tensors gezogen wird. Dabei sind dieEintrage des Tensors unabhangig und identisch verteilte complex Gauss’sche Zu-fallsvariablen. Weiterhin beschreiben wir ein effizientes (im Mittel Polynomialzeit)Homotopie-Verfahren um Tensor Eigenpaare zu berechnen.

Im dritten und letzten Abschnitt untersuchen wir die erwartete Anzahl reellerEigenpaare eines reellen zufalligen Tensors. Dabei betrachten wir zwei Modelleeines zufalligen Tensors: Das Erste ist die Verallgemeinerung des reellen GinibreEnsembles von Matrizen zu Tensoren; das zweite ist die Verallgemeinerung desGauss’schen Orthogonal Ensembles von symmetrischen Matrizen zu symmetrischenTensoren.

Page 10: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 11: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Contents

1. Introduction 11.1. First Example: Contingency Tensors . . . . . . . . . . . . . . . . . 21.2. Second Example: Blind Source Separation . . . . . . . . . . . . . . 41.3. Third Example: Diffusion Tensor Imaging . . . . . . . . . . . . . . 61.4. Fourth Example: Higher Order Markov Chains . . . . . . . . . . . . 91.5. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2. Preliminaries 172.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2. The space of tensors . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3. Condition numbers in numerical analysis . . . . . . . . . . . . . . . 262.4. Real and complex random variables . . . . . . . . . . . . . . . . . . 30

I. Numerical analysis of tensor rank decompositions 35

3. The condition number of join decompositions 373.1. A geometric condition number . . . . . . . . . . . . . . . . . . . . . 383.2. The condition number as distance to ill-posedness . . . . . . . . . . 403.3. The condition number of open boundary points . . . . . . . . . . . 49

4. Riemannian optimization for least-squares problems on joins 554.1. The Riemannian Gauss–Newton method . . . . . . . . . . . . . . . 564.2. Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5. Application: The tensor rank decomposition 635.1. The condition number of tensor rank decomposition . . . . . . . . . 635.2. Condition number theorem for the tensor rank decomposition . . . 65

II. Numerical analysis of solving for eigenpairs of tensors 79

6. Eigenpairs of tensors 816.1. From rank-one approximation of symmetric tensors to eigenpairs . . 81

xi

Page 12: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Contents

6.2. The number of eigenpairs of a tensor . . . . . . . . . . . . . . . . . 846.3. Geometric framework I . . . . . . . . . . . . . . . . . . . . . . . . . 856.4. Gaussian tensors and Gaussian polynomial systems . . . . . . . . . 896.5. Integration on the solution manifold . . . . . . . . . . . . . . . . . . 916.6. Eigenpairs and h-eigenpairs . . . . . . . . . . . . . . . . . . . . . . 94

7. The general homotopy method is not a good choice for the eigenpairproblem 977.1. Distribution of the eigenvalues of a complex Gaussian tensor . . . . 987.2. Solving for eigenpairs is ill-posed when the measure is the classical

condition number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8. A homotopy method to solve for eigenpairs of complex tensors 1118.1. Condition of solving for h-eigenpairs . . . . . . . . . . . . . . . . . 1138.2. The adaptive linear homotopy method . . . . . . . . . . . . . . . . 1198.3. Auxiliary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1258.4. Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

III. Statistics of real eigenpairs of real tensors 153

9. The expected number of real eigenpairs of a real Gaussian tensor 1559.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1559.2. Geometric framework II . . . . . . . . . . . . . . . . . . . . . . . . 1589.3. General tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629.4. Symmetric tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Bibliography 191

List of symbols 193

List of algorithms 194

List of figures 195

xii

Page 13: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Appendix 197

A. Operator norms 201A.1. Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201A.2. Norms of multilinear operators . . . . . . . . . . . . . . . . . . . . . 204

B. Differential Geometry 205B.1. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . 205B.2. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 206B.3. Hermitian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . 209B.4. Integration on manifolds . . . . . . . . . . . . . . . . . . . . . . . . 210

C. Higher transcendental functions 213C.1. The error function and the complementary error function . . . . . . 213C.2. The (incomplete) Gamma function . . . . . . . . . . . . . . . . . . 213C.3. The (incomplete) Beta function . . . . . . . . . . . . . . . . . . . . 214C.4. Hypergeometric functions . . . . . . . . . . . . . . . . . . . . . . . 215

D. Hermite polynomials 219D.1. Definition and intrarelationships . . . . . . . . . . . . . . . . . . . . 219D.2. Orthogonality relations of the Hermite polynomials . . . . . . . . . 220D.3. The expectation of Hermite polynomials . . . . . . . . . . . . . . . 221

E. Expected absolute value of random determinants 225E.1. Complex Ginibre ensemble . . . . . . . . . . . . . . . . . . . . . . . 226E.2. Real Ginibre ensemble . . . . . . . . . . . . . . . . . . . . . . . . . 227E.3. Gaussian Orthogonal Ensemble . . . . . . . . . . . . . . . . . . . . 227

F. Source code 249F.1. R script to generate Gaussian tensors . . . . . . . . . . . . . . . . . 249F.2. SAGE scripts to compute the expected number of real eigenpairs . . 252

Page 14: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 15: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

Tensors are a natural generalization of matrices and can be thought of as tables ofhigher dimensions. I.e., a 2× 2-matrix is a table that contains 4 numbers alignedin two dimensions and each dimension has length 2. Whereas, a 2× 2× 2 tensoris a table with 8 numbers aligned in three dimensions, each of which has length 2.

Figure 1.0.1.: A 2× 2 matrix A and a 2× 2× 2 tensor B.

We will make a rigorous definition of what mathematical object a tensor is inSubsection 2.2.1 below. It is nevertheless convenient to think of tensors in theabove described way.

Definition 1.0.1 (Informal definition of tensors). Let K be a field. For positiveintegers n1, . . . , np the set of n1 × · · · × np-tensors over K is defined as the set ofp-dimensional tables

Kn1 ⊗ · · · ⊗Knp :=

(ai1,...,ip)1≤i1≤n1,...,1≤ip≤np | ai1,...,ip ∈ K.

If n1, . . . , np and K are clear from the context we abbreviate T := Kn1 ⊗· · ·⊗Knp .The number p is called the order or power of the tensor. The tuple (n1, . . . , np) iscalled the format of the tensor.

As matrices we may may add tensors and multiply them by scalars making Ta vector space.

For motivating and illustrating the main topic of this work—tensor decompo-sitions—we want to give four examples. Ten explicit questions, labeled Q1–Q10,should serve as signposts to guide the reader to the problems this works aims atgiving solutions for.

1

Page 16: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

1.1. First Example: Contingency Tensors

Suppose that in an experiment we measure a number of n attributes. The nattributes are divided into two groups X1, . . . , Xn1 and Y1, . . . , Yn2 . The questionis whether the two groups of attributes are statistically independent of each otheror not. The tool at hand in this situation are contingency tables, whose nameis due to Karl Pearson [Pea04]. The table correponding to our experiment is atwo-dimensional table A (a matrix), such that the (i, j)-entry of A is the (relative)number of measurements of Xi and Yj appearing simultaneously.

Mathematically we model this as two random variables X, Y taking values in thesets 1, . . . , n1 and 1, . . . , n2, respectively. The question whether X and Y areindependent can be translated to the question whether the matrix A has rank-one.Indeed, the independence of X and X is equivalent to saying that A = pqT , wherepi = Prob(X = i) and qj = Prob(Y = j). If, on the other hand, A is a convexcombination of r rank-one tables, i.e, a matrix of rank1 r, then the distributiondisplayed in A is called mixed.

If we allow grouping into more than two groups, the table that we obtain is ofhigher dimension; i.e., a contingency tensor. Pearson [Pea04, p. 22-23] notes thefollowing.

”Suppose instead of a single correlation table we have a multiple cor-relation system. Such a system is well illustrated by the cabinet atScotland Yard, which contains the measurements of habitual criminalson the old system of body measurements, now discarded in favour of afinger-print index. We have in this case a division of the cabinet into 3compartments, which mark a threefold division of long, medium, andshort head lengths. Each of these vertical divisions is then sub-dividedhorizontally into three divisions giving the corresponding divisions forhead breadth ; each of these head-breadth divisions has three drawersfor large, moderate, and small face breadths. Each drawer is sub-divided into three sections for three finger groups, and these again intocompartments for cubit groups, and so on. If this be carried out for theseven characters dealt with, we should have ultimately 37 sub-groupsforming a multiple correlation system of the 7th order. We may askwhat is the mean square contingency of such a system and to whatextent does it diverge from an independent probability system? Ofcourse, for an ideal anthropometric index system the divergence shouldbe very slight.”

1Since contingency tables are positive matrices, strictly speaking one has to work with thenonnegative rank [AB94, Section 11.3] of A. As this section is meant to illustrate the meaningof tensor rank, we will stick with the ”usual” rank of matrices.

2

Page 17: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.1. First Example: Contingency Tensors

Figure 1.1.1.: This is Table VII. in [Pea04]. It shows a contingency table with the groups Occu-pation of Son and Occupation of Father both subdivided into 14 attributes. The entries of thetable are the absolute number of measurements. On the sides of the table the marginal countsare displayed.

In our words, the Scotland Yard table is a 3 × 3 × 3 × 3 × 3 × 3 × 3-tensor andPearson ask whether the 7 variables are independent or if their joint distributionis mixed. This gives rise to the following definition.

Definition 1.1.1 (Informal definition of tensor rank). Let A ∈ Kn1 ⊗ · · · ⊗ Knp .We say that A = (ai1,...,ip) has rank-one, if there exist vectors vi ∈ Rni , 1 ≤ i ≤ p,

such that ai1,...,ip = v(1)i1· . . . · v(p)

ipfor all tuples (i1, . . . , ip). We say that A has

rank r, if A is a sum of r rank-one tensors and A is not a sum of s rank-onetensors for s < r. The rank is also called canonical polyadic rank—or CP-rank2.

Of course, the table from Scotland Yard can never contain the true distribution.By nature, its entries can only be approximations of the real data. These approx-

2The term ”polyadic rank” was supposedly first mentioned by Hitchcock [Hit27].

3

Page 18: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

imations may be arbitrarily close, but what if even for the slightest perturbationfrom the real data the rank decomposition of the perturbed data differs signifi-cantly from the rank decomposition of the real data? Any information obtained insuch a way would be rendered useless. But how can we predict such phenonemons?This question is one of the main topics of this work:

Q1 Suppose that A is a tensor of rank r and let A′ be a slight perturbationof A under the constraint that A′ has rank r. How much does therank-r decomposition of A differ from the one of A′?

We will specify later what we mean by slight and how much in Section 5.1. Infact, we will give a quantitive answer to the above question by defining what iscalled the condition number for the tensor rank decomposition.

The reader may ask why this question is interesting at all—the table fromScotland Yard was discarded anyway in favor of a finger print index. Askingfor the validity of the information seems to be of academic nature, but not offurther importance. To refute this let us consider another example, Blind SourceSeparation.

1.2. Second Example: Blind Source Separation

A short and precise summary of Blind Source Separation3 (BSS) is given by Lands-berg in [Lan12, Section 1.3.3]. We will recall it here.

Suppose that we have placed n antennae y = (y1, . . . , yn) that measure signalsfrom sources x = (x1, . . . , xr). We assume that we have placed sufficiently manyantennae so that n ≥ r. Moreover, due to noise we assume x to be a randomvariable and we are interested in its center of mass Ex. The task is to recover Exusing only the signals measured. It is an assumption, that the signals are mixed byan n× r matrix A = (ai,j) ∈ Rn×r, such that y = Ax. Under this assumption, then× n× n rank-one tensor Y = (yiyjyk) has the form

yiyjyk =∑

1≤s,t,u≤r

ai,saj,tak,u xsxtxu. (1.2.1)

To actually recover x we need some more information. A good approximationof the higher moments mi1,...,ip := E[yi1 · . . . · yip ] is the empirical mean, that isobtained from measuring y over time. From this we compute an approximation ofthe 3-rd order cumulant

Ki,j,k(y) := mi,j,k − (mimj,k +mjmi,k +mkmi,j) + 2mimjmk.

3See https://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi for a demo of theCocktail Party Problem

4

Page 19: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.2. Second Example: Blind Source Separation

Figure 1.2.1.: Two pictures of the river Tajo in Toledo, Spain, are mixed. Note that the numberof antennae (= the pixels in the mixed image) is less than the number of sources (= the number ofpixels the sources). The human brain uses additional information (= experience) to decomposethe mixed image. The pictures are private property.

We similarly define Ki,j,k(x), so that by (1.2.1) and the linearity of the expectedvalue we have

Ki,j,k(y) =∑

1≤s,t,u≤r

ai,saj,tak,u Ks,t,u(x) (1.2.2)

But other than for Ki,j,k(y) there is no data available to compute Ki,j,k(x). Never-theless, assuming statistical independence of the xi we have Ki,j,k(x) = 0 wheneveri, j, k are not all equal. This simplifies the equations (1.2.2) dramatically. We haveKi,j,k(y) =

∑rs=1 ai,saj,sak,s Ks,s,s(x); that is, the tensor K(y) := (Ki,j,k(y)) is of

rank (at most) r and decomposes as

K(y) =r∑s=1

Ks,s,s(x) as ⊗ as ⊗ as, (1.2.3)

where as denote the columns of A. A tensor rank decomposition of K(y) lets usrecover A. The desired Ex (up to scaling) is obtained by solving E y = AExusing linear algebra. It is obvious, that the result of this computation is onlyan approximation of Ex and that reliance on that approximation is crucial inapplications.

Remark 1.2.1. The attentive reader may ask why in the foregoing example onetakes the step to third order tensors instead of working with second order cumu-lants and rank decomposition of matrices. The reason for this is that for higher or-der tensors and for a almost all formats the tensor rank decomposition is unique—aproperty clearly desirable and failing for matrices; see Subsection 2.2.5.

Note that apart from perturbations within rank-r decompositions we can noteven be sure that the tensor in (1.2.3) actually has rank r. In fact, it might only bean approximation of some rank-r tensor close to it. Of course, we can preconditionequation (1.2.3) by first computing a rank-r approximation and thereafter proceed

5

Page 20: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

by computing the tensor rank decomposition. But here two more problems arise:

Q2 How to compute the rank-r approximation that is closest to a tensor?

Q3 Does the condition number for tensor rank decomposition influencenumerical algorithms computing best rank-r approximations?

In Chapter 4 we present an algorithm, the Riemann-Gauss-Newton method,giving an answer to question Q2. Thereafter, we analyze this algorithm to an-swer question Q3. In fact, we will express the complexity of the Riemann-GaussNewton method in terms of the aforementioned condition number of the tensorrank decomposition. Prior to this work, there have been proposed several otheralgorithms to attack question Q2—see, e.g., [EAM11,HH82,AHPC13,Ose06] andin particular [VDS+16]—and of some of those algorithms performance under per-turbed data [ZKP11] and the complexity [AHPC] have been investigated. Butthe present text together with the related [BV16,Van16] are the first to address aperturbation theory for the tensor rank decomposition itself.

Another interesting question would be How many best rank-r approximationsdoes a tensor have?” However, if we consider the distance function from a fixedtensor A to the set of rank-r tensors, there might be many local minima, calledcritical rank-r approximations, making the search for the global best one hard.The much more important question is

Q4 How many critical rank-r approximations does a tensor have?

This question is a difficult one even for r = 1. We postpone further discussions tothe end of the next section.

Remark 1.2.2. There are various other examples for tensor rank decompositionin applications: Psychometrics [Kro08], chemical sciences [SBG04], theoreticalcomputer science [BCS97], signal processing [Com94, CJ10, SDF+16], statistics[McC87,AMR09] and machine learning [AGH+14,SDF+16] to name a few.

1.3. Third Example: Diffusion Tensor Imaging

The tensors appearing in the foregoing example are of a special class called sym-metric tensors.

Definition 1.3.1 (Informal definition of symmetric tensors). Suppose that A isa tensor in Kn ⊗ · · · ⊗ Kn. We say that A = (ai1,...,ip) is symmetric, if for allpermutations π on n elements we have aπ(i1),...,π(ip) = ai1,...,ip . We call the format(n, . . . , n) a cube format.

6

Page 21: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.3. Third Example: Diffusion Tensor Imaging

Rank-one approximation of symmetric tensors finds application in Diffusiontensor imaging (DTI). Diffusion tensor imaging is an extension of the classicalMagnetic Resonance Imaging (MRI), in which the diffusion of water in a piece oftissue is measured and this data is used to generate contrast in images. In DTInot only the intensity but also the direction of the diffusion is measured, makingit possible to create detailed 3-D images of the tissue. The following is a quotefrom the introduction from [AA07].

”The broad spectrum of MR contrast mechanisms makes MRI one ofthe most powerful and flexible imaging tool for diagnosis in the CNS[central nervous system]. Measurement of the signal attenuation fromwater diffusion is one of the most important contrast mechanisms. Inparticular, diffusion tensor imaging (DTI) may be used to map andcharacterize the three-dimensional diffusion of water as a function ofspatial location. [...] Many developmental, aging and pathologic pro-cesses of the CNS influence the microstructural composition and archi-tecture of the affected tissues. The diffusion of water within the tissueswill be altered by changes in the tissue microstructure and organiza-tion; consequently, diffusion-weighted (DW) MRI methods includingDTI are potentially powerful probes for characterizing the effects of dis-ease and aging on microstructure. Indeed, the applications of DTI arerapidly increasing because the technique is highly sensitive to changesat the cellular and microstructural level.”

In DTI the tissue is divided into a finite number of voxels and for each voxel thesignal strengh s is modeled via the Stejskal-Tanner equation [SM14, Eq. (3.18)]

s(x) = s0 exp(−bDx).

Here x is the spatial direction of diffusion, s0 and b are constants and Dx is thediffusion coefficient that indicates the diffusion intensity in the direction of x.The functional relation between the diffusion coefficient and the spatial directionis modeled by a homogeneous polynomial Q, so that Q(x) = Dx [EO03, Eq. (11)].For each voxel one measures x and s(x) and computes Q from these observations.Due to physical assumptions Q must be positive semi-definite and the criticalpoints of Q on the unit sphere yield the principal directions of the diffusion inthat voxel. This information is used to generate 3-D images. In classical DTI Qis modeled with degree 2 [SM14, Section 5.1], but in [TD99] it was pointed outthat in certain situations this model is not capable to capture the diffusion be-havior correctly. This is why in [EO03] the diffusion coefficient is modeled usingpolynomials of higher degree.

The connection to tensors is established via the identification of symmetric ten-sors with homogeneous polynomials. LetHR denote the space of real homogeneous

7

Page 22: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

Figure 1.3.1.: MRI scan of the author’s right knee. The picture is private property.

polynomials of degree p in the n variables X1, . . . , Xn. There is a natural bijectionfrom the set of symmetric tensors in Rn ⊗ · · · ⊗ Rn to HR via the map

(ai1,...,ip) 7→ QA(X) :=∑

1≤i1,...,ip≤n

ai1,...,ip Xi1 · . . . ·Xip ;

see the discussion in Subsection 2.2.2. For a symmetric tensor A let us denoteby QA the so obtained polynomial. Conversely, given Q ∈ HR, let us denote byAQ the corresponding symmetric tensor. It is well known that, if one wants tooptimize a real polynomial QA ∈ HR on the unit sphere, one has to consider theLagrangian LA(x, `) := QA(x) − `(xTx − 1) and solve for the points where thegradient vanishes.

∇(x,`)LA =

[∇xQA − 2`x

xTx− 1

]= 0. (1.3.1)

It is remarkable that all solutions (x, `) of equation (1.3.1) yield a symmetrictensor (xi1 · . . . · xip)1≤i1,...,ip≤n ∈ Rn ⊗ · · · ⊗ Rn that is a multiple of some criticalsymmetric rank-one approximation of AQ; i.e. a point in the set of symmetric rank-one tensors locally minimizing the distance to AQ. This phenonemon is discussedin Section 6.1.

We started with DTI and the problem of maximizing a polynomial function andhave identified this problem with the problem of rank-one tensor approximation.

8

Page 23: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.4. Fourth Example: Higher Order Markov Chains

In the context of the present work the most important aspect of this observation isthat for polynomial equation solving there exists a extensive perturbation theoryand much about condition numbers is known [BC13]. This leads to the followingquestion.

Q5 Can we design an efficient algorithm that solves the equation (1.3.1)and guarantees that the computation result is not too far from thetrue solution?

We will not answer question Q5 directly, but rather answer a more general question(question Q8) that we will motivate in the next example

Another important question is

Q6 How many solutions does the system (1.3.1) have?

Note that an answer for question Q6 yields an answer for Q4 for r = 1. The answerto Q7 is known when one also counts complex solutions and this count is constant[CS13]. Unlike this, the number of real solutions of (1.3.1) is constant only onsome semialgebraic sets contained in Rn ⊗ · · · ⊗ Rn and changes when crossing aco-dimension one variety4; see the discussion in [DH16]. This is our motivation topose following question.

Q7 If the entries of A are real random variables, what is the expectednumber of real solutions of (1.3.1).?

In the case of centered gaussian entries of A with a certain variance we answerquestion Q7 for symmetric tensors in Part III. Returning to the DTI example,this is important insofar as the number of real solutions of the diffusion coeffientpolynomial Q determines the resolution of the contrast mechanism. But to controlthe resolution knowing the exact number of real solution is less important thanknowing the overall magnitude of the number of real solutions, and for this theexpected value is a good benchmark5.

1.4. Fourth Example: Higher Order Markov Chains

Solutions of (1.3.1) are called eigenpairs of the corresponding symmetric tensor[Qi07,Qi05,Lim06], generalizing the concept of eigenpairs from matrices to higherorder. It is worth emphasizing, that eigenpairs can also be defined for ordinarytensors. To give this definition, we need to rewrite equation (1.3.1) using thefollowing lemma.

4This variety is called the eigendiscriminant variety, cf. Proposition 6.2.6.5The experiments shown in Figure 9.1.1 suggest that the variance of the expected number of

real eigenpairs is reasonably small, so that the expected value indeed is a good proxy for themagnitude of number of real solutions.

9

Page 24: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

Lemma 1.4.1. Let A = (ai1,...,ip) ∈ Rn ⊗ · · · ⊗ Rn be symmetric. Then, we have

∇xQA(X) = p

1≤i1,...,ip−1≤nai1,...,ip−1,1 xi1 · . . . · xip−1

...∑1≤i1,...,ip−1≤n

ai1,...,ip−1,n xi1 · . . . · xip−1

.Proof. Write QA(X) =

∑α1+...+αn=p λ(α1,...,αn) X

α11 · . . . ·Xαn

n . We show the asser-tion for the partial derivative

dQA

dX1

=∑α1 6=0

λ(α1,...,αn) α1Xα1−11 · . . . ·Xαn

n .

The other entries of∇xQA(X) are proven accordingly. Fix (α1, . . . , αn) with α1 6= 0and let (i1, . . . , ip) be a tuple such that Xi1 · . . . ·Xip = Xα1

1 · . . . ·Xαnn . Then, we

have λ(α1,...,αn) = p!α1!·...·αn!

ai1,...,ip . Since α1 6= 0 and by symmetry of the ai1,...,ip wecan fix the p-th index as ip = 1. For the partial derivative this yields

dQA

dX1

= p∑α1 6=0

(p− 1)!

(α1 − 1)! · α2! · . . . · αn!ai1,...,ip−1,1X

α1−11 · . . . ·Xαn

n .

The number of tuples (i1, . . . , ip−1) with xi1 · . . . · xip−1 = Xα1−11 ·Xα2

2 · . . . ·Xαnn is

exactly (p−1)!(α1−1)!·α2!·...·αn!

, which proves the assertion on dQAdX1

and finishes the proof.

The crucial observation is that we can define the polynomial system from theright-hand side in Lemma 1.4.1 for any tensor A. If we liberate the system fromhaving relations among the coefficients ai1,...,ip imposed by a symmetric tensor, weare free6 to define for a general tensor A ∈ Kn ⊗ · · · ⊗Kn

Axp−1 :=

1≤i1,...,ip−1≤nai1,...,ip−1,1 xi1 · . . . · xip−1

...∑1≤i1,...,ip−1≤n

ai1,...,ip−1,n xi1 · . . . · xip−1

. (1.4.1)

Note that, for general square matrices A ∈ Kn ⊗Kn equation (1.4.1) is the usualmatric vector product Ax. The notation from (1.4.1) has another beautiful benefit:If A is real symmetric, and we write Axp := QA(x), then ∇xAx

p = pAxp−1. InPart II we will often write fA(X) := Axp−1.

6We still have to keep the cube format of the tensor, though.

10

Page 25: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.4. Fourth Example: Higher Order Markov Chains

Inspired by equation (1.3.1) we define eigenpairs of a general tensor as follows:The pair (v, λ) ∈ (Kn\ 0) × K is called an eigenpair of A ∈ Kn ⊗ · · · ⊗ Kn, ifit satisfies the equation Avp−1 = λv. What first seems artificial in the sense ”Wedefine it, because we can.” is actually a useful definition. We illustrate this withthe following example, which we found in [LNQ13]: Higher Order Markov Chains.

Recall [Doo90, Chapter 5, Section 1]: A Markov chain is a stochastic processof finite random variables x(1), x(2), . . . , called states (cf. Section 2.4). In thisexample we assume that all random variables x(i) take values in the set 1, . . . , n.The Markov hypothesis is that the distribution of the current state only dependson the very last state. This means that for all k and all 1 ≤ i1, . . . , ik ≤ n we have

Probx(k) = ik | x(k−1) = ik−1, x

(k−2) = ik−2, . . . , x(1) = i1

= Prob

x(k) = ik | x(k−1) = ik−1

.

We assume that the transition probabilities Probx(k) = i | x(k−1) = j are inde-pendent of k, in which case one speaks of stationary transition probabilities, andwrite pi,j := Probx(k) = i | x(k−1) = j. Slightly abusing notation, we identifythe random variable x with its distribution vector x = (Prob x = i)ni=1. Thenwe can describe the distribution of x(k) as

x(k) = Px(k−1), (1.4.2)

where P := (pi,j) is called the transition matrix of the process.Consider the example of the random variable today’s weather taking values in

the set sunny, rainy. As ”[...] two meteorological observations separated bya relatively short interval of time tend to be either similar or highly correlated”[MKK02], ”a Markov chain model can be fitted to daily rainfall” [PG76]. Model-ing today’s weather with a Markov Chain like (1.4.2) would assume that today’sweather only depends on one past event, for instance yesterday’s weather. However,to make weather prediction as precise as possible, meteorologists take into accountas much data as they can, involving not only the data of yesterday’s weather, butof as many of the past days as possible7.

The foregoing discussion motivates the following modification of the Markovhypothesis: We assume that the current state xk not only depends on the laststate, but depends on the last d states :

Probx(k) = ik | x(k−1) = ik−1, x

(k−2) = ik−2 . . . , x1 = i1

= Probx(k) = ik | x(k−1) = ik−1, . . . , x

(k−d) = ik−d.

7The restriction is the computational complexity of the system. In fact, weather prediction iscomputationally challenging: The first numerical weather prediction that could be computedwithin 24 hours was made in April 1950 on ENIAC (Electronic Numerical Integrator andCalculator) by the von Neumann’s Meteorology project [Ste78, p. 145].

11

Page 26: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

Similarly, as we defined the transition matrix, we define the transition tensor asP := (pi1,...,id,j), where

pi1,...,id,j = Probx(k) = j | x(k−1) = i1, . . . , x

(k−d) = id.

Then, the distribution of the k-th state is given by

x(k) =( ∑

1≤i1,...,id≤n

pi1,...,id,j (x(k−1))i1 · . . . · (x(k−d))id

)nj=1. (1.4.3)

An important question in the study of dynamical systems like (1.4.3) is whetherthe process converges to a steady state. If x? is a steady state of the process, then,by (1.4.3), it must satisfy the equation

x? =

( ∑1≤i1,...,id≤n

pi1,...,id,j (x?)i1 · . . . · (x?)id)nj=1

= P (x?)d.

In other words, (x?, 1) is an eigenpair of the transition tensor P .It is clear that in applications the data at hand is only an approximation of the

true data and reliability on the results of computations is crucial. This is why wepose the following questions.

In Q5 we asked whether one can we design an efficient algorithm that solves theequation (1.3.1) and guarantees that the computation result is not too far fromthe true solution. We extend this question as follows:

Q8 Can we design an efficient algorithm that solves Axp−1 = λx for ageneral tensor and guarantees that the computation result is not toofar from the true solution?

We furthermore pose the related question.

Q9 What is the sensitivity of the system Axp−1 = λx with respect toperturbations in A, where A is a complex general tensor?

Questions Q8 and Q9 are the main topic of Part II of this work. We will answerthem in Part II, in particular Chapter 8.

Finally, we pose question Q7 for general tensors as well:

Q10 If the entries of A are real random variables, what is the expectednumber of real eigenpairs of A?

As for symmetric tensors we answer question Q10 in Part III.

12

Page 27: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.5. Outline

1.5. Outline

Buckley [Buc67, p. 47] notes the following.

”Information is [...] a relationship between sets or ensembles of struc-tured variety.”

In the context of this definition it is fair to say that information ultimately isa tensor. Despite this fundamental importance of tensors and despite the factthat researchers have been considering tensors for more than a century, theredoes not exists a thorough tensor theory yet. Other than for matrices, where”linear algebra”, ”numerical linear algebra” or ”random matrix theory” are wellestablished theories. One reason for the difference in theories between matricesand higher order tensor clearly is that most tensor problems are NP-hard [Has90,HL13]. Nevertheless, recently mathematicians have begun to start working towardsa complete tensor theory and connecting various results involving tensors from adiverse set of discplines. This spirit can be well observed in the laudation on the2017 Smale Prize8 winner Lek-Heng Lim.

”[...] addressing the challenges of high-dimensionality by tensor meth-ods opens very promising perspectives. [...] Multilinear algebra hasnow become a thriving research area [...].””

The author understands the present work as a contribution to this program. By in-vestigating tensors and tensor-rank under numerical and statistical point of views,he intends to give new impulses to the tensor community. He hopes to contributeto establishing the theories ”numerical multilinear algebra” and ”random tensortheory”.

1.6. Results

The results presented in this work can be divided into three main categories—andthis is why we chose to make a partition of this work into three parts. Before wecome to those three parts, though, in Chapter 2 we give a variety of preliminariesneeded throughout this work.

The main subject of Part I is question Q1 and the condition number of tensorrank decomposition. In fact, in Chapter 3 we will consider the more general frame-work of joins of manifolds: For a collection of manifoldsM1, . . . ,Mr, all embeddedin some KN , their join is defined as J (M1, . . . ,Mr) = x1 + . . .+ xr | xi ∈Mi .The set of tensors of rank at most r is obtained from this by letting the Mi

8http://focm-society.org/Lim.php

13

Page 28: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1. Introduction

all be equal to the set of rank-one tensors (the fact that this set is a mani-fold is explained in Subsection 2.2.4). The problem of tensor rank decomposi-tion is then a special case of the join decomposition problem (JDP). We definea condition number for join decompositions and prove that the condition num-ber at (x1, . . . , xr) ∈ M1 × · · ·Mr equals the distance of the tuple of tangentspaces (TxiMi)

ri=1 to some ill-posed set in a product of Grassmann manifolds.

The problem of tensor rank approximation arising from questions Q2 and Q3will be formulated as the more general join approximation problem (JAP). Morespecifically, in Chapter 4 we give an algorithm, called Riemannian-Gauss New-ton, that takes as input a point q ∈ KN and computes the decomposition of thepoint on the join that is closest to q. We moreover analyze the complexity of thisalgorithm in terms of the condition number from Chapter 3. Riemannian opti-mization methods for the tensor rank decomposition have been proposed priorly[EAM11, HH82, AHPC13, Ose06, ZKP11, AHPC, VDS+16], but the first approachin the context of condition number is made in [Van16, BV16] and in this text..Thereafter, in Chapter 5 we return to the problem of tensor rank decomposition;i.e., we let all the Mi be the set of rank-one tensors. We describe the conditionnumber of join decomposition explicitly in this case, thus obtaining the conditionnumber of tensor rank decomposition. Furthermore, we relate the condition num-ber of tensor rank decomposition to the distance of the decomposition to someill-posed set—this distance is in the data space, contrary to the distance term in aproduct of Grassmannian mentioned above.

In Part II we turn to eigenpairs of tensors. In the case of real symmetric tensors,solving for eigenpairs is equivalent to computing (locally) best rank-one approxi-mations. In Chapter 6 we use this relation to motivate the definition of eigenpairs(similar as we did when passing from the DTI example to the Markov Chainsexample in the introduction). In the rest of part, however, we will investigateeigenpairs of general tensors independent of the relation to rank-one approxima-tion. We reformulate the problem of finding tensor eigenpairs as the problem offinding eigenpairs of polynomial systems. The overall goal of this part is to give ahomotopy method algorithm that solves for eigenpairs of general polynomial sys-tems. There are other algorithms for the eigenpair problem, such as the PowerMethod [KM11], but the advantage of the homotopy method is that it is extraor-dinary stable and that we can provide a complexity analysis for it. We explain inChapter 7 why existing homotopy methods are not a good choice for the eigen-problem. One component of the argument is the distribution of the eigenvalue ofa random tensor, which we compute. In Chapter 8 we give a homotopy methoddesigned specifically for the tensor-eigenpair/polynomial systems-eigenpair prob-lem. The step size of the homotopy method is governed by the condition numberof solving for eigenpairs, which we define and analyze giving an answer to questionQ9. The algorithm itself is randomized. We make an analysis of this algorithm

14

Page 29: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

1.6. Results

and show that on the average it performs well (in polynomial time). This answersquestion Q8 in the affirmative.

Finally, in Part III we answer Q7 and Q10 by computing the expected numberof real eigenpairs of a real Gaussian tensor and of a Gaussian symmetric tensor ;see Definition 2.4.10. The computation of the latter requires a formula for theexpected absolute characteristic polynomial of a matrix from the Gaussian Or-thogonal Ensemble. We prove this formula, which is new to our best knowledge,in Appendix E.3.

15

Page 30: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 31: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

2.1. Notation

Throughout this work R always denotes the real numbers and C denotes the com-plex numbers. If z ∈ C, we denote by <z the real part of z and by =z theimaginary part of z. The symbol K means a field that is either R or C (becausethe statements involving K hold for both choices). The symbol | | denotes thecommon norm on either R or C. The symbol T serves as abbreviation for thespace of tensors T = Kn1 ⊗ · · · ⊗Knd .

The symbol T means transposition. The symbol ∗ denotes the conjugatetranspose; i.e., when A = ai,j ∈ Km×n is an m × n matrix, then A∗ is the matrixwith entries bi,j = aj,i. The symbol 〈 〉 denotes the hermitian inner producton Kn; that is, for x, y ∈ Kn

〈x, y〉 := x∗y = xTy,

The norm on Kn is ‖x‖ :=√〈x, x〉 and the orthogonal complement of x 6= 0 in

Kn is denotedx⊥ := y ∈ Kn | 〈x, y〉 = 0 .

The sphere in Kn is denoted S(Kn) := x ∈ Kn | ‖x‖ = 1 and by P(Kn) we denotethe projective space over Kn. The class of x ∈ Kn in P(Kn) is denoted [x]. Wedefine the angle between x, y ∈ S(Kn) as

dP(x, y) := arccos|〈x, y〉|‖x‖ ‖y‖

. (2.1.1)

The reason for the symbol dP, is that the angle is the metric on P(Kn) inducedby the Fubini-Study inner product; cf. Lemma B.3.3. We define the sphericaldistance between x and y as (cf. Lemma B.3.2)

dS(x, y) := arccos〈x, y〉‖x‖ ‖y‖

(2.1.2)

Remark 2.1.1. For K = R the difference between dP and dS is that dS measures theangle between the rays R>0x and R>0y, whereas dP measures the angle betweenthe lines Rx and Ry; cf. Figure 2.1.1.

17

Page 32: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

y

x

Figure 2.1.1.: Difference between dP(x, y) (length of the blue arc) and dS(x, y) (length of the redarc).

We call a matrix Kn×n orthonormal, if its columns are pairwise orthogonal withrespect to 〈 , 〉 and are of unit norm. In the special case K = C, we call such amatrix unitary. We denote the space of orthonormal matrices in Kn×n by U(n).

For a differentiable map between manifolds F : M → N we denote by DxF thederivative of F at x. We use the symbol DxF also for the Jacobi matrix of F at x.The symbol dF

dzdenotes the partial derivative of F with respect to z.

2.2. The space of tensors

Throughout this subsection let K ∈ R,C and let V,W,U, V1, . . . , Vp be finitedimensional K-vector spaces. We denote n = dimV , ni = dimVi and m = dimW .In this section we make more precise Definition 1.0.1, Definition 1.1.1 and Defini-tion 1.3.1 and put them into a rigorous mathematical framework.

2.2.1. Tensor products

The following is a summary of [War83, p. 54-55]. Let F (V,W ) be the free vectorspace over K whose generators are the points of V × W . Let R(V,W ) be thesubspace of F (V,W ) that is generated by all the elements of the following form.

(v1 + v2, w)− (v1, w)− (v2, w),

(v, w1 + w2)− (v, w1)− (v, w2),

(av, w)− a(v, w),

(v, aw)− a(v, w),

where v, v1, v2 ∈ V,w,w1, w2 ∈ W,a ∈ K.

18

Page 33: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.2. The space of tensors

Definition 2.2.1. The quotient vector space

V ⊗W := F (V,W )/R(V,W )

is called the tensor product of V and W . The class of (v, w) in V ⊗W is denotedby v ⊗ w.

Remark 2.2.2. In [War83] only tensor products over real vector spaces are de-scribed. The construction, however, holds for any field K (even for modules overrings); see [Eis95, Subsection A 2.2].

It follows that we have the following identities in V ⊗W .

(v1 + v2, w) = (v1, w) + (v2, w),

(v, w1 + w2) = (v, w1) + (v, w2),

(av, w) = a(v, w),

(v, aw) = a(v, w).

The tensor product has the following properties.

Proposition 2.2.3. The following holds.

1. (Universal mapping property.) Let σ denote the bilinear map (v, w) 7→ v ⊗ w.Then, whenever U is a K-vector space and b : V ×W → U is a bilinear map,there exists a unique linear map φ : V ⊗W → U , such that the following diagramcommutes.

V ⊗Wφ

''V ×W

σ

OO

b // U

2. V ⊗ (W ⊗ U) is canonically isomorphic to (V ⊗ W ) ⊗ U (this justifies thenotation V ⊗W ⊗ U).

3. The bilinear map V ∗ × W → Hom(V,W ) defined by (f, v) 7→ (w 7→ f(w)v)determines uniquely a linear isomorphism α : V ∗ ⊗W → Hom(V,W ).

4. dimV ⊗W = (dimV )(dimW ).

5. Let ei | i = 1, . . . , n be a basis for V and fj | j = 1, . . . ,m be a basis for W .Then ei ⊗ fj | 1 ≤ i ≤ n, 1 ≤ j ≤ m is a basis for V ⊗W .

This justifies the informal Definition 1.0.1: Because of Proposition 2.2.3 (2) thesymbol T = Kn1 ⊗ · · ·⊗Knp is well defined and the set

(ai1,...,ip)1≤i1≤n1,...,1≤ip≤np

is the set of coordinate vectors of T for a choice of basis for each of the Kni . In

19

Page 34: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

this work we usually choose the standard basis e(i)j | 1 ≤ j ≤ ni on each Kni ,

and callB =

e

(1)j1⊗ · · · ⊗ e(p)

jp| 1 ≤ j1 ≤ n1, . . . , 1 ≤ jp ≤ np

(2.2.1)

the standard basis for T. This shows that

dim (Kn1 ⊗ · · · ⊗Knp) =

p∏i=1

ni. (2.2.2)

2.2.2. Symmetric tensors

In this subsection we want to recall [Eis95, subsection A2.3] where the p-th sym-metric power of a K-vector space V is defined. For all p ≥ 0 we denote

V ⊗p := V ⊗ · · · ⊗ V︸ ︷︷ ︸p times

.

Then, the direct sum

T (V ) :=∞⊕p=0

V ⊗p

is a graded K-algebra with multiplication induced by

(v1 ⊗ · · · ⊗ vp1) · (w1 ⊗ · · · ⊗ wp2) := v1 ⊗ · · · ⊗ vd1 ⊗ w1 ⊗ · · · ⊗ wd2 .

The symmetric algebra S(V ) is obtained from T (V ) by factoring out the idealgenerated by all v⊗w−w⊗v for all v, w ∈ V . Note that the order of the vi in theclass of v1 ⊗ · · · ⊗ vp in S(V ) is irrelevant. This observation justifies the notationv1 · . . . · vp; the dot puts emphasis on the commutativity of multiplication in S(V ).Let e1, . . . , en be a basis for V . Then we have the isomorphism of K-algebras

S(V ) ∼= K[X1, . . . , Xn], (2.2.3)

via the map ei 7→ Xi,

Definition 2.2.4. Let p ≥ 0. The p-th symmetric power of V is defined as theimage of V ⊗p ⊂ T (V ) in S(V ) and is denoted by Sp(V ).

The space Sp(V ) can naturally be identified with a subspace of V ⊗p. Considerthe injective linear map s : V ⊗p → V ⊗p that is induced by

s(v1 ⊗ · · · ⊗ vp) =1

p!

∑π∈Sp

vπ(1) ⊗ · · · ⊗ vπ(p), (2.2.4)

20

Page 35: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.2. The space of tensors

where the sum ranges over all permutations π on p elements. One can show thatthe image of s is naturally isomorphic to Sp(V ), which allows us to interpret Sp(V )as a subspace of V ⊗p; cf. [Lan12, Subsection 2.6.3]. This observation together withDefinition 2.2.4 complements the informal Definition 1.3.1. Note that, by (2.2.3),Sp(V ) is isomorphic to the space of homogeneous polynomials of degree p over K.This implies

dimSp(V ) =

(n− 1 + p

p

); (2.2.5)

see [Eis95, Corollary A 2.3 (c)].

2.2.3. Antisymmetric tensors

By contrast with the symmetric algebra, the antisymmetric algebra Λ(V ) is thequotient of T (V ) and the ideal generated by all v ⊗ v for all v ∈ V . The name”antisymmetric” is because in Λ(V ) the element v ⊗ w + w ⊗ v equals 0.

Definition 2.2.5. The p-th antisymmetric power of V is the image of V ⊗p ⊂ T (V )in Λ(V ) and is denoted by Λp(V ). Furthermore, we denote the image of v1⊗· · ·⊗vpin Λp(V ) by v1 ∧ . . . ∧ vp.

If e1, . . . , en is a basis of V , a basis of Λp(V ) is given by

B∧ =ei1 ∧ . . . ∧ eip | 1 ≤ i1 < . . . < ip ≤ n

; (2.2.6)

see [Eis95, Corollary A2.3 (p)]. In particular, dim Λp(V ) =(np

)for 1 ≤ p ≤ n and

Λp(V ) = 0 for all p > n.

2.2.4. Tensors of rank-one and their secants

The map σ from Proposition 2.2.3 (1) is called the Segre map. Using Proposi-tion 2.2.3 (2) we extend its definition to several factors, defining it as a multilinearmap

σ : V1 × · · · × Vp → V1 ⊗ · · · ⊗ Vp, (v1, . . . , vp) 7→ v1 ⊗ · · · ⊗ vp. (2.2.7)

Definition 2.2.6. The image of σ without the origin1 is called the is called theSegre variety, denoted

S := S (V1, . . . , Vp) := v1 ⊗ · · · ⊗ vp | ∀1 ≤ i ≤ p : vi ∈ Vi, vi 6= 0 .

The name ”Segre variety” is due to the fact that the image of S under theprojection map (V1⊗· · ·⊗Vp)\ 0 → P(V1⊗· · ·⊗Vp) is a smooth projective variety

1The origin is a singularity in the image of σ. This is why we exlude it from the definition.

21

Page 36: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

of dimension∑p

i=1(ni−1) [Lan12, Section 4.3.5], denoted PS . The projective Segremap is a diffeomorphism [Lan12, Section 4.3.4.] denoted by

σP : P(V1)× · · · × P(Vp)→ PS , (2.2.8)

([v1], . . . , [vp]) 7→ [v1 ⊗ · · · ⊗ vp].

It follows that S is a smooth manifold and we have

dim(S (V1, . . . , Vp)) = 1− p+

p∑i=1

ni. (2.2.9)

A special case of the Segre map appears when all the vi are equal. In this case itis called the Veronese map, denoted

ν : V × · · · × V → V ⊗p, (v, . . . , v) 7→ v ⊗ · · · ⊗ v. (2.2.10)

Definition 2.2.7. The set

V p(V ) := v ⊗ · · · ⊗ v︸ ︷︷ ︸p times

| v ∈ V, v 6= 0

is called the Veronese variety. If the vector space V is clear from the context, weabbreviate V p := V p(V ). If furthermore the power p is clear, we use the symbol V .

The image of V under the projection V ⊗p\ 0 → PV ⊗p is a projective variety[Lan12, Section 4.3.6] of dimension n − 1. Hence, V is a smooth manifold ofdimension

dim(V ) = n. (2.2.11)

The sets S and V are sometimes called the sets of (symmetric) pure tensors ortensors of rank-one; and this makes Definition 1.1.1 (1) precise. Recall that inDefinition 1.1.1 (2) we defined the rank of a tensor. In the present framework weformulate this as follows. Consider the addition map

Φ : S r =

r times︷ ︸︸ ︷S × · · · ×S → V1 ⊗ · · · ⊗ Vp, (2.2.12)

(x1, . . . , xr) 7→ x1 + . . .+ xr.

The image of Φ is called the r-th secant set to S , denoted σr(S ). The name”secant set” is due to the fact that for r points x1, . . . , xr ∈ S the r-secant plane∑r

i=1 tixi |∑r

i=1 ti = 1 is contained in σr(S ). We define σr(V ) similarly.

Definition 2.2.8. Let A ∈ V ⊗p.1. We define the rank of A as rk K(A) := min r | A ∈ σr(S ).2. We define the symmetric rank of A as symrkK(A) := min r | A ∈ σr(V ).

22

Page 37: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.2. The space of tensors

The subscripts in Definition 2.2.8 should indicate that the rank depends onthe field chosen. Indeed there exist tensors whose complex rank does not equalthe real rank, see the example in [QCL16, Eq. (5.1)]. Is is clear that we havethe rk C(A) ≤ rk R(A) and symrkC(A) ≤ symrkR(A). But in many cases we haveequality, see [QCL16, Corollary 5.13, Corollary 5.17].

Considering σr(S ) as a collection of secant planes suggests that σr(S ) is notclosed in the euclidean topology, because tangent planes are limits of secant planes.For instance, for p > 2 the 2nd secant set σ2(S ) in the Euclidean topology containspoints that lie on the tangential variety of S but not in σ2(S ); see [BL14, Sec-tion 2.4]. Considering the Zariski closure of σr(S ) defines the secant variety[Lan12, Chapter 5]. Note that for p = 2 (the matrix case) σr(S ) is always Zariskiclosed.

2.2.5. Defectiveness and generic identifiability

In this subsection we put K = C.Recall from (2.2.12) the addition map and note that, if A = Φ(x1, . . . , xr),

we also have A = Φ(xτ(1), . . . , xτ(r)) for any permutation τ on r elements. Thequestion whether these are all the preimages of A under Φ motivates the discussionin this subsection. Let σr(S ) denote the Zariski closure. A necessary condition

for Φ to have finite fibers is that σr(S ) has the expected dimension. This meansthat for generic p ∈ S r it should hold that

dimσr(S ) = r dim(S ) = r(

1− p+

p∑i=1

ni

).

Definition 2.2.9. 1. If dimσr(S ) = r dim(S ), the secant set σr(S ) is callednon-defective. Otherwise, it is called defective.

2. If σr(S ) is non-defective and for a generic tensor A ∈ σr(S ) it holds that#Φ−1(A) = r!, then σr(S ) is called generically identifiable.

Similar to Definition 2.2.9 we define defectiveness and generic identifiability forthe r-the secant set of the Veronese variety σr(V ).

Definition 2.2.10. 1. If dimσr(V ) = r dim(V ), the secant set σr(V ) is callednon-defective. Otherwise, it is called defective.

2. If σr(V ) is non-defective and for a generic tensor A ∈ σr(V ) it holds that#Φ−1(A) = r!, then σr(V ) is called generically identifiable.

The property of σr(S ) being generically identifiable implies that the decom-position of a tensor of rank r is essentially unique (this means unique up to theorder of the summands). Matrices are never generically identifiable and this canbe shown using the Singular Value Decomposition; see, e.g., [SS90, Theorem 4.1].

23

Page 38: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

Theorem 2.2.11 (Singular Value Decomposition). Let A ∈ Cm×n have rank r.There exist unitary matrices U and V such that

A = U

[D 00 0

]V ∗,

where D = diag(ς1, . . . , ςr) with ς1 ≥ . . . ≥ ςr > 0 and V ∗ denotes the hermitiantranspose of V . The ςi are called the singular values of A.

Suppose that A = U [D 00 0 ]V ∗ ∈ Cm×n is the singular value decomposition of a

matrix A. Then a rank-r decomposition of A is given by A =∑r

i=1 ςiuiv∗i , where

ui, vi are the columns of U and V , respectively. Let M ∈ Cr×r be any invertiblematrix and put M ′ = M−1D. Then, another rank-r decomposition of A is givenby

A = U

[M 00 0

] [I 00 0

] [M ′ 00 0

]V ∗.

Since M was arbitrary, we have in fact a whole family of rank-r decompositions.For higher order tensors of subgeneric rank and higher order symmetric tensors

of subgeneric symmetric rank, however, a unique decomposition is rather common.The word ’subgeneric’ renders that, by a dimension count, in Cn1 ⊗ · · · ⊗Cnp oneexpects a generic tensor to have rank⌈

dim (Cn1 ⊗ · · · ⊗ Cnp)

dim S (Cn1 , . . . ,Cnp)

⌉=

⌈ ∏pi=1 ni

1− p+∑p

i=1 ni

⌉and a generic tensors in Sp(Cm) to have symmetric rank

dimSp(Cn)

dim V p(Cn)=

1

n

(n− 1 + p

p

).

Any rank smaller than generic is called subgeneric. The following theorem is[COV15, Theorem 1.1].

Theorem 2.2.12. Let p ≥ 3. The secant set σr(V ) ⊂ Sp(Cn) is genericallyidentifiable for all r < n−1

(n−1+p

p

), unless it is one of the following exceptions.

1. p = 6, n = 3, r = 9.

2. p = 4, n = 4, r = 8.

3. p = 3, n = 6, r = 9.

In these exceptional cases one has #Φ−1(A) = 2r! for generic A ∈ σr(V ); i.e., Ahas two essentially unique decompositions.

An algorithm for deciding generic identifiability of σr(S ) is given in [COV14];see also Theorem 1.1 of that reference. Moreover, generic identifiability over C

24

Page 39: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.2. The space of tensors

often implies generic identifiability over R; see the discussion in [QCL16, Section 5],in particular see Theorem 5.12, Corollary 5.13 of that reference.

Remark 2.2.13. In the Blind Source Separation example from Section 1.2 genericidentifiability was the reason for us to choose the cumulant tensor in (1.2.3) overa cumulant matrix.

2.2.6. Inner products of rank-one-tensors

Recall from (2.2.1) the standard basis B in Kn1 ⊗ · · · ⊗Knp . We define the innerproduct on 〈 , 〉 to be the euclidean inner product with respect to the basis B,such that B becomes an orthonormal basis. By identification of Sp(Kn) with thesubspace of symmetric tensors in (Kn)⊗p via the map in (2.2.4) this also yields aninner product on Sp(Kn).

Definition 2.2.14. Let r ≥ 1.

1. Let A ∈ Kn1 ⊗ · · · ⊗ Knp . A local minimum of σr(S ) → R, B 7→ ‖A−B‖ iscalled a critical rank-r approximation. A global minimum of that function iscalled a best rank-r approximation.

2. Let A ∈ Sp(Kn). A local minimum of the function σr(V ) → R, B 7→ ‖A−B‖is called a critical symmetric rank-r approximation and a global minimum iscalled a best symmetric rank-r approximation.

Note that a best rank-r approximation not always exists, because the secantset σr(S ) is not always closed in the euclidean topology; see the discussion at theend of Subsection 2.2.42.

Recall further from (2.2.6) the standard basis B∧ for ΛpKn. We define the innerproduct on 〈 , 〉 to be the euclidean inner product with respect to the basis B∧making B∧ an orthonormal basis. The following lemma gives formulas for innerproducts of rank-one tensors.

Lemma 2.2.15 (Inner products of rank-one-tensors). We have

1. 〈x1 ⊗ · · · ⊗ xp, y1 ⊗ · · · ⊗ yp〉 =∏p

j=1〈xj, yj〉.2. 〈x1 ∧ · · · ∧ xp, y1 ∧ . . . ∧ yp〉 = det(〈xi, yj〉)pi,j=1

Proof. Let us denote the i-th entries of xj, yj by x(i)j and y

(i)j . Then, we have

〈x1 ⊗ · · · ⊗ xp, y1 ⊗ · · · ⊗ yp〉 =∑

1≤ij≤nj1≤j≤p

x1(i1) · · · . . . · xp(ip) · y(i1)

1 · · · . . . · y(ip)p

2In Section 3.3 we treat this phenonemon under the aspect of condition.

25

Page 40: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

Pairing x(i)j y

(i)j for each (i, j) we get

∑1≤ij≤nj1≤j≤p

x1(i1) · · · . . . · xp(ip) · y(i1)

1 · · · . . . · y(ip)p =

p∏j=1

∑1≤i≤nj

xj(i)y

(i)j =

p∏j=1

〈xj, yj〉

proving the first assertion.For the second assertion let x1, . . . , xp ∈ Kn (antisymmetric tensors are only

defined for cube formats (n, . . . , n)). If p > n, there is a linear relation among thexi implying x1 ∧ · · · ∧ xp = 0. On the other hand, a linear relation among the xiimplies det(〈xi, yj〉) = 0. Thus, the formula holds in this case. If p ≤ n, we have

x1 ∧ · · · ∧ xp =∑

1≤i1<...<ip≤n

det(X i1,...,ip) ei1 ∧ · · · ∧ eip ,

where X i1,...,ip is the submatrix of[x1 . . . xp

]with rows i1, . . . , ip. For the yi we

similarly define Y i1,...,ip . Then,

〈x1 ∧ · · · ∧ xp, y1 ∧ . . . ∧ yp〉 =∑

1≤i1<...<ip≤n

det(X i1,...,ip) det(Y i1,...,ip) = det(〈xi, yj〉),

the last equality by the Cauchy-Binet formula [HJ92, Sec. 0.8.7].

2.3. Condition numbers in numerical analysis

Let us start this section with a quote by Demmel [Dem96, p. 4].

”The answers produced by numerical algorithms are seldom exactlycorrect. There are two sources of error. First, there may be errors inthe input data to the algorithm, caused by prior calculations or perhapsmeasurement errors. Second, there are errors caused by the algorithmitself, due to approximations made within the algorithm. In order toestimate the errors in the computed answers from both these sources,we need to understand how much the solution of a problem is changed,if the input data is slightly perturbed.”

The situation that Demmel describes is an algorithm that computes a solution toa problem and how perturbation in the data effects the result of that computation.Both question, that Demmel poses, are closely tied to each other. But while thesecond specifically concerns properties of algorithms, the first question deals withproperties of the problem solely—properties that affect any algorithm used to solvethe problem. Classically, such a perturbation theory is established by the meansof condition numbers.

26

Page 41: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.3. Condition numbers in numerical analysis

2.3.1. The classical definition of condition

We can view a problem as a function

f : I → O, (2.3.1)

where I is a normed vector space of inputs and O is a normed vector space ofoutputs. A slight perturbation of a data point x ∈ I is a point x+∆x, where ‖∆x‖is small. The question of ”how much the solution of a prolem is changed” isquantified by the condition number of f . The following is a summary of thediscussion in [TB97, p. 90].

The absolute condition number of the problem f at the point x is defined as

κ(x) = limδ→0

sup‖∆x‖≤δ

‖f(x+ ∆x)− f(x)‖‖∆x‖

. (2.3.2)

Note that, if f is differentiable, we have κ(x) = ‖Dxf ‖, where ‖ ‖ is the spectralnorm Definition A.1.1. One subtlety here is that ”slight” and ”how much” arequantities that may depend on the magnitude of x. For instance, if ‖x‖ = 104

and ‖∆x‖ = 102, is then ‖∆x‖ small or not? Compared to 104 the number 102 issmall, but is 102 a small number at all? This motivates the following definition.The relative condition number of f at x is defined as

κrel(x) = limδ→0

sup‖∆x‖≤δ

‖f(x+ ∆x)− f(x)‖‖f(x)‖

‖x‖‖∆x‖

. (2.3.3)

If f is differentiable, we have κrel(x) = ‖Dxf ‖ ‖x‖‖f(x)‖ . We close this subsection

with a quote by Trefethen and Bau [TB97, p. 91].

”Both absolute and relative condition numbers have their uses, butthe latter are more important in numerical analysis. This is ultimatelybecause the floating point arithmetic used by computers introducesrelative errors rather than absolute ones [...].”

Depending on which type of condition number one uses, one calls data points xwith small κ(x) or κrel(x) well-conditioned and data points x with large κ(x) orκrel(x) ill-conditioned.

Backward and forward error

Let us return to the introductory quote by Demmel. The approximations madewithin the algorithm are measured in terms of the forward and backward errors.We once more quote Demmel [Dem96, Sec. 1.3.3].

27

Page 42: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

”If alg(x) is our algorithm for f(x), including the effects of roundoff,we call alg(x) a backward stable algorithm for f(x) if for all x there isa small ∆x such that alg(x) = f(x + ∆x). ∆x is called the backwarderror. Informally, we say that we get the exact answer f(x+ ∆x) for aslightly wrong problem x+ ∆x.”

The error in the output data |alg(x)− f(x)| is called the forward error. Note thata small perturbation in the input data for an ill-conditioned problem may causecomparably large forward errors, because

forward error = |f(x+ ∆x)− f(x)| ≈ |Dxf ||∆x| = κ(x) · |∆x|.

Small backward and forward errors are reasonable quantities to measure how”good” an approximation of a solution to a problem is.

Remark 2.3.1. When the problem consists of solving an equation f(x) = 0, thereis a third notion of good approximation called approximate zero. We will introducethis concept at the end of this section

2.3.2. Condition of problems defined on manifolds

In the above definition of a condition number the space of inputs and the space ofoutputs are restricted to be linear spaces. However, many problems are defined as amap between manifolds. An example is the addition map Φ from (2.2.12) definedon a product of Segre varieties (i.e., a product manifold). It it obvious that athorough perturbation theory should include problems defined on manifolds.

When the problem is stated as a map

Like in Subsection 2.3.1 we can model the problem as a differentiable map

f : I → O, (2.3.4)

but in constrast to (2.3.1) we allow I and O to be riemannian manifolds. Thenthe condition number of f at x ∈ I is defined as

κ(x) := ‖Dxf ‖ ,

and the norms on the respective tangent spaces are the norms that are induced bythe Riemannian inner product on I and O, respectively.

When the problem is stated implicitly

A problem occurs when the problem is not stated in terms of a map, but onlyimplicitly as a manifold of solutions. Let F : I × O → R be a differentiable map

28

Page 43: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.3. Condition numbers in numerical analysis

that defines the solution manifold

V = (x, y) ∈ I ×O | F (x, y) = 0 . (2.3.5)

Let π1 : V → I and π2 : V → O denote the projection of V onto the first andsecond factor, respectively. If dimV > dimX, the fibers of π1 are genericallypositive dimensional, which means that the generic input x ∈ I serves for a pos-itive dimensional set of outputs. On the contrary, if dimV < dimX,the π1 isnot surjective, which means that not all x ∈ I are inputs. In both scenarios itdoes not make sense to define a condition number. Henceforth, we assume thatdimensions of V and X are equal: dimV = dimX; that is, we assume some sortof local uniqueness hypothesis. The idea of [BCSS98, Section 12.3] is to define thecondition number in this situation as follows. Under the above assumption, for all(x, y), where D(x,y)π1 is of full rank, we can locally invert π1 defining the (local)solution map S := π2 π−1

1 . Setting

Σ′ :=

(x, y) ∈ V | D(x,y)π1 does not have full rank

andΣ := π1(Σ′) (2.3.6)

the condition number at x ∈ I is defined as

κ(x) :=

‖DxS ‖ , if x 6∈ Σ.

∞, if x ∈ Σ.(2.3.7)

The set Σ is called the set of ill-posed inputs (compare the terminology fromSubsection 2.3.1). Note that this approach generalizes the approach from (2.3.4).

Approximate zeros

A common problem, where a solution manifold as in (2.3.5) appears, is the prob-lem of solving polynomial equations f(x) = 0. We announced in Remark 2.3.1that, next to small forward and backward error, in this case there is a third notionof good approximation called approximate zero. The idea behind this is explainedin [Dem96, Sec. 2.5]3.

Algorithm 1: Newton’s method

1 Input: A starting point x0.2 for k = 0, 1, 2, . . . do3 xk+1 = −(Dxkf )−1f(xk);4 end

3In the mentioned reference Newton’s method is explained for linear systems. We extend itsdefinition to polynomials of higher degree.

29

Page 44: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

Suppose that after having applied some algorithm alg(f) we are given an ap-proximate solution x0 of f(x) = 0. Suppose further, that we know that the forwarderror is not too large, but we do not know how large. The idea to improve thequality of the approximation is to use Newton’s method (Algorithm 1). If the ini-tial point x0 is close to an actual solution x?, Newton’s method converges towardsthis point4. This leads to [BCSS98, Sec. 8.1, Definition 1].

Definition 2.3.2. We say that x0 is an approximate zero of the polynomial f , ifthere exists a zero x? of f such that the sequence x0, x1, . . . , produced by Newton’smethod satisfies

|x? − xi| ≤ 2−(2i−1) |x? − x0|.

In this case x? is called the associated zero to x0.

Approximate zeros are excellent approximations, because Newton’s method al-lows to bound the forward error as small as one wants it to. On the other hand,a small forward error in the computation does not automatically imply that onehas computed an approximate zero. After all, the vague notion of ”small” is notneeded for approximate zeros—for them ”small” is ”as small as we want”.

2.4. Real and complex random variables

A random variable is a measurable function from a probability space (Ω,F ,Prob)to the real numbers

X : Ω→ R.

The space Ω is called the space of outcomes and F is a Borel algebra in Ω[Chu78, Appendix of Chapter 4]. The probability of an event E ∈ F is denotedby Prob(E). In the following we assume that Ω = R.

Definition 2.4.1. Let φ : R → R be a measurable function and let Prob de-note a probability measure on R. The probability measure φ∗(Prob) defined byφ∗(Prob)(E) := Prob(φ−1(E)) is called the push-forward measure of Prob under φ.

If there exists a measurable function f : R → R with the properties thatf(x) ≥ 0 for all x ∈ R and

Prob(E) =

∫E

f(x)dx (2.4.1)

for all events E ∈ F , one calls f the density of X [Chu78, Section 4.5]. Conversely,if a non-negative measurable function f satisfies (2.4.1) for all E ∈ F , it defines adistribution with probability measure Prob(E).

4We will observe this behavior later in Algorithm 2 and Algorithm 3.

30

Page 45: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.4. Real and complex random variables

If two non-negative measurable functions f and g satisfy (2.4.1) for all E ∈ F ,then f = g almost everywhere. This shows that densities don’t have to be unique.We will still call both f and g the density of X, because the distribution that theydefine is unique.

Definition 2.4.2. Let the random variable X have density f . The expectation ofX is defined as

EX :=

∫RX(x)f(x)dx

and the variance of X is defined as Var(X) := E(X − EX)2.

The most relevant random variables in this work are gaussian random variables.

Definition 2.4.3. A random variable X : R→ R is called gaussian with mean µand variance σ2, denoted X ∼ N(µ, σ2), if the density function is given by

f(x) =1√

2πσ2exp

(−(x− µ)2

2σ2

).

We write X ∼ N(µ, σ2). In the special case that µ = 0 and σ2 = 1 we call standardnormal.

Gaussian random variables have the property that

EX∼N(µ,σ2)

X = µ, and Var(X) = EX∼N(µ,σ2)

X2 = σ2. (2.4.2)

The following lemma is [Chu78, Eq. (7.4.6) and Theorem 7].

Lemma 2.4.4. Let z ∼ N(0, σ2) and zi ∼ N(0, σ2i ), i = 1, 2.

1. Let t ∈ R\ 0. Then tz ∼ N(0, t2σ2).

2. If z1, z2 are independent, then z1 + z2 ∼ N(0, σ21 + σ2

2).

We generalize standard normal random variables to finite dimensional vectorspaces as follows. If E is a finite dimensional real vector space with inner product,we define the standard normal density on the space E as

ϕE(z) :=1

√2π

dim(E)· exp

(−‖z‖

2

2

). (2.4.3)

If it is clear from the context which space is meant, we sometimes omit the sub-script E in ϕE. If z ∈ E is a random variable with density ϕE, we write z ∼ N(E).

Letting E be the 2-dimension real vector space C defines complex random vari-ables. Expectation and variance of a complex random variable z are defined as

E z := E<(z) + iE=(z) and Var(z) := E |z − E z|2 = E |z|2 − |E z|2 .

31

Page 46: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2. Preliminaries

Note that, if <(z) and =(z) are both N(0, 1), then by Lemma 2.4.4, we haveVar(z) = 2. This is the motivation to make the following normalization for complexrandom variables. We say that a random variable z on C is standard normaldistributed if both real and imaginary part of z are i.i.d centered normal distributedrandom variables with variance σ2 = 1

2. The corresponding density is

ϕ(z) :=1

πexp

(−|z|2

), (2.4.4)

and we write z ∼ NC(0, 1) for this distribution. We have

Ez∼NC(0,1)

z = 0, and Var(z) = Ez∼NC(0,1)

|z|2 = 1; (2.4.5)

For a hermitian vector space E, we write NC(E) for the corresponding multivari-ate distribution. One checks that the density of a Gaussian random variable isinvariant under unitary transformations.

Lemma 2.4.5. 1. Let x ∼ N(0, 1) and O be an orthogonal matrix. Then Ox ∼ x.

2. Let z ∼ NC(0, 1) and U be an unitary matrix. Then Ux ∼ x.

The following estimate will be useful later.

Lemma 2.4.6. Let a, b ∈ Cm be fixed and z ∼ NC(Cm). Then

1. If ‖a‖ ≥ ‖b‖, we have Prob ‖z − a‖ ≤ ‖z‖ ≤ Prob ‖z − b‖ ≤ ‖z‖ .2. For all a:

Prob ‖z − a‖ ≤ ‖z‖ ≥ 1√π

2

‖a‖+√‖a‖2 + 8

exp

(−‖a‖

2

2

).

Proof. We have ‖z − a‖ ≤ ‖z‖, if and only if ‖a‖2 ≤ 2<〈z, a〉. By Lemma 2.4.5 wemay assume that a = (‖a‖ , 0, . . . , 0) ∈ Cm. Hence <〈z, a〉 = ‖a‖<〈z, e1〉. Observethat <〈z, e1〉 is a real gaussian random variable with mean 0 and variance 1

2. This

shows that

Probz∼NC(Cm)

‖z − a‖ ≤ ‖z‖ =1√π

∫ ∞‖a‖2

exp(−x2)dx.

From this equation we can deduce the first assertion. For the second assertion weuse [Abr72, Eq. (7.1.13)]:∫

t≥xexp(−t2)dt ≥ (x+

√x2 + 2)−1 exp(−x2)

for x ≥ 0. This finishes the proof.

32

Page 47: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

2.4. Real and complex random variables

2.4.1. Random matrices and random tensors

The complex Ginibre ensemble [Gin65] is a random complex matrix, whose entriesare all i.i.d. complex standard normal random variables; i.e, NC(0, 1)-distributedrandom variables.

Definition 2.4.7. Let A ∈ Cn×n be a random matrix, such that ai,jiid∼ NC(0, 1).

Then we call A a Ginibre matrix or a complex Gaussian matrix. We denote thisby A ∼ NC(Cn×n). The density of a complex Gaussian matrix is given by

ϕCn×n(A) := π−n2

exp(−‖A‖2F ),

where ‖ ‖F is the Frobenius norm; see Definition A.1.1.

The real Ginibre ensemble is defined similar as the complex one, but one requiresthe matrix A to have entries that are real standard normal random variables.

Definition 2.4.8. Let A ∈ Rn×n with ai,jiid∼ N(0, 1). We call A a real Ginibre

matrix or a real Gaussian matrix. We denote this by A ∼ N(Rn×n). The densityof a real Gaussian matrix is given by

ϕRn×n(A) := (2π)−n22 exp(−1

2‖A‖2

F ).

The Gaussian Orthogonal Ensemble is also a random matrix, whose entriesare Gaussian random variables, but under the constraint of being symmetric; cf.[Meh91, Sec. 2.3], [Tao12, Sec. 2.2.6]

Definition 2.4.9. Let A ∈ Rn×n with independent entries ai,j ∼ N(0, 12), if i < j,

and ai,i ∼ N(0, 1). Moreover, put ai,j := aj,i for j < i. We call A a matrixfrom the Gaussian Orthogonal Ensemble or a GOE matrix. We denote this byA ∼ GOE(n). The density of a GOE matrix is given by

2−n2 π

−n(n+1)4 exp(−1

2Trace (ATA)).

We extend the definition of the Ginibre ensemble and the Gaussian orthogonalensemble from matrices to tensors.

Definition 2.4.10. Let A = (ai1,...,ip) be an order-p tensor of format (n, . . . , n).

1. If ai1,...,ipiid∼ NC(0, 1), we call A a complex Gaussian tensor.

2. If ai1,...,ipiid∼ N(0, 1), we call A a real Gaussian tensor.

3. If A ∈ Sp(Rn) is a symmetric tensor and ai1,...,ipiid∼ N(0, σ2), where σ2 = α1!···αn!

p!

with αj := number of js appearing in (i1, . . . , ip), then we call A a real symmetricGaussian tensor.

33

Page 48: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 49: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Part I.

Numerical analysis of tensor rankdecompositions

35

Page 50: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 51: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of joindecompositions

All of the results in this part were developed in close collaboration with N. Van-nieuwenhoven1 from KU Leuven [BV16].

Throughout this part we will have K ∈ R,C.This goal of this section is to give answers to question Q1 from the introduction.

To this end, we define and analyze a condition number for the tensor-rank decom-position. The framework we want to adapt is the one from Section 2.3. Recallfrom (2.2.12) the definition of the addition map

S × . . .×S︸ ︷︷ ︸r times

→ Kn1 ⊗ · · · ⊗Knp , (x1, . . . , xr) 7→ x1 + . . .+ xr,

where S is the Segre variety in Kn1 ⊗ · · · ⊗ Knp . We generalize this setting asfollows. LetM1, . . . ,Mr be smooth manifolds embedded in KN ; see Appendix B.1.Then we define

Φ :M1 × . . .×Mr −→ KN , (x1, . . . , xr) 7−→ x1 + · · ·+ xr. (3.0.1)

The image of Φ is called the join of the Mi, denoted

J (M1, . . . ,Mr) := Φ(M1 × . . .×Mr).

In this notation the r-th secant set of the Segre variety is the join of r copiesof S ; i.e., σr(S ) = J (S , . . . ,S ), and the r-th secant set of the Veronesevariety is the join σr(V ) = J (V , . . . ,V ). Within this context the problemof tensor decomposition is stated as the more general join decomposition prob-lem (JDP). For smooth manifolds M1, . . . ,Mr ⊂ KN it is formulated as follows.

(JDP) Given a point y ∈ J (M1, . . . ,Mr), find x ∈ M1 × . . .×Mr

such that Φ(x) = y.

Furthermore, the approximation of a tensor by a rank-r tensor is stated as themore general join approximation problem (JAP).

1https://people.cs.kuleuven.be/~nick.vannieuwenhoven/

37

Page 52: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

(JAP) Given q ∈ KN , find argminx∈M1×...×Mr

12‖Φ(x)− q‖2 .

In what follows the manifolds M1, . . . ,Mr, if not stated otherwise, are fixedand we abbreviate J := J (M1, . . . ,Mr).

Remark 3.0.1. An alternative to the formulation (JAP) is the formulation of theoptimization problem as argminy∈J

12‖y − q‖2 . This approach, however, steps

into two disadvantages of the join set. The first is that in general J is not amanifold2 and—at least for the case when J is the r-th secant set of the Segre orthe Veronese variety—testing smoothness of points in (the Zariski closure of) Jis a hard task; see e.g. [COV15, Section 5.2], [IK99, Theorems 4.5A and 4.10A],[Lan12, Theorem 7.3.3.3] and [LO13, Section 4]. The second is that, if y is asolution to the optimization problem formulated above, we are generally interestedin the decomposition of y and not only in y as a point on J . This is why weprefer the formulation (JAP). However, one must mention that a disadvantage ofparametrizing the join set via the involved manifolds is that J is not always closedin the euclidean topology, see the discussion at the end of Subsection 2.2.4. Thismeans that there exists sequences x in M1 × . . . ×Mr not converging but forwhich Φ(x) nevertheless converges. Luckily, the condition number we design candetect such behavior in the case when the Mi are closed cones (this includes thecase when the Mi are (cones of) projective algebraic varieties such as the Segreand the Veronese variety); see the discussion in Section 3.3.

3.1. A geometric condition number

Recall that the join decomposition consists of the problem of been given y ∈ Jand finding x ∈M1× . . .×Mr such that Φ(x) = y. We state this problem in thewords of Section 2.3.

Given a point on the join y ∈ J (M1, . . . ,Mr), compute a local inverseΦ−1 : J →M1 × . . .×Mr at y, if it exists.

Two problems arise when trying to formulate the (JDP) in this way to definea condition number. The first is that it is not clear what is meant by ”localinverse”. The second problem is that, even if we had such an inverse, we could notunreservedly apply the framework from Section 2.3, because J is in general not amanifold as it may have singularities; see the discussion in Remark 3.0.1.

Our solution out of this dilemma is the observation that Φ as a map fromM1 × . . .×Mr to KN is a map between smooth manifolds.

2Consider, e.g., a matrix of rank r − 1. It is a singular point in the variety of rank-r matrices.

38

Page 53: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.1. A geometric condition number

Definition 3.1.1. Let x = (x1, . . . , xr) ∈M1 × . . .×Mr. The condition numberof the (JDP) at x is defined as

κ(x) :=1

ςmin(DxΦ )=

1

ςmin([U1 . . . Ur

]),

where ςmin( ) denotes the smallest singular value of the argument and the Ui arematrices whose columns form an orthonormal basis for TxiMi.

Note that the second equality in the definition of κ(x) above is given by Corol-lary A.1.4. In the following we want to justify this definition.

Another solution of the above mentioned problems would be to restrict to pointsin the smooth locus J sm of J , i.e., the subset J sm ⊂ J where J is a smooth sub-manifold of KN . This approach, though, prohibits us from defining a conditionnumber at singular points of J—a deficiency not present in Definition 3.1.1. How-soever, similar to (2.3.5), we could define

V := (x, y) | x ∈M1 × . . .×Mr, y ∈ J sm, so that y = Φ(x) ,

and, moreover, define W := (x, y) ∈ V | DxΦ is injective . For all (x, y) ∈ W theinverse function theorem for manifolds [Lee13] implies the existence of the localinverse function Φ−1

x , so that we could make a definition as in (2.3.7).

K(x, y) :=

‖DyΦ

−1x , ‖, if Φ is locally invertible at y

∞, otherwise.(3.1.1)

Here ‖·‖ is the spectral norm on the respective tangent spaces, which is inducedby the norms on the respective ambient spaces.

A good definition of a condition number for the (JDP) should coincide on thesmooth locus J sm with definition given in (3.1.1). The definition proposed inDefinition 3.1.1 has this property, as we show in the following proposition.

Proposition 3.1.2. For all (x, y) ∈ V we have K(x, y) = κ(x).

Proof. Suppose that (x, y) ∈ W . The inverse function theorem for manifolds[Lee13] implies the existence of the local inverse function Φ−1

x , which is smooth withderivative DΦ(x)Φ

−1x = (DxΦ )−1. Thus, K(x, y) = ‖DyΦ

−1x ‖ = ‖(DxΦ )−1‖ and

hence, by the characterization of the smallest singular value from Lemma A.1.2 (4),we have K(x, y) = (ςmin(DxΦ ))−1 = κ(x). If, on the other hand, (x, y) ∈ V \W ,then DxΦ is not injective and, from the definition of the condition number for the(JDP), we see that κ(x) = ∞. If Φ were locally invertible on a neighborhood Uof x we would have Φ−1

x Φ = idU and hence DyΦ−1x DxΦ = idTxM1×...×Mr . This

contradicts DxΦ being not injective so that we also have K(x, y) =∞.

39

Page 54: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

Proposition 3.1.2 provides the justification to choose κ(x) as the condition num-ber of the (JDP). It moreover shows that Definition 3.1.1 is a generalization of theapproach to define a condition number from Subsection 2.3.2.

To finish the discussion in this section it remains to define a set of ill-posedinputs, similar to the definition made in (2.3.6). However, instead of defining theset of ill-posed inputs as y ∈ J | ∃x : (x, y) ∈ V \W we define

Σ := x ∈M1 × . . .×Mr | κ(x) =∞, (3.1.2)

which we call the set of ill-posed solutions. This puts the focus on local solutionsof the (JDP)—the algorithm presented in Chapter 4 computes local solutions.

Remark 3.1.3. It is possible that Σ = M1 × . . . × Mr, in which case we callJ defective in analogy with defective secant varieties; see Definition 2.2.9. Bydefinition, the condition number for all elements of defective join sets is ∞.

3.2. The condition number as distance toill-posedness

In this section we provide a characterization of the condition number as an in-verse distance to ill-posedness. Such a characterization is very common—see e.g.[BC13, pages 10, 16, 125, 204]—, but usually it is understood as a distance of theinput data to some set of ill-posed inputs. Here the situation appears differently.The distance term in which we describe the present condition number is definedin an auxiliary space, that is a product of Grassmann manifolds. Before we statethe main result of this section, Theorem 3.2.2, we need to define this distance.

The Grassmann manifold is defined as the set of linear spaces in KN that areof fixed dimension.

Gr(n,KN) :=L ⊂ KN | L is a linear space and dimL = n

.

The name ”manifold” is justified as Gr(n,KN) is indeed a manifold of dimensionn(N − n) [Won67]. The principal angles [BG73] θ1, . . . , θn between two spacesW,W ′ ∈ Gr(n,KN) are defined to be the arccosines of the singular values of U∗U ′,where U and U ′ are matrices, whose columns form an orthogonal basis for Wand W ′, respectively. By [Won67, Theorem 3] θ1, . . . , θn determine the relativeposition of W and W ′ completely. The chordal distance [DD09, Sec. 12.3] betweenthe spaces W,W ′ ∈ Gr(n,KN) is defined as

distchordal(W,W′) :=

√√√√ n∑j=1

(sin θj)2. (3.2.1)

40

Page 55: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.2. The condition number as distance to ill-posedness

For a tuple of positive integers n = (n1, . . . , nr) we define

Gr(n,KN) := Gr(n1,KN)× . . .×Gr(nr,KN).

We extend the definition of chordal distance to Gr(n,KN).

Definition 3.2.1 (Chordal distance in Gr(n,KN)). The chordal distance betweenW = (W1, . . . ,Wr) and W ′ = (W ′

1, . . . ,W′r) in Gr(n,KN) is defined as

distchordal(W,W′) :=

√√√√ r∑i=1

distchordal(Wi,W ′i )

2, (3.2.2)

where the chordal distance between the respective linear spaces Wi,W′i is defined

as in (3.2.1).

Note that for xi ∈Mi we have TxiMi ∈ Gr(dim(Mi),KN). We state the maintheorem of this section.

Theorem 3.2.2. Let x = (x1, . . . , xr) ∈M1 × . . .×Mr, ni := dimMi. Then

1

κ(x)= distchordal((Tx1M1, . . . ,TxrMr),ΣGr),

where the set ΣGr is defined as

ΣGr = (W1, . . . ,Wr) ∈ Gr(n,Kn) | dim(W1 + . . .+Wr) < n ,

with n := (n1, . . . , nr) and n := n1 + . . .+ nr.

The proof of Theorem 3.2.2 works as follows. We first work out the distance ofa fixed point W ∈ Gr(n,KN) to what is called a Schubert variety in Lemma 3.2.3below. In a similar fashion this was already done by Burgisser and Amelunxenin [AB12] (but only for K = R). The proof involves what is called the projectiondistance. Thereafter, we extend this result to r > 1, which will then lead us to aproof of Theorem 3.2.2 at the end of this section.

The projection distance on Gr(n,KN) is defined as

distp(W,W′) := ‖ΠW − ΠW ′‖ , (3.2.3)

where ΠW is the orthogonal projection onto W [GvL96, Sec. 2.6]. The projectiondistance equals the sine of the largest principal angle [YL14, Table 2], i.e.,

distp(W,W′) = sin

(max1≤i≤n

θi), (3.2.4)

where θ1, . . . , θn are the principal angles between W and W ′.

41

Page 56: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

The Schubert variety [GH78, Chap. 1, Sec. 5] associated to z ∈ KN , z 6= 0, is

Ez :=W ′ ∈ Gr(n,KN) | z ∈ W ′ (3.2.5)

Lemma 3.2.3. Let z ∈ S(KN). For all W ∈ Gr(n,KN):

1. distchordal(W, Ez) = distp(W, Ez),

2. distp(W, Ez) = miny∈S(W ) sin dP(z, y).

Here dP(z, y) denotes the angle between z and y; see (2.1.1).

Proof. The following proof is taken from [AB12].Let W ∈ Gr(n,KN). We construct a linear space W ′ ∈ Ez that minimizes the

distance from W to Ez in both the chordal and the projection distance proving1. and 2. Let α := ΠW z denote the orthogonal projection of z onto W andput a := α

‖α‖ , if α 6= 0. Otherwise, let a be some arbitrary but fixed point in

S(W ). Let a, a2, . . . , an be an orthonormal basis of W . We claim that the spaceW ′ = span z, a2, . . . , an has the desired properties.

For 1. observe that z, a2, . . . , an is an orthonormal basis for W ′. The principalangles between W and W ′ are given as the arccosines of the singular values of

[a a2 . . . an

]∗ [z a2 . . . an

]=

[〈a, z〉 0

0 In−1

].

Note that 〈a, z〉 = cos dP(a, z). By (3.2.4),

distp(W,W′) = sin dP(a, z) = distchordal(W,W

′). (3.2.6)

Furthermore, for W ′′ ∈ Gr(n,KN) with z ∈ W ′′ we have

distp(W,W′′) = ‖ΠW − ΠW ′′‖ ≥ ‖(ΠW − ΠW ′′)z‖ = ‖ΠW z − z‖

and, by (3.2.6),‖ΠW z − z‖ = sin dP(a, z) = distp(W,W

′).

This shows that we have distp(W,W′′) ≥ distp(W,W

′). Altogether this implies 1.For the second assertion we write y = µa+

∑ri=2 µiai ∈ S(W ), so that

dP(z, y) = arccos |〈z, y〉| = arccos |µ| |〈z, a〉| .

This implies, using (3.2.6),

miny∈S(W )

dP(z, y) = arccos |〈z, a〉| = distp(W,W′),

which finishes the proof

42

Page 57: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.2. The condition number as distance to ill-posedness

Next, we extend Lemma 3.2.3 to Gr(n,KN). To this end, we first we extend thedefinition of projection distance (3.2.3) from Gr(n,KN) to the product Gr(n,KN).The projection distance between W = (W1, . . . ,Wr) and W ′ = (W ′

1, . . . ,W′r) is

defined as

distp(W,W′) :=

√√√√ r∑i=1

distp(Wi,W ′i )

2, (3.2.7)

In the remainder of this section we let

SN×r := S(RN)r =[z1 . . . zr

]| ‖z1‖ = . . . = ‖zr‖ = 1

denote the subset of N × r matrices whose columns are of unit norm and

SN×r<r :=Z ∈ S N×r | rkZ < r

.

Furthermore, similar to (3.2.5), for Z =[z1 . . . zr

]∈ KN×r we define the gen-

eralized Schubert variety.

EZ :=

(W ′1, . . . ,W

′r) ∈ Gr(n,KN) | ∀1 ≤ i ≤ r : zi ∈ W ′

i

.

The following lemma is a direct consequence of Lemma 3.2.3.

Lemma 3.2.4. Let Z ∈ SN×r and W = (W1, . . . ,Wr) ∈ Gr(n,KN). Then wehave

1. distchordal(W, EZ) = distp(W, EZ).

2. distp(W, EZ) = minyi∈S(Wi),1≤i≤r(∑r

i=1 (sin dP(zi, yi))2) 1

2 .

Next to Lemma 3.2.4 we need a characterization of ΣGr.

Lemma 3.2.5. Let n = n1 + · · ·+ nr. The following holds.

1. W ∈ ΣGr if and only if there exists Z ∈ SN×r<r with W ∈ EZ.

2. For all Z ∈ SN×r<r we have EZ ⊂ ΣGr.

3. ΣGr equals the union of EZ over all Z ∈ SN×r<r .

4. ΣGr is an algebraic subvariety of Gr(n,KN).

5. For K = C the codimension of ΣGr in Gr(n,CN) is N − n+ 1.

Proof. The first two assertions are clear and the third is a direct consequenceof those two. To prove the fourth assertion we make a minor generalization of[Har92, Example 8.30]. The Plucker embedding [GKZ94, Chap. 3.1.C] identifies

43

Page 58: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

Gr(n,KN) with a subvariety of P(∧mKN). Under this point of view ΣGr equals([w1,1 ∧ · · · ∧ w1,n1 ], . . . , [wr,1 ∧ · · · ∧ wr,nr ]) |

r∧i=1

ni∧j=1

wi,j = 0

Hence, ΣGr is a projective subvariety of Gr(n,KN) cut out by the foregoing alge-braic equation.

It remains to prove 5. Let S denote the image of SN×r<r under the projectiononto P(KN) × . . .P(Kn) (r times). Observe that EZ depends only on the class ofZ in P(KN)r, so we may define EZ for Z ∈ P(KN)r, as well. Define the algebraicvariety

Ω :=

(Z,W ) ∈ S ×Gr(n,CN) | W ∈ EZ

Let π1, π2 be the projections from Ω onto the first and onto the second coordi-nate, respectively. Then π2(Ω) = ΣGr. By the fiber dimension theorem—see, e.g,[Sha13, Theorem 1.25]—we have

dimS + dim π−11 (Z) = dim Ω = dim ΣGr + dim π−1

2 (W ),

where W ∈ Gr(n,CN) and Z ∈ S are general. Hence,

dim ΣGr = dimS + dim π−11 (Z)− dim π−1

2 (W ).

We will compute all three quantities on the right hand side. We start with

dimS = dimZ ∈ CN×r | rkZ < r

− r = (r − 1)(N + 1)− r;

for the last equality see, e.g., [BC13, Proposition A.5]. The fiber of the matrixZ = [z1, . . . , zr] ∈ S is π−1

1 (Z) = Z × EZ and, by definition,

EZ =W ∈ Gr(n1,CN) | z1 ∈ W

× . . .×

W ∈ Gr(nr,CN) | zr ∈ W

,

so that

dimπ−1(Z) =r∑i=1

dimW ′ ∈ Gr(ni,CN) | zi ∈ W ′

=r∑i=1

(ni(N − ni)−N − ni)

=r∑i=1

(dim Gr(ni,CN)−N − ni)

= dim Gr(n,CN)− rN + n;

44

Page 59: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.2. The condition number as distance to ill-posedness

the second equality is [Won68, Theorem 2.1 (a)] and the third equality is due todim Gr(ni,CN) = ni(N−ni) [Won67]. Finally, write W = (W1, . . . ,Wr) and for all1 ≤ i ≤ r let Ui be a matrix whose columns form an orthonormal basis for Wi. PutU := [U1, . . . , Ur]. Then W ∈ EZ is equivalent to the existence of yi ∈ S(CdimMi),1 ≤ i ≤ r, such that Z can be written as Z =

[U1y1 . . . Uryr

]. Moreover, if

Z ∈ S, there exists a linear combination

0 =r∑i=1

λiUiyi = U

λ1y1...

λryr

A general W yields an U , whose kernel has dimension 1, so that π1(π−1

2 (W ))consists of a unique matrix Z ∈ S. This implies dimπ−1

2 (W ) = 0. Altogether:

dim ΣGr = (r − 1)(N + 1)− r + dim Gr(n,CN)− rN + n

= dim Gr(n,CN)− (N − n+ 1).

This finishes the proof.

An important consequence of Lemma 3.2.5 (4) is that ΣGr is closed in theEuclidean topology. Hence, for all W ∈ Gr(n,KN) there exists some W ′ ∈ ΣGr

with dist∗(W,W′) = dist∗(W,ΣGr) for both ∗ = chordal and ∗ = p. The following

proposition is a crucial step in the proof.

Proposition 3.2.6. Let W = (W1, . . . ,Wr) ∈ Gr(n,KN). Let Ui be a matrixwhose columns form an orthonormal basis for Wi, and set U =

[U1 · · · Ur

].

Then,distp(W,ΣGr) = distchordal(W,ΣGr) = ςmin(U).

Proof. Let W ′ ∈ ΣGr with distp(W,ΣGr) = distp(W,W′) (such a W ′ exists by the

foregoing discussion). According to Lemma 3.2.5 (1) there exists some Z ∈ SN×r<r

so that W ′ ∈ EZ . We get

distp(W, EZ) ≤ distp(W,W′) = distp(W,ΣGr) ≤ distp(W, EZ),

where the last inequality is because of EZ ⊂ ΣGr by Lemma 3.2.5 (2). Hence,distp(W,ΣGr) = distp(W, EZ), which shows

distp(W,ΣGr) =distp(W, EZ) = distchordal(W, EZ) ≤ distchordal(W,ΣGr).

We obtain distp(W,ΣGr) ≤ distchordal(W,ΣGr). In what follows we will prove that

distchordal(W,ΣGr) ≤ ςmin(U) ≤ distp(W,ΣGr), (3.2.8)

which would complete the proof.

45

Page 60: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

Using Lemma A.1.3 we conclude that ςmin(U) equals

minz∈S(Rr)

‖Uz‖ = minwi∈S(Wi),

1≤i≤r

minz∈S(Rr)

‖z1w1 + · · ·+ zrwr‖ (3.2.9)

= minwi∈S(Wi),

1≤i≤r

ςmin

([w1 · · · wr

]). (3.2.10)

Let Y =[y1 · · · yr

]with yi ∈ S(Wi) be a minimizer of the above expression, so

that ςmin(U) = ςmin(Y ). Lemma A.1.5 tells us that

ςmin(Y ) = minZ∈SN×r<r

( r∑i=1

(sin dP(zi, yi))2) 1

2. (3.2.11)

Let Z ∈ SN×r<r be a minimizer of the right-hand side of (3.2.11). We obtain

ςmin(U) =( r∑i=1

(sin dP(zi, yi))2) 1

2

≥ minwi∈S(Wi),

1≤i≤r

( r∑i=1

(sin dP(zi, wi))2) 1

2

= distchordal(W, EZ),

where the last step is by Lemma 3.2.4. Since rkZ < r, by Lemma 3.2.5 (2)we have EZ ⊂ ΣGr, and thus distchordal(W, EZ) ≥ distchordal(W,ΣGr). This yieldsthe left-hand inequality in (3.2.8). For proving the other inequality in (3.2.8), letW ′ = (W ′

1, . . . ,W′r) ∈ ΣGr with distp(W,ΣGr) = distp(W,W

′). By Lemma 3.2.5 (1)there exists Z ∈ SN×r<r , such that W ′ ∈ EZ . By the definition made in (3.2.7), wehave

distp(W,W′) =

( r∑i=1

∥∥ΠWi− ΠW ′i

∥∥2) 1

2 ≥( r∑i=1

∥∥(ΠWi− ΠW ′i

)zi∥∥2) 1

2.

For 1 ≤ i ≤ r, let wi = ΠWizi and yi := wi

‖wi‖ if wi 6= 0, or, if wi = 0,

let yi denote some arbitrary but fixed point in S(Wi). Then for each i, we have∥∥(ΠWi− ΠW ′i

)zi∥∥ = ‖wi − zi‖ = sin dP(zi, yi), and, hence,

distp(W,W′) ≥

( r∑i=1

(sin dP(zi, yi))2) 1

2 ≥ ςn([y1 · · · yr

])≥ ςn(U),

where the second step is due to Lemma A.1.5 and the last step because of (3.2.9).This finishes the proof.

46

Page 61: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.2. The condition number as distance to ill-posedness

We are now ready to wrap up the proof of Theorem 3.2.2.

Proof of Theorem 3.2.2. Let x = (x1, . . . , xr) ∈ M1 × · · · ×Mr. By linearity ofthe derivative we have

DxΦ : Tx1M1 × · · · × TxrMr → TyKN , (•x1, . . . ,

•xr) 7→

•x1 + · · ·+ •

xr.

Recall that we have put ni := dimMi. For 1 ≤ i ≤ r, let Ui ∈ KN×ni denote a ma-trix that contains an orthonormal basis for TxiMi and set U =

[U1 · · · Ur

]. By

Proposition 3.2.6 we have ςmin(U) = distchordal((Tx1M1, . . . ,TxrMr),ΣGr). Thisconcludes our proof of Theorem 3.2.2.

3.2.1. Example: The join of two copies of the unit circle

Let S denote the unit circle in R2. For illustrating the foregoing we investigate thecondition number κ(x) in the case r = 2 and M1 =M2 = S.

x1

x2

(W,W )

x1

x2

(W,W )

Figure 3.2.1.: This picture sketches why in Subsection 3.2.1 the relative position of the doubleline (W,W ) ∈ ΣGr to x1 and x2 depends on dP(x1, x2).

Observe that for n = (dimS, dimS) = (1, 1) we have

ΣGr =

(W,W ′) ∈ Gr((1, 1),R2) | dim(W +W ′) < 2

= (W,W ′) ∈ Gr(2, 1)×Gr(2, 1) | W = W ′ .

It is easy to see that 0 ∈ J (S,S) and that ( 20 ) ∈ J (S,S). By continuity and

symmetry we deduce that

J (S,S) =x ∈ R2 | ‖x‖ ≤ 2

.

Let x = x1 + x2 ∈ J (S,S) and W ∈ Gr(2, 1) be fixed. For i ∈ 1, 2, let θi denotethe angle between W and TxiS. By definition of the chordal distance in (3.2.2)

47

Page 62: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

and by Theorem 3.2.2 we have

κ(x1, x2)−1 = distchordal((Tx1S,Tx2S), (W,W )) =((sin θ1)2 + (sin θ2)2

) 12 .

This expression is minimized when θ1 = θ2. That is, if the angle between Tx1Sand Tx2S is less than π

2, then W is the line that halves the angle between Tx1S

and Tx2S and if this angle is larger than π2, then W is the line that is orthogonal

to the line that halves the angle between Tx1S and Tx2S—see Figure 3.2.1. Theangle between Tx2S and Tx2S equals dP(x1, x2), so that

κ(x1, x2)−1 =

√2 sin

(12dP(x1, x2)

), if 0 ≤ dP(x1, x2) < π

2√2 cos

(12dP(x1, x2)

), if π

2≤ dP(x1, x2) < π.

By the law of cosines we have ‖x1 + x2‖2 = 2 + 2 cos dP(x1, x2), which shows thatthe condition number of x1 + x2 only depends on ‖x1 + x2‖; i.e.

κ(x1, x2)−1 =

2 sin(

12

arccos(r2

2− 1))

, if√

2 ≤ r ≤ 2

√2 cos

(12

arccos(r2

2− 1))

, if 0 ≤ r ≤√

2,

where r = ‖x1 + x2‖; compare Subsection 3.2.1. The dependence of κ(x1, x2) on rreflects the fact that the generic point J (S,S) has two preimages in S×S, whereaspoints in with r = 2 have only one preimage and the origin has infinitely many.

0.00

0.25

0.50

0.75

1.00

0.0 0.5 1.0 1.5 2.0r

κ(p 1

,p2)

−1

Figure 3.2.2.: The plot shows the dependence of κ(x1, x2) on r = ‖x1 + x2‖ in the example fromSubsection 3.2.1. The plot was created using R [R C15].

48

Page 63: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.3. The condition number of open boundary points

3.3. The condition number of open boundary points

Contrary to the set of matrices of rank at most r, i.e., the r-th secant set of rank-one matrices, r-secant sets of dth order Segre and p-th order Veronese varietieswith p ≥ 3 are not closed in the Euclidean topology. The reason is that in theeuclidean closure there are points from the tangential variety that are not containedin the secant set. We already discussed this in Subsection 2.2.4.

In [dSL08, Theorem 1.1] de Silva and Lim gave an example for the 2nd secant setof the Segre variety S := S (Rn1 ,Rn2 ,Rn3), with n1, n2, n3 ≥ 2. Let a1, b1 ∈ Rn1 ,a2, b2 ∈ Rn2 , a3, b3 ∈ Rn3 be a pairs of independent vectors. Then the sequence ofrank-2 tensors

An := n(a1 +

1

nb1

)⊗(a2 +

1

nb2

)⊗(a3 +

1

nb3

)− n a1 ⊗ a2 ⊗ a3

for n→∞ converges to

A? := a1 ⊗ a2 ⊗ b3 + a1 ⊗ b2 ⊗ a3 + b1 ⊗ a2 ⊗ a3,

which is a tensor of rank three.As already discussed in Remark 3.0.1 this phenonemon is undesirable when one

wants to find the closest rank-r tensors to a given tensor, because such a bestapproximation may not exist after all.

Definition 3.3.1. A subset S ⊂ KN\ 0 is called a cone, if for every x ∈ S wehave tx ∈ S for all t ∈ K×.3

When the constituent manifoldsM1, . . . ,Mr of the join are cones and further-more J (M1, . . . ,Mr) = J (M1, . . . ,Mr), we show that points that come close tosuch open boundary points admit an unbounded condition number marking thosepoints as ill-posed (here denotes euclidean closure). The algorithm that wepropose in Chapter 4 does not converge to such points. Note that all nonsingularprojective varieties such as the Grassmann, Segre and the Veronese varieties satisfythe aforementioned assumptions.

Remark 3.3.2. A sequence of points admitting an unbounded sequence of conditionnumbers does not imply that the sequence converges to an open boundary point.

Moreover, de Silva and Lim explained that when a sequence of rank-2 tensorsy(1), y(2), . . .

⊂ σ2(S ) converges to some y? 6∈ σ2(S ), some of the summands

in x(i) = (x(i)1 , . . . , x

(i)r ) with y(i) = Φ(x(i)) become of unbounded norm while

Φ(x(i)) → y?. We also extend this result to join sets that belonging to the abovecategory. The main result of this section is stated next.

3Some author’s require that 0 ∈ S, but we omit this requirement, since adding 0 to the Segreor the Veronese variety would create a singularity. See also the discussion in Subsection 2.2.4

49

Page 64: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

Theorem 3.3.3. Let M1, . . . ,Mr ⊂ KN\ 0 be smooth manifold. Assume thatJ (M1, . . . ,Mr) = J (M1, . . . ,Mr), where denotes euclidean closure, and thatall of theMi are cones. Let xi(t) be a smooth curve inMi defined for all t ∈ (0, 1).If

y? = limt→0

(x1(t) + . . .+ xr(t)) 6∈ J (M1, . . . ,Mr),

thenlimt→0

κ(x1(t), . . . , xr(t))→∞,

and furthermore at least two components diverge:

∃1 ≤ i? < j? ≤ r : limt→0‖xi?(t)‖ → ∞ and lim

t→0‖xj?(t)‖ → ∞.

In the special case r = 2 and M :=M1 =M2 we even have

distchordal(Tx1(t)M,Tx2(t)M)→ 0.

The key property that is implicitly exploited in the proof of Theorem 3.3.3 isthe fact that the condition number of the (JDP) κ(x1, . . . , xr) is invariant underscaling of the individual factors when the Mi’s are cones. The reason for this isthat the tangent space to a point of a cone is invariant under scaling of that point.By Theorem 3.2.2 the condition number only depends on the relative position ofthe tangent spaces.

Proposition 3.3.4. Let M1, . . . ,Mr ⊂ KN\ 0 be smooth manifold embeddedand assume that all of the Mi are cones. Let (x1, . . . , xr) ∈M1 × . . .×Mr, thenfor all ti ∈ R, ti 6= 0, i = 1, 2, . . . , r, we have

κ(x1, x2, . . . , xr) = κ(t1x1, t2x2, . . . , trxr)

We can now present the proof of Theorem 3.3.3.

Proof of Theorem 3.3.3. Let x(t) := (x1(t), . . . , xr(t)) be a curve inM1× . . .×Mr

such thatlimt→0

Φ(x(t)) = y? 6∈ J .

Assume that the curve x(t) converges to x ∈M1 × · · · ×Mr. As Φ is continuousand J (M1, . . . ,Mr) = J (M1, . . . ,Mr), it follows that y? ∈ J (M1, . . . ,Mr),which is a contradiction.

The foregoing shows that x(t) does not converge inM1×· · ·×Mr, which impliesthat x(t) becomes unbounded, i.e., limt→0 ‖x(t)‖ → ∞. The assumed metric onM1 × . . . ×Mr is the product metric and thus ‖x(t)‖2 =

∑ri=1 ‖xi(t)‖2. This

implies that there exists 1 ≤ i? ≤ r such that ‖xi?(t)‖ → ∞. Without restrictionwe may assume that i? = 1. The sequence ‖x1(t) + . . .+ xr(t)‖ is assumed to

50

Page 65: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.3. The condition number of open boundary points

converge in K, so it is a bounded sequence for t near 0. Thus there must existanother 2 ≤ j? ≤ r so that ‖xj?(t)‖ → ∞.

We now show κ(x(t)) → ∞. For this purpose, let y(t) := Φ(x(t)). By Defini-tion 3.1.1 we have κ(x(t))−1 = ςn(Dx(t)Φ ), where, by linearity of the derivative,

Dx(t)Φ : Tx1(t)M1 × · · · × Txr(t)Mr → Ty(t)KN , (•x1, . . . ,

•xr) 7→

•x1 + · · ·+ •

xr,

To show that κ(x(t)) → ∞ we construct p(t) ∈ Tx1(t)M1 × · · · × Txr(t)Mr

satisfying both Dx(t)Φ p(t) → 0 and ‖p(t)‖ ≥ 1, which together would implythat ςn(Dx(t)Φ )→ 0. To this end let q(t), q(t) ⊂ S(KN) be two normalized curvesdefined by

α(t)q(t) = x1(t), and β(t)q(t) =∑j>1

xj(t). (3.3.1)

We can assume that α(t) and β(t) only admit positive real values, which makesthe choice of q(t), q(t) unique. By assumption, the sequence α(t)q(t) + β(t)q(t)converges as t→ 0, so there exists some constant c > 0 with

c ≥ ‖α(t)q(t) + β(t)q(t)‖2 = α(t)2 + 2α(t)β(t)< 〈q(t), q(t)〉+ β(t)2. (3.3.2)

Since α(t) is unbounded, β(t) must be unbounded, too. Write β(t) = γ(t)α(t). Byconstruction γ(t) admits only positive real values. Equation (3.3.2) becomes

cα(t)−2 ≥ 1 + 2γ(t)< 〈q(t), q(t)〉+ γ(t)2. (3.3.3)

Real solutions to this quadratic equation exist only if (< 〈q(t), q(t)〉)2 ≥ 1−cα(t)−2.Since |< 〈q(t), q(t)〉| ≤ |〈q(t), q(t)〉| and, by Cauchy-Schwartz, |〈q(t), q(t)〉|2 ≤ 1,the only solution consistent with (3.3.3) satisfies

limt→0< 〈q(t), q(t)〉 = −1. (3.3.4)

Since S(KN) ⊂ KN is a real riemannian manifold with induced inner product < 〈 , 〉(see Appendix B.3) equation (3.3.4) entails that we have q(t) + q(t)→ 0 as t→ 0.Moreover, because of homogeneity, xi(t) ∈ Txi(t)Mi for 1 ≤ i ≤ r. Therefore, forall t, we have the following tangent vector:

p(t) := (q(t), β(t)−1x2(t), . . . , β(t)−1xr(t)) ∈ Tx1(t)Mi × . . .× Txr(t)Mi.

We get

Dx(t)Φ p(t) = q(t) + β(t)−1

r∑j=2

xj(t) = q(t) + q(t)t→0−→ 0,

while ‖p(t)‖ ≥ ‖q(t)‖ = 1 as desired.

51

Page 66: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3. The condition number of join decompositions

In the special case r = 2 and M1 =M2 =:M, similar to equation (3.3.1), wewrite α(t)q(t) = x1(t), α(t) > 0, and further β(t)q(t) = x2(t), β(t) > 0. The mapx 7→ TxM is called the Gauss map of the manifold M. It is well known that theGauss map of a smooth manifold is continuous. Since q(t) + q(t)→ 0 as t→ 0 forsmall t the points q(t) and q(t) are close to being opposite points of the sphere.We therefore have

Tx1(t)M = Tq(t)Mt→0−→ T−q(t)M = Tx2(t)M.

All the equalities are by homogeneity. This concludes the proof.

It is natural to wonder whether the assumptions on the manifolds Mi are atall necessary in Theorem 3.3.3. The example below shows a join set that has anopen boundary point where the condition number neither converges nor divergestowards ∞. The example thus proves that it is necessary to impose some addi-tional requirement on the manifolds Mi to be able to predict the behavior of thecondition number near boundary points. We do not know which are the minimalrequirements on the Mi to exclude such phenomenons as described below.

Consider the embedded smooth manifolds

M1 :=

(−t, 0, 0) ∈ R3 | 1 < t <∞, and

M2 :=

(t,

cos(t)

t,sin(t2)

t

)∈ R3 | 1 < t <∞

.

For 1 ≤ t ≤ ∞ let x1(t) := (−t, 0, 0), and x2(t) :=(t, cos(t)

t, sin(t2)

t

). Then we

havelimt→∞

x1(t) + x2(t) = (0, 0, 0) 6∈ J (M1,M2).

Although x1(t) + x2(t) converges the condition number κ(x1(t), x2(t)) does notconverge, as we will show. We have

Tx1(t)M1 = span(1, 0, 0) and

Tx2(t)M2 = span

(1,− sin(t)t− cos(t)

t2, 2 cos(t2)− sin(t2)

t2

).

As in the example in Subsection 3.2.1 we deduce that

κ((x1(t), x2(t)))−1 = distchordal((Tx1(t)M1,Tx2(t)M2),ΣGr) =√

2 sin

(θ(t)

2

),

52

Page 67: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

3.3. The condition number of open boundary points

Figure 3.3.1.: The plot shows M1 and M2 for 1 ≤ t ≤ 30. We used Matlab R2015b [MAT15] tocreate this picture.

where θ(t) is the angle between Tx1(t)M1 and Tx2(t)M2; that is,

cos θ(t) =(

1 +((− sin(t)t− cos(t)t2

)2

+(

2 cos(t2)− sin(t2)

t2

)2)− 12;

For large t we have

θ(t) ∼ arccos((1 + 4 cos(t2))−

12

),

from which we see that κ((x1(t), x2(t))) does not converge.

Remark 3.3.5. This situation is particularly troublesome because there exist bothconvergent subsequences and divergent subsequences. Letting tk =

√2kπ and

sk =√kπ/2 yields

limk→∞

κ((x1(tk), x2(tk))) =

√1−

√1

5, and lim

k→∞κ((x1(sk), x2(sk))) =∞.

It is a theoretical possibility that an iterative algorithm for solving the associatedjoin approximation problem with x = 0 ∈ J \ J as input yields only iterates forwhich the condition number is bounded by some small constant. In this case, thecondition number cannot detect the convergence towards an open boundary point.

53

Page 68: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 69: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4. Riemannian optimization forleast-squares problems on joins

In this section we answers the questions Q2 and Q3. We will use the frame-work from [AMS08] to design and analyse an algorithm that solves the (JAP).In particular, [AMS08, Section 8.4.1] provides a Gauss–Newton method to solveoptimization problems like the (JAP) on Riemannian manifolds.

Note that the (JAP) is indeed defined on a Riemannian manifold. This isbecause, sinceM1, . . . ,Mr ⊂ KN are all smooth manifolds,M1× . . .×Mr ⊂ KrN

is a smooth manifold. The inner product on KrN induces an inner product onM⊂ KrN making it a Riemannian manifold, if K = R, or a Hermitian manifold,if K = C. In any case,M is a Riemannian manifold, so that [AMS08, Section 8.4.1]can be applied (see also Appendix B.3).

Recall from (3.0.1) the definition of the addition map Φ. The least-squares costfunction to be minimized is then

M1 × . . .×Mr → R, x 7→ 1

2‖Φ(x)− q‖2, for q ∈ RN . (4.0.1)

However, in what follows we consider the more general setting, where M is aRiemannian manifold (not necessarily the product from above) embedded in RM

for some M . We consider the least-squares cost function

f :M→ R, x 7→ 1

2‖F (x)‖2, (4.0.2)

with F :M→ RK being a smooth function fromM into some real vector space RK

with K ≥ dimM. Note that (4.0.1) is a special case of (4.0.2) for F (x) = Φ(x)−q.In the next subsection we design the Riemannian Gauss–Newton (RGN) method

for the minimization of a least-squares cost function of type (4.0.2). Thereafter, inRemark 4.2.1 we show that the condition number of the (JDP) appears naturallyin the bounds of both the convergence speed and the radius of attraction of thatRGN method applied to the (JAP).

55

Page 70: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4. Riemannian optimization for least-squares problems on joins

4.1. The Riemannian Gauss–Newton method

In this section we summarize the RGN method for a Riemannian manifold M asdescribed in [AMS08, Section 8.4.1].

Newton’s method for minimizing f (4.0.2) on M consists of computing theupdate direction at xk ∈M as the solution ηk of the linear system(

∇2xkf)ηk = −∇xkf,

where ∇xkf is the Riemannian gradient and ∇2xkf is the Riemannian Hessian of f

at xk; for details see, e.g., [AMS08, Chapter 6]. The next iterate is then given by

xk+1 = Rxk(ηk) ∈M,

where R is some retraction onM; see Appendix B.2.2. The Hessian matrix is notcomputed explicitly but rather it is approximated as

∇2xkf ≈ (DxkF )T (DxkF ),

where DxF : TxM → RK is the derivative of F at x. Since the gradient of f isgiven explicitly by

∇xk12〈F (x), F (x)〉 = 〈F (xk),DxkF 〉 = (DxkF )TF (xk), (4.1.1)

the RGN algorithm determines the update direction as the solution of((DxkF )T (DxkF )

)ηk = −(DxkF )TF (xk).

If DxkF is injective, the solution of this system is given explicitly by

ηk = −((DxkF )T (DxkF )

)−1(DxkF )TF (xk) = −(DxkF )†F (xk),

where A† is the Moore–Penrose pseudoinverse of the linear operator A; see Defini-tion A.1.6. In summary, the basic RGN method consists of the following algorithm:

Algorithm 2: The Riemannian Gauss–Newton method

1 Input: A starting point x0 ∈M.2 for k = 0, 1, 2, . . . do3 ηk = −(DxkF )†F (xk);4 xk+1 = Rxk(ηk);

5 end

56

Page 71: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4.2. Convergence analysis

4.2. Convergence analysis

Before we state the convergence analysis of the RGN method we want to make thefollowing remark.

Remark 4.2.1. The least squares problem (4.0.1) is given by F (x) = Φ(x)− q andwe have ςmin(DxF ) = ςmin(DxΦ ) = κ(x)−1. Note how ςmin(DxF ) is of influence inTheorem 4.2.2 below. From this we obtain a quantitive description to what extentthe condition number of the (JDP) governs the convergence behavior of the RGNmethod applied to solve the (JAP).

The following is the main theorem of this section.

Theorem 4.2.2 (Convergence of the RGN method). Let x? ∈M be a local min-imum of the smooth objective function

f :M→ R, x 7→ 1

2‖F (x)‖2,

such that Dx?F is injective. Let κ := ςmin(Dx?F )−1. There exists some ε′ > 0such that for all 0 < α < 1 there exists a constant c > 0 depending on ε′, F , x?,M and the chosen retraction operator R so that the following holds.

1. Linear convergence: For all x ∈M with ‖x− x?‖ < ε, where

ε := min(1− α)

cκ,

αε′

(1 + α + cκ2‖F (x?)‖)

,

one step of the RGN method, starting with x, generates a point y that satisfies

‖x? − y‖ ≤cκ2 ‖F (x?)‖

α‖x? − x‖+O(‖x? − x‖2).

2. Quadratic convergence: If x? is a zero of the objective function f , for all x ∈Mwith ‖x− x?‖ < ε, where

ε := min(1− α)

cκ,αε′

1 + α

,

one step of the RGN method, starting with x, generates a point y that satisfies

‖x? − y‖ ≤c(κ+ 1)

α‖x? − x‖2 +O(‖x? − x‖3).

Remark 4.2.3. The use of the constant c and the big-O-notation in the theoremis intended: The theorem is meant to display the quality of how the conditionnumber enters the RGN method. A quantitative convergence analysis is yet to beworked out.

57

Page 72: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4. Riemannian optimization for least-squares problems on joins

Before we go over to prove Theorem 4.2.2, we will have to consider the followinguseful lemma.

Lemma 4.2.4. Let x ∈ M. There exist constants rF > 0 and γF ≥ 0 such thatfor all points y ∈M with ‖x− y‖ < rF we have

F (y) = F (x) + (DxF )P (y − x) + vx,y, where ‖vx,y‖ ≤ γF‖y − x‖2,

and P denotes the orthogonal projection onto TxM.

Proof. Denote the projection onto the manifold M by

Π : RM →M, z 7→ argminy∈M

‖z − y.‖

The projection Π is a smooth function in the neighborhood of x; see [AM12, Lemma4]. By [AM12, Proposition 5], the map

Rx : RM →M, η 7→ Π(x+ η)

restricted to TxM is the projective retraction, which is a smooth and well-definedretraction for all ξ ∈ TxM in a neighborhood of 0x. Let y ∈M, τ = ‖y − x‖ andfix η = τ−1(y − x). Consider then the smooth function

G : R→ RM , t 7→ F (Rx(tη)).

By smoothness it admits a Taylor series approximation

G(τ) = G(0) + D0G · τ +O(τ 2),

which is well defined in a neighborhood τ | |τ | < rF of 0, where rF is a constantdepending on F . We have

G(0) = F (Rx(0)) = F (x) and G(τ) = F (Rx(‖x− y‖)) = F (Π|M (y)) = F (y).

As Dx(Π|M) = P by [AM12, Lemma 4], it follows from the chain rule that

D0G = DRx(0)F D0Rx η = DxF (DxP ) η = DxF P η.

Here η also denotes the map t 7→ tη. This concludes the proof.

We can now prove Theorem 4.2.2

Proof of Theorem 4.2.2. First, we make some general considerations:In Lemma B.2.7 we choose δ small enough such that it applies to all x within

a small radius ‖x? − x‖ < δ.

58

Page 73: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4.2. Convergence analysis

Let 0 < ε′ ≤ δ. Then, there exists a constant γR depending on the retractionoperator R, such that for all ‖x− x?‖ < ε′ we have

‖Rx(η)− (x+ η)‖ ≤ γR‖η‖2 for every η ∈ TxM, ‖η‖ < δ. (4.2.1)

By applying Lemma 4.2.4 to the smooth functions F and idM respectively andusing the smoothness of (the derivative of) F , we see that there exists constantsγR, γF , γI > 0, such that for all x ∈M with ‖x− x?‖ < ε′ we have

F (x?)− F (x)− (DxF )P (x? − x) = v with ‖v‖ ≤ γF‖x? − x‖2, (4.2.2)

x? − x− P (x? − x) = w with ‖w‖ ≤ γI‖x? − x‖2. (4.2.3)

Moreover, we define the constant

C := maxx∈M

‖x−x?‖<ε′

‖Dx?F −DxF ‖‖x? − x‖

and put

ε′′ := min 1

κγF,1− αCκ

,(

1 +1 +√

2Cκ2‖F (x?)‖α

)−1

ε′. (4.2.4)

We choose a constant c, depending on R,F and ε′, that satisfies

ε := min(1− α)

cκ,

αε′

(1 + α + cκ2‖F (x?)‖)

≤ ε′′.

andc ≥ max

√2C, γF , γR, γI

.

From now on let x ∈ M with ‖x− x?‖ < ε be fixed. First we show that DxF isinjective, from which it follows that the update direction

η = −(DxF )†F (x)

of the RGN method with starting point x is defined. By assumption, Dx?F isinjective and thus we have

κ−1 = ςmin(Dx?F ) > 0.

To abbreviate, in the following we let J := DxF and J? := Dx?F . Note thatκ =

∥∥J†?∥∥ . Then, we have

‖J? − J‖ ≤ C‖x? − x‖ ≤ Cε ≤ (1− α)κ−1 = (1− α)ςmin(J?), (4.2.5)

59

Page 74: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4. Riemannian optimization for least-squares problems on joins

It follows from Weyl’s perturbation lemma that

|ςmin(J?)− ςmin(J)| ≤ ‖J? − J‖ ≤ (1− α)ςmin(J?).

We obtain ςmin(J) > αςmin(J?) > 0, where the last inequality is by the assumptionα > 0. It follows that

‖J†‖ = ςmin(J)−1 <κ

α(4.2.6)

and that J is injective. This shows that the RGN update direction η is well defined.Let y := Rx(−J†F (x)) be the point that one step of the RGN method start-

ing at x produces. It remains to prove the bound on ‖x? − y‖ in each of theassertions 1. and 2.

We first cover assertion 1. If ‖η‖ = ‖J†F (x)‖ ≤ ε′, then the retraction satisfies(4.2.1), because of ε′ < δ. Furthermore, x? is a local minimum of the least squaresproblem (4.0.2), so that from (4.1.1) we obtain JT? F (x?) = 0. By the characteriza-tion of the pseudo inverse from Definition A.1.6 and our assumption of J? beinginjective we have that J†? = (JT? J?)

−1JT? . We conclude

J†?F (x?) = 0.

Moreover, from (4.2.2) we obtain

J†F (x) = J†F (x?)− P (x? − x)− J†v, (4.2.7)

so that

‖η‖ = ‖ − (J† − J†?)F (x?) + P (x? − x) + J†v‖≤ ‖J† − J†?‖‖F (x?)‖+ ‖P‖‖x? − x‖+ ‖J†‖‖v‖. (4.2.8)

From Theorem A.1.7 we obtain

‖J† − J†?‖ ≤√

2 ‖J†‖‖J†?‖‖J − J?‖ ≤√

2Cκ2

α‖x? − x‖, (4.2.9)

where the last step is because of ‖J? − J‖ ≤ C‖x? − x‖ as shown in (4.2.5)and by (4.2.6). Using ‖P‖ = 1 for any orthogonal projector, the assumption‖x? − x‖ ≤ (κγF )−1, and (4.2.6), (4.2.9), and the bound on ‖v‖ in (4.2.2), itfollows from (4.2.8) that

‖η‖ ≤(

1 +κγF ‖x? − x‖+

√2Cκ2‖F (x?)‖

α

)‖x? − x‖. (4.2.10)

60

Page 75: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4.2. Convergence analysis

By the definition of ε (4.2.4) and our assumption on x that ‖x? − x‖ < ε, we have

‖x? − x‖ < ε <1

κγF, (4.2.11)

so that, by (4.2.10),

‖η‖ ≤(

1 +1 +√

2Cκ2‖F (x?)‖α

)‖x? − x‖. (4.2.12)

Similar to (4.2.11), but using the third bound on ε in (4.2.4), we have

‖x? − x‖ < ε <(

1 +1 +√

2Cκ2‖F (x?)‖α

)−1

ε′,

which when plugged into (4.2.12) yields ‖η‖ < ε′. We conclude that (4.2.1) appliesto y = Rx(η) = Rx(−J†F (x)), so that

‖y − x?‖ = ‖Rx(−J†F (x))− x?‖ ≤ ‖x− J†F (x)− x?‖︸ ︷︷ ︸=:ζ

+γR‖η‖2. (4.2.13)

We use J†?F (x?) = 0 and the formula from (4.2.7) to derive that

ζ = ‖x− x? − (J†F (x)− J†?F (x?))‖= ‖x− x? − (J† − J†?)F (x?) + P (x? − x) + J†v‖ (4.2.14)

= ‖ − (J† − J†?)F (x?)− w + J†v‖ (4.2.15)

≤ ‖J† − J†?‖‖F (x?)‖+ γI‖x? − x‖2 + ‖J†‖γF‖x? − x‖2, (4.2.16)

where the second-to-last equality is due to (4.2.3), and in the last line we have usedthe triangle inequality and the bounds on ‖v‖ and ‖w‖ from (4.2.2) and (4.2.3).Combining this with (4.2.9) yields

ζ ≤√

2Cκ2 ‖F (x?)‖α

‖x? − x‖+ γI‖x? − x‖2 + ‖J†‖γF‖x? − x‖2,

for which we use (4.2.6) to derive that

ζ ≤√

2Cκ2 ‖F (x?)‖α

‖x? − x‖+(γI +

γFκ

α

)‖x? − x‖2, (4.2.17)

Plugging (4.2.17) and (4.2.12) into (4.2.13) yields

‖y − x?‖ ≤√

2Cκ2 ‖F (x?)‖α

‖x? − x‖+O(‖x? − x‖2). (4.2.18)

61

Page 76: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

4. Riemannian optimization for least-squares problems on joins

We have chosen the constant c large enough, so that√

2C < c. Pluggin this into(4.2.18) gives the first assertion.

For the second assertion we have the additional assumption that x? is a zero ofthe objective function f(x) = 1

2‖F (x)‖2. From (4.2.17) we obtain

ζ ≤(κγFα

+ γI)‖x? − x‖2.

From (4.2.8) we get

‖η‖2 = ‖P (x? − x) + J†v‖2

≤ ‖P (x? − x)‖2 + 2|〈P (x? − x), J†v〉|+ ‖J†v‖2

≤ ‖x? − x‖2 + 2γF‖J†‖‖x? − x‖3 + γ2F‖J†‖2‖x? − x‖4,

where the last step is by the Cauchy–Schwartz inequality and the fact that ‖P‖ = 1for orthogonal projectors. Again, plugging these bounds for ζ and ‖η‖ into (4.2.13)we get

‖x? − y‖ ≤(κγFα

+ γI + γR)‖x? − x‖2 +O(‖x? − x‖3). (4.2.19)

We have chosen the constant c so that c ≥ max γF , γI , γR. Using this inequalityand α < 1 in (4.2.19) yields the second assertion.

62

Page 77: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rankdecomposition

5.1. The condition number of tensor rankdecomposition

In this section we want to return to the initial motivation of this section—the con-dition number of tensors rank decomposition. In particular, we want to investigatethe previously defined condition number of the (JDP) in the case when the join inquestion is the r-secant set of the Segre variety S = S (Kn1 ⊗ · · · ⊗Knp); i.e.,

σr(S ) = J (S , . . . ,S︸ ︷︷ ︸r times

),

see Definition 2.2.6. This condition number yields the answer to question Q1 fromthe introduction.

In the remainder of this section, let (x1, . . . , xr) ∈ S r. Recall from Defini-tion 3.1.1 that

κ(x1, . . . , xr) = ςmin([U1 . . . Ur

])−1, (5.1.1)

where Ui is a matrix whose columns form an orthonormal basis for TxiS . Nextwe give an explicit formula for each Ui in terms of the xi.

Proposition 5.1.1. For 1 ≤ i ≤ r let xi = a(1)i ⊗ · · · ⊗ a

(p)i . For each pair (i, k)

write α(k)i := ‖a(k)

i ‖−1 a(k)i and let Q

(k)i ∈ Knk×(nk−1) be a matrix whose columns

form an orthonormal basis for the orthogonal complement of a(k)i in Knk ; i.e., an

orthonormal basis for (a(k)i )⊥ := x ∈ Knk | 〈x, a(k)

i 〉 = 0. Then the matrices Uiin (5.1.1) can be taken as

Ui :=[a

(k)i ⊗ · · · ⊗ a

(p)i Q

(1)i ⊗ a

(2)i ⊗ · · · ⊗ a

(p)i · · · a

(1)i ⊗ · · · ⊗ a

(p−1)i ⊗Q(p)

i

].

Proof. By Terracini’s Lemma [Lan12, Section 4.6.2] the columns of the proposedUi form a basis for TxiS . We use Lemma 2.2.15 (1) to verify that all the columnsare of unit norm and are pairwise orthogonal.

When the tensor rank decomposition is generically identifiable (Definition 2.2.9)we refer to κ(x1, . . . , xr) as the condition number of A = x1 + . . .+ xr.

63

Page 78: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

Using essentially the same arguments as in [Van16, Section 8] we prove thefollowing property of the condition number,

Proposition 5.1.2. The geometric condition number is scale and orthogonallyinvariant: for every A = Φ(x1, . . . , xr) ∈ σr(S ), all t1, . . . , tr ∈ R, ti 6= 0, and allQ = Q1 ⊗ · · · ⊗Qd with Qk ∈ U(nk) it holds that

κ(x1, . . . , xr) = κ(t1Qx1, . . . , trQxr).

Proof. Invariance with respect to scaling has been discussed in Proposition 3.3.4,and is also implied by Proposition 5.1.1. Orthogonal invariance follows by notingthat the columns of QUi, where Ui is given in (5.1.1), form an orthonormal basisof TQxiS . The result follows by the orthogonal invariance of singular values.

Proposition 5.1.2 implies that the condition number is constant on the secantplane

t1x1 + . . .+ trxr | t1, . . . , tr ∈ K\ 0 .

So we can regard the condition number of the tensor rank decomposition also asa condition number of secant planes.

We describe a collection of well-posed tensor-rank decompositions. Recall from[BDHR15, CS09, Kol01, ZG01] that a tensor A is orthogonally decomposable ofrank r, if A has a decomposition as

A =r∑i=1

a(1)i ⊗ · · · ⊗ a

(p)i , (5.1.2)

such that for each k the vectors a(k)1 , . . . , a

(k)r are pairwise orthogonal. A generaliza-

tion of orthogonally decomposable tensors are weak 3-orthogonal tensors of rank r.The tensor A is called weak 3-orthogonal, if in (5.1.2) for every 1 ≤ i < j ≤ r thereexist 1 ≤ k1 < k2 < k3 ≤ p, depending on i and j, such that there is orthogonality

in the corresponding factors: 〈a(k1)i , a

(k1)j 〉 = 〈a(k2)

i , a(k2)j 〉 = 〈a(k3)

i , a(k3)j 〉 = 0.

Proposition 5.1.3. The following holds.

1. In the special case r = 1 let x ∈ S . Then κ(x) = 1.

2. Let x ∈ S r. If Φ(x) is a 3-weak-orthogonal tensor of rank r, then κ(x) = 1.

I.e.; tensors of rank one and weak 3-orthogonal tensors are well-posed.

Proof. Claim 1. is implied by the fact that matrices with orthonormal columshave singular values all equal to 1. Since weak 3-orthogonality implies that all thecolumns of the matrix

[U1 . . . Ur

]are pairwise orthogonal, claim 2. is proved by

the same argument.

64

Page 79: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

5.1.1. Empirical distribution of the condition number

We make the following experiment. For i ∈ 1, 2 we sample the entries of the

three vectors a(1)i , a

(2)i , a

(3)i ∈ R3 all independently with distribution N(0, 1). Then

we put xi = a(1)i ⊗a

(1)i ⊗a

(3)i . We compute the condition number κ(x1, x2) with the

formula of Proposition 5.1.1. An implementation in MATLAB [MAT16] is givennext.

U=ze ro s ( 2 7 , 1 4 ) ;f o r r =1:2

%%% Sample the a i ˆ( j ) and compute orthonormal b a s i s e sQ1=eye ( 3 ) ; Q1( : ,1 )= normrnd ( 0 , 1 , [ 1 3 ] ) ; [ Q1 , R]= qr (Q1 ) ;Q2=eye ( 3 ) ; Q2( : ,1 )= normrnd ( 0 , 1 , [ 1 3 ] ) ; [ Q2 , R]= qr (Q2 ) ;Q3=eye ( 3 ) ; Q3( : ,1 )= normrnd ( 0 , 1 , [ 1 3 ] ) ; [ Q3 , R]= qr (Q3 ) ;

%%% Extract the f i r s t column o f Q i (= s c a l e d a i ˆ( j ) )a1=Q1 ( : , 1 ) ; a2=Q2 ( : , 1 ) ; a3=Q3 ( : , 1 ) ;

%%% Def ine the matr i ce s Q iQ1=Q1 ( : , 2 : 3 ) ; Q2=Q2 ( : , 2 : 3 ) ; Q3=Q3 ( : , 2 : 3 ) ;

%%% Compute the matrix U iU( : , ( r−1)∗7+1: r ∗7)=[ kron ( kron ( a1 , a2 ) , a3 ) , . . .

kron ( kron (Q1, a2 ) , a3 ) , . . .kron ( kron ( a1 ,Q2) , a3 ) , . . .kron ( kron ( a1 , a2 ) ,Q3 ) ] ;

end

%%% Compute the cond i t i on number with a s i n g u l a r va lue decompos it ions=svd (U) ;kappa=1/(min ( s ) ) ;

We executed this experiment 10000 times. The outcome is shown in Fig-ure 5.1.1.

5.2. Condition number theorem for the tensor rankdecomposition

Let r > 1 and x1, . . . , xr ∈ S := S (Kn1 , . . . ,Knp). We put T := Kn1⊗ · · · ⊗ Knp .For technical reasons we assume that dimT > 4.

In Section 3.2 we described the condition number of the tensor rank decompo-sition at (x1, . . . , xr) as the inverse distance of (Tx1S , . . . ,TxrS ) to ill-posedness:Recall from Theorem 3.2.2 that

1

κ(x1, . . . , xr)= distchordal((Tx1S , . . . ,TxrS ),ΣGr), (5.2.1)

65

Page 80: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

0.0

0.1

0.2

0.3

0 10 20 30 40 50

κ(x1,x2)

rela

tive

freq

uenc

yDistribution of the condition number.

Figure 5.1.1.: Empirical distribution of the condition number obtained in the experiment fromSubsection 5.1.1 (127 data points with κ(x1, x2) > 50). The plot was created using R [R C15].

This characterization is indeed useful, but conceals the metric situation on T. Inother words, the characterization (5.2.1) can only give a qualitative answer to thequestion

If x = (x1, . . . , xr) is close to an ill-posed tuple, what is κ(x)? (5.2.2)

Of course, the assignment x → κ(x) is continuous, and hence κ(x) → ∞ as xapproaches an ill-posed tuple. But this does not yield a quantitative descriptionof how fast the condition number grows. In this section a first advance is made togive a quantitative answer to question (5.2.2).

Note that, by Proposition 5.1.2, the condition number is invariant under scalingof the xi. Hence, any attempt to connect the condition number in terms of inversedistance to ill-posedness on S r must involve some sort of angular distance. Thismeans that we have to consider the product PS × · · · × PS of projective Segrevarieties. The main theorem of this section is as follows.

66

Page 81: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

Theorem 5.2.1 (A condition number theorem for the tensor rank decomposition).Let (x1, . . . , xr) ∈ S r. Then

1

κ(x1, . . . , xr)≤ distweighted(([x1], . . . , [xr]),ΣP)

whereΣP = ([x1], . . . , [xr]) ∈ (PS )r | κ(x1, . . . , xr) =∞

(compare (3.1.2)) and the distance distweighted is defined in Definition 5.2.2 below.

We prove Theorem 5.2.1 at the end of this section. To explain the weighteddistance distweighted put

n := dim S

and consider the projective Segre map (2.2.8)

σP : P(Kn1)× · · · × P(Knp)→ PS , ([v1], . . . , [vp]) 7→ [v1 ⊗ · · · ⊗ vp], (5.2.3)

which by [Lan12, Section 4.3.4.] is a diffeomorphism. Each of the P(Kni) is aRiemannian manifold endowed with the Fubini-Study inner product 〈 , 〉; seeLemma B.3.3. We use those inner products to define the weighted inner product〈 , 〉weighted on the tangent spaces of PS as follows. For a ∈ P(Kn1)×· · ·×P(Knp)

and all•a,

b ∈ Ta(P(Kn1)× · · · × P(Knp)) we put

〈 •a,•

b〉weighted = 〈( •a1, . . . ,•ap), (

b1, . . . ,•

bp)〉weighted :=

p∑i=1

(n− ni)〈•ai,

bi〉. (5.2.4)

Definition 5.2.2. We define the weighted inner product 〈 , 〉weighted on PS tobe the inner product that makes σP a Riemannian isometry (see Definition B.2.2).The Riemannian metric on (PS )r is the product metric. The induced distanceon (PS )r is denoted distweighted.

In what follows the Riemannian metric on PS is the one defined in Defini-tion 5.2.2.

Remark 5.2.3. The distance distweighted can be seen as the angular distance onthe product of a collections of r · p spheres S(1,1) × · · · × S(p,r), where for each1 ≤ i ≤ p the spheres S(i,1), . . . ,S(i,r) ⊂ Kni are of radius

√n− ni. This shows

that for n1 < n2 relative errors in the factor P(Kn1) weigh more than relative errorsin the factor P(Kn2); c.f. Figure 5.2.1.

Recall from (3.2.2) the chordal distance defined on Gr(n,T). If W,W ′ ⊂ T aresubspaces of dimension n = dim S , the chordal distance between them is definedas distchordal(W,W

′) = (∑n

i=1(sin θi)2)

12 , where the θi are the principal angles be-

tween W and W ′. The chordal distance, however, does not come from Riemannian

67

Page 82: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

a1

∆a1

φ

a2

∆a2

φ

tanφ = ‖∆a2‖‖a1‖ = ‖∆a2‖

‖a2‖

Figure 5.2.1.: The picture depicts relative errors in the weighted distance.The relative errors ofthe tangent directions ∆a1 and ∆a2 are both equal to tanφ, but the contribution to the weighteddistance marked in red is larger for the large circle. The error for the large circle ”weighs” more.

metric on Gr(n,T). There is a unique orthogonally invariant Riemannian metricon Gr(n,T) (see [Lei61]1) and the associated distance is given by

dist(W,W ′) =

√√√√ n∑i=1

θ2i .

In what follows Gr(n,T) will be fixed to have this orthogonally invariant Rie-mannian structure. Since, for all −π

2< θ < π

2we have sin(θ) ≤ θ, for all

W,W ′ ∈ Gr(n,T), we have

dist(W,W ′) ≥ distchordal(W,W′) (5.2.5)

The proof of Theorem 5.2.1 uses the following important result.

Proposition 5.2.4. The map

G : PS → Gr(n,T), [v1 ⊗ · · · ⊗ vp] 7→ Tv1⊗···⊗vpS

is a Riemannian immersion.

Remark 5.2.5. Note that G is not the Gauss map of PS , which is why results onthe classification of homothetical Gauss maps like [N90, Theorem 1] do not ap-ply here.1There is a unique orthogonally invariant metric on Gr(n,KN ) except when n = 2, N = 4. But

since we assume dimT > 4, this case does not appear here.

68

Page 83: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

Remark 5.2.6. In [Zak93, Section 2] the map G is called the n-th Gauss map ofthe projective Segre variety PS (the classical Gauss map of PS would be the(n− 1)-th Gauss map in that reference).

Proposition 5.2.4 is at the heart of this section. We postpone its proof to theend of the next section, because we first derive from it a proof for Theorem 5.2.1.

Proof of Theorem 5.2.1. Since G is a Riemannian immersion the r-fold product

Gr : (PS )r → Gr(n,T)r, ([x1], . . . , [xr]) 7→ (Tx1S , . . . ,Tx1S )

is a Riemannian immersion, where Gr(n,T)r is endowed with the product metric(see (B.2.1)). We denote the associated distance on Gr(n,T)r by dist. Recall fromTheorem 3.2.2 the definition of ΣGr and note that, by construction, Gr(ΣP) ⊂ ΣGr.By Lemma B.2.4 (1) this implies that

distweighted(([x1], . . . , [xr]),ΣP) ≥ dist((Tx1S , . . . ,Tx1S ),ΣGr).

The inequality (5.2.5) also holds for the respective distances in Gr(n,T)r so that

distweighted(([x1], . . . , [xr]),ΣP) ≥ distchordal((Tx1S , . . . ,Tx1S ),ΣGr).

By Theorem 3.2.2 the latter is equal to κ(x1, . . . , xr)−1, which proves the assertion.

It remains to prove Proposition 5.2.4. Consider the commutative diagram:

P(Kn1)× · · · × P(Knp)σP //

GσP ))

PS

G

Gr(n,T)

(5.2.6)

To prove that G is a Riemannian immersion, we will actually prove that G := GσPis a Riemannian immersion. Since σP is a Riemannian isometry, this will yield theassertion on G.

Remark 5.2.7. If K = C, by [Zak93, Proposition 2.10], G : PS → Gr(n,T) is abirational isomorphism ( means Zariski closure).

5.2.1. Orthonormal frames

We need some auxiliary results about orthonormal frames.

Definition 5.2.8 (Orthonormal frames). An orthonormal frame in Kn is an or-dered orthonormal basis of Kn. We write orthonormal frames as ordered tu-ples (u(0), u(1), . . . , u(n)).

69

Page 84: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

The following proposition will be useful

Proposition 5.2.9. Let γ : [−ε, ε]→ S(Rn) be a curve with

a := γ(0), and•a :=

dtγ(0) 6= 0.

Then there exists a curve (u(1)(t), . . . , u(n−1)(t), a(t)) in the set of orthonormalframes with the following properties holding at t = 0:

1.•u

(1)= −

∥∥ •a∥∥ a

2. u(1) =∥∥ •a∥∥−1 •

a.

3. For all 1 ≤ j ≤ n− 1: 〈 •u(1), u(j)〉 = 0.

4. For all 2 ≤ j ≤ n− 1: 〈 •a, u(j)〉 = 0.

5. For all 2 ≤ j ≤ n− 1:•u

(j)= 0.

Here we have put u(j) = u(j)(0) and•u

(j):= du(j)(t)

dt(0), 1 ≤ j ≤ n− 1.

Proof. We construct u(1)(t), . . . , u(n−1)(t) explicitly. Since TaS(Rn) = a⊥, we have

〈a, •a〉 = 0. Put u(1) :=∥∥ •a∥∥−1 •

a. We can completea, u(1)

to an orthonormal

basisa, u(1), u(2), . . . , u(n−1)

of Kn. Let U denote a orthogonal transformation

that rotates a to u(1), u(1) to −a and leaves spanu(2), . . . , u(n−1)

fixed.

Put a(t) := γ(t), u(1)(t) := U a(t) and for 2 ≤ j ≤ n − 1 let u(j)(t) = u(j)

be constant (for a sketch of this construction see Figure 3.2.1). By construction,conditions 2., 4. and 5. hold. Moreover,

•u

(1)=

du(1)(t)

dt(0) = U

da(t)

dt(0) = U

•a =

∥∥ •a∥∥Uu(1) = −

∥∥ •a∥∥ a,

which implies 1. and 3. This finishes the proof.

Figure 5.2.2.: A sketch of the construction made in the proof of Proposition 5.2.9.

70

Page 85: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

5.2.2. Proof of Proposition 5.2.4

We now prove Proposition 5.2.4. In what follows we abbreviate Pn−1 := P(Kn).Recall from (5.2.6) the diagram.

Pn1−1 × · · · × Pnp−1 σP //

GσP ((

PS

G

Gr(n,T)

Since, σP is a Riemannian isometry, by Lemma B.2.3 (1), it suffices to show thatG := G σP is a Riemannian immersion. Recall from [GKZ94, Chap. 3.1.] thePlucker embedding ι : Gr(n,T) → P(ΛnT) and consider the following diagram:

Pn1−1 × · · · × Pnp−1

G

((G:=ιG

P (ΛnT) Gr(n,T)ι

oo

The set P := ι(Gr(n,T)) ⊂ P (ΛnT) is a smooth variety and the Fubini-Study metric (see (B.3.1)) on P (ΛnT) makes P a Riemannian manifold. In[Bur17, Lemma 4.1] it is shown that for K = C the Plucker embedding is anisometry. By essentially the same arguments, one shows that this is also truefor K = R. Therefore, by Lemma B.2.3 (2), to prove that G is a Riemannianimmersion it suffices to show that G := ι G is a Riemannian immersion.

According to Definition B.2.2 we have to prove that for all a ∈ Pn1−1×· · ·×Pnp−1

and for all•a,

b ∈ Ta(Pn1−1 × · · · × Pnp−1) we have

〈 •a,•

b〉weighted = 〈DaG•a,DaG

b〉.

However, by Lemma B.2.5 it suffices to prove that∥∥ •a∥∥ =

∥∥DaG•a∥∥ . (5.2.7)

Let a := (a1, . . . , ap) ∈ Pn1−1 × · · · × Pnp−1 be fixed. For each i we also denoteby ai ∈ S(Kni) a representative for ai ∈ Pni−1. We have

DaG : Ta1Pn1−1 × · · · × Ta1Pnp−1 → TG(a)P (ΛnT).

In order to describe DaG we compute G(a). Let a⊥i be the orthogonal complementof ai in Kni ; i.e., a⊥i = x ∈ Kni | 〈x, ai〉 = 0 . By [BC13, Lemma 14.11] we canidentify a⊥i = TaiPni−1 and the inner product on TaiPni−1 is given by the standardinner product on a⊥i .

71

Page 86: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

Let u(1)i , . . . , u

(ni−1)i denote an orthonormal basis of the orthogonal comple-

ment a⊥i . By [Lan12, Section 4.6.2], a basis for TaS is given by

f ∪f(i,j) | 1 ≤ i ≤ p, 1 ≤ j ≤ ni

,

where

f := a1 ⊗ · · · ⊗ ap, (5.2.8)

f(i,j) := a1 ⊗ · · · ⊗ ai−1 ⊗ u(j)i ⊗ ai+1 ⊗ · · · ⊗ ap. (5.2.9)

If we let π denote the canonical projection π : ΛnT→ P (ΛnT), we therefore have

G(a) = G(a1, . . . , ap) = π(f ∧

( p∧i=1

ni−1∧j=1

f(i,j)

)); (5.2.10)

see [GKZ94, Chap. 3.1.C]. (In particular, the following arguments are independent

of the choice of the ni-frames (u(1)i , . . . , u

(ni−1)i , ai).)

From this point on the proof works as follows. We take the derivative of G(a)using the expression (5.2.10). Then we prove equality (5.2.7). To do the latter weidentify TG(a)P (ΛnT) with the orthogonal complement g⊥, where

g = g(a) := f ∧( p∧i=1

ni−1∧j=1

f(i,j)

);

However, we have to take into account that the inner product on g⊥ is then given interms of the standard inner product 〈 , 〉 on g⊥ ⊂ ΛnT as ‖g‖−1 〈 , 〉; see (B.3.1).Hence, we furthermore have to compute ‖g‖.

Among the three tasks the first we complete is to compute ‖g‖. Using thecomputation rules for inner products from Lemma 2.2.15 (1) we get

〈f, f〉 =

p∏i=1

〈ai, ai〉 = 1, (5.2.11)

〈f, f(i,j)〉 = 〈ai, u(j)i 〉∏k 6=i

〈ak, ak〉 = 0. (5.2.12)

〈f(i,j), f(k,`)〉 =

1, if (i, j) = (k, `)

0, else.(5.2.13)

Let ∗ denote conjugate transpose and I the identity matrix. Then the above canbe expressed as [f, f(1,1), . . . , f(p,np)]

∗[f, f(1,1), . . . , f(p,np)] = I. By Lemma 2.2.15 (2),

this gives ‖g‖2 = det(I) = 1.

72

Page 87: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

Note that DaG•a = Dg(a)π Dag

•a. Thus, in order to show

∥∥DaG•a∥∥ =

∥∥ •a∥∥

weigthed

it suffices to show that

Dg(a)π Dag•a = Dag

•a and (5.2.14)∥∥Dag

•a∥∥ =

∥∥ •a∥∥

weighted. (5.2.15)

Remark 5.2.10. For (5.2.14) we have to show that Dag•a ∈ g⊥. If K = R, this

is is already given by ‖g‖ = 1. If K = C, however, from ‖g‖ = 1 we only getthat Dag ∈ g⊥ ⊕ Rig; see Lemma B.1.3.

From now on fix

•a := (

•a1, . . . ,

•ap) ∈ Ta

(Pn1−1 × · · · × Pnp−1

)= Ta1Pn1−1 × · · · × TapPnp−1.

The product rule of differentiation yields

Dag•a =

p∑r=1

dg

dar

•ar =

p∑r=1

Fr +

p∑r=1

p∑i=1

ni−1∑j=1

F(i,j),r, (5.2.16)

( ddar

denotes the partial derivative in direction ar) where

Fr :=df

dar

•ar ∧ f(1,1) ∧ f(1,2) ∧ . . . ∧ f(p,np−1), (5.2.17)

F(i,j),r := f ∧ f(1,1) ∧ . . . ∧ f(i,j−1) ∧df(i,j)

dar

•ar ∧ f(i,j+1) ∧ . . . ∧ f(p,np−1). (5.2.18)

To show (5.2.14) and (5.2.15) we will have to prove the following claim:

Claim. For all i we can choose the ni-frame (u(1)i , . . . , u

(ni−1)i , ai), such that

1. For all r: Fr = 0.

2. If i = r: F(i,j),r = 0.

3. For all (i, j) and r:⟨g, F(i,j),r

⟩= 0

4.⟨F(i,j),r, F(k,`),s

⟩=

∥∥ •ar∥∥2, if (i, j) = (k, `) and r = s 6= i,

0, else.

The claim together with (5.2.16) implies that

Dag•a =

p∑r=1

p∑i=1,i 6=r

ni−1∑j=1

F(i,j),r. (5.2.19)

By the third assertion of the claim and (5.2.19) we have 〈g,Dag•a〉 = 0, which is

equivalent to Dag•a ∈ g⊥ and hence Dg(a)π Dag

•a = Dag

•a verifying (5.2.14).

73

Page 88: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

From (5.2.19) we see that

∥∥Dag•a∥∥2

=

p∑r,s=1

p∑i=1,i 6=r

p∑k=1,k 6=s

ni−1∑j=1

nk−1∑`=1

⟨F(i,j),r, F(k,`),s

⟩The fourth assertion of the claim implies that

∥∥Dag•a∥∥2

=∑p

s=1 λs∥∥ •as∥∥2, where

λs =

p∑k=1k 6=s

(nk − 1) = −ns + 1− p+

p∑k=1

nk = n− ns,

the last equality by the formula for dim S from (2.2.9). Hence,

∥∥Dag•a∥∥2

=

p∑s=1

(n− ns)∥∥ •ar∥∥2

=∥∥ •a∥∥2

weighted,

which proves (5.2.15), and, consequently, proves Proposition 5.2.4.

Proof of the claim. Put•u

(j)i =

du(j)i

dai

•ai. According to Proposition 5.2.9, we can

choose the orthonormal ni-frame (u(1)i , . . . , u

(ni−1)i , ai) in Rni , such that

•u

(1)i = −

∥∥ •ai∥∥ ai. (5.2.20)

u(1)i =

∥∥ •ai∥∥−1 •

ai. (5.2.21)•u

(2)i = . . . =

•u

(ni)i = 0. (5.2.22)

For 1 ≤ j ≤ ni − 1 : 〈 •u(1)i , u

(j)i 〉 = 0. (5.2.23)

For 2 ≤ j ≤ ni − 1 : 〈 •ai, u(j)i 〉 = 0. (5.2.24)

By (5.2.8) we have

df

dar

•ar = a1 ⊗ · · · ⊗ ar−1 ⊗

•ar ⊗ ar+1 ⊗ · · · ⊗ ap, (5.2.25)

which by the previous choice of the orthonormal frames becomes

df

dar

•ar =

∥∥ •ar∥∥ a1 ⊗ · · · ⊗ ar−1 ⊗ u(1)

i ⊗ ar+1 ⊗ · · · ⊗ ap =∥∥ •ar∥∥ f(r,1).

We plug this into (5.2.17) to see that

Fr =∥∥ •ar∥∥ f(r,1) ∧ f(1,1) ∧ f(1,2) ∧ . . . ∧ f(p,np−1) = 0 (5.2.26)

proving the first assertion.

74

Page 89: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

Furthermore, by (5.2.9), if r 6= i, we have

df(i,j)

dar

•ar = a1⊗· · ·⊗ ai−1⊗u(j)

i ⊗ ai+1⊗· · ·⊗ ar−1⊗•ar⊗ ar+1⊗· · ·⊗ ap (5.2.27)

and, if r = i,

df(i,j)

dai

•ai =

a1 ⊗ · · · ⊗ ai−1 ⊗

•u

(1)i ⊗ ai+1 ⊗ · · · ⊗ ap, if j = 1

0, if j > 1.(5.2.28)

The second equality shows that for r = i and j > 1, we have

F(i,j),i = f ∧ f(1,1) ∧ . . . ∧ f(i,j−1) ∧df(i,j)

dai

•ai ∧ f(i,j+1) ∧ . . . ∧ f(p,np−1) = 0.

Note that, by choice of the orthonormal frame, in particular by (5.2.20), for j = 1we have

df(i,1)

dai

•ai = −

∥∥ •ai∥∥ a1 ⊗ · · · ⊗ ai−1 ⊗ ai ⊗ ai+1 ⊗ · · · ⊗ ap = −

∥∥ •ai∥∥ f.

This shows that for i = r and j = 1 we have

F(i,1),i = −∥∥ •ai∥∥ f ∧ f(1,1) ∧ . . . ∧ f ∧ f(i,2) ∧ . . . ∧ f(p,np−1) = 0.

Summarizing, we have shown that F(i,j),i = 0 for all i and j proving the secondassertion of the claim.

We proceed by proving the third and fourth assertion of the claim. Using (5.2.8),(5.2.20)–(5.2.24) and (5.2.28) and that for 1 ≤ i ≤ p we have 〈 •ai, ai〉 = 0, we getthe following identities for inner products.⟨

f,df

dar

•ar

⟩= 0 (5.2.29)⟨

f(i,j),df(k,`)

dar

•ar

⟩= 0 (5.2.30)⟨

df

dar

•ar,

df

das

•as

⟩=

∥∥ •ar∥∥2, if r = s

0, else(5.2.31)

⟨df(i,j)

dar

•ar,

df(i,j)

das

•as

⟩=

∥∥ •ar∥∥2, if r = s 6= i

0, else(5.2.32)

(Some inner products are missing in this list, but in the following (5.2.29)–(5.2.32)suffice.)

Lemma 2.2.15 (2) yields a recipe to compute the inner products of the claim:

75

Page 90: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5. Application: The tensor rank decomposition

We have to consider the matrices

B :=[f, f(1,1), . . . , f(p,np)

]and

B(i,j),r :=

[f, f(1,1), . . . ,

df(i,j)

dar

•ar, . . . , f(p,np)

].

In the following we call the column f(i,j) the (i, j)-th column. For the third assertionof the claim we have

〈g, F(i,j),r〉 = detB ∗B(i,j),r.

By (5.2.12), (5.2.13) and (5.2.30) The (i, j)-th row of B ∗B(i,j),rr is zero. Hence,we have 〈g, F(i,j),r〉 = 0 showing the third assertion

It remains to prove the fourth assertion of the claim. We have⟨F(i,j),r, F(k,`),s

⟩= detB ∗(i,j),rB(k,`),s.

We distinguish two cases. If (i, j) 6= (k, `), we use (5.2.12), (5.2.13) and (5.2.30)to deduce that that the (k, `)-th row of B ∗(i,j),rB(k,`),s is a zero row, which implies

that⟨F(i,j),r, F(k,`),s

⟩= 0. If (i, j) = (k, `) and r 6= i, we have

B ∗(i,j),rB(i,j),s =

〈f, f〉 · · · 〈f, df(i,j)

das

•as〉 . . . 0

.... . .

... 0

〈df(i,j)dar

•ar, f〉 · · · 〈

df(i,j)dar

•ar,

df(i,j)das

•as〉 · · · 0

......

. . . 00 · · · 0 · · · 1

Hence, by (5.2.32)

⟨F(i,j),r, F(i,j),s

⟩=

∥∥ •ar∥∥2, if r = s 6= i

0, else.

This proves the claim and finishes the proof.

76

Page 91: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

5.2. Condition number theorem for the tensor rank decomposition

This page is intentionally left blank.

77

Page 92: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 93: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Part II.

Numerical analysis of solving foreigenpairs of tensors

79

Page 94: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 95: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

6.1. From rank-one approximation of symmetrictensors to eigenpairs

In the preceeding part we have investigated the behavior of the rank-r decom-position of a tensor under perturbation within the set of rank-r tensors σr(S )(or σr(V ) for symmetric tensors). We defined the condition number for the (JDP)and connected this condition number to the (JAP) in Chapter 4, were we gave analgorithm—called RGN method—that iteratively computes a (locally) best rank-rapproximation to a tensor.

One vast limitation of that algorithm, though, is that, in order to converge, thefirst iterate must itself be close to a rank-r tensor. When we find ourselves in asituation, where we know that the tensor A, whose rank decomposition we seek,is of a certain rank (due to some physical reasons, for instance) and the data Awe are given is just some natural perturbation of that tensor, this limitation isnot harmful. In this case A may serve as the first iterate for the RGN method tocompute the decomposition of A. In other situations, however, when we do nota priori know the rank of A, the RGN method is of no help. Consider, for instance,the case when A ∈ Sp(Kn) is symmetric. The dimension of σr(V ) is neglectablewhen compared to the dimension of the ambient space (K⊗p); see (2.2.9). Thus,when we do not know r and without any further information, there is no chanceto design a RGN method (which includes specifying the rank) with first iterate A.

Nevertheless, in the special case r = 1 and K = R there is further information.We describe this for symmetric tensors.

Remark 6.1.1. The following deduction will eventually lead to the definition ofeigenpairs of tensors. A similar deduction, but for general tensors, leads to thenotion of singular values of tensors; see [FMPS13,Lim06].

Recall from (2.2.10) the Veronese map ν : Rn → (Rn)⊗p, x 7→ x⊗· · ·⊗x =: x⊗p.The Veronese map parametrizes the Veronese variety as V = ν (Rn\ 0) and wemay use this parametrization to pose the following optimization problem for asymmetric tensor A ∈ Sp(Rn):

argminx∈Rn\0

1

2

∥∥A− x⊗p∥∥2. (6.1.1)

It is important that the parametrization via ν is a differentiable parametrization

81

Page 96: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

[Lan12, Section 4.3.4.]. This is what makes the case r = 1 differ from r > 1 as itenables us to solve (6.1.1) by taking derivatives. We have

1

2

∥∥A− x⊗p∥∥2=

1

2‖A‖2 − 〈A, x⊗p〉+

1

2

∥∥x⊗p∥∥2. (6.1.2)

Note that, because V is unbounded, critical points of (6.1.2) are always localminima. A local minimum of (6.1.1) is hence given by a solution of

Dx〈A, x⊗p〉 −1

2Dx ‖x‖2p = 0. (6.1.3)

(we have used Lemma 2.2.15 (1) to write ‖x⊗p‖2= ‖x‖2p.) Note that 〈A, x⊗p〉 is a

homogeneous polynomial of degree p in x. Letting A = (ai1,...,ip) we can write thepolynomial as

〈A, x⊗p〉 =∑

1≤i1,...,ip≤n

ai1,...,ip xi1 · . . . · xip .

In the DTI example in Section 1.3 of the introduction we denoted this polynomialby QA(x), and, by Lemma 1.4.1, we have

Dx〈A, x⊗p〉 = DxQA = p

1≤i1,...,ip−1≤nai1,...,ip−1,1 xi1 · . . . · xip−1

...∑1≤i1,...,ip−1≤n

ai1,...,ip−1,n xi1 · . . . · xip−1

T

.

As in the introduction we want to emphasize that the system on the right is definedfor general tensors A ∈ (Cn)⊗p—not only for real symmetric tensors. This leadsto the following definition.

Definition 6.1.2. Let A ∈ (Cn)⊗p be general. We denote

Axp−1 :=

1≤i1,...,ip−1≤nai1,...,ip−1,1 xi1 · . . . · xip−1

...∑1≤i1,...,ip−1≤n

ai1,...,ip−1,n xi1 · . . . · xip−1

.In the notation of Definition 6.1.2 equation (6.1.3) becomes

Axp−1 = ‖x‖2(p−1) x. (6.1.4)

Note that, though we defined the symbol Axp−1 for general tensors, the deductionof (6.1.4) was restricted to involve only real symmetric tensors. However, we

82

Page 97: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.1. From rank-one approximation of symmetric tensors to eigenpairs

are free to pose (6.1.4) for general tensors, as well. This motivates the followingdefinition, that we had informally already given in Section 1.4:

Definition 6.1.3. Let A ∈ (Cn)⊗p. If the pair (v, λ) ∈ (Cn\ 0)× C satisfies

Avp−1 = λv,

it is called an eigenpair of the tensor A. The vector v is called an eigenvector andthe number λ is called an eigenvalue of A.

Note that, if x satisfies (6.1.4), then it is easily seen that (x, ‖x‖2(p−1)) is aneigenpair of A. If, on the other hand, (v, λ) is an eigenpair of A, then x := tv

with tp = λ ‖v‖2(1−p) and t ∈ R satisfies (6.1.4).We have illustrated in the Higher Order Markov Chain-example from the in-

troduction that eigenpairs, although motivated with rank-one approximation ofreal symmetric tensors, find applications for general tensors. This is why we stateDefinition 6.1.3 in this generality. In what follows we will consider eigenpairs oftensors independent of rank-one approximation. Still, the reader should keep inmind the relation between them and that a lot of the following results can beapplied to rank-one approximation, too.

Remark 6.1.4. From another point of view the definition of eigenpairs of tensorsis as follows. Any tensor A = (ai1,...,ip) ∈ (Cn)⊗p can be seen as a multilinear mapA : Cn × · · · × Cn → C, defined by A(ei1 , . . . , eip) = ai1,...,ip , where e1, . . . , ep isthe standard basis in Cn. Then, the pair (v, λ) is an eigenpair of A, if and only ifA(v, . . . , v, ei) = vi, for all 1 ≤ i ≤ n.

Remark 6.1.5. For p = 2 (i.e., the matrix case), the definition of eigenpairs coin-cides with the well-known definition of eigenpairs from linear algebra. Attemptsto generalize this to p > 2 were motivated by tensor analysis [Qi05,Qi07], spectralhypergraph theory [LQY13] or optimization [Lim06]. An overview on publicationsin the subject can be found in [LNQ13], where the authors use the term ”spectraltheory of tensors”.

We have now transferred the problem of finding critical rank-one approximationsof a symmetric tensor to the problem of finding eigenvalues of that tensor. In fact,solving for eigenpairs of a tensor (symmetric or general) means solving a particularpolynomial system: For A ∈ (Cn)⊗p we have

(v, λ) is an eigenpair of A if and only if Avp−1 − λv = 0, (6.1.5)

and AXp−1 − λX is a system of n polynomials in n+ 1 variables.The overall goal of this part is to design and analyse an algorithm to solve

systems of the form Avp−1 − λv = 0. The algorithm we aim at is a homotopymethod; see, e.g., [BC13, Section 15.1]. Before we turn to this algorithm, however,we will first have to investigate the geometry of the eigenpair equation. Moreover,

83

Page 98: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

we argue why existing homotopy methods perform poorly when applied to theeigenpair problem. A of lot the arguments we give are on the average. Thismeans that we consider the expected performance of the algorithm when the tensorA ∈ (Cn)⊗p is complex Gaussian (see Definition 2.4.10). In the progress of thischapter we will give answers to questions Q5, Q8 and Q9 from the introduction.

6.2. The number of eigenpairs of a tensor

Recall from Definition 6.1.3 that the pair (v, λ) is an eigenvalue of the tensor A,if Avp−1 = λv. If we want to design an algorithm to solve this equation, we mustnow how many solutions possibly exist. A first oberservation is that, if (v, λ) is aneigenpair of A, then (tv, tp−2λ) is an eigenpair of A for all t ∈ C×. Thus, countingall eigenpairs of A is not meaningful. Following Cartwright and Sturmfels [CS13]we make the following definition.

Definition 6.2.1. Let A ∈ (Cn)⊗p and let (v, λ), (w, η) be eigenpairs of A. We callthe pairs equivalent, if there exists some t ∈ C× such that v = tw and λ = tp−2η.

The idea from Qi [Qi07] is to consider normalized eigenpairs from each equiva-lence class of eigenpairs.

Definition 6.2.2 ([Qi07, Section 3]). Let A ∈ (Cn)⊗p and (v, λ) ∈ (Cn\ 0)×Cbe an eigenpair of A. The number λ is called E-eigenvalue of A, if vTv = 1.

Note that we can normalize any eigenpair (v, λ) provided that vTv 6= 0, andthat vTv = 0 poses an algebraic equation on the eigenvector, which for a generictensor A is not satisfied. Moreover, observe that (tv)T (tv) = vTv if and only ift = ±1. Hence, each equivalence class of eigenpairs yields exactly one E-eigenvalue,if p is even, and exactly two E-eigenvalues, if p is odd.

The number of E-eigenvalues of a generic tensor is given as the degree of theE-characteristic polynomial.

Definition 6.2.3 ([Qi07, Section 4]). Let A ∈ (Cn)⊗p. The E-characteristic poly-nomial ΨA is defined as follows. If p = 2h is even, define ΨA(λ) to be the mul-tivariate resultant of G(x) = Axp−1 − λ (xTx)h−1 x. If p is odd, ΨA(λ) is themultivariate resultant of G(x0, x) = [Axp−1 − λxp−2

0 x, x20 − xTx].

The following is [Qi07, Theorem 4].

Theorem 6.2.4. If λ is an E-eigenvalue of A ∈ (Cn)⊗p, then it holds thatΨA(λ) = 0. If moreover [Axp−1 = 0, xTx = 0] has no solution, this ”if” isan ”if and only if”.

In [GQWW07] the degree of Ψ is computed for even p ≥ 4. The most thoroughresult, however, is the one from Cartwright and Sturmfels [CS13].

84

Page 99: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.3. Geometric framework I

Theorem 6.2.5 ([CS13, Theorem 1.2 and Theorem 5.5]). The number of equiv-alence classes for a generic tensors A ∈ (Cn)⊗p and the number of equivalenceclasses of eigenpairs of a generic symmetric tensor A ∈ Sp(Cn) both are equal to

n−1∑i=0

(p− 1)i =(p− 1)n − 1

p− 2.

Note that Theorem 6.2.5 answers question Q7 when counting complex solutions.In [ASS15] Abo, Seigal and Sturmfels specify the ”generic” from that theorem.

Proposition 6.2.6 ([ASS15, Section 4]). The set of tensors A ∈ (Cn)⊗p, whosenumber of eigenpairs is not equal to the count from Theorem 6.2.5, is an algebraicvariety defined by a single homogeneous polynomial of degree n(n − 1)(p − 1)n−1.This polynomial is called the eigendisriminant and the variety is called the eigendis-criminant variety.

6.3. Geometric framework I

Recall from (6.1.5) that we have identified eigenpairs a tensor A as the solutionsof the polynomial system AXp−1− `X = 0. In what follows it makes the formulaseasier to grasp, if we write d := p− 1 for the degrees.

6.3.1. Eigenpairs of homogeneous polynomial systems

Let us denote by Hn,d the vector space of homogeneous polynomials of degree d inthe variables X := (X1, . . . , Xn) over the complex numbers C of degree d = p− 1.We make the following important definition.

Definition 6.3.1. Let A ∈ (Cn)⊗p and put d := p− 1. We define

fA(X) := AXp−1 ∈ (Hn,d)n.

It is easy to see that (v, λ) is a an eigenpair of A, if and only if fA(v) = λv.Note that the map

f : (Cn)⊗p → (Hn,d)n, A 7→ fA(X)

is a surjective, but not an injective linear map. Henceforth, to avoid redundancywe make the following definition.

Definition 6.3.2. Let f ∈ (Hn,d)n. We call the pair (v, λ) ∈ (Cn\ 0) × C an

eigenpair of the homogeneous polynomial system, if f(v) = λv. Moreover, we calltwo eigenpairs (v, λ), (w, η) of f equivalent, if (v, λ) = (tw, td−1η) for some t ∈ C×.

85

Page 100: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

Furthermore, it will be convenient to work with the following notation.

Definition 6.3.3. Let ` be an auxiliary variable. We define the map

F : (Hn,d)n → (C[X, `])n , f(X) 7→ f(X)− `X.

We write Ff (X, `) := F (f)(X, `).

6.3.2. The Bombieri-Weyl basis

The Bombieri-Weyl basis is a basis for (Hn,d)n with preferable properties. If the

inner product on (Hn,d)n is defined to be the standard inner product with respect

to that basis, this inner product is invariant under unitary transformations.

Definition 6.3.4 (The Bombieri-Weyl basis and inner product). For all α ∈ Nn

with |α| := α1 + . . .+ αn = d we define

eα :=(dα

) 12Xα1

1 · . . . ·Xαnn , where

(dα

):=

d !

α1! · . . . · · ·αn!.

It is common to use the multiindex notation Xα := Xα11 · . . . · Xαn

n . The seteα | |α| = d is called the Bombieri-Weyl basis for Hn,d. We define an innerproduct on Hn,d via ⟨∑

α

aαeα,∑α

bαeα

⟩:=∑α

aαbα.

The inner product extends to (Hn,d)n in the following way. For all f = (f1, . . . , fn)

and g = (g1, . . . , gn) ∈ (Hn,d)n we define

〈f, g〉 :=n∑i=1

〈fi, gi〉.

Moreover, for f ∈ (Hn,d)n we set ‖f‖ :=

√〈f, f〉.

Remark 6.3.5. Suppose that f = (f1, . . . , fn) ∈ (Hn,d)n and fi =

∑α ai,αeα, 1 ≤

i ≤ n. Let k := dim(Hn,d)n and put A := (ai,α) ∈ Cn×k. Then ‖f‖ = ‖A‖F , where

‖ ‖F is the Frobenius norm (see Definition A.1.1).

In the following, (Hn,d)n will always be endowed with the Bombieri-Weyl inner

product. Moreover, it will be convenient to abbreviate

H := Hn,d,

as long as it does not cause confusion.

86

Page 101: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.3. Geometric framework I

Proposition 6.3.6 ([BC13, Prop. 16.16]). Let x ∈ (Cn\ 0). We define thesubspaces Z(x) := f ∈ H | f(x) = 0, R(x) := f ∈ H | f(x) = 0,Dxf = 0,L(x) := R(x)⊥ ∩ Z(x) and C(x) := Z(x)⊥. Then, H decomposes orthogonallyas

H = C(x)⊕ L(x)⊕R(x). (6.3.1)

Moreover, the map defined by x⊥ → L(x), a 7→√d〈X, x〉daTX, where x⊥ is the

orthogonal complement of x in Cn, is an isometry.

6.3.3. The solution manifold

Recall from Definition 6.3.3 that we have put

Ff (X, `) = f(X)− `X

and observe that this equation consists of two parts, one homogeneous of degree dand one homogeneous of degree 2. The derivative of Ff at (v, λ) has the followingmatrix representation:

D(v,λ)Ff =[Dvf − λIn, −v

], (6.3.2)

where In denotes the n×n-identity matrix. Next, we define the solution manifold(cf. [BC13, Sec. 16.2]).

Definition 6.3.7. The set

V := (f, v, λ) ∈ Hn × S(Cn)× C | Ff (v, λ) = 0 ,

is called the solution manifold and its subset

W :=

(f, v, λ) ∈ V | rk D(v,λ)Ff = n

is the manifold of well-posed triples.

The situation is illustrated by the following diagram, where π1, π2 denote theprojections on the respective factors.

Vπ2

%%

π1

~~Hn S(Cn)× C

(6.3.3)

Note that for all t ∈ C×, we have

(f, v, λ) ∈ V , if and only if (tf, v, tλ) ∈ V. (6.3.4)

87

Page 102: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

The tangent space to (f, v, λ) ∈ V is described in the following lemma.

Lemma 6.3.8. The following holds:

1. V is a connected and smooth submanifold of Hn×S(Cn)×C of dimension equalto dimR V = dimRHn + 1.

2. The tangent space of V at (f, v, λ)inV equals

T(f,v,λ)V =

(•

f,•w,

λ) ∈ Hn × TvS(Cn)× C |•

f(v) + D(v,λ)Ff (•w,

λ) = 0.

3. The tangent space of V at (f, v, λ) ∈ W is given by

T(f,v,λ)V =

(•

f,•v+riv,

λ) ∈ Hn×(v⊥⊕Riv)×C | ( •v,•

λ) = −D(v,λ)Ff |−1v⊥×C

f(v).

Proof. The map G : Hn × S(Cn)×C→ Cn, (f, v, λ) 7→ Ff (v, λ) has V as its fiberover 0. The derivative of G,

D(f,v,λ)G : Hn × TvS(Cn)× C→ Cn, (•

f,•v,

λ) 7→•

f(v) + D(v,λ)Ff (•v,

λ),

is surjective. Therefore 0 ∈ Cn is a regular value of G and Theorem A.9 in [BC13]implies the assertions 1. and 2.

For 3. let (f, v, λ) ∈ W and let (•

f,•w,

λ) ∈ T(f,v,λ)V . By 1. we have that•w ∈ TvS(Cn) and by Lemma B.1.3 we have TvS(Cn) = v⊥ ⊕ Riv. Hence,we can uniquely write

•w =

•v + riv for some

•v ∈ v⊥ and r ∈ R. Note that,

since rk D(v,λ)Ff = n, we have

ker D(v,λ)Ff = C(v, λ) = R(v, λ)⊕ Ri(v, λ).

Hence, (TvS(Cn) × C) ∩ ker D(v,λ)Ff = Ri(v, λ). In particular, D(v,λ)Ff |v⊥×C is

invertible and D(v,λ)Ff (•w,

λ) = D(v,λ)Ff (•v,

λ), which we can use to write T(f,v,λ)Vin the desired form.

We close this section by noting that the group U(n) of unitary linear transfor-mations Cn → Cn acts on Hn and V , respectively, via

U.f := U f U−1 and U.(f, v, λ) := (U.f, Uv, λ). (6.3.5)

Proposition 6.3.9. 1. The set W is invariant under the group action.

2. U(n) acts by isometries on Hn.

Proof. For the first assertion consider (compare (6.3.2))

D(Uv,λ)FU.f =[DUvU.f − λIn, −Uv

]= U D(v,λ)Ff

[U−1 0

0 1

],

88

Page 103: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.4. Gaussian tensors and Gaussian polynomial systems

which implies rk D(Uv,λ)FU.f = rk D(v,λ)Ff . The second claim can be found in, e.g.,[BC13, Theorem 16.3].

The following lemma will be useful later.

Lemma 6.3.10. Let e1 := (1, 0, . . . , 0) ∈ S(Rn) and (f, v, λ) ∈ V . We denote theevaluation map by evalv(f) := f(v). Then det(evalvevalTv ) = det(evale1evalTe1).

Proof. Let U be a unitary map with Uv = e1. We have evalv = U−1evale1 U . ByProposition 6.3.9 the map f 7→ U.f is an isometry, which implies the assertion.

6.3.4. The eigendiscriminant variety

We define the set of ill-posed triples.

Definition 6.3.11. The set Σ′ := V \W is called the set of ill-posed triples in V .The set of ill-posed problems is Σ := π1(Σ′) (compare (2.3.6)).

Note that we have (f, v, λ) ∈ Σ′ if and only if (v, λ) is not a simple root of thepolynomial Ff . Thus, f ∈ π1(Σ′) if and only if Ff has a double root or f hasinfinitely many roots. From this we see that Σ is in fact the eigendiscriminantfrom Proposition 6.2.6 in disguise. We put

D(n, d) :=n−1∑i=0

di. (6.3.6)

Then, Theorem 6.2.5 and Proposition 6.2.6 yield the following.

Proposition 6.3.12. Let f ∈ Hn.

1. If f 6∈ Σ, the number of distinct equivalence classes of f equals D(n, d).

2. The set Σ is a closed hypersurface of Hn of degree at most n(n− 1)dn−1.

6.4. Gaussian tensors and Gaussian polynomialsystems

Recall from Definition 2.4.10 the notion of complex and real Gaussian tensors. Inthis subsection we compute the distribution of fA (see Definition 6.3.1) for thoserandom tensor models. Futher, we compute the distribution of QA (the polynomialassiociated to A; see (2.2.3)) for a real symmetric Gaussian tensor A.

Definition 6.4.1. Let f ∈ Hn. We say that f is standard normal, if the coefficientsof f in the the Bombieri-Weyl basis (see Definition 6.3.4) are i.i.d NC(0, 1)-randomvariables. We write f ∼ NC(Hn). The density of f is given by π− dimHn exp(−‖f‖2).

89

Page 104: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

Definition 6.4.2. We say that f ∈ Hn is real standard normal, if the cofficientsof f in the Bombieri-Weyl basis are i.i.d. N(0, 1), random variables. We write

f ∼ N(Hn). The density of f is given by√

2π− dimHn

exp(−1

2‖f‖2).

If f ∼ NC(Hn), by Proposition 6.3.9, we have that for all unitary transforma-tions U ∈ U(n)

U.f ∼ f. (6.4.1)

The same holds true for f ∼ N(Hn) and orthgonal transformations. The followingrelates Gaussian tensors to Gaussian polynomial systems.

Lemma 6.4.3. Let d = p− 1.

1. Let A ∈ (Cn)⊗p be complex Gaussian. Then fA ∼ NC((Hn,d)n).

2. Let A ∈ (Rn)⊗p be real Gaussian. Then fA ∼ N((Hn,d)n).

3. Let A ∈ (Rn)⊗p be real symmetric Gaussian. Then QA ∼ N(Hn,p).

Proof. We first show 1. Write fA(X) = (f1(X), . . . , fn(X)). We have to show that

every coefficient of each fi in the Bombieri-Weyl basis eα =√(

)Xα is i.i.d.

NC(0, 1)-distributed. Fix 1 ≤ j ≤ n and suppose that

fj(X) =∑

α1+...+αn=d

λα

√(dα

)Xα. (6.4.2)

By definition of fA we have

fj(X) =∑

1≤i1,...,id≤n

ai1,...,id,j Xi1 . . . Xid . (6.4.3)

Comparing (6.4.2) and (6.4.3) reveals that for each α we have

λα

√(dα

)=

∑(i1,...,id):

Xα=Xi1 ...Xid

ai1,...,id,j. (6.4.4)

Note that all the λα, by construction, are independent. Applying the rule ofsummation of normal distributed random variables from Lemma 2.4.4 (2) to thisequation shows that

λα

√(dα

)∼ NC(0, σ2),

whereσ2 = # (i1, . . . , id) | Xα = Xi1 . . . Xid =

(dα

).

Hence, λα ∼ NC(0, 1), by Lemma 2.4.4 (1). The same proof can be applied toprove 2.

90

Page 105: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.5. Integration on the solution manifold

For 3. we only have one single equation, but we can nevertheless proceed until(6.4.4), so that

QA(X) =∑

α1+...+αn=p

λα

√(pα

)Xα,

where

λα

√(pα

)=

∑(i1,...,ip):

Xα=Xi1 ...Xip

ai1,...,ip . (6.4.5)

Now we have to take into account the dependencies among the variables of A. Notethat any group of dependent variables ai1,...,ip appears in only one of the summandsλα, which shows that the λα are independent. Lemma 2.4.4 (2) in combination

with (6.4.5) tells us that λα

√(pα

)∼ N(0,

(pα

)σ2), where the variance of ai1,...,ip is

σ2 =(pα

)−1. Hence, λα ∼ N(0, 1). This finishes the proof.

6.5. Integration on the solution manifold

Recall from (6.3.3) the projections π1, π2. For a measurable map Ξ : V → R weput

Ξ av(f) :=1

2πD(n, d)

∫(f,v,λ)∈π−1(f)

Ξ(f, v, λ) d(f, v, λ). (6.5.1)

The meaning of Ξ av is explained in the following lemma.

Lemma 6.5.1. Let ρ denote the density of the probability distribution that isobtained by choosing f ∼ NC(Hn) and then choosing (v, λ) uniformly at randomfrom the eigenpairs of f . Then

E(f,v,λ)∼ρ

Ξ(f, v, λ) = Ef∼NC(Hn)

Ξ av(f).

Proof. First we want to show that the procedure indeed defines a probabilitydistribution on V . According to Proposition 6.3.12, the fiber

π−11 (f) = (f, v, λ) ∈ Hn × S(Cn)× C | (f, v, λ) ∈ V

over f 6∈ Σ consists of D(n, d) disjoint circles, each of them having volume 2π.Hence the density of the uniform distribution on V (f) equals (2πD(n, d))−1. Theclaim follows then, since NC(Hn) is the forward distribution (see Definition 2.4.1)of the distribution given by ρ.

The following technical result will be needed in various occasions throughoutthe paper.

91

Page 106: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

Lemma 6.5.2. Suppose that Ξ : V → R is a measurable and unitarily invariantfunction; that is, for all U ∈ U(n) we have Ξ(U.(f, v, λ)) = Ξ(f, v, λ). Let In−1

denote the (n− 1)× (n− 1) identity matrix. Then

Ef∼NC(Hn)

Ξav(f) =vol S(Cn)

2πn+1 D(n, d)

∫λ∈C

E(λ)e−|λ|2

dλ,

where

E(λ) := EA,a,h

Ξ(f, e1, λ)∣∣∣det(

√dA− λIn−1)

∣∣∣2 ,with A ∼ NC(C(n−1)×(n−1)), a ∼ NC(Cn−1) and h ∼ NC(R(e1)n) (R(e1) beingdefined as in Proposition 6.3.6) are independent and f is given as

f = λXd1e1 +

√dXd−1

1

[aT

A

](X2, . . . , Xn)T + h.

Proof. LetD := D(n, d) =∑n−1

i=0 di. Using the formula from (B.4.2), Remark B.4.4

and the description of the tangent space of V from Lemma 6.3.8 we get

Ef∼NC(Hn)

Ξav(f) =

∫f∈Hn

Ξav(f)ϕHn(f)df =

∫(v,λ)∈S(Cn)×C

K(v, λ) d(v, λ), (6.5.2)

where

K(v, λ) :=1

2πD

∫f∈π1(π−1

2 (v,λ))

Ξ(f, v, λ) |det(φφ∗)| ϕHn(f) df,

with φ = D(v,λ)Ff∣∣−1

v⊥×C evalv. Let U be a unitary map with Uv = e1. Then, by

orthogonal invariance, (U(f U−1), e1, λ) ∈ V , and we have (compare the proof ofProposition 6.3.9)

D(v,λ)Ff =[Dvf − λI, −v

]= U

[De1f − λI, −e1

] [U−1 00 1

].

Using Lemma 6.3.10, this implies

det([

D(v,λ)Ff |−1v⊥×Revalv

][D(v,λ)Ff |−1

v⊥×Revalv]T )

= det(D(e1,λ)Ff |−1

e⊥1 ×RD(e1,λ)Ff |−Te⊥1 ×R

)det(evale1(f) evale1(f)T

)Write f ∈ π1(π−1

2 (e1, λ)) in the Bombieri-Weyl basis as f = cXd1 + terms in Xd−1

1

where c ∈ Rn. Then evale1(f) = c, which shows that evale1 is an orthogonal

92

Page 107: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.5. Integration on the solution manifold

projection, implying evale1(f) evale1(f)T = idRn . Hence,

K(v, λ) :=1

2πD

∫f∈π1(π−1

2 (v,λ))

Ξ(f, v, λ)∣∣ det(D(e1,λ)Ff

∣∣e⊥1 ×C

)∣∣2 ϕHn(f) df. (6.5.3)

Claim. For (v, λ) ∈ S(Cn)× C we have K(v, λ) = (2πn+1D)−1 E(λ) e−|λ|2

.

Observe that the proposed K(v, λ) is independent of v. When we plug into(6.5.2), we may therefore integrate over v ∈ S(Cn) to obtain

Ef∼NC(Hn)

Ξav(f) =vol S(Cn)

2πn+1 D

∫λ∈C

E(λ)e−|λ|2

dλ,

which shows the assertion.Proof of the claim. By assumption Ξ is unitarily invariant. This shows that

K(v, λ) = K(e1, λ). Let R := h ∈ Hn | h(e1) = 0,De1h = 0 be the space of sys-tems vanishing to higher order at e1. By Proposition 6.3.6, for any f ∈ π−1

2 (e1, λ),there exist uniquely determined h ∈ R and M ∈ Cn×(n−1) such that we can or-thogonally decompose f as

f = λXd1e1 +Xd−1

1

√dM(X2, . . . , Xn)T + h, (6.5.4)

This implies

ϕH(f) =1

(2π)dimHn exp(−‖f‖2)

= ϕ(λe1)ϕ(M)ϕ(h)

= π−n exp(− |λ|2)ϕ(M)ϕ(h)

Let a ∈ Cn−1 denote the first row of M and let A ∈ C(n−1)×(n−1) be thematrix obtained from M by removing a. Note that a ∼ NC(Cn−1) and thatA ∼ NC(C(n−1)×(n−1)). From (6.5.4) we get

De1f =

[dλ√d aT

0√dA

],

and therefore, by (6.3.2),

D(e1,λ)Ff =

[(d− 1)λ

√d aT −1

0√dA− λIn−1 0

]. (6.5.5)

93

Page 108: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

This shows that ∣∣∣det(

D(e1,λ)Ff∣∣e⊥1 ×C

)∣∣∣2 =∣∣∣det(

√dA− λIn−1)

∣∣∣2 . (6.5.6)

Plugging this into (6.5.3) we see that K(v, λ) equals

1

2πn+1D

∫λ

[E

A,a,hΞ(f, e1, λ)

∣∣∣det(√dA− λIn−1)

∣∣∣2] e−|λ|2dλ,where A ∼ NC(C(n−1)×(n−1)), a ∼ NC(Cn−1), h ∼ NC(Rn) and f is as in (6.5.4).Taking into account the definition of E(λ) shows the claim.

6.6. Eigenpairs and h-eigenpairs

In [CS13] it is mentioned that one can view equivalence classes of eigenpairs aselements in the weighted projective space P(1, . . . , 1, d − 1). Since we lack a thor-ough complex analysis of this space, we construct an algorithm that computesh-eigenpairs.

Definition 6.6.1. Write

P := P(Cn × C)\ [0 : . . . : 0 : 1]

for the punctured projective space. Let f ∈ Hn. The pair (v, η) ∈ P is called anh-eigenpair of f , if

f(v) = ηd−1v.

(h-eigenpair abbreviates ”homogeneous equation eigenpair”.)

Observe that the equation defining h-eigenpairs is the former eigenpair equation,which is homogenized by replacing λ by ηd−1. In accordance with Definition 6.3.3we denote

Ff (X, `) := f(X)− `d−1X. (6.6.1)

Notation 6.6.2. For (f, (v, η)) ∈ H × P we write (f, v, η) := (f, (v, η)).

Similar to Definition 6.3.7 we define

V := (f, v, η) ∈ Hn × P | Ff (v, η) = 0 , (6.6.2)

W :=

(f, v, η) ∈ V | rk D(v,η)Ff = n. (6.6.3)

We wish to compare the sets V and V and therefore we need an appropriate set ofrepresentatives for V . A common way of representing P as a subset of P(Cn×C) isto choose representatives in S(Cn+1). Here, however, we choose instead S(Cn)×Cas the set of representatives. This makes the comparison to V possible and reflects

94

Page 109: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6.6. Eigenpairs and h-eigenpairs

the distinction between eigenvector and eigenvalue. Note this choice is well-defined,because [0 : . . . 0 : 1] 6∈ P . Henceforth, we define

V := (f, v, η) ∈ Hn × S(Cn)× C | Ff (v, η) = 0

W :=

(f, v, η) ∈ V | rk D(v,η)Ff = n,

Similar to (6.3.4) for t ∈ C× we have

(f, v, η) ∈ V , if and only if (td−1f, v, tη) ∈ V . (6.6.4)

The sets V and V are connected by the surjective differentiable map

ψ : V → V, (f, v, η) 7→ (f, v, ηd−1), (6.6.5)

The following lemma is straight-forward to prove.

Lemma 6.6.3. Let (f, v, λ) = ψ(f, v, η).

1. The derivative of ψ at (f, v, η) is given by

D(f,v,η)ψ (•

f,•v,

•η) = (

f,•v, (d− 1)ηd−2 •

η).

2. We have D(f,v,η)Ff = D(f,v,ηd−1)Ff D(f,v,η)ψ .

Combining Lemma 6.6.3 with (6.3.2) immediately yields the following.

D(v,η)Ff =[Dvf − ηd−1In, −(d− 1)ηd−2v

]. (6.6.6)

In the following let π1, π2 be the projections marked in the following diagram.

Vπ1

$$

π2

ψ

Hn S(Cn)× C

V

π2

::

π1

``

,

Similar to (6.3.6) we putD(n, d) := dn − 1 (6.6.7)

The last item that we have to define for h-eigenpairs is an analogue of the eigendis-criminant variety Σ from Definition 6.3.11. The variety

E := π1(V\W) ⊂ Hn

95

Page 110: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

6. Eigenpairs of tensors

is called extended eigendiscriminant variety for the following reason.

Proposition 6.6.4. 1. We have E = Σ ∪ f ∈ Hn | ∃v : f(v) = 0.2. The set E is a proper subvariety of H.

3. If f 6∈ E, f has D(n, d) many h-eigenpairs.

Proof. Use Lemma 6.6.3 and Proposition 6.3.12.

Lemma 6.6.3 makes it easy to prove the following.

Proposition 6.6.5. 1. The tangent space of V at (f, v, η) ∈ W equals(•

f,•v + riv,

•η) ∈ Hn × (v⊥ ⊕ Riv)× C | ( •v, •η) = −D(v,η)Ff |−1

v⊥×C

f(v).

2. The tangent space of V at (f, v, η) ∈ W equals(•

f,•v,

•η) ∈ Hn × (v, η)⊥ | ( •v, •η) = −D(v,η)Ff |−1

(v,η)⊥

f(v).

Proof. Let (f, v, λ) = ψ(f, v, η). Then we have

T(f,v,λ)V = D(f,v,η)ψT(f,v,η)V .

Note that, unless η = 0, by Lemma 6.6.3 (1) the linear map D(f,v,η)ψ is invertible.The first assertion then follows from combining Lemma 6.3.8 with Lemma 6.6.3 (2).

To show the second assertion let Π : V → V denote the canonical projection.Recall from Appendix B.3.1 that the tangent space T(v,η)P = T(v,η)P(Cn × C) canbe identified with the orthogonal complement (v, η)⊥. Hence, the derivative of Πis the linear map idHn × p, where

p : S(Cn)× C→ (v, η)⊥

is the orthogonal projection. Using the description of the tangent space in 1. showsthat the tangent space in 2. has the declared form.

96

Page 111: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method isnot a good choice for theeigenpair problem

In Section 6.6 we reformulated the eigenpair problem as the h-eigenpair problem,which is the problem of solving a system of n homogeneous polynomials of degree din the n+ 1 variables X = (X1, . . . , Xn) and `; i.e.,

Ff (X, `) = f(X)− `d−1X = 0.

An algorithm to solve exactly those kind of polynomial systems is the adaptive lin-ear homotopy algorithm—abbreviated by ALH—introduced in [BC13, Section 17].

Algorithm ALH roughly works as follows. Let G ∈ (Hn+1,d)n be a system of

which a zero is known. If F ∈ (Hn+1,d)n is another system that one wants to

solve, one connects G with F by a continuous path. This path is discretized andthe zero of G is continued along that discretized path using Newton’s method (seeAlgorithm 1). The fineness of the discretization is determined by the conditionnumber for polynomial equation solving.

Algorithm ALH, however, is designed to solve general polynomial systems, notonly systems of the special form, which Ff (X, `) is of. It is clear that this algorithmmust ignore the geometric situation of the h-eigenpair problem and this is whatcauses algorithm ALH to perform poorly when applied to the eigenpair problem.After all, this observation motivates the upcoming chapter, where a homotopymethod specifically for the eigenpair problem is designed.

Let us be more precise. The complexity of algorithm ALH is measured interms of the number of iterations it takes in (Hn+1,d)

n to get from one polynomialsystem to another. The length of each iteration step is proportional to the inverseof the condition number for polynomial equation solving—the larger the conditionnumber, the smaller the step size; see [BC13, Algorithm 17.1]. The conditionnumber is in fact defined with respect to the Bombieri-Weyl product on (Hn+1,d)

n

(cf. [BC13, Equation (16.6)]).However, the inner product we choose for our structured set of polynomials

F((Hn,d)n) = Ff | f ∈ (Hn,d)

n ⊂ (Hn+1,d)n.

is the inner product induced by the Bombieri-Weyl product on (Hn,d)n; see Def-

97

Page 112: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

F0

0

F(Hnn,d)

Hnn+1,d

α

β

Figure 7.0.1.: The plot shows a schematic picture of the affine space F((Hn,d)n) within theambient space (Hn+1,d)

n. Although the angle α between the two points in F((Hn,d)n) is smallthe corresponding angle β with respect to (Hn+1,d)

n can be large.

inition 6.3.4. Note that F((Hn,d)n) is an affine linear subspace within (Hn+1,d)

n.This leads to situations where small angles in F((Hn,d)

n) may lead to large an-gles in (Hn+1,d)

n, c.f. Figure 7.0.1. Although two systems f and g are close withrespect to the angular measure in F((Hn,d)

n), the angular measure in (Hn+1,d)n

detects them as being far from other. As a consequence, small perturbations in fcause large perturbations in the zeros of Ff , which causes the condition numberfor polynomial equation solving to be large and ALH to choose an unnecessarilysmall step size.

In Section 7.2 we make this explicit by showing the following. If f ∈ (Hn,d)n

is Gaussian and (v, η) is chosen uniformly from the D(n, d) = dn − 1 many h-eigenpairs of f , then the probability that the condition number for polynomial

equation solving at (Ff , (v, η)) is greater than d−1√

2d−3

is at least 36%.Nevertheless, small perturbations in f do not cause large perturbations in the

eigenvectors of f . In Section 8.1 we define a new condition number that is used inthe aforementioned homotopy method for the eigenpair problem.

7.1. Distribution of the eigenvalues of a complexGaussian tensor

The results in this section have been developed in close collaboration with PeterBurgisser1; see [BB16].

1http://www.math.tu-berlin.de/fachgebiete_ag_diskalg/fachgebiet_

algorithmische_algebra/v_menue/members/prof_dr_peter_buergisser/

98

Page 113: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7.1. Distribution of the eigenvalues of a complex Gaussian tensor

In this section we compute the distribution of the eigenvalues of a complexGaussian tensor defined by the following algorithm.

1. Choose a Gaussian polynomial system f ∈ Hn (Proposition 6.3.12 implies thatf almost surely has D(n, d) many equivalence classes of eigenpairs).

2. Choose an eigenpair (v, λ) ∈ S(Cn)× C uniformly at random.

3. Apply the projection (v, λ) 7→ λ.

This distribution of λ is of interest, because algorithm ALH has trouble handlinglarge ratios |λ|‖v‖ . We want to know how likely it is that |λ| is large when compared

to ‖v‖. As before we abbreviate H := Hn,d.

Remark 7.1.1. The so obtained distribution for the eigenvalue λ is the push-forwarddistribution (see Definition 2.4.1) of the distribution from Lemma 6.5.1 under theprojection (f, v, λ) 7→ λ.

Remark 7.1.2. In [Gin65] Ginibre computes the distribution of an eigenvalue λ,that is chosen uniformly at random from the n eigenvalues of a complex Gaussianmatrix A. This section generalizes Ginibre’s results to tensors of order greaterthan 2; i.e., to polynomial systems of degree d > 1.

Let us denote by φn,d(λ) the density of the probability distribution that isderived from the above recipe. The unitary invariance of Gaussian polynomialsystems (see (6.4.1)) implies that φn,d(λ) only depends on |λ|, but not on theargument of λ. This is why we want to investigate

R := 2 |λ|2 .

For d = 1 we re-obtain the result from Ginibre, that the random variable R followsa distribution, that is mixed from χ2-distributions with weights from the uniformdistribution Unif(1, . . . , n) on n items. Furthermore, if d > 1, we will prove thatR follows a distributions that is mixed from χ2-distributions with weights fromthe geometric distribution Geo(p) truncated at n (see (7.1.3)). Here is the maintheorem. We will prove it at the end of the section.

Theorem 7.1.3. Assume that λ ∼ φn,d and let φn,dR denote the density of the realrandom variable R = 2|λ|2. Then we have

φn,dR (R) =

n∑k=1

ProbX∼Unif(1,...,n)

X = k χ22k(R). if d = 1.

n∑k=1

ProbX∼Geo(1− 1

d)X = k | X ≤ n χ2

2k(R), if d > 1.

Here χ22k(y) := (e−

y2 yk−1)/(2k(k−1)!) is the the density of a chi-square distributed

random variable with 2k degrees of freedom.

99

Page 114: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

We can write the densities by using the formulas for the uniform and the trun-cated distribution as follows.

φn,dR (R) =

1n

n∑k=1

χ22k(R), if d = 1.

d−1dn−1

n∑k=1

dn−k χ22k(R), if d > 1.

In particular,

φn,dR (R) =1

D(n, d)

n∑k=1

dn−k χ22k(R),

for all d.

7.1.1. How to sample eigenvalues

As mentioned before the distribution of the eigenvalues φn,d is invariant underunitary transformations. Hence, to sample an eigenvalue λ one can sample theargument uniformly at random in [0, 2π) and then sample R = |λ|2 using thefollowing algorithm.

Proposition 7.1.4. The following algorithm samples from φn,dR .

1. If d = 1, choose k ∈ 1, . . . , n uniformly at random.

2. If d > 1, make Bernoulli trials with success probability 1 − 1d

until the firstsuccess. Let l be the number of the last trial and k the remainder of l whendivided by n.

3. Choose x1, . . . , x2kiid∼ N(0, 1

2).

4. Output: R := 22k∑i=1

x2i .

Proof. The only point that requires proof is that 2. indeed yields a Geo(1 − 1d)

distributed random variable. By definition of Geo(p)—e.g. [Chu78, Section 4.4]—ProbX∼Geo(p) X = k is the probability that the first success in a sequence ofindependent Bernoulli trials, each with success probability p, is achieved in thek-th trial. Moreover, for 1 ≤ k ≤ n and 0 ≤ q < 1 we have

∞∑t=0

ProbX∼Geo(1−q)

X = k + tn = qk−1(1− q)∞∑t=0

qtn

= ProbX∼Geo(1−q)

X = k | X ≤ n ;

for the last equality see (7.1.3) below.

100

Page 115: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7.1. Distribution of the eigenvalues of a complex Gaussian tensor

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10n

expe

ctat

ion

of |λ

|2

dd=1d=2d=3d=5

Figure 7.1.1.: The plot shows n 7→ Eλ∼φn,d |λ|2 for d ∈ 1, 2, 3, 5. The plot was created using R[R C15].

7.1.2. Expectation and asymptotics of the modulus of theeigenvalue

We compute the expectation of the random variable |λ|2. The proof of the followingproposition is postponed to the end of the section.

Proposition 7.1.5. 1. If d = 1, then Eλ∼φn,1|λ|2 = n+12. If d > 1, then

Eλ∼φn,d

|λ|2 =n− (n+ 1)d+ dn+1

(dn − 1)(d− 1).

2. We have limd→∞ Eλ∼φn,d|λ|2 = 1 and, if d > 1,

limn→∞

Eλ∼φn,d

|λ|2 =d

d− 1.

Moreover, for fixed n, the function d 7→ Eλ∼φn,d|λ|2 is strictly decreasing. Forfixed d, the function n 7→ Eλ∼φn,d|λ|2 is strictly increasing.

101

Page 116: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

2

4

6

2.5 5.0 7.5 10.0

d

expe

ctat

ion

of |λ

|2

nn=2n=3n=5n=10

Figure 7.1.2.: The plot shows d 7→ Eλ∼φn,d |λ|2 for n ∈ 2, 3, 5, 10. The picture was created usingR [R C15].

In order to investigate φn,dR for large n, we normalize R = 2|λ|2 by dividing itby twice its expectation. So we put

τ :=|λ|2

2 Eλ∼φn,d

|λ|2. (7.1.1)

The reason for the factor 2 is that in [Gin65] Ginibre normalizes the density φn,1

by dividing it by n—which is asymptotically 2 times the expectation of |λ|2—andwe would like to compare the cases d > 1 and d = 1.

Making a change of variables from R to τ yields the normalized density, denotedby φn,dnorm. In [Gin65] Ginibre notes that in the case d = 1 we have

limn→∞

φn,1norm(τ) = 1[0,1](τ) :=

1, if 0 ≤ τ ≤ 1

0, else(7.1.2)

This means that the distribution of the normalized eigenvalue n−12 λ converges

towards the uniform distribution on the unit ball x ∈ C | |x| ≤ 1. For thecase d > 1 we have the following.

102

Page 117: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7.1. Distribution of the eigenvalues of a complex Gaussian tensor

Theorem 7.1.6. Let d > 1 be fixed. For any τ ≥ 0 we have

limn→∞

φn,dnorm(τ) = 2e−2τ .

Hence, as n → ∞, the normalized density φn,dnorm(τ) converges towards the expo-nential distribution with parameter 2.

7.1.3. Proofs

Before we prove Theorem 7.1.3, Proposition 7.1.5 and Theorem 7.1.6, we will haveto consider the truncated geometric distribution.

Definition 7.1.7. The geometric distribution with parameter p truncated at n ≥ 1is defined to be the distribution of a geometrically distributed random variable X[Chu78, Section 4.4] with parameter p under the condition that X ≤ n. Its densityis given by

ProbX∼Geo(p)

X = k | X ≤ n =

ProbX∼Geo(p)

X = k

ProbX∼Geo(p)

X ≤ n=qk−1(1− q)

1− qn, (7.1.3)

where q := 1− p and k ∈ 1, . . . , n.

The expectation of the truncated geometric distribution is as follows.

Lemma 7.1.8. Let n ≥ 1 and 0 ≤ q < 1. Then

EX∼Geo(1−q)

[X | X ≤ n] =nqn+1 − (n+ 1)qn + 1

(1− qn)(1− q).

Proof. We have

ProbX∼Geo(1−q)

X = k | X ≤ n =qk−1(1− q)

1− qn, k ∈ 1, . . . , n .

This implies

EX∼Geo(1−q)

[X | X ≤ n] =1− q1− qn

n∑k=1

kqk−1.

Observe, that∑n

k=1 kqk−1 is the derivative of 1−xn+1

1−x at x = q and that

d

dx

(1− xn+1

1− x

)=nxn+1 − (n+ 1)xn + 1

(1− x)2

This shows the claim.

103

Page 118: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

Proof of Theorem 7.1.3

We are now ready to prove Theorem 7.1.3. We first compute the density φn,d(λ).Recall from Definition 6.3.7 the solution manifold

V = (f, v, λ) ∈ Hn × S(Cn)× C | f(v) = λv

and from (6.3.3) the diagram

Vπ2

%%

π1

H S(Cn)× C

By Remark 7.1.1 the density φn,d is the push-forward density of the density ρ de-scribed in Lemma 6.5.1 under the projection (f, v, λ) 7→ λ. On the other hand, thedistribution NC(Hn) is given by the push-forward density of ρ under the projec-tion (f, v, λ) 7→ f . Fix a measurable subset S ⊂ C, denote by 1S its characteristicfunctions and let

Ξ(f, v, λ) :=

1, if 1S(λ) = 1

0, if 1S(λ) = 0.

Then, on the one hand, we have∫C

1S φn,d(λ)dλ =

∫V

Ξ(f, v, λ)ρ(f, v, λ)d(f, v, λ),

and on the other hand, we have∫V

Ξ(f, v, λ)ρ(f, v, λ)d(f, v, λ) =

∫Hn

Ξav(f)ϕHn(f)df,

where Ξav(f) is defined as in (6.5.1). Note that, by construction, Ξ is invariantthe under unitary transformation defined in (6.3.5). Lemma 6.5.2 reveals that∫

C1S φ

n,d(λ)dλ =vol S(Cn)

2πn+1D(n, d)

∫λ∈C

E(λ)e−|λ|2

dλ,

where

E(λ) := EA,a,h

Ξ(f, e1, λ)∣∣∣det(

√dA− λIn−1)

∣∣∣2 ,with independent A ∼ NC(C(n−1)×(n−1)), a ∼ NC(Cn−1), h ∼ NC(R(e1)n) and fbeing a function of λ,A, a and h (see the formula in Lemma 6.5.2 for details).

104

Page 119: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7.1. Distribution of the eigenvalues of a complex Gaussian tensor

Following the discussion in Section 2.4 we have that

φn,d(λ) =vol S(Cn)

2πn+1D(n, d)e−|λ|

2

EA|det(

√dA− λIn−1)|2

almost everywhere (note that the random variables a and h from Lemma 6.5.2 donot appear in the last expectation, so we may omit them). We furthermore havevol S(Cn) = 2πnΓ(n)−1 and

EA|det(

√dA− λIn−1)|2 = dn−1 E

A|det(A− λ√

dIn−1)|2

= dn−1 (n− 1)!n−1∑k=0

1

k!

( |λ|2d

)k;

the last line by Proposition E.1.2. Using that that Γ(n) = (n − 1)! (C.2.4) Thisshows that

φn,d(λ) =dn−1 e−|λ|

2

πD(n, d)

n−1∑k=0

1

k!

( |λ|2d

)k;

Making a change of variables r := |λ|, we obtain the density

2dn−1 e−r2

D(n, d)

n−1∑k=0

1

k!

(r2

d

)kMaking another change of variables R := 2r2 finally yields the desired

φn,dR (R) =dn−1

2D(n, d)e−

R2

n−1∑k=0

1

k!

(R

2d

)k. (7.1.4)

If d = 1, we have D(n, d) = n, and hence (7.1.4) becomes

φn,1R (R) =1

n

n−1∑k=0

e−R2Rk

2k+1k!=

1

n

n∑k=1

e−R2Rk−1

2k(k − 1)!.

For any k we have that

χ22k(R) = e−

R2

Rk−1

2k(k − 1)!

is the density of a chi-square distributed random variable with 2k degrees of free-dom; see, e.g., [BC13, Section 2.2.3]. This proves the assertion in this case. If d > 1,put q := 1

d, such that

D(n, d) =1− qn

qn−1(1− q).

105

Page 120: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

Then (7.1.4) becomes

φn,dR (R) =n−1∑k=0

e−R2 Rk

2k+1k!

(1− q)qk

1− qn=

n∑k=1

e−R2 Rk−1

2k(k − 1)!

(1− q)qk−1

1− qn.

Using that ProbX∼Geo(1−q) X = k | X ≤ n = qk−1(1−q)1−qn , see (7.1.3), finishes the

proof.

Proof of Proposition 7.1.5

To prove Proposition 7.1.5 we will need the following lemma.

Lemma 7.1.9. Let n ≥ 1.

1. If d = 1, then Eλ∼φn,1 |λ|2 = EX∼Unif(1,...,n)[X],

2. If d > 1, then Eλ∼φn,d|λ|2 = EX∼Geo(1− 1d

)[X | X ≤ n].

Proof. We prove the claim for d > 1 (the case d = 1 is proven similarly). FromTheorem 7.1.3 we get

2E |λ|2 = ER

=

∫ ∞R=0

Rφn,dR (R)dR

=n∑k=1

ProbX∼Geo(1− 1

d)X = k | X ≤ n

∫ n

R=0

Rχ22k(R)dR.

=n∑k=1

ProbX∼Geo(1− 1

d)X = k | X ≤ n 2k = 2 E

X∼Geo(1− 1d

)[X | X ≤ n],

where we have used that a χ22k-distributed random variable with 2k degrees of

freedom has the expectation 2k.

Proof of Proposition 7.1.5. If d = 1, we get Eλ∼φn,1|λ|2 = n+12

from Lemma 7.1.9.If d > 1, by Lemma 7.1.9, we have that

Eλ∼φn,d

|λ|2 = EX∼Geo(1− 1

d)[X | X ≤ n].

Therefore, Lemma 7.1.8 with q := 1d

implies

Eλ∼φn,d

|λ|2 =n− (n+ 1)d+ dn+1

(dn − 1)(d− 1)

106

Page 121: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7.1. Distribution of the eigenvalues of a complex Gaussian tensor

as claimed. Using de l’Hopital’s rule twice, for fixed n we find

limd→1

n− (n+ 1)d+ dn+1

(dn − 1)(d− 1)=n+ 1

2

Therefore, the map

R≥1 → R, d 7→

n+1

2, if d = 1

n−(n+1)d+dn+1

(dn−1) (d−1), if d > 1

.

is continuous and differentiable on R>1. One checks that its derivative on R>1 isnegative. Hence, for fixed n, we see that d 7→ Eλ∼φn,d [|λ|2] is strictly decreasing.In the same way we can prove that, if d is fixed, n 7→ Eλ∼φn,d [|λ|2] is strictlyincreasing. Further,

limd→∞

n− (n+ 1)d+ dn+1

(dn − 1) d= lim

q→0

nqn+1 − (n+ 1)qn + 1

(1− qn) (1− q)= 1

If d > 1, we have

limn→∞

Eλ∼φn,d

[|λ|2] = limn→∞

nqn+1 − (n+ 1)qn + 1

(1− qn) (1− q)=

1

1− q,

where again q = 1d.

Proof of Theorem 7.1.6

Let d > 1. Recall from (7.1.1) that for λ ∼ φn,d have put

τ :=|λ|2

2 Eλ∼φn,d

|λ|2.

Let us write Rn,d := R = 2|λ|2 to emphasize that the distribution of R dependson n and d. Then, we have

τ =Rn,d

2ERn,d

.

Making a change of variables we see that the density of τ , that we denoted φn,dnorm,

is given by φn,dnorm(τ) = 2E[Rn,d]φn,dR (2τ ERn,d). Using (7.1.4) we get

φn,dnorm(τ) =dn−1

D(n, d)e−τ ERn,d

n−1∑k=0

1

k!

(τ ERn,d

d

)k.

107

Page 122: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

Again putting q = 1d

we obtain

φn,dnorm(τ) =1

1− qne−τ ERn,d

n−1∑k=0

1

k!(qτ ERn,d)

k .

By Proposition 7.1.5 we have that ERn,d for n→∞ converges to 2dd−1

= 21−q and,

morever, that the sequence an := ERn,d is strictly increasing. By the monotoneconvergence theorem, we have

limn→∞

n−1∑k=0

1

k!(qτ ERn,d)

k = e2qτ1−q .

Furthermore, since 0 < q < 1, we have limn→∞ qn = 0. Summarizing, for fixed τ ,

the sequence bn := φn,dnorm(τ) is a product of three convergent sequences convergingto

limn→∞

φn,dnorm(τ) = e−2τ1−q e

2qτ1−q = 2e−2τ ,

which finishes the proof.

7.2. Solving for eigenpairs is ill-posed when themeasure is the classical condition number

Now we return to the discussion from the introduction to this chapter, where wehave announced the following result.

Proposition 7.2.1. Let f ∈ Hnn,d and (v, η) ∈ P be an h-eigenpair of f . Denote

by µ(Ff , (v, η)) the condition number for polynomial equation solving at (Ff , (v, η))(see [BC13, Equation (16.6)]). Suppose that f is Gaussian and (v, η) is chosenuniformly at random from the D(n, d) = dn−1 many h-eigenpairs of f in P. Then

Prob

µ(Ff , (v, η)) >

√2d−3

d

> 0.36.

In the words of Section 2.3, solving for eigenpairs is ill-posed when the problemis defined to be solving general polynomial equations. Note that the bound of 36%in the proof is obtained making very rough estimates, so the actual probabilityshould be even larger. We need the following auxiliary result.

Lemma 7.2.2. For (f, v, η) ∈ V we have

1 ≤ d(‖f‖+ max

|η|d−1 , |η|d−2

)∥∥∥(D(v,η)Ff∣∣(v,η)⊥

)−1∥∥∥ .

108

Page 123: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7.2. Solving for eigenpairs is ill-posed when the measure is the classical conditionnumber

Proof. If (f, v, η) 6∈ W , then∥∥∥(D(v,η)Ff∣∣(v,η)⊥

)−1∥∥∥ =∞

and the claim is trivially true. Assume that If (f, v, η) ∈ W . From the submulti-plicativity of the spectral norm (see (A.1.1)) we get

1 ≤∥∥∥(D(v,η)Ff

∣∣(v,η)⊥

)−1∥∥∥∥∥D(v,η)Ff

∥∥ .Using (6.6.6) we see that∥∥D(v,η)Ff

∥∥ =∥∥[Dvf − ηd−1In, −(d− 1)ηd−2v

]∥∥≤ ‖Dvf ‖+ |η|d−1 + (d− 1) |η|d−2 ;

for the last line we have used the triangle inequality. From [BC13, Lemma 16.46]we get ‖Dvf ‖ ≤ d ‖f‖. Using that d ≥ 1 shows the assertion.

We can now prove Proposition 7.2.1.

Proof of Proposition 7.2.1. By [BC13, Proposition 16.10] we have

µ(Ff , (v, η)) = ‖Ff‖ ‖(v, η)‖d−1∥∥∥( D(v,η)Ff

∣∣(v,η)⊥

)−1∥∥∥ .

In Lemma 7.2.2 we have shown that∥∥∥( D(v,η)Ff∣∣z⊥t

)−1∥∥∥ ≥ 1

d(‖f‖+ max

|η|d−1 , |η|d−2

)so that

µ(Ff , (v, η)) ≥ ‖Ff‖ ‖(v, η)‖d−1

d(‖f‖+ max

|η|d−1 , |η|d−2

) . (7.2.1)

Let us write

‖Ff‖2 =∥∥f(X)− `d−1X

∥∥2= ‖f‖2 +

∥∥`d−1X∥∥2

= ‖f‖2 +n

d.

Moreover, since f(v) = ηd−1v we have

‖η‖2(d−1) ‖v‖2 = ‖f(v)‖2 ≤ ‖f‖2 ‖v‖2d ; (7.2.2)

the last equality by [BC13, Lemma 16.5]. Because ‖v‖ = 1, we therefore have

‖η‖d−1 ≤ ‖f‖ (compare the proof of Lemma 8.0.4, where we give a similar argu-

109

Page 124: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

7. The general homotopy method is not a good choice for the eigenpair problem

ment). This implies

‖f‖2 + nd(

‖f‖+ max|η|d−1 , |η|d−2

)2 ≥‖f‖2(

‖f‖+ max|η|d−1 , |η|d−2

)2 ≥1

4(7.2.3)

and hence‖Ff‖

d(‖f‖+ max

|η|d−1 , |η|d−2

) ≥ 1

2d.

Plugging into (7.2.1) yields

µ(Ff , (v, η)) ≥ ‖(v, η)‖d−1

2d. (7.2.4)

By assumption, (v, η) is chosen uniformly at random from the D(n, d) many h-eigenpairs of f . This implies that λ := ηd−1 is an eigenvalue of f having the densityφn,d as defined in Section 7.1. Therefore, we have

Prob|η|2 > 1

= Prob

|η|2(d−1) > 1

= Prob

|λ|2 > 1

=

d

dn − 1

n∑k=1

dn−k ProbX∼χ2

2k

X > 2 ,

the last line by Theorem 7.1.3. Using

ProbX∼χ2

2k

X > 2 ≥ ProbX∼χ2

2

X > 2 = e−1

we obtain Prob|η|2 ≥ 1

> e−1 and, consequently,

Prob‖(v, η)‖2(d−1) > 2d−1

= Prob

(1 + |η|2)d−1 > 2d−1

= Prob

|η|2 > 1

> e−1.

Using (7.2.4) this shows that

Probµ(Ff , (v, η)) ≥ d−1

√2d−3> e−1 ≈ 0.3678794. (7.2.5)

This finishes the proof.

110

Page 125: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve foreigenpairs of complex tensors

Now that in the preceeding chapter we have argued why there is need to design ahomotopy method specifically for the h-eigenpair problem. We will turn towardssuch an algorithm in this chapter. Theorem 8.0.1 below gives a complete answerto questions Q8 and Q9 from the introduction.

Previously, there have been proposed other algorithms, such as the PowerMethod [KM11], to compute eigenpairs of tensors. The advantage of the algo-rithm we present here is that its output are approximate zeros in the sense ofDefinition 2.3.2, making it highly stable and, secondly, that we can provide a com-plexity analysis. This analysis will be on the average, meaning that we analyzethe expected complexity of a randomized algorithm.

Note that our algorithm can take also real and real symmetric tensors as input.This in particular yields an algorithm to compute rank-one approximations ofreal symmetric tensors (compare the discussion in Section 6.1). Nevertheless, thecomplexity analysis we give is for complex general tensors.

The basis of this section is the recent paper [ABB+15], in which the authorsquote Demmel [Dem96, p. 22].

”So the problem of devising an algorithm [for the eigenvalue problem]that is numerically stable and globally (and quickly!) convergent re-mains open.”

The algorithm with which the problem is solved was given before by Armentanoin [Arm14]. In that paper a condition number for the matrix eigenpair problemis defined and, by using this definition, in [ABB+15] Armentano et al. provide asmoothed analysis for Armentano’s algorithm. We extend their methods to generaltensors A ∈ (Cn)⊗p, p > 2, solving Demmel’s problem also for higher degrees. Ofcourse, defining and analyzing condition numbers will be of great importance.

As before, we fix n and d and abbreviate H := Hn,d. For technical reasons wewill assume that n ≥ 2. Furthermore, recall from Definition 6.6.1 that we have put

P := P(Cn × C)\ [0 : 1] .

In [BCSS98, Sec. 14.1, Definition 1] one finds the notion of an approximatezero of a system F = (F1, . . . ,Fn) of n homogeneous polynomials of degree d

111

Page 126: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

in n+ 1 variables. This notion is a direct generalization from Definition 2.3.2 frompolynomials to polynomial systems. We transfer this concept to h-eigenpairs bysaying that (w, ξ) ∈ P is an approximate eigenpair of f , if (w, ξ) is an approximatezero of Ff and the associated zero (v, η) ∈ P is an h-eigenpair of f . In this case wecall (v, η) the associated eigenpair of (w, ξ) (see also Definition 8.2.1). Note thatthe only difference to approximate zeros is that we exclude points in P(Cn × C)having the trivial solution [0 : 1] as associated zero. Trivial, because it yields thetrivial and undesired solution f(0) = 1 · 0.

Our model of complexity counts arithmetic operations, where taking squareroots and drawing from a gaussian distribution are included. The main result,that we will prove at the end of Section 8.2, is as follows.

Theorem 8.0.1. There is a randomized algorithm that on input f ∈ Hn almostsurely returns an approximate eigenpair of f . Its average number of arithmeticoperations is O(n3 + dn

52N2), where N = dimCHn = n

(n+d−1n−1

).

Remark 8.0.2. In Remark 8.2.6 we discuss that the algorithm of Theorem 8.0.1not only approximates h-eigenpairs, but also approximates eigenvectors.

Notation 8.0.3. We denote h-eigenpairs with symbols (v, η) and approximate eigen-pairs with (w, ξ). Moreover, we will often use the same symbols for elements in Pand their representatives in (Cn\ 0)× C.

The main problem in our situation is that Newton’s method (Algorithm 1),which sits at the core of homotopy algorithms, does not distinguish between eigen-vectors and eigenvalues and simply sees eigenpairs as points in projective space.We need to make sure a priori that we do not get too close to [0 : 1] ∈ P(Cn×C).

The idea to circumvent this problem is restricting the input space. Insteadof having an algorithm that takes as input any f ∈ Hn, we only permit inputsfrom the unit sphere S(Hn) := f ∈ Hn | ‖f‖ = 1. Here, as before, ‖·‖ denotesthe Bombieri-Weyl norm from Definition 6.3.4. The following simple lemma is ofgreat importance for this section.

Lemma 8.0.4. Let f ∈ S(Hn) have the h-eigenpair (v, η) and let (v, η) ∈ Cn ×Calso denote a representative. Then |η| ≤ ‖v‖.

Proof. Suppose f = (f1, . . . , fn). For all i we have |fi(v)| = |η|d−1 |vi| and, by

[BC13, Lemma 16.5], |fi(v)| ≤ ‖fi‖·‖v‖d. Using |η|2(d−1) ‖v‖2 =∑n

i=1 |η|2(d−1) |vi|2,

this implies |η|2(d−1) ≤∑n

i=1 ‖fi‖2 ‖v‖2(d−1) = ‖v‖2(d−1) .

Lemma 8.0.4 shows that all h-eigenpairs of any f ∈ S(Hn) are sufficiently faraway from [0 : 1]. Moreover, if we approximate carefully, all approximations of allh-eigenpairs of such an f are sufficiently far away from [0 : 1]; cf. Figure 8.0.1. Thisrestriction enables us to adapt the ideas from [BC13,ABB+15,Arm14,BP11,BC11]to describe a homotopy method for the h-eigenpair problem.

112

Page 127: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.1. Condition of solving for h-eigenpairs

v

η ‖w‖ = ‖v‖ = 1︸ ︷︷ ︸

(v, η)

(w, ξ)[0 : 1]

Figure 8.0.1.: Schematic description of why one needs to bound the norm of f : If ‖f‖ = 1, all of

its h-eigenpairs are in the grey area. If (w, ξ) is an approximation of (v, η) then the ratio |ξ|‖w‖ is

bounded, so (w, ξ) doesn’t approximate the trivial solution [0 : 1].

8.1. Condition of solving for h-eigenpairs

The problem of solving for h-eigenpairs comes with two notions of condition, de-pending on how one defines the output. Recall that in Subsection 2.3.2 the def-inition of the condition number for an implicitly stated problem—such as theh-eigenpair problem—required the provision of an output space. But in our situa-tion there are two possible output spaces: P and S(Cn)×C. Defining the outputas elements in P respects the architecture of Newton’s method (because Newton’smethod moves points in projective space), while defining the output as elementsin S(Cn)× C respects the geometry of the h-eigenpair problem.

8.1.1. Two condition numbers

We derive the condition numbers for both outputs with the recipe that lead tothe definition of the condition number in (2.3.7). After all, both definitions ofcondition numbers are crucial in the sections to come. The situation appears asfollows.

Vπ1

π2

Vπ1

$$

π2

Hn P Hn S(Cn)× C

.

113

Page 128: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Here, π1, π2, π1, π2 should all denote the respective projections. If D(f,v,η)π1 hasfull rank, there exists a local solution map S(f,v,η) := π2 π−1

1 . Note that D(f,v,η)π1

has full rank, if and only if (f, v, η) ∈ W as defined in (6.6.3). Then, the conditionnumber of solving for h-eigenpairs with output in P at (f, v, η) is defined as

κ(f, v, η) :=

∥∥DfS(f,v,η)

∥∥ , if (f, v, η) ∈ W .

∞, else.

Similarly, if (f, v, η) ∈ W , there exist a local solution map S(f,v,η) := π2 π−11 . The

condition number of solving for h-eigenpairs with output in S(Cn)× C at (f, v, η)is defined as

κ(f, v, η) :=

∥∥∥Df S(f,v,η)

∥∥∥ , if (f, v, η) ∈ W .

∞, else.

Recall from (6.6.1) the definition of the polynomial F(v, η). The following propo-sition characterizes the two condition numbers.

Proposition 8.1.1. We have

for (f, v, η) ∈ W : κ(f, v, η) =‖v‖d

‖v, η‖

∥∥∥(D(v,η)Ff∣∣(v,η)⊥

)−1∥∥∥ ,

for (f, v, η) ∈ W : κ(f, v, η) =∥∥∥( D(v,η)Ff

∣∣v⊥×C

)−1∥∥∥ .

Proof. Let (v, η) ∈ (Cn\ 0) × C denote a representative for (v, η) ∈ P . FromProposition 6.6.5 (2) we know that the tangent space of W at (f, v, η) equals

(•

f,•v,

•η) ∈ Hn × (v, η)⊥ | ( •v, •η) = −

(D(v,η)Ff |(v,η)⊥

)−1 •

f(v).

This shows that DfS(f,v,η) maps•

f to −(D(v,η)Ff |(v,η)⊥

)−1 •

f(v). By Lemma B.3.3

the norm on (v, η)⊥ is ‖w‖(v,η) = ‖w‖‖(v,η)‖ . Hence, we have

κ(f, v, η) =∥∥DfS(f,v,η)

∥∥ = ‖v, η‖−1 max‖•f‖=1

∥∥∥(D(v,η)Ff |(v,η)⊥)−1 •

f(v)∥∥∥ .

By [BC13, Lemma 16.6] the map Hn → Cn,•

f 7→•

f(v) maps the unit ball in Hn

onto the unit ball with radius ‖v‖d in Cn. This yields the first assertion. For thesecond assertion we proceed in the exact same way, but using the identification ofthe tangent space from Proposition 6.6.5 (1).

Since the algorithm we aim at computes approximate solutions rather thanactual solutions, we must establish a notion of condition number for all points

114

Page 129: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.1. Condition of solving for h-eigenpairs

Hn × P , detached from the necessity of being points in the solution manifold V .Thus, in what follows we will not work with the two condition numbers κ and κ,but use the following modified condition numbers. Their definition is inspired bythe relative condition number defined in [BC13, Eq. (16.6)]. We also take over thenotation µ from [BC13] for our definition.

Remark 8.1.2. In the following definition for (f, v, η) ∈ V we ”almost” define

µ(f, v, η) as κ(f, v, η). The only difference is the scaling with ‖v‖d−1 instead of

‖v‖d ‖(v, η)‖−1. The reason for our definition of µ(f, v, η) is that it is easier towork with when proving the Lipschitz estimates in Subsection 8.3.4.

Definition 8.1.3. For (f, v, η) ∈ Hn × P we define

µ(f, v, η) :=

‖v‖d−1

∥∥∥( D(v,η)Ff∣∣(v,η)⊥

)−1∥∥∥ , if D(v,η)Ff

∣∣(v,η)⊥

is invertible

∞, else.

For (f, v, η) ∈ Hn × S(Cn)× C we define

µ(f, v, η) :=

∥∥∥( D(v,η)Ff∣∣v⊥×C

)−1∥∥∥ , if D(v,η)Ff

∣∣v⊥×C is invertible

∞, else.

Remark 8.1.4. For all t ∈ C× we have µ(f, v, η) = µ(f, tv, tη). This shows that thedefinition of µ(f, v, η) is independent of the choice of a representative of (v, η) ∈ Pand hence it is well defined on Hn × P .

In the words of Section 2.3 µ(f, v, η) and µ(f, v, η) are absolute condition num-bers. It turns out that under the restriction f ∈ S(Hn) the condition numberµ(f, v, η) can be used to describe a homotopy method for the eigenpair problem,see Theorem 8.2.5. But µ(f, v, η) does not scale properly with f , see Remark 8.1.8for details (this is why we define µ(f, v, η) as an absolute condition number). Bycontrast, we can modify µ(f, v, η) by defining

µrel(f, v, η) :=∥∥∥[ ‖f‖In 0

0 ‖f‖d−2d−1

] (D(v,η)Ff

∣∣v⊥×C

)−1∥∥∥ , (8.1.1)

if D(v,η)Ff∣∣v⊥×C is invertible and µrel(f, v, η) :=∞, otherwise (here In denotes the

n× n identity matrix). In Lemma 8.1.7 (5) we show that µrel(f, w, ξ) is invariantunder scaling of f . We will make use of this property when giving a probabilisticanalysis for our algorithm in Proposition 8.3.1.

Summarizing, we need the condition number µ(f, v, η) to construct the algo-rithm and the condition number µ(f, v, η) to analyze this algorithm.

It is clear the interplay between the two condition numbers can only workproperly, if they are of the overall same magnitude and if this magnitude does not

115

Page 130: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

differ much from the magnitude of κ(f, v, η) and κ(f, v, η). This is why we needthe following proposition

Proposition 8.1.5. Let f ∈ S(Hn).

1. If (f, v, η) ∈ W, we have κ(f, v, η) ≤ µ(f, v, η) ≤√

2κ(f, v, η).

2. If (f, v, η) ∈ W, we have κ(f, v, η) = µ(f, v, η).

3. Let Π : S(Cn)× C→ P denote the canonical projection. Then

1√2µ(f, v, η) ≤ µ(f,Π(v, η)) ≤ µ(f, v, η).

The requirement that ‖f‖ is bounded is crucial in Proposition 8.1.5, because

only then the ratio ‖v‖‖η‖ is bounded. We discussed this already in the introduction—

Figure 8.0.1 and Lemma 8.0.4 are to be highlighted here—and one can say thatProposition 8.1.5 is the quantification of that discussion.

Proof of Proposition 8.1.5. The first claim follows from Lemma 8.0.4. The secondclaim is given by the respective definitions. It remains to prove the third assertion.Abusing notation we denote (v, η) := Π(v, η). We have

µ(f, v, η) =∥∥∥(D(v,η)Ff

∣∣(v,η)⊥

)−1∥∥∥ ≤ ∥∥∥D(v,η)Ff

∣∣−1

v⊥×C

∥∥∥ = µ(f, v, η),

the inequality by Lemma A.1.8(3). Note that we have v⊥ × C = (v, 0)⊥. Then

µ(f, v, η) =∥∥∥(D(v,η)Ff

∣∣v⊥×C

)−1∥∥∥ ≤ 1

cos δ

∥∥∥(D(v,η)Ff∣∣(v,η)⊥

)−1∥∥∥ =

µ(f, v, η)

cos δ,

by Lemma A.1.8, where

cos δ = cos dP((v, 0), (v, η)) =|〈(v, 0), (v, η)〉|‖v, 0‖ ‖v, η‖

=‖v‖2

‖v‖√‖v‖2 + |η|2

.

We have ‖v‖ = 1 and, since ‖f‖ = 1, by Lemma 8.0.4, we also have |η| ≤ 1. Thisshows cos δ ≥ 1√

2.

It should be clear at this point that bounding the norm ‖f‖ = 1 is crucial. Thisleads to the following definition.

Definition 8.1.6. We define

VS := (f, v, η) ∈ V | ‖f‖ = 1

and WS, VS and WS accordingly.

116

Page 131: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.1. Condition of solving for h-eigenpairs

8.1.2. Properties of the condition numbers

We finish this subsection by giving some useful properties of the condition numbersµ(f, v, η) and µ(f, v, η).

Lemma 8.1.7. 1. For (f, v, η) ∈ WS we have 1 ≤ 2dµ(f, v, η).

2. If t→ (ft, vt, ηt) is a curve in WS, then∥∥ •vt,

•ηt∥∥ ≤ µ(f, v, η)

∥∥∥ •

ft

∥∥∥.

3. Let U ∈ U(n) . For all (f, v, η) ∈ W we have µ(U.(f, v, η)) = µ(f, v, η) and for

all (f, v, η) ∈ W we have µ(U.(f, v, η)) = µ(f, v, η)

4. For all (f, v, η) ∈ V and all s ∈ C× we have µrel(sd−1f, v, sη) = µrel(f, v, η).

Proof. Throughout the proof we choose a representative (v, η) ∈ S(Cn)× C. Thefirst claim follows from Lemma 7.2.2 by using from Lemma 8.0.4 that we have|η| ≤ ‖v‖ = 1 for ‖f‖ = 1.

For 2. let S be the solution map from the beginning of Subsection 8.1.1. We take

derivatives on both sides of S(ft,vt,ηt)(ft) = (vt, ηt) to get DftS(ft,vt,ηt) (•

ft) = (•vt,

•ηt),

such that∥∥ •vt,

•ηt∥∥ =

∥∥∥DftS(ft,vt,ηt) (•

ft)∥∥∥ ≤ ∥∥DftS(ft,vt,ηt)

∥∥ ∥∥∥ •

ft

∥∥∥ = κ(ft, vt, ηt)∥∥∥ •

ft

∥∥∥ .Using Proposition 8.1.5 (1) shows 2.

A straightforward calculations shows the third assertion so that it remains toprove 4. First observe that µrel(f, v, η) < ∞ if and only if µrel(s

d−1f, v, sη) < ∞.From (6.6.6) we have

D(v,sη)Fsd−1f =[sd−1Dvf − sd−1ηd−1In, −(d− 1)sd−2ηd−2w

]= D(v,η)Ff

[sd−1In 0

0 sd−2

]. (8.1.2)

Using that

D(v,η)Ff[sd−1In 0

0 sd−2

]∣∣∣∣v⊥×C

= D(v,η)Ff∣∣v⊥×C

[sd−1In 0

0 sd−2

]∣∣∣∣v⊥×C

(8.1.3)

we obtain with (8.1.2)

µrel(sd−1f, v, sη) =

∥∥∥∥∥[‖s‖d−1 ‖f‖ In 0

0 ‖s‖d−2 ‖f‖d−2d−1

]D(ws,η)Fsd−1f

∣∣−1

v⊥×C

∥∥∥∥∥=

∥∥∥∥[‖f‖ In 0

0 ‖f‖d−2d−1

]D(v,η)Ff

∣∣−1

v⊥×C

∥∥∥∥= µrel(f, v, η).

117

Page 132: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Remark 8.1.8. The proof above shows why the condition number µ(f, v, η) doesnot scale properly with f . In fact, we cannot adapt equation (8.1.3) to µ(f, v, η),because in general we have

D(v,η)Ff[sd−1In 0

0 sd−2

]∣∣∣∣(v,η)⊥

6= D(v,η)Ff∣∣(v,η)⊥

[sd−1In 0

0 sd−2

]∣∣∣∣(v,η)⊥

.

8.1.3. Lifting of paths

There is one subtlety that we have left out so far. We will explain this in a moment.The idea behind homotopy method algorithms is that when one connects poly-

nomial systems by a continuous path, almost always this path can be lifted to apath in the cartesian product polynomial systems × zeros. The tool to trackthat lifted path is Newton’s method.

For the h-eigenpair this appears as follows. We are given some f ∈ Hn andare provided with (g, v, η) ∈ V . To obtain an h-eigenpair of f—that is, a zero ofFf (X, `)—we connect f and g by a continuous path Ef,g. If Ef,g does not crossthe extended eigendiscriminant variety E (see Proposition 6.6.4), then the solutionmap described in Subsection 8.1.1 is defined everywhere on Ef,g. In this case thereexists a unique lifted path Lf,g ⊂ Hn × P(Cn × C) connecting (g, v, η) with some(f, w, ξ), where Ff (w, ξ) = 0. The subtely that we mentioned earlier is that (w, ξ)could be the trivial solution w = 0, which we would like to avoid. If the path Ef,gis contained in the unit sphere S(Hn), Lemma 8.0.4 can be used to conclude thatthis can not happen. We next show, that the same holds for general paths, as longas they don’t cut the extended eigendiscriminant variety E .

Proposition 8.1.9. Let f ∈ Hn, (g, v, η) ∈ V and Ef,g be a continuous pathconnecting f and g. If Ef,g ∩ E = ∅ and Lf,g is the lifted path as described above,

then Lf,g ⊂ V. The same holds when one replaces V by V.

Proof. We prove the assertion for V . The proof for V is similar. Since Ef,g ∩E = ∅each zero of Fg is connected to a zero of Ff and this assignment is bijective. Weshow that the trivial solution [0 : 1] remains fixed. Let h ∈ Ef,g. Using (6.3.2) weget D(0,1)Fh = [−In, 0], which is a matrix of full rank. Hence, in a neighborhoodaround (h, 0, 1) the projection (V ∪(f, 0, 1) | f ∈ Hn)→ Hn is invertible. Let Sdenote the composition of the local inverse with the projection onto P . DerivatingFf (S(h)) = 0 at h we obtain dFh

dh(0, 1) + [−In, 0]DhS = 0. Observe that we have

dFhdh

(0, 1) = 0. This shows that [−In, 0]DhS = 0. The linear map DhS is a mapfrom Hn → (0, 1)⊥. Since (0, 1)⊥ ∩ (0 × C) = 0, we get DhS = 0. Thisshows that S is constant on Ef,g, which implies that the trivial solution along Ef,gremains fixed.

118

Page 133: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.2. The adaptive linear homotopy method

8.2. The adaptive linear homotopy method

We have now introduced the condition numbers of the h-eigenpair problem. It istime to turn to the algorithm.

8.2.1. Newton method and approximate eigenpairs

Let f ∈ Hn. The Newton operator for the h-eigenpair problem with respect to fis the usual Newton operator associated to Ff : Cn × C→ Cn, that is

Newtonf : P → P

(v, η) 7→

(v, η)−

(D(v,η)Ff |(v,η)⊥

)−1Ff (v, η), if D(v,η)Ff |(v,η)⊥ invertible

(v, η), else.

It is an easy exercise to show that Newtonf is well defined rational map and thatthe image of Newtonf is contained in P . The following definition was alreadymade in the introduction to this chapter and is [BCSS98, Sec. 14.1, Definition 1]specialized to our scenario.

Definition 8.2.1. Let (f, v, η) ∈ V . We say that (w, ξ) ∈ P is an approximateeigenpair of f with associated h-eigenpair (v, η), if for all i ≥ 0, we have

∀i ≥ 0 : dP((wi, ξi), (v, η)) ≤ 1

22i−1dP((w, ξ), (v, η)),

where (w0, ξ0) = (w, ξ) and (wi, ξi) = Newtonf (wi−1, ξi−1), i ≥ 1.

The following theorem is [BC13, Theorem 16.38] specialized to our scenario.

Theorem 8.2.2. Let (f, v, η) ∈ W and (w, ξ) ∈ P be such that there exists some2π≤ r < 1 with

dP((v, η), (w, ξ)) ≤ δ(r) and dP((v, η), (w, ξ)) · γ(f, v, η) ≤ u(r),

where functions δ(r) and u(r) are defined as below and

γ(f, v, η) := ‖v, η‖ maxk≥2

∥∥∥∥ 1

k!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff∥∥∥∥ 1k−1

(the norm is the norm for multilinear operators restricted to the diagonal, as de-fined in (A.2.2)). Then (w, ξ) is an approximate eigenpair of (v, η).

In Theorem 8.2.2 the functions δ(r) and u(r) indicate the size of basins ofattraction of the Newton method. They are defined as in [BC13, p. 316] as

δ(r) := minδ>0sin δ = rδ and u(r) := min

u>0

2u = rΨδ(r)(u)

, (8.2.1)

119

Page 134: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

whereΨδ(u) := (1 + cos δ)(1− u)2 − 1, u, δ ∈ R (8.2.2)

Note that the γ from Theorem 8.2.2 is defined as in [BC13, Definition 16.35],the only difference being the naming γ(f, v, η), which in that reference would bedenoted γ(Ff , (v, η)).

It turns out that, if ‖f‖ = 1, then γ(f, v, η) can be bounded from above bythe condition number, that we introduced in Definition 8.1.3. The following is aversion of the higher derivative estimate, compare [BC13, Theorem 16.1]. Its proofis postponed to Subsection 8.3.3.

Theorem 8.2.3. For (f, v, η) ∈ VS we have

γ(f, v, η) ≤ µ(f, v, η) d2√

2n.

At this point it should not be surprising anymore that the requirement ‖f‖ = 1is crucial here.

8.2.2. The algorithm EALH

We can now state the algorithm EALH (eigenpair adaptive linear homotopy). Thisalgorithm lays the basis for two other subsequent algorithms that are randomizedversions of EALH. Because EALH needs an initial solution (g, v, η) as input andwe want to get rid of this requirement, we will later choose the initial solutionrandomly. Let

Θ(ε) := 1− (1− ε)−2 + cos( ε4)− ε. (8.2.3)

The algorithm EALH is given as follows.

Algorithm 3: Adaptive linear homotopy method for eigenpairs (EALH)

1 Input: f ∈ S(H)n and (g, v, η) ∈ VS, g 6= ±f .2 Output: An approximate eigenpair (w, ξ) ∈ P of f .3 Initialize: τ ← 0;4 Initialize: q ← g, (w, ξ)← (v, η);5 Initialize: ε← 0.04, χ← 2Θ(ε)− 1, α← dS(f, g);6 while τ < 1 do7 ∆ τ ← ε(1− ε)4χΘ(ε)/ [4 d2

√n µ2(q, w, ξ) α];

8 τ ← min 1, τ + ∆ τ;9 q ← point on the geodesic path in S(Hn) from f to g with dS(q, g)← τα;

10 (w, ξ)← Newtonq(w, ξ);

11 end12 Postcondition The algorithm halts, if Ef,g does not intersect E ;

120

Page 135: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.2. The adaptive linear homotopy method

Remark 8.2.4. Proposition 6.6.4 tells us that E is of real codimension at least 2.This shows that for almost all f and (g, v, η) the postconditions of are fulfilled.

We denote by K(f, (g, v, η)) the number of iterations that EALH performs oninputs f ∈ Hn, (g, v, η) ∈ VS. The proof of the following analysis is postponed toSubsection 8.4.1.

Theorem 8.2.5 (Analysis of EALH). If Ef,g ∩ E = ∅, then algorithm EALHterminates, the output (w, ξ) is an approximate eigenpair of f and we have

K(f, (g, v, η)) ≤ 246 d2√n dS(f, g)

∫ 1

0

µ(qτ , vτ , ητ )2dτ,

where (qτ , vτ , ητ ) | 0 ≤ τ ≤ 1 denotes the lifting of Ef,g at (g, v, η) (see Subsec-tion 8.1.3).

In the following we denote

C(f, (g, v, η)) :=

∫ 1

0

µ(qτ , vτ , ητ )2dτ, (8.2.4)

where (qτ , vτ , ητ ) is defined as in Theorem 8.2.5.

Remark 8.2.6. Algorithm EALH not only finds approximations of h-eigenpairs,but also approximations of eigenvectors. To see this, let f ∈ S(Hn) be the inputof algorithm EALH and suppose that we have found (w, ξ) ∈ P(Cn × C) with

δ := dP((v, η), (w, ξ)) <π

16

for some h-eigenpair (v, η) of f . Let (v, η), (w, ξ) ∈ S(Cn) × C also denote repre-sentatives with δ = dS((v, η), (w, ξ)). We have |η| ≤ 1 by Lemma 8.0.4. Proposi-

tion 8.3.4 below implies δ ≥√

8−1 ‖v − w, η − ξ‖ ≥

√8−1 ‖v − w‖ . Since v, w are

points in S(Cn), we have ‖v − w‖ ≥ sin dS(v, w) ≥ sin dP(v, w). Using that for all0 ≤ φ ≤ π

2we have sinφ ≥ 2

πφ, we obtain

dP((v, η), (w, ξ)) = δ ≥ 2

π√

8dP(v, w).

In other words, if (w, ξ) approximates (v, η), then w approximates v.

8.2.3. The algorithm LVEALH

We now want to modify the algorithm EALH, such that the initial system togetherwith one of its h-eigenpairs is chosen randomly. For this we need to define aprobability distribution on VS. We proceed similar as in Section 7.1:

121

Page 136: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

1. Choose g ∈ S(Hn) uniformly at random.

2. Choose (v, η) uniformly from the D(n, d) many eigenpairs of g.

This distribution is similar to the distribution with density ρ from Lemma 6.5.1,the only differences being that one chooses h-eigenpairs instead of eigenpairs andf ∼ Unif S(Hn) instead of being Gaussian (Unif S(Hn) denotes the uniform dis-tribution on S(Hn)). There is no harm in denoting the present distribution by ρ,too. Similar to (6.5.1) we then put

Ξav(f) =1

2πD(n, d)

∫(v,η)∈S(Cn)×C:

f(v)=ηd−1v

Ξ(f, v, η) d(v, η) (8.2.5)

for a measurable function Ξ(f, v, η) on V . Using the same arguments as we usedto prove Lemma 6.5.1 we get

E(f,v,η)∼ρ

Ξ(f, v, η) = Ef∼Unif S(Hn)

Ξav(f). (8.2.6)

The algorithm Las Vegas EALH (LVEALH) is as follows.

Algorithm 4: Las Vegas EALH (LVEALH)

1 Input: f ∈ S(Cn).2 Output: An approximate eigenpair (w, ξ) ∈ P of f .3 Choose (g, v, η) ∈ V with density ρ;4 Start algorithm EALH with inputs (g, v, η) and f .;5 Set (w, ξ)← output of EALH.;6 Postcondition The algorithm halts, if Ef,g does not intersect E ;

A little care should be taken here. We have not given an algorithm to drawfrom ρ. And, unfortunately, we are not able to provide such an algorithm. Thereason why we still describe LVEALH is that we can provide an average analysisfor it and this will be relevant for the analysis of the subsequent true randomizedalgorithm LVEALHWS, see Proposition 8.2.9.

The proof of the following theorem is postponed to Subsection 8.4.2.

Theorem 8.2.7 (Average analysis of LVEALH). We have

Ef∼Unif S(Hn)

E(g,v,η)∼ρ

C(f, (g, v, η)) <40π nN

d

where C is as in (8.2.4).

122

Page 137: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.2. The adaptive linear homotopy method

8.2.4. The algorithm LVEALHWS

We now describe an algorithm that defines another probability density functionon VS, which we denote by ρ∗. The idea for this algorithm is an adaption of themethods in [ABB+15, Sec. 10.1] and roughly works as follows. Beltran and Pardo[BP11] gave an algorithm to sample a system of n− 1 polynomials of degree d inn variables f and a zero uniform at random from the dn−1 many zeros of f. I.e.;Beltran-Pardo randomization yields f ∼ NC(Hn−1) and v ∈ P(Cn) with f(v) = 0.The idea is to sample one further single polynomial f0 ∈ H and some η withf0(v) = ηd−1, so that, by construction, we get[

f0(v)f(v)

]=

[ηd−1

0

]= ηd−1e1.

If we sample U ∈ U(n) under the constraint that Ue1 = v, where e1 := (1, 0, . . . , 0)and put f := U [f0, f], we have

f(v) := U

[f0(v)f(v)

]= ηd−1Ue1 = ηd−1v

as desired. The discussion in Chapter 7 made us realize how one must sample theeigenvalue η: First, draw r > 0 with density e−r and draw φ ∈ [0, 2π) uniform

at random. Then, put η := r1

2(d−1) exp(iφ). We give a precise description of thedensity of η in Lemma 8.4.3 below. The explicit algorithm is as follows.

Algorithm 5: Draw-from-ρ∗.

1 Input: -.2 Output: (f, v, η) ∈ V .3 Draw f ∼ NC(Hn−1) and v ∈ P(Cn) from the dn−1 many zeros of f;4 Draw a ∼ NC(v⊥) and h ∼ NC(R(v)), where R(v) is as in Proposition 6.3.6;5 Draw U ∈ U(n), such that Ue1 = v, uniformly at random;6 Draw r with density e−r 1r≥0(r) and choose φ ∈ [0, 2π) uniformly;

7 Put η := r1

2(d−1) exp(iφ);8 Define I|v⊥ as the restriction of the identity to v⊥;

9 if∥∥Dvf + ηd−1 I|v⊥

∥∥F≥ ‖Dvf ‖F ;

10 then11 Delete η, φ and R and go to 3;12 else

13 Put f0 := ηd−1 〈X, v〉d +√d 〈X, v〉d−1aTX + h;

14 end

15 Put f := U ·[f0, f

]T;

16 Return (f, v, η) ∈ V ;

123

Page 138: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

The advantage of ρ∗ when compared to ρ is that we can implement drawingfrom ρ∗ efficiently using a random number generator that can draw from the stan-dard normal distribution NC(0, 1). The proof of the following is postponed toSubsection 8.4.3.

Proposition 8.2.8. We assume to have a random number generator that candraw von NC(0, 1) in O(1). We count draws like arithmetic operations. Thenthe algorithm Draw-from-ρ∗ can be implemented such that its expected number ofarithmetic operations is O(n3 + dnN), where N := dimC(Hn).

Generating a starting system for the algorithm EALH with Draw-from-ρ∗ yieldsthe algorithm we call LVEALH with sampling, or LVEALHWS.

Algorithm 6: Las Vegas EALH with sampling (LVEALHWS).

1 Input: f ∈ S(Cn).2 Output: An approximate eigenpair (w, ξ) ∈ P of f .3 Run Draw-from-ρ∗;4 Set (g, v, η)← output of Draw-from-ρ∗;

5 Start algorithm EALH with inputs (‖g‖−1 g, v, ‖g‖−1d−1 η) and f ;

6 Set (w, ξ)← output of EALH;7 Postcondition The algorithm halts, if Ef,g does not intersect E . In this case

(w, ξ) is an approximate eigenpair of f .

We can use the average analysis of algorithm LVEALH given in Theorem 8.2.7to analyze LVEALHWS. The proof of Proposition 8.2.9 below is postponed toSubsection 8.4.3.

Proposition 8.2.9. We have

Ef∼Unif S(Hn)

(g,v,η)∼ρ∗

C(‖g‖−1 g, v, ‖g‖−1d−1 η) ≤ 10

√π n E

f∼Unif S(Hn)(g,v,η)∼ρ

C(f, (g, v, η)).

Now we have gathered all the ingredients to prove Theorem 8.0.1.

Proof Theorem 8.0.1. We prove that algorithm LVEALHWS has the propertiesstated. First, note that from Remark 8.2.4 we get that algorithm LVEALHWSterminates almost surely. Proposition 8.2.8 states that expected number of arith-metic operations of algorithm Draw-from-ρ∗ is O(n3 + dnN).

The cost of every iteration in algorithm EALH is dominated by the costs ofevaluating Newtonf (v, η), which by [BC13, Proposition 16.32] isO(N). Combining

124

Page 139: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

Proposition 8.2.9 with Theorem 8.2.7 we see that

Ef∼Unif S(Hn)

E(g,v,η)∼ρ∗

C(‖g‖−1 g, v, ‖g‖−1d−1 η) ≤ 400

√π

3n2N

d.

From Theorem 8.2.5 we get that the expected number of iterations algorithmEALH is at most

Ef∼Unif S(Hn)

E(g,v,η)∼ρ∗

K(‖g‖−1 g, v, ‖g‖−1d−1 η)

≤ 246 d2√n dS(f, g) E

f∼Unif S(Hn)E

(g,v,η)∼ρ∗C(‖g‖−1 g, v, ‖g‖

−1d−1 η)

≤ 98400d√π

3n

52 N dS(f, g)

Using that dS(f, g) ≤ π2

we see that the expected number of iterations of EALH

is O(dn52N) and hence the expected number of arithmetic operations of algorithm

EALH is O(dn52N2). Altogether we have shown that the expected number of

arithmetic operations of LVEALHWS is O(n3 + dn52N2).

8.3. Auxiliary results

8.3.1. The condition number has small expectation

This section is crucial for the proof of Theorem 8.2.7 presented in Subsection 8.4.2.Theorem 8.2.5 shows that the average number of steps of algorithm LVEALH is

closely connected to the expectation of the condition number µ(f, v, η). By Propo-sition 8.1.5 the expectation of µ(f, v, η) is bounded by the expectation of µ(f, v, η).We prefer working with µ(f, v, η) because of the possibility of making a relativecondition number out of it(see (8.1.1)). Let f ∈ Hn. We write

(µ2)av(f) :=1

2πD(n, d)

∫(v,η)∈S(Cn)×C:

f(v)=ηd−1v

µ(f, v, η)2d(v, η) (8.3.1)

as in (8.2.5). By (8.2.6), we have

E(f,v,η)∼ρ

µ(f, v, η) = Ef∼Unif S(Hn)

(µ2)av(f).

Proposition 8.3.1. We have that

Ef∼Unif S(Hn)

(µ2)av(f) < 80nNd−1.

125

Page 140: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

The first task on the way to prove Proposition 8.3.1 is to use [BC13, Cor. 2.23]to replace Unif S(Hn) by the standard normal distribution NC(Hn) (see Defini-tion 6.4.1). The advantage of the latter is the independence of the coefficientsof f ∼ NC(Hn). Recall from (8.1.1) that we have put

µrel(f, v, η) :=∥∥∥[ ‖f‖In 0

0 ‖f‖d−2d−1

] (D(v,η)Ff

∣∣v⊥×C

)−1∥∥∥ , (8.3.2)

if D(v,η)Ff∣∣v⊥×C is invertible and µrel(f, v, η) := ∞, otherwise. By construction,

for ‖f‖ = 1 the relative and the absolute condition number coincide. This implies

Eq∼Unif S(Hn)

(µ2)av(q) = Eq∼Unif S(Hn)

(µ2rel)av(q) (8.3.3)

For general f we have the following inequality.

Lemma 8.3.2. For all f ∈ Hn we have

(µ2rel)av(f) ≤ max

‖f‖2 , ‖f‖

2(d−2)d−1

(µ2

rel)av(f).

Proof. Use the description (8.3.2) the submultiplicativity of the spectral norm(A.1.1).

From Lemma 8.1.7 (4) we see that (µ2rel)av(f) is scale invariant. We further get

from Lemma 8.1.7(3), that (µ2rel)av(f) is unitarily invariant. By [BC13, Cor. 2.23]

we can writeE

q∼Unif S(Hn)(µ2

rel)av(q) = Eq∼N<

√2N

C (Hn)

(µ2rel)av(q), (8.3.4)

where N<√

2NC (Hn) is the truncated normal distribution [BC13, eq. (17.25)]. Using

Lemma 8.3.2 we deduce that

Eq∼N√2N (Hn)

(µ2rel)av(q) ≤ 2N E

q∼N<√2N

C (Hn)

(µ2)av(q) ≤ 4N Eq∼NC(Hn)

(µ2)av(q); (8.3.5)

the second inequality by [BC13, Lemma 17.25]. Proposition 8.3.1 is now obtainedby combining (8.3.3), (8.3.4) and (8.3.5) with the following result.

Proposition 8.3.3. We have

Eq∼NC(Hn)

(µ2)av(q) ≤ 20nd−1.

Proof. In the following we ease notation by writing

µ2av(q) := (µ2)av(q)

for the average condition number.

126

Page 141: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

Recall from (6.6.5) the map ψ : V → V, (f, v, η) 7→ (f, v, ηd−1). Combining thiswith Lemma 6.5.2 we obtain

Eq∼NC(Hn)

µ2av(q) =

vol S(Cn)

2πn+1D(n, d)

∫η∈C

(d− 1)2 E(η) |η|2(d−2) e−|η|2(d−1)

dη, (8.3.6)

where

E(η) := E(ηd−1) = EA,a,h

µ(q, e1, η)2∣∣∣det(

√dA− ηd−1In−1)

∣∣∣2 (8.3.7)

and where A ∼ NC(C(n−1)×(n−1)), a ∼ NC(Cn−1), h ∼ NC(R(e1)n) are independent,and where

q = ηd−1Xd1e1 +

√dXd−1

1

[aT

A

](X2, . . . , Xn)T + h.

Note that, by (6.6.6),

D(e1,η)Fq =[De1q − ηd−1In, −(d− 1)ηd−2v

]=

[(d− 2)ηd−1 aT −(d− 1)ηd−2

0 A− ηd−1In−1 0

],

so that we have

µ(q, e1, η) =

∥∥∥∥∥[(d− 2)ηd−1

√daT −(d− 1)ηd−2

0√dA− ηd−1In−1 0

]∣∣∣∣−1

e⊥1 ×C

∥∥∥∥∥2

; (8.3.8)

see Definition 8.1.3.

Claim. We have

E(η)(d− 1)2 |η|2(d−2) ≤ dn−1n(n− 1)!(1 + (d− 1) |η|2(d−2) ) n−1∑

k=0

1

k!

( |η|d−1

d

)2k

.

Suppose that the claim is true. Then E µ2av(q) is less than or equal to

(n− 1)!ndn−1vol S(Cn)

2πn+1D

∫η∈C

(1 + (d− 1) |η|2(d−2))e−|η|2(d−1)

n−1∑k=0

|η|2(d−1)k

k!dkdη,

where D := D(n, d) = dn − 1. Observe that the integrand is independent of theargument of η. Substituting r := |η|, so that dη = r dr dθ, shows

Eqµ2

av(q) ≤ (n− 1)!ndn−1volS(Cn)

πnD

∫r≥0

(r + (d− 1)r2(d−2)+1)e−r2(d−1)

n−1∑k=0

r2(d−1)k

k!dkdr.

127

Page 142: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Making the change of variables t := r2(d−1), interchanging summation and integra-tion and using the fact that vol S(Cn) = 2πn/(n− 1)! yields

Eq∼NC(0, 1

2I)µ2

av(q) ≤ ndn−1

D(d− 1)

n−1∑k=0

∫t≥0

(t

1d−1−1 + (d− 1)

) tke−tk!dk

dt

=ndn−1

D(d− 1)

n−1∑k=0

Γ(k + 1

d−1

)+ (d− 1) Γ

(k + 1

)dkk!

=ndn−1

D(d− 1)

(n−1∑k=0

Γ(k + 1

d−1

)dkk!

+ (d− 1)n−1∑k=0

1

dk

)(8.3.9)

For 0 < ε < 1 and k ≥ 2 we have Γ(k + ε) ≤ Γ(k + 1) = k! and hence

n−1∑k=0

Γ(k + 1

d−1

)dkk!

≤ Γ

(1

d− 1

)+

Γ(1 + 1

d−1

)d

+n∑k=2

1

dk

≤ Γ

(1

d− 1

)+

Γ(1 + 1

d−1

)d

+n∑k=2

1

2k, (because d ≥ 2)

≤ Γ

(1

d− 1

)+

Γ(1 + 1

d−1

)d

+1

2

≤ (d− 1)

√π

2+

√π

2d+

1

2≤ 3d.; (8.3.10)

the last inequality by Lemma C.2.1. Using that d ≥ 2 we also have

(d− 1)n−1∑k=0

1

dk≤ d

n−1∑k=0

1

2k≤ 2d.

Combining this with (8.3.10) and plugging into (8.3.9) we obtain

Eq∼NC(0, 1

2I)µ2

av(q) ≤ 5ndn

D(d− 1)= 5n

dn

(dn − 1)(d− 1)

d≥2

≤ 20n

d.

This shows the assertion of Proposition 8.3.3. It remains to prove the claim.Proof of the claim. The following proof follows ideas that we have found in

[ABB+15, Sec. 7]. Recall from (8.3.7) and (8.3.8) that E(η) is equal to

EA,a,h

∥∥∥∥∥[(d− 2)ηd−1

√daT −(d− 1)ηd−2

0√dA− ηd−1In−1 0

]∣∣∣∣−1

e⊥1 ×C

∥∥∥∥∥2 ∣∣∣det(

√dA− ηd−1I)

∣∣∣2

128

Page 143: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

Let us put b :=√da and B :=

√dA− ηd−1I and

S :=

[ √daT −(d− 1)ηd−2

√dA− ηd−1I 0

]=

[bT −(d− 1)ηd−2

B 0

]. (8.3.11)

Note that det(S) = −(d− 1)ηd−2 det(B), so that

E(η) (d− 1)2 |η|2(d−2) = EA,a,h

∥∥S−1∥∥2 |detS|2 .

We use from Lemma A.1.2 (3) that ‖S−1‖2 ≤ ‖S−1‖2F , where ‖·‖2

F is the Frobeniusnorm. Moreover, S is independent of h so we may omit it in the expectation. Thisyields

E(η) (d− 1)2 |η|2(d−2) ≤ EA,a

∥∥S−1∥∥2

F|detS|2 . (8.3.12)

We enumerate the entries of S with indices 0 ≤ i, j ≤ n − 1, the entries of B areenumerated with indices 1 ≤ i, j ≤ n− 1. Let Si,j ∈ C(n−1)×(n−1), 0 ≤ i, j ≤ n− 1,denote the matrix that is obtained by removing from S the i-th row and the j-thcolumn and define Bi,j, 1 ≤ i, j ≤ n − 1, correspondingly. By Cramer’s rule wehave ∥∥S−1

∥∥2

F|detS|2 =

∑0≤i,j≤n−1

∣∣detSi,j∣∣2 .

We deduce from (8.3.11) that

∣∣detSi,j∣∣2 =

0, if i = 0, j < n− 1

|detB|2 , if i = 0, j = n− 1

(d− 1)2η2(d−2) |detBi,j|2 , if i > 0, j < n− 1

|detB(i : b)|2 , if i > 0, j = n− 1

, (8.3.13)

where B(i : b) is the matrix that is obtained from B by replacing the i-th row ofB with b. We have Var(Bi,j) = d Var(Ai,j) = d for 1 ≤ i, j ≤ n− 1, so that

d

n−1∑j=1

E∣∣detBi,j

∣∣2 =n−1∑j=1

Var(Bi,j)E∣∣detBi,j

∣∣2 .From Proposition E.0.1 we get for fixed 1 ≤ i ≤ n− 1

n−1∑j=1

Var(Bi,j)E∣∣detBi,j

∣∣2 ≤ E |detB|2 . (8.3.14)

We moreover have E b = 0 and for 1 ≤ j ≤ n − 1 we have Var(bj) = d. Using

129

Page 144: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Proposition E.0.1 again, we obtain for 1 ≤ i ≤ n− 1

E |detB(i : b)|2 = E |detB(i : 0)|2 +n−1∑j=1

Var(bj)E∣∣detBi,j

∣∣2= d

n−1∑j=1

E∣∣detBi,j

∣∣2 (8.3.15)

≤ E |detB|2 , (8.3.16)

the last inequality by (8.3.14). Combining (8.3.13) with (8.3.14) and (8.3.15) weget

E∥∥S−1

∥∥2

F|detS|2

= E |detB|2 + (d− 1)2 |η|2(d−2)∑

1≤i,j<n−1

E∣∣detBi,j

∣∣2 +n−1∑i=1

E |detB(i : b)|2

≤ E |detB|2 +(n− 1)(d− 1)2 |η|2(d−2)

dE |detB|2 + (n− 1)E |detB|2

≤ n(1 + (d− 1)η2(d−2)

)E |detB|2 . (8.3.17)

We have that

E |detB|2 = E∣∣∣det(

√dA− ηd−1In−1)

∣∣∣2= dn−1 E

∣∣∣∣det

(A− ηd−1

√dIn−1

)∣∣∣∣2 (8.3.18)

and by Proposition E.1.2 we have

E∣∣∣∣det

(A− ηd−1

√dIn−1

)∣∣∣∣2 = (n− 1)!n−1∑k=0

1

k!

(|η|d−1

d

)2k

. (8.3.19)

Combining (8.3.19) with (8.3.18) and (8.3.12) shows the claim.

8.3.2. Trigonometry

The following proposition will be useful later. Basically, it says that for pairs ofpoints (v1, η2), (v2, η2) in the grey area of Figure 8.0.1 that are close to each otherthe points v1, v2 are also close to each other.

130

Page 145: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

Proposition 8.3.4. Let m = m1 + m2, and x := (x1, x2) ∈ (Rm1\ 0) × Rm2,such that ‖x2‖ ≤ ‖x1‖. Let y := (y1, y2) ∈ (Rm1 ×Rm2)\ 0 and put δ := dS(x, y)

1. If δ < π/4, then y1 6= 0.

2. Suppose that δ < π16

and x1, y1 ∈ S(Rm1). Then ‖x− y‖ ≤ δ√

8.

Proof. Let us first prove the first part. Without restriction we may assume that‖x‖ = ‖y‖ = 1. Consider the map

Ψ : S(Rm1)× S(Rm2)× [0, π/2]→ S(Rm1+m2), (u,w, φ) 7→ (cos(φ)u, sin(φ)v)

One easily shows that Ψ is surjective. Suppose that we have Ψ(ux, wx, φx) = xand Ψ(uy, wy, φy) = y. Then,

cos(δ) = cos(φx) cos(φy)〈ux, uy〉+ sin(φx) sin(φy)〈wx, wy〉≤ cos(φx) cos(φy) + sin(φx) sin(φy) = cos(φx − φy)

and hence δ ≥ |φx − φy|. From our assumption ‖x2‖ ≤ ‖x1‖ follows 0 ≤ φx ≤ π/4,which implies 0 ≤ φy ≤ δ + φx < π/2. This implies ‖y1‖ = cos(φy) 6= 0. The firstassertion follows from this.

We now prove the second assertion. We can not without restrictions assumethat ‖x‖ = ‖y‖ = 1. Therefore we write Ψ(ux, wx, φx) = ‖x‖−1 x and we writeΨ(uy, wy, φy) = ‖y‖−1 y. By assumption, we have cos(φx) 6= 0 and, by the firstpart, we have cos(φy) 6= 0. Since, by assumption, ‖x1‖ = ‖y1‖ = 1, we have that

‖x‖2 = 1 + tan2(φx), and ‖y‖2 = 1 + tan2(φy). The law of cosines yields

‖x− y‖2 = ‖x‖2 + ‖y‖2 − 2 ‖x‖ ‖y‖ cos(δ).

From this we obtain

‖x− y‖2 = 2 + tan2 φx + tan2 φy − 2√

(1 + tan2 φx)(1 + tan2 φy) cos(δ).

Writing φy = φx+δ we obtain ‖x− y‖2 as a function of φx and δ, which we denoteby

L : [0,π

4]× [− π

16,π

16]→ R≥0, (φx, δ) 7→ ‖x− y‖2 .

A computer based calculation reveals that dLdφx

> 0. This implies

‖x− y‖2 ≤ L(π/4, δ) = 3 + tan(π/4 + δ)2 − 2√

2(1 + tan(π/4 + δ)2) cos(δ).

Another computer based calculation shows L(π4, 0) = 0 and that 0 is a global

minimum of 8δ2 − L(π4, δ).

131

Page 146: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

8.3.3. Higher derivative estimate

In this section we prove Theorem 8.2.3. We will need the following lemma.

Lemma 8.3.5. For d ≥ 2 and 2 ≤ k ≤ d we have that

(d− 1)!

k!(d− k)!≤(d

2

)k−1

.

Proof. Combine k! ≥ 2k−1 with (d−1)!(d−k)!

≤ dk−1.

The following is similar to the proof of [BC13, Theorem 16.1].

Proof of Theorem 8.2.3. We choose a representative (v, η) ∈ S(Cn) × C, so that,

by assumption, (f, v, η) ∈ VS. From Lemma 8.0.4 we get ‖v, η‖ ≤√

2. Moreover,for any 2 ≤ k ≤ d, using the submultiplicativity of the spectral norm (A.2.4), wehave∥∥∥∥ 1

k!

(D(v,η)Ff

∣∣(v,η)⊥

)−1 ·Dk(v,η)Ff

∥∥∥∥ ≤ 1

k!

∥∥∥(D(v,η)Ff∣∣(v,η)⊥

)−1∥∥∥∥∥Dk

(v,η)Ff∥∥

=1

k!µ(f, v, η)

∥∥Dk(v,η)Ff

∥∥ . (8.3.20)

By the triangle inequality we have∥∥Dk(v,η)Ff

∥∥ ≤ ∥∥Dkvf∥∥+

∥∥Dk(v,η)`

d−1X∥∥ . (8.3.21)

Using ‖f‖ = ‖v‖ = 1 we get from [BC13, Prop. 16.48]∥∥Dkvf∥∥ ≤ d!

(d− k)!(8.3.22)

One easily sees that

∥∥Dk(v,η)`

d−1X∥∥ ≤ ( n∑

i=1

∥∥Dk(v,η)`

d−1X1

∥∥2) 1

2.

For w1, . . . , wk ∈ Cn let wji be the j-th entry of wi. Recall that we denote thepartial derivative by d

dX. Then

∥∥Dk(v,η)`

d−1Xi

∥∥ = maxw1,...,wk∈S(Cn)

∣∣∣∣∣ ∑1≤i1,...,ik≤n

dk(`d−1Xi)

dXi1 . . . dXik

(v, η)k∏j=1

wijj

∣∣∣∣∣≤

∑1≤i1,...,ik≤n

∣∣∣∣ dk(`d−1Xi)

dXi1 . . . dXik

(v, η)

∣∣∣∣132

Page 147: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

Oberve, that the only k-th order derivatives of `d−1Xi, that are non-zero, are`d−1−kXi

∏k−1i=0 (d− 1− i) and `d−k

∏k−2i=0 (d− 1− i). This implies

∥∥Dk(v,η)`

d−1X∥∥ ≤ (d− 1)!

(d− 1− k)!|η|d−1−k |vi|+ k

(d− 1)!

(d− k)!|η|d−k

≤ (d− 1)!

(d− k)!(d− k + k)

=d!

(d− k)!;

where we have used from Lemma 8.0.4 that |η| ≤ ‖v‖ = 1. This shows that we

have∥∥∥Dk

(v,η)`d−1X

∥∥∥ ≤ √n d!(d−k)!

. Plugging this and (8.3.22) into (8.3.21) we get

∥∥Dk(v,η)Ff

∥∥ ≤ (√n+ 1)

d!

(d− k)!.

Using (8.3.20) we obtain

‖v, η‖k−1

∥∥∥∥∥(D(v,η)Ff |(v,η)⊥)−1 Dk

(v,η)Ffk!

∥∥∥∥∥ ≤ √2k−1

µ(f, v, η)(√n+ 1) d!

(d− k)! k!

≤√

2nk−1

µ(f, v, η)2 d!

(d− k)!k!

=√

2nk−1

2d µ(f, v, η)(d− 1)!

(d− k)!k!.

By Lemma 8.3.5 we have (d−1)!k!(d−k)!

≤(d2

)k−1. Furthermore, by Lemma 8.1.7 we have

1 ≤ 2d µ(f, v, η), so that 2d µ(f, v, η) ≤ (2d µ(f, v, η))k−1. Hence

‖v, η‖∥∥∥∥ 1

k!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff∥∥∥∥ 1k−1

≤ µ(f, v, η) d2√

2n,

from which the claim follows.

8.3.4. A Lipschitz estimate for the condition number

We will have to quantify the behavior of the condition number µ(f, v, η) undersmall pertubations of both f and (v, η). In what follows we define the functionsϑ(ε) := 1− (1− ε)−2 + cos( ε

4) and, as in (8.2.3), Θ(ε) = ϑ(ε)− ε. Observe that

∀ 0 ≤ ε ≤ 1

5: 0 < Θ(ε) ≤ 1 and ε < ϑ(ε). (8.3.23)

133

Page 148: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

The rest of the section is similar to [BC13, Sec. 16.8].

Proposition 8.3.6. 1. Let f, g ∈ S(Hn) and (v, η) ∈ P. If

d µ(f, v, η) dS(f, g) ≤ ε < 1,

then

µ(g, v, η) ≤ 1

1− εµ(f, v, η).

2. Let (f, v, η) ∈ WS and (w, ξ) ∈ P. If

4 d2√n µ(f, v, η) dP((w, ξ), (v, η)) ≤ ε ≤ 1

4,

then

(1− ε)2µ(f, v, η) ≤ µ(f, w, ξ) ≤ 1

ϑ(ε)µ(f, v, η).

We can combine the statements in Proposition 8.3.6.

Corollary 8.3.7. Let (f1, v1, η1), (f2, v2, η2) ∈ WS. If

d µ(f1, v1, η1) maxdS(f1, f2), 4 d

√n dP((v1, η1), (v2, η2))

≤ ε ≤ 1

5,

then

Θ(ε) µ(f1, v1, η1) ≤ µ(f2, v2, η2) ≤ 1

Θ(ε)µ(f1, v1, η1).

Proof. We may apply Proposition 8.3.6(2) to obtain

µ(f1, v2, η2) ≤ 1

ϑ(ε)µ(f1, v1, η1). (8.3.24)

This yields d µ(f1, v2, η2) dS(f, g) ≤ ε/ϑ(ε) =: ε′ < 1, by (8.3.23). ApplyingProposition 8.3.6(1) yields

µ(f2, v2, η2) ≤ 1

1− ε′µ(f1, v2, η2) (8.3.25)

Combining (8.3.24) and (8.3.25) gives

µ(f2, v2, η2) ≤ 1

(1− ε′)ϑ(ε)µ(f1, v1, η1) =

1

ϑ(ε)− εµ(f1, v1, η1)

This yields the right inequality. If µ(f1, v1, η1) ≤ µ(f2, v2, η2), the left inequality is

134

Page 149: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

trivial. If µ(f2, v2, η2) ≤ µ(f1, v1, η1), then

d µ(f2, v2, η2) maxdS(f1, f2), 4 d

√n dP((v1, η1), (v2, η2))

≤ d µ(f1, v1, η1) max

dS(f1, f2), 4 d

√n dP((v1, η1), (v2, η2))

≤ ε.

and we may interchange the roles of (f1, v1, η1) and (f2, v2, η2) to conclude.

In order to prove Proposition 8.3.6 we will need the following lemma, similiarto [BC13, Lemma 16.41]. Recall from Theorem 8.2.2 the definition

γ(f, v, η) := ‖v, η‖ maxk≥2

∥∥∥∥ 1

k!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff∥∥∥∥ 1k−1

.

and from (8.2.2) the definition Ψδ(u) = (1 + cos δ)(1− u)2 − 1, u, δ ∈ R.

Lemma 8.3.8. Let (f, v, η) ∈ WS and let (w, ξ) ∈ S(Cn)× C. Put

δ := dS((w, ξ), (v, η)), and u := δγ(f, v, η)√

8.

If δ < π16

and u < 1, then∥∥∥(D(v,η)Ff |(v,η)⊥)−1

D(w,ξ)Ff |(w,ξ)⊥∥∥∥ ≤ 1

(1− u)2.

If further Ψδ(u) > 0, then D(w,ξ)Ff |(w,ξ)⊥ is invertible and we have∥∥∥D(w,ξ)Ff |−1(w,ξ)⊥

D(v,η)Ff |(v,η)⊥

∥∥∥ ≤ (1− u)2

Ψδ(u).

Proof. Recall that Ff is a map Cn × C → Cn. We identify Cn+1 ∼= Cn × C andwe denote by L(Cn+1,Cn) the vector space of C-linear maps Cn+1 → Cn, compareAppendix A.2 and [BC13, eq. (15.2), Lemma 15.9]. For any f ∈ H we define

DFf : Cn+1 → L(Cn+1,Cn), (w, ξ) 7→ D(w,ξ)Ff ,

and writeD2

(w,ξ)Ff := D(w,ξ)(DFf ) ∈ L(Cn+1,L(Cn+1,Cn)).

Inductively we define Dk(w,ξ)Ff . By construction, Dk

(w,ξ)Ff is a multilinear map

(Cn+1)×(k−1) → Cn. Similar to Definition 6.1.2 we define

Dk(w,ξ)Ff xk−1 := Dk

(w,ξ)Ff (x, . . . , x).

135

Page 150: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

The Taylor expansion of D(w,ξ)Ff |(w,ξ)⊥ around (v, η) can then be expressed as

D(w,ξ)Ff |(w,ξ)⊥ =d∑

k=1

1

(k − 1)!Dk

(v,η)Ff |(w,ξ)⊥(w − v, ξ − η)k−1,

see also [BC13, Lemma 16.41]. Then

(D(v,η)Ff |(v,η)⊥)−1 D(w,ξ)Ff |(w,ξ)⊥ = P +B,

where,

P =(D(v,η)Ff |(v,η)⊥

)−1D(v,η)Ff |(w,ξ)⊥ (8.3.26)

B =d∑

k=2

1

(k − 1)!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff |(w,ξ)⊥(w − v, ξ − η)k−1. (8.3.27)

By Lemma A.1.8 we have that ‖P‖ ≤ 1. If in Proposition 8.3.4 we put x =(v, η), y = (w, ξ), we obtain ‖w − v, ξ − η‖ ≤ δ

√8. Combining with (8.3.27) we

obtain

‖B‖ ≤d∑

k=1

k

∥∥∥∥ 1

k!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff (w − v, ξ − η)k−1

∥∥∥∥≤

d∑k=1

k

∥∥∥∥ 1

k!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff∥∥∥∥ ‖w − v, ξ − η‖k−1

≤d∑

k=1

k ‖v, η‖1

k−1

∥∥∥∥ 1

k!

(D(v,η)Ff |(v,η)⊥

)−1Dk

(v,η)Ff∥∥∥∥ ‖w − v, ξ − η‖k−1

≤d∑

k=2

k(γ(f, v, η) δ

√8)k−1

=d∑

k=2

kuk−1 ≤ (1− u)−2 − 1;

the first inequality by the triangle inequality, the second by (A.2.3) and the thirdbecause ‖v‖ = 1. This implies∥∥∥(D(v,η)Ff |(v,η)⊥

)−1D(w,ξ)Ff |(w,ξ)⊥

∥∥∥ = ‖P +B‖ ≤ ‖P‖+ ‖B‖ ≤ (1− u)−2,

which is the first assertion.It remains prove the second assertion. Lemma A.1.8 tells us that P is invertible

and that ‖P−1‖ ≤ (cos δ)−1. By our assumption Ψδ(u) > 0 we have∥∥P−1∥∥ ‖B‖ ≤ (cos δ)−1

((1− u)−2 − 1

)< 1.

136

Page 151: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.3. Auxiliary results

From [BC13, Lemma 15.7] we get that P +B is invertible and that

∥∥(P +B)−1∥∥ ≤ ‖P−1‖

1− ‖B‖ ‖P−1‖≤ (cos δ)−1

1− (cos δ)−1 ((1− u)−2 − 1).

One shows that the right hand side equals (1 − u)2/Ψδ(u). This finishes theproof.

Proof Proposition 8.3.6. We prove the first assertion. Let (v, η) ∈ S(Cn)×C be arepresentative for (v, η) ∈ P . By assumption we have

d µ(f, v, η) dS(f, g) ≤ ε < 1.

This implies that µ(f, v, η) < ∞, which means that D(v,η)Ff |(v,η)⊥ is invertible.Let A := D(wv,η)Ff |(v,η)⊥ and hence µ(f, v, η) = ‖A−1‖. Futher, write

∆ := [Dw(g − f) , 0]|(v,η)⊥ ,

so A+ ∆ = D(v,η)g |(v,η)⊥ . We have

‖∆‖ ≤ ‖[Dv(g − f) , 0]‖ = ‖Dv(g − f) ‖ ≤ d ‖g − f‖ ;

the last inequality by [BC13, Lemma 16.46]. Since ‖f‖ = ‖g‖ = 1, we have that‖f − g‖ ≤ dS(f, g). Therefore,∥∥A−1

∥∥ ‖∆‖ ≤ d∥∥A−1

∥∥ ‖g − f‖ ≤ d µ(f, v, η) dS(f, g) ≤ ε < 1.

We can apply [BC13, Lemma 15.7] to deduce that A+ ∆ is invertible and that

µ(g, v, η) =∥∥(A+ ∆)−1

∥∥ ≤ ‖A−1‖1− ‖∆‖ ‖A−1‖

≤ 1

1− εµ(f, v, η).

This proves the first assertion.To prove the second assertion for (v, η), (w, ξ) ∈ P we choose representatives

(v, η), (w, ξ) ∈ S(Cn)× C, such that

δ := dP((v, η), (ξ, η)) = dS((v, η), (ξ, η)).

Put u := δγ(f, v, η)√

8. By Theorem 8.2.3 and by our assumption we have

u ≤ 4 δ d2√n µ(f, v, η) ≤ ε ≤ 1

4< 1.

Using that d ≥ 2 and from Lemma 8.1.7(1) that 2d µ(f, v, η) ≥ 1 we obtainδ ≤ ε/4 < π

16. Recall that Ψδ(u)/(1− u)2 = 1 + cos δ − (1− u)−2. The right hand

137

Page 152: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

side is decreasing in δ and in u. This implies

Ψδ(u)

(1− u)2= 1 + cos δ − (1− u)−2 ≥ 1 + cos(ε/4)− (1− ε)−2 = ϑ(ε).

We getΨδ(u)

(1− u)2≥ ϑ(ε) > 0,

by (8.3.23). Summarizing, we have δ < π16

, u < 1 and Ψδ(u) > 0 and can thereforeapply Lemma 8.3.8 to deduce that µ(f, w, ξ) < ∞ and, using the submultiplica-tivity of the spectral norm (A.1.1), that

µ(f, w, ξ) =∥∥∥D(w,ξ)Ff −1

(w,ξ)⊥D(v,η)Ff (v,η)⊥ D(v,η)Ff −1

(v,η)⊥

∥∥∥≤∥∥∥D(w,ξ)Ff −1

(w,ξ)⊥D(v,η)Ff (v,η)⊥

∥∥∥ ∥∥∥D(v,η)Ff −1(v,η)⊥

∥∥∥≤ (1− u)2

Ψδ(u)µ(f, v, η)

≤ 1

ϑ(ε)µ(f, v, η);

the second-to-last inequality by Lemma 8.3.8. This proves the first inequality.Similiary we obtain

µ(f, v, η) =∥∥∥D(v,η)Ff −1

(v,η)⊥D(w,ξ)Ff (w,ξ)⊥ D(w,ξ)Ff −1

(w,ξ)⊥

∥∥∥≤ 1

(1− u)2µ(f, w, ξ)

≤ 1

(1− ε)2µ(f, w, ξ).

This finishes the proof.

8.4. Complexity analysis

In this section we prove the previously stated complexity analyses for the fouralgorithms EALH, LVEALH and LVEALHWS and Draw-from-ρ∗; that is, we willprove Theorem 8.2.5, Theorem 8.2.7, Proposition 8.2.8 and Proposition 8.2.9.

8.4.1. Complexity of algorithm E-ALH

Recall from (8.2.3) that we have put Θ(ε) = 1− (1− ε)−2 + cos( ε4)− ε. We need

the following lemma to prove Theorem 8.2.5.

138

Page 153: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

Lemma 8.4.1. Let (q, v, η), (q′, w, ξ) ∈ WS, q 6= −q′, such that the geodesic pathE ⊂ S(Hn) connecting q and q′ does not intersect E. Let ε < 1

5and χ < 1. Then

dS(q, q′) ≤ εχΘ(ε)

4d2√nµ2(q, v, η)

implies that

dP((v, η), (w, ξ)) ≤ εχ

4d2√nµ(q, v, η)

.

Proof. Let E = qτ | 0 ≤ τ ≤ 1 . If E does not cross the extended eigendiscrim-inant variety E (see Proposition 6.6.4), then, following the discussion in Subsec-

tion 8.1.3, there exists a unique lifted path L = (qτ , vτ , ητ ) | 0 ≤ τ ≤ 1 ⊂ V .Suppose the assertion were false. Then there exists some 0 ≤ τ ∗ < 1, such that

dP((v, η), (vτ∗ , ητ∗)) =

∫ τ∗

0

∥∥ •vτ ,

•ητ∥∥ dτ =

εχ

4d2√nµ(q, v, η)

.

Let 0 ≤ τ ≤ τ ∗. From Lemma 8.1.7(2) we obtain∥∥ •vτ ,

•ητ∥∥ ≤ µ(qτ , vτ , ητ )

∥∥ •qτ∥∥ andhence

εχ

4d2√nµ(q, v, η)

≤∫ τ∗

0

µ(qτ , vτ , ητ )∥∥ •qτ∥∥ dτ.

Using

dP((v, η), (vτ , ητ )) ≤ dP((v, η), (vτ∗ , ητ∗)), and

dS(q, qτ ) ≤ dS(q, q′),

from Lemma 8.1.7(1) that 2dµ(q, v, η) ≥ 1 and from (8.3.23) that Θ(ε) ≤ 1 wehave by assumption

d µ(q, v, η) maxdS(q, qτ ), 4 d

√n dP((v, η), (vτ , ητ ))

≤ ε ≤ 1

5.

Corollary 8.3.7 implies that µ(qτ , vτ , ητ ) ≤ Θ(ε)−1µ(q, v, η). Hence,∫ τ∗

0

∥∥(•vτ ,

•ητ )∥∥ dτ ≤ µ(q, v, η)

Θ(ε)

∫ τ∗

0

∥∥ •qτ∥∥ dτ <µ(q, v, η)

Θ(ε)dS(q, q

′),

contradicting our assumption.

Proof of Theorem 8.2.5. The proof is adapted from [BC13, p. 335–338]. Let E :=Ef,g = qτ | 0 ≤ τ ≤ 1 ⊂ S(Hn) be the geodesic path connecting f and g and let

L := Lf,g = (qτ , vτ , ητ ) | 0 ≤ τ ≤ 1

139

Page 154: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

be the lifting of E at (g, v, η). We first note that Proposition 8.1.9 implies that,if E ∩ E = ∅, then L ⊂ V (this means the lifted path does not contain the trivialsolution [0 : 1]). Let

K := K(f, (g, v, η))

denote the number of iterations of algorithm EALH and let

0 = τ0 < τ1 < . . . < τK = 1, (w, ξ) = (w0, ξ0), . . . , (wK , ξK).

be the sequences of τ ’s and (w, ξ)’s generated by EALH, respectively. Further, for0 ≤ i ≤ K we denote qi := qτi and (vi, ηi) := (vτi , ητi). We put

ε := 0.04, Θ := Θ(ε) ≈ 0.87488, χ := 2Θ− 1 ≈ 0.74976 < 1.

Claim. For all i ∈ 0, . . . , K the following holds.

4 d2√n µ(qi, vi, ηi) dP((wi, ξi), (vi, ηi)) ≤ ε (8.4.1)

Proof of the Claim We will proceed by induction. By construction (8.4.1) holdsfor i = 0. Let us assume that (8.4.1) holds for all i ≤ j. From Proposition 8.3.6(2)we get that

(1− ε)2µ(qj, vj, ηj) ≤ µ(qj, wj, ξj). (8.4.2)

The stepsize of the algorithm EALH is defined such that

dS(qj, qj+1) ≤ ε(1− ε)4χΘ

4 d2√n µ2(qj, wj, ξj)

(8.4.2)

≤ εχΘ

4 d2√n µ2(qj, vj, ηj)

.

Lemma 8.4.1 implies that

dP((vj, ηj), (vj+1, ηj+1)) ≤ εχ

4d2√nµ(qj, vj, ηj)

. (8.4.3)

Using that 2d µ(qj, vj, ηj) ≥ 1, d ≥ 2, Θ < 1 and χ < 1 we get

d µ(qj, vj, ηj) maxdS(qj, qj+1), 4 d

√n dP((vj, ηj), (vj+1, ηj+1))

≤ ε,

so that Corollary 8.3.7 implies

Θ µ(qj, vj, ηj) ≤ µ(qj+1, vj+1, ηj+1) ≤ 1

Θµ(qj, vj, ηj). (8.4.4)

We use the triangle inequality to get

dP((wj, ξj), (vj+1, ηj+1)) ≤ dP((wj, ξj), (vj, ηj)) + dP((vj, ηj), (vj+1, ηj+1)).

140

Page 155: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

Applying (8.4.3) and our assumption, that (8.4.1) holds for i, to this inequalityyields

dP((wj, ξj), (vj+1, ηj+1)) ≤ ε(1 + χ)

4 d2√n µ(qj, vj, ηj)

=εΘ

2 d2√n µ(qj, vj, ηj)

, (8.4.5)

so that

d2√

2nµ(qj+1, vj+1, ηj+1)dP((wj, ξj), (vj+1, ηj+1)) ≤ εΘ√2

µ(qj+1, vj+1, ηj+1)

µ(qj, vj, ηj)(8.4.4)

≤ ε√2≈ 0.02828;

Together with Theorem 8.2.3 this implies γ(qj+1, vj+1, ηj+1) < 0.1. Using thatd ≥ 2, from Lemma 8.1.7 (1) that 2d µ(qj, vj, ηj) ≥ 1 and Θ < 1 we further get

dP((wj, ξj), (vj+1, ηj+1)) ≤ ε

2= 0.02

from (8.4.5). Recall from (8.2.1) the definitions of u(r) and δ(r). If we putr = 0.999933..., we have u(r) > 0.1 and δ(r) = 0.02. Using Definition 8.2.1and Theorem 8.2.2 we see that (wj, ξj) is an approximate eigenpair of qj+1 withassociated eigenpair (vj+1, ηj+1). Applying Newton’s method with respect to qj+1

to (wj, ξj) halves the distance to (vj+1, ηj+1), so

dP((wj+1, ξj+1), (vj+1, ηj+1))Theorem 8.2.2

≤ 1

2dP((wj, ξj), (vj+1, ηj+1))

(8.4.5)

≤ εΘ

4 d2√nµ(qj, vj, ηj)

(8.4.4)

≤ ε

4 d2√nµ(qj+1, vj+1, ηj+1)

which is (8.4.1) for i = j + 1. The correctness of EALH also follows from this.It remains to estimate K, the number of iterations. In the same way we deduced

(8.4.4), we can deduce that for any 0 ≤ i ≤ K − 1 and τi ≤ τ ≤ τi+1 we haveΘ µ(qi, vi, ηi) ≤ µ(qτ , vτ , ητ ). Hence∫ τi+1

τi

µ(qτ , vτ , ητ )2dτ ≥

∫ τi+1

τi

Θ2µ2(qi, vi, ηi)dτ = Θ2 µ2(qi, vi, ηi) (τi+1 − τi)

As the stepsize we have set

(τi+1 − τi) =ε(1− ε)4χ

4 d2√n µ2(qi, wi, ξi) dS(f, g)

141

Page 156: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Furthermore, (8.4.1) and Proposition 8.3.6 (2) imply that for any 0 ≤ i ≤ K wehave

ϑ(ε) µ(qi, wi, ξi) ≤ µ(qi, vi, ηi),

where ϑ(ε) := 1− (1− ε)−2 + cos( ε4). Thus∫ τi+1

τi

µ(qτ , vτ , ητ )2dτ ≥ Θ2(1− ε)4εϑ(ε)2χ

4 d2√n dS(f, g)

,

which implies ∫ 1

0

µ(qτ , vτ , ητ )2dτ ≥ K

Θ2(1− ε)4εϑ(ε)2χ

4 d2√n dS(f, g)

,

For ε = 0.04 we have1

4Θ2(1− ε)4εϑ(ε)2χ ≥ 1

246

and therefore

K ≤ 246 d2√n dS(f, g)

∫ 1

0

µ(qτ , vτ , ητ )2dτ

as claimed.

8.4.2. Average analysis of algorithm LVEALH

In this section we prove Theorem 8.2.7. The proof will be rather short as mostof the work has already been done in previous sections. Recall from (8.2.4) thenotation

C(f, (g, v, η)) =

∫ 1

0

µ(qτ , vτ , ητ )2dτ,

LetC(f) := E

(g,v,η)∼ρC(f, (g, v, η))

and define

(µ2)av(f) :=1

D(n, d)

∑(v,η)∈P:f(v)=ηd−1v

µ(f, v, η)2;

compare (8.3.1). Since the density ρ is defined such that g ∼ NC(Hn) and (v, η)is chosen uniformly at random from the D(n, d) many h-eigenpairs of g, we have

Ef∼Unif S(Hn)

C(f) = Ef∼Unif S(Hn)

Eg∼NC(Hn)

∫q∈E

µ2av(q) dq,

where E ⊂ S(Hn) denotes the geodesic path between f and ‖g‖−1 g. Clearly, theintegral is independent of the norm of g, so we may as well take the expectation

142

Page 157: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

over g ∼ Unif S(Hn). Furthermore, Proposition 8.1.5 (3) implies that for allq ∈ Hn we have µ2

av(q) ≤ µ2av(q) and hence

Ef∼Unif S(Hn)

C(f) ≤ Ef,g∼Unif S(Hn)

∫q∈E

µ2av(q) dq. (8.4.6)

The following is [BP11, Eq. (4.3)] specialized to our scenario.

Lemma 8.4.2. Let S := S(Hn). For any measurable function φ : S→ R we have

Ef,g∼Unif(S)

∫h∈E

φ(h) dq =π

2E

h∼Unif(S)φ(h),

where E ⊂ S(Hn) denotes the geodesic path between f and g.

Combining (8.4.6) with Lemma 8.4.2 we get that

Ef∼Unif S(Hn)

C(f) ≤ π

2E

q∼Unif S(Hn)µ2

av(q) ≤ 40π nN

d,

the last inequality by Proposition 8.3.1. Thus, Theorem 8.2.7 is proved.

8.4.3. Analysis of the sampling method

In this section we prove Proposition 8.2.8. We show that Draw-from-ρ∗ on the aver-age needs O(n3 + dnN) arithmetic operations (including drawing from NC(Hn)),where N := dimC(Hn). ”Average”, because the success of the if -statement inline 9 of the algorithm is a random variable.

Step 3. of algorithm Draw-from-ρ∗ can be done via Beltran-Pardo randomiza-tion [BC13, Sec. 17.6]. By [BC13, Prop. 17.21] this can be implemented withO(N) draws from NC(0, 1) and O(dnN) arithmetic operations.

Given v ∈ P(Cn) we can draw a ∼ NC(v⊥) by drawing a′ ∼ NC(Cn−1) and thenput a = U(0, a′)T . This requires O(n2) arithmetic operations. An implementationfor drawing h ∼ NC(R(v)) is given in [BC13, algorithm 17.7]. Its number ofarithmetic operations is O(N). Hence, step 4. can be implemented using O(N)operations.

For Step 5. draw B ∼ NC(C(n−1)×(n−1)). Let Q be the Q-factor in the QR-decomposition of B. Then we put U ′ = diag(rii/ |rii|)Q, where the rii are thediagonal elements of the R factor in the QR decomposition of B. This algorithmyields U ′ ∼ U(n− 1) uniformly at random (see, e.g., [ABB+15, Sec. 10.1]). Com-pute U ′′ ∈ U(n) with U ′′e1 = v, for instance via Gram-Schmidt algorithm. Putting

U =

[v, U ′′

[0U ′

]]

143

Page 158: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

we have U ∈ U(n), such that Ue1 = v, uniformly at random. The dominatingcomplexity is the QR decomposition, which requires O(n3) arithmetic operations.Summarizing we can implement Step 5. with O(n3) arithmetic operations.

For drawing r with density e−r 1r≥0(r), one can draw z ∼ NC(0, 1) and then

put r = ‖z‖2.Altogether this shows that steps 3.-7. require O(n3 + dnN) arithmetic opera-

tions.The number of times we have to execute steps 9.-14. is imposed by the if-

statement in step 9. Let E denote the expected number of iterations and let

C := Prob(f,v),η

∥∥Dvf + ηd−1 I|v⊥∥∥F≤ ‖Dvf ‖F

(8.4.7)

be the probability of the if-statement being true. For any k ≥ 1 we have

Prob The if-statement is first true in the k-th iteration = (1− C)k−1C,

which implies E =∑∞

k=1 k(1 − C)k−1C = C−1. In Lemma 8.4.4 below we showthat C ≥ 1/(5

√π n). Observe that every iteration requires O(1) many draws

from NC(Cn) and the computation of a derivative and its Frobenius norm, whichcan be done with O(N) many arithmetic operations.

Summarizing, our implemention of algorithm Draw-from-ρ∗ requires an ex-pected number of O(n3 + dnN) arithmetic operations.

The probability of the if-statement being true.

Let Ω ⊂ Hn−1 × P(Cn)× C× U(n)× Cn ×H be the probability space

Ω :=

(f, v, η, U, a, h) | f(v) = 0, Ue1 = v, a ∈ v⊥, h ∈ R(v),

where the random variables f, v, η, U, a, h have the distribution that is induced bythe algorith Draw-from-ρ∗; that is,

(f, v) ∼ ρBP, U ∼ Unif U ∈ U(n) | Ue1 = v , a ∼ NC(v⊥), h ∼ NC(R(v)), η ∼ β.

Here ρBP is the distribution from Beltran Pardon randomization [BP11] and β isthe distribution of η as described in Lemma 8.4.3 below. Consider the subset

Ω∗ =

(f, v, η, U, a, h) ∈ Ω |∥∥Dvf |v⊥ + ηd−1 I|v⊥

∥∥F≤ ‖Dvf |v⊥‖F

with I|v⊥ being restriction of the identity to v⊥. The C from (8.4.7) is then givenas C = Prob(Ω∗). The plan is now to compute the density of η first and to computeProb(Ω∗) afterwards.

144

Page 159: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

Lemma 8.4.3. If we choose r with density e−r 1r≥0(r) and φ ∈ [0, 2π) uniformly

at random, putting η := r1

2(d−1) exp(iφ) defines a random variable with density

β(η) :=(d− 1)

π|η|2(d−2) exp(− |η|2(d−1)).

Proof. Put s := r1

2(d−1) , such that we have 2(d− 1)s2(d−2)+1ds = dr and

exp(−r) 1r≥0(r)dr = 2(d− 1)s2(d−2)+1 exp(−s2(d−1)) 1s≥0(s)ds.

We have defined the argument of the random variable η as being uniform dis-tributed in [0, 2π). Changing from polar to euclidean coordinates η := s exp(iφ)we get that the density of η is given by

1

π(d− 1) |η|2(d−2) exp(− |η|2(d−1)) dη,

which shows the assertion.

We can now prove the asserted bound on C.

Lemma 8.4.4. We have C = Prob(Ω∗) ≥ (5√π n)−1.

Proof. In the proof of [BC13, Lemma 17.18] it is shown that the density ρPB

is invariant under unitary transformations. Moreover, by [BC13, Lemma 17.19]and [BC13, Lemma 17.18 for linear systems] the push-forward distribution of ρPB

under the map (f, v) 7→√d−1

Dvf ∈ C(n−1)×n is the complex standard normaldistribution on C(n−1)×n. Put

A := ηd−1 I|e⊥1 .

Then, we have that

Prob(Ω∗) = Eη

Prob(f,v)

∥∥Dvf |v⊥ + ηd−1 I|v⊥∥∥F≤ ‖Dvf |v⊥‖F

= E

ηProb(f,v)

∥∥∥De1f |e⊥1 + A∥∥∥F≤∥∥∥De1f |e⊥1

∥∥∥F

= E

ηProb

Z∼NC(C(n−1)×(n−1))

∥∥∥Z −√d−1A∥∥∥F≤ ‖Z‖F

,

because restricting a matrix to e⊥1 = 0 × Cn−1 yields the same matrix without

the first column. Note that ‖A‖F ≥ ‖√d−1A‖F . From Lemma 2.4.6 (1) get

Prob(Ω∗) ≥ Eη

ProbZ‖Z + A‖F ≤ ‖Z‖F .

145

Page 160: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Then, using that ‖A‖F =√n− 1 |η|d−1 and Lemma 2.4.6 (2) we get

Prob(Ω∗) ≥ Eη

1√π

2 exp(−(n−1)|η|2(d−1)

2

)√n− 1 |η|d−1 +

√(n− 1) |η|2(d−1) + 8

.

Put r := |η|2(d−1). By definition r has density e−r 1r≥0(r). This shows

Prob(Ω∗) ≥ 2√π

∫r≥0

1√(n− 1)r +

√(n− 1)r + 8

exp(− r( (n−1)

2+ 1))

dr.

Making a change of variables t := (n− 1)r the latter expression becomes

2√π (n− 1)

∫t≥0

1√t+√t+ 8

exp(− t(1

2+ 1

n−1))

dt

Using n ≥ 2 we have 12

+ 1n−1≤ 3

2, so that

Prob(Ω∗) ≥ 2√π(n− 1)

∫t≥0

1√t+√t+ 8

exp(− 3

2t)

dt.

A computer based calculation shows∫t≥0

1√t+√t+8

exp(−3

2t)

dt ≥ 110

. This finishes

the proof.

8.4.4. Average analysis of algorithm LVEALHWS

In this section we prove Proposition 8.2.9. To this end, let

φ : V → R, (g, v, η) 7→ Ef∼Unif S(Hn)

C(f, (‖g‖−1 g, v, ‖g‖

−1d−1 η)

).

By Lemma 8.1.7 (3), φ is unitarily invariant, that is, for all U ∈ U(n) we haveφ(g, v, η) = φ(U.(g, v, η)). Further, by definition φ is scale invariant, that is, for alls ∈ C\ 0: φ(g, v, η) = φ(sd−1g, v, sη). Proposition 8.2.9 is now a corollary fromthe following proposition, whose proof is adapted from [ABB+15, Sec. 10.2].

Proposition 8.4.5. Let Ξ : V → R be a measurable, unitarily invariant, scaleinvariant nonnegative function. Then

E(f,v,η)∼ρ∗

Ξ(f, v, η) ≤ 10√π n E

(f,v,η)∼ρΞ(f, v, η).

Proof. While the proof itself is very technical, its underlying idea is explained

146

Page 161: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

quickly: Recall from Subsection 8.4.3 the definition of Ω∗. The idea is to showthat on Ω∗ we have the inequality ρ∗ ≤ ρ

Prob(Ω∗), so that we can have the following

bound for a random variable X on Ω:

EX∼ρ∗

[X] ≤∫

Ω∗X

ρ

Prob(Ω∗)dΩ∗ ≤ 1

Prob(Ω∗)

∫Ω

X ρ dΩ =EX∼ρ

[X]

Prob(Ω∗).

Proposition 8.4.5 follows then using Lemma 8.4.4. Let

1Ω∗ := 1Ω∗(f, v, η)

denote the characteristic function of Ω∗. We put C := Prob(Ω∗), as before. Byconstruction, the density ρ∗ is the density of the conditional distribution on Ω∗,associated to the probability measure

Prob∗(Y ) =1

CE

(f,v),a,h,U,η[1Y ], Y ⊂ Ω∗ measurable.

We therefore have

E(f,v,η)∼ρ∗

Ξ(f, v, η) =1

CE

(f,v),a,h,U,ηΞ(f, v, η) 1Ω∗ , (8.4.8)

where (recall that R(v) := h ∈ Hn,d | h(v) = 0,Dvh = 0)

(f, v) ∼ ρBP, U ∼ Unif U ∈ U(n) | Ue1 = v , a ∼ NC(v⊥), h ∼ NC(R(v)), η ∼ β.

By assumption Ξ is unitarily invariant and it is easily seen that 1Ω∗ is unitarilyinvariant, too. Since Ξ is nonnegative and by Tonelli’s theorem the expectation in(8.4.8) is independent of the order in which we integrate, so that

E(f,v,η)∼ρ∗

Ξ(f, v, η) =1

CE

U,a,h,ηE

(f,v)∼ρPB

Ξ(f, v, η) 1Ω∗ , (8.4.9)

For a moment we consider the inner expectation. Let

Z(v) :=A ∈ C(n−1)×n | Av = 0

.

We put M :=√d−1

Dvf . By Proposition 6.3.6, f is determined by M, v and thechoice of k ∈ R(v)n−1. By [BC13, Algorithm 17.6] we have k ∼ NC(R(v)n−1).Moreover, by [BC13, Lemma 17.19] the pair (M, v) follows the so called standarddistribution on the linear solution manifold

W =

(M, v) ∈ C(n−1)×n × P(Cn) |M ∈ Z(v),

147

Page 162: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

which is given by the density (see the equation before [BC13, Equation (17.20)])

ρst(M, v) = ϕ(M) NJp1(M, v).

Here ϕ(M) is the density of the normal distribution NC(C(n−1)×n), p1 : W → Mis the projection onto the first factor and NJp1(M, v) is the normal jacobian of p1

at (M, v); see (B.4.1). By the foregoing, for fixed U, a, h, η we have

E(f,v)∼ρBP

Ξ(f, v, η) 1Ω∗ =

∫(M,v)∈W

Ek

Ξ(f, v, η) 1Ω∗ ϕ(M) NJp1(M, v) d(M, v),

where k ∼ NC(R(v)n−1). We use the coarea formula (Theorem B.4.2) on theprojection p2 : W → P(Cn) to deduce that E(f,v) Ξ(f, v, η) 1Ω∗ is equal to∫

v∈P(Cn)

(∫(M,v)∈p−1

2 (v)

Ek

Ξ(f, v, η) 1Ω∗ ϕ(M)NJp1(M, v)

NJp2(M, v)dp−1

2 (v)

)dv.

We use the characterization from [BC13, Lemma 17.14]:

NJp1(M, v)

NJp2(M, v)= det(MM∗).

(Compare also Lemma B.4.3 and Remark B.4.4.) Note that (M, v) ∈ p−12 (v) is

equivalent toM ∈ Z(v), and that forM ∈ Z(v) we have det(MM∗) = det(M |v⊥)2.Interchanging integration over k and M we see that E(f,v) Ξ(f, v, η) 1Ω∗ is equal to∫

v∈S(Cn)

Ek

Ξ(f, v, η) 1Ω∗

(∫M∈Z(v)

det(M |v⊥)2ϕ(M)dM)

dv.

Taking expectation over all the random variables, by (8.4.8), we therefore have

E(f,v,η)∼ρ∗

Ξ(f, v, η)

=1

CE

U,a,h,η

∫v∈S(Cn)

Ek

Ξ(f, v, η) 1Ω∗

(∫M∈Z(v)

det(M |v⊥)2ϕ(M)dM)

dv.

By Tonelli’s theorem:

E(f,v,η)∼ρ∗

Ξ(f, v, η) (8.4.10)

=1

C

∫v∈S(Cn)

EU,a,h,η,k

Ξ(f, v, η) 1Ω∗

(∫M∈Z(v)

det(M |v⊥)2ϕ(M)dM)

dv.

148

Page 163: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

To keep track, recall that we have

f = U

[f0

f

], with

[f0

f

]= ηd−1Xd

1e1 +Xd−11

√d

[aT

M

](X2, . . . , Xn)T +

[hk

].

By unitary invariance, the integrand in (8.4.10) is invariant of v and U . In thatintegral we may replace v by e1 and integrate over U and v so that

E(f,v,η)∼ρ∗

Ξ(f, v, η)

=volP(Cn)

CE

a,h,η,kΞ(f, e1, η) 1Ω∗

(∫M∈Z(e1)

det(M |v⊥)2ϕ(M)dM),

where a ∼ NC(0 × Cn−1), h ∼ R(e1), k ∼ R(e1)n−1 and η ∼ β. Observe thatZ(e1) =

[0,M ] |M ∈ C(n−1)×(n−1)

and that [0,M ]|e⊥1 = M . Moreover, note

that on Z(e1) we have

ϕ([0,M ]) = ϕCn−1(0)ϕC(n−1)×(n−1)(M) = π−(n−1)ϕC(n−1)×(n−1)(M).

Altogether, this shows that∫M∈Z(e1)

det(M |v⊥)2ϕ(M)dM =1

πn−1E

M∼NC(C(n−1)×(n−1))det(M)2,

so that

E(f,v,η)∼ρ∗

Ξ(f, v, η) =volP(Cn)

πn−1 CE

a,h,η,M,k(detM)2 Ξ(f, e1, η)1Ω∗ , (8.4.11)

where

k ∼ NC(R(e1)n−1),M ∼ NC(C(n−1)×(n−1)), a ∼ NC(Cn−1), h ∼ NC(R(e1)), η ∼ β

Let us fix all the random variables but M now and write the expectation over Mas the integral

EM

(detM)2 Ξ(f, e1, η)1Ω∗ =1

π(n−1)2

∫M

(detM)2 Ξ(f, e1, η)1Ω∗ϕ(M)dM,

where ϕ is the density of the normal distribution on C(n−1)×(n−1). Recall that

[0,M ] =√d−1

De1f and that

1Ω∗(f, e1, η) = 1, if and only if∥∥∥De1f |e⊥1 + ηd−1 I|e⊥1

∥∥∥ ≤ ∥∥∥De1f |e⊥1∥∥∥ .

149

Page 164: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8. A homotopy method to solve for eigenpairs of complex tensors

Note that I|e⊥1 is the (n−1)× (n−1) identity matrix I(n−1)×(n−1), which—abusing

notation—we also abbreviate by I. Setting A := M +√d−1ηd−1I we get

‖A‖2F =

∥∥∥M +√d−1ηd−1I

∥∥∥2

F

and

1Ω∗(M, η) = 1by definition⇐⇒

∥∥∥M +√d−1ηd−1I

∥∥∥2

F≤ ‖M‖2

F

⇐⇒ exp(−‖M‖2F ) ≤ exp(−‖A‖2

F ). (8.4.12)

Using that 1Ω∗(M, η) ≤ 1 we therefore get

EM

(detM)2Ξ(f, e1, η)1Ω∗ ≤ EA

(det(A−√d−1ηd−1I))2Ξ(f, e1, η), (8.4.13)

where A ∼ NC(C(n−1)×(n−1)). Plugging into (8.4.11) yields

E(f,v,η)∼ρ∗

Ξ(f, v, η)

≤ volP(Cn)

πn−1CE

a,h,η,A(det(A−

√d−1ηd−1I))2 Ξ(f, e1, η) (8.4.14)

≤ volS(Cn)

2πndn−1CE

a,h,η,k,A(det(

√dA− ηd−1I))2 Ξ(f, e1, η), (8.4.15)

where we have used 2π volP(Cn) = vol S(Cn). We use Lemma 8.4.3 to write theintegral over η explicitly, so that (8.4.15) becomes

E(f,v,η)∼ρ∗

Ξ(f, v, η)

≤ volS(Cn)

2πndn−1C

∫η∈C

E(η)(d− 1)

π|η|2(d−2) e−|η|

2(d−1)

dη; (8.4.16)

whereE(η) := E

a,h,k,A(det(

√dA− ηd−1I))2 Ξ(f, e1, η).

By construction, we have

A ∼ NC(C(n−1)×(n−1)), a ∼ NC(Cn−1),

[hk

]∼ NC(R(e1)n).

We combine the map ψ(f, v, η) = (f, v, ηd−1) from (6.6.5) with Lemma 6.5.2 toconclude that

vol S(Cn)(d− 1)

2πn+1 D(n, d)

∫η∈C

E(η) |η|2(d−2) e−|η|2(d−1)

dη = Ef∼NC(Hn)

Ξav(f) (8.4.17)

150

Page 165: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

8.4. Complexity analysis

where

Ξav(f) =

∫(v,η)∈S(Cn)×Cf(v)=ηd−1η

Ξ(f, v, η)d(v, η).

(we have made a very similar argument for (8.3.6)). Comparing (8.4.16) and(8.4.17) we see that

E(f,v,η)∼ρ∗

Ξ(f, v, η) ≤ D(n, d)

dn−1CE

f∼NC(Hn)Ξav(f).

Note that, since d ≥ 2, we have

D(n, d)

dn−1=

dn − 1

dn−1(d− 1)=

n−1∑i=0

1

di≤

∞∑i=0

1

2i= 2.

Moreover, C ≥ (5√π n)−1, by Lemma 8.4.4. Since Ξ(f, v, η) is invariant under

scaling of f , also Ξav is scale invariant. By [BC13, Cor. 2.23] we have

Ef∼NC(Hn)

Ξav(f) = Ef∼Unif (Hn)

Ξav(f)

But Unif (Hn) is the push-forward distribution of ρ under the projection V → Hn,and hence, by (8.2.6), we have

Ef∼Unif (Hn)

Ξav(f) = E(f,v,η)∼ρ

Ξ(f, v, η),

which finally yields

E(f,v,η)∼ρ∗

Ξ(f, v, η) ≤ 10√π n E

(f,v,η)∼ρΞ(f, v, η).

This finishes the proof.

151

Page 166: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 167: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Part III.

Statistics of real eigenpairs of realtensors

153

Page 168: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 169: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of realeigenpairs of a real Gaussiantensor

9.1. Motivation

In Part II we cared about complex eigenpairs of complex tensors, investigatedthem probabilistically and gave an efficient and stable algorithm to compute them.Nevertheless, in almost all applications the eigenpairs that are of interest are realeigenpairs. A key property that we exploited in all the arguments so far was thatgeneric complex tensors admit a constant number of eigenpairs (cf. Theorem 6.2.5),a property failing for real tensors and real eigenpairs; cf. Figure 9.1.1.

For A ∈ (Rn)⊗p let us denote

#R(A) := number of equivalence classes of A that

contain a real eigenpair (v, λ) ∈ (Rn\ 0)× R.

Although the function #R(A) is constant on some open semi-algebraic subsets of(Rn)⊗p, it ”jumps” when crossing the eigendiscriminant variety (Proposition 6.2.6),that has real codimension one in both (Rn)⊗p and Sp(Rn) (in the complex worldit has real codimension two!). This motivates the probabilistic study of real eigen-pairs. In the words of Ginibre [Gin65]:

”In the absence of any precise knowledge [...], one assumes a reasonableprobability distribution [...], from which one deduces statistical prop-erties [...]. [...] Apart from the intrinsic interest of the problem, onemay hope that the methods and results will provide further insight inthe cases of physical interest or suggest as yet lacking applications.”

The word ”reasonable” in Ginibre’s quote is somewhat ambiguous. In our situ-ation we interpret this insofar as that the probability distribution of our choiceshould be invariant under the orthogonal transformations defined in (9.2.5) below.The random variables Gaussian real tensor and Gaussian symmetric tensor (cf.Definition 2.4.10) satisfy this property.

155

Page 170: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

0.0

0.1

0.2

0.3

0.00

0.05

0.10

0.15

0.20

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

number of equivalence classes of real eigenpairs

number of equivalence classes of real eigenpairs

rela

tive

freq

uenc

yre

lativ

e fr

eque

ncy

Sample of 2000 general 5 × 5 × 5 tensors

Sample of 2000 symmetric 5 × 5 × 5 tensors

Figure 9.1.1.: The plots show samples of 2000 5×5×5 Gaussian tensors and Gaussian symmetrictensors. The R [R C15] script that generated the samples is appended in Appendix F.1. The

corresponding complex count is 25−12−1 = 31 (eigenpairs of real tensors are invariant under complex

conjugation; so the number of real eigenpairs has the same parity as the complex count).

The experiments in Figure 9.1.1 suggest that the variance of #R(A) for bothGaussian real tensors and Gaussian symmetric tensors is considerably small, sothat the expected number of real eigenpairs is a meaningful quantity. Note thatin the introduction we observed that the expected number of real eigenpairs is ofpotential interest in DTI (see Section 1.3), because the expected value is a goodbenchmark for the overall magnitude of real solutions, which is the basis of decidingthe degree of the diffusion constant polynomial Q(x). Following Ginibre, not onlyin this specific example, but in general the expected number of real eigenvalues—or typical number of eigenvalues—provides further insight into the structure ofthe real eigenpair problem. This is the motivation for the main problem of thischapter:

What is the expected number of eigenvalues of a Gaussian real tensorand of a Gaussian symmetric real tensor?

156

Page 171: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.1. Motivation

In what follows we will compute this expectation answering questions Q7 and Q10from the introduction. Let us write

E(n, p) := EA∈(Rn)⊗p

A Gaussian

#R(A) (9.1.1)

andEsym(n, p) := E

A∈Sp(Rn)A Gaussian

#R(A). (9.1.2)

(Note that E(1, p) = Esym(1, p) = 1 for all p.) We give formulas for E(n, p) inTheorem 9.3.1 and for Esym(n, p) in Theorem 9.4.1 below and use those results tocompute the following identities for small n.

n E(n, p) = c(p) (1 + h(p−1)), n fixed, h ∈ R[X]. c(p) =

2√p p

12

3(p( p−1

p)32 +1)(p−1)

32

p32 ( p−1

p)32

p

4 2 p3−2 p2+p+1

2 p32

p32

5(2 p2( p−1

p)52 +2 p( p−1

p)52 +( p−1

p)52 +2)(p−1)

52

2 p52 ( p−1

p)52

p2

6 8 p6−16 p5+12 p4+3 p2+6 p−5

8 p72

p52

7(8 p4( p−1

p)72 +8 p3( p−1

p)72 +12 p2( p−1

p)72 +12 p( p−1

p)72 +7 ( p−1

p)72 +8 p)(p−1)

72

8 p92 ( p−1

p)72

p3

n Esym(n, p) = csym(p) (1 + hsym(p−1)), n fixed, hsym ∈ R[X]. csym(p) ≈

2√

3 p− 2 1.732 p12

3 1 + 4 (p−1)32√

3 p−22.309 p

4 29 p3−63 p2+48 p−12

2 (3 p−2)32

2.791 p32

5 1 + 2 (5 p−2)2(p−1)52

(3 p−2)52

3.208 p2

6 1339 p6−5946 p5+11175 p4−11240 p3+6360 p2−1920 p+240

8 (3 p−2)72

3.579 p52

7 1 + (1099 p4−2296 p3+2184 p2−992 p+176)(p−1)72

2 (3 p−2)92

3.917 p3

Table 9.1.1.: Formulas for E(n, p) and Esym(n, p) for 2 ≤ n ≤ 7. See Appendix F.2 for SAGE[S+16] scripts to compute identities for E(n, p) and Esym(n, p) for any n.

157

Page 172: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

A strinking feature of Table 9.1.1 is that E(n, p) = p(n−1)

2 (1 + O(p−1)) for all

n displayed. This is no coincidence: Let us denote by EC(n, p) := (p−1)n−1p−2

the

number of complex eigenpairs of a general tensor in (Cn)⊗p (cf. Theorem 6.2.5).We show in Theorem 9.3.1 below that

limn→∞

E(n, p)√EC(n, p)

= limp→∞

E(n, p)√EC(n, p)

= 1

Thus, as a rule of thumb we can say

The number of real eigenpairs of a general real tensor typically is thesquare-root of the corresponding complex count.

For symmetric tensors we quote [AA13, Theorem 2.17], where it is stated that forn→∞ we have

Esym(n, p) ∼√

8n√π

√EC(n, p)(1 + o(1)),

but an asymptotic law for Esym(n, p), where p→∞, is yet to be proven. However,it is fair to say

Typically a real symmetric tensor has (about a factor of√

8n√π

) more real

eigenvalues than a real general tensor.

Compare also Figure 9.1.2 and Table 9.1.1.

Remark 9.1.1. If (v, λ) is a real eigenpair of A, Qi [Qi05, Qi07, LQY13] calls thenumber λ a Z-eigenvalue of A, if vTv = 1. We have the equality

#R(A) =

12

# Z-eigenvalues of A , if d is even and 0 is no eigenvalue.12

(# Z-eigenvalues of A+ 1) , if d is even and 0 is an eigenvalue.

# Z-eigenvalues of A , if d is odd.

9.2. Geometric framework II

Before we go over to compute E(n, p) and Esym(n, p) we will have to establish thegeometric framework for the real eigenpair problem. We denote by HR

n,d the vectorspace of real homogeneous polynomials of degree d in the n variables X1, . . . , Xn

The map from Definition 6.3.1 restricted to the reals is

f : (Rn)⊗p → (HRn,p−1)n, A 7→ fA(X) = AXp−1

Moreover, we denote the image of Sp(Rn) in (HRn,p−1)n by

∇n,p := fA | A ∈ Sp(Rn) ⊂ (HRn,p−1)n, (9.2.1)

158

Page 173: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.2. Geometric framework II

1e+02

1e+04

1e+06

0 10 20 30 40p

f(p)

fcomplex countE(5,p)E_sym(5,p)

Figure 9.1.2.: The plot shows f(p) ∈ EC(5, p), E(5, p), Esym(5, p) for 2 ≤ p ≤ 40 in the loga-rithmic scale. We used R [R C15] to create this plot.

which is the linear space of polynomial systems that are gradients of some polyno-mial. Indeed, if QA(X) denotes the polynomial associated to A ∈ Sp(Rn) (2.2.3),then—following the discussion in Section 6.1—we have fA(X) = 1

p∇XQA . In

other words, the following diagram commutes.

Sp(Rn)Q //

f

((

HRn,p

p−1∇X( )

(HRn,p−1)n

The space HRn,p−1 and its subspace ∇n,p are endowed with the Bombieri-Weyl

inner product; that is, the restriction of Definition 6.3.4 to the real numbers. Thefollowing proposition is Proposition 6.3.6 specialized to the real numbers.

Proposition 9.2.1. Let x ∈ (Rn\ 0) and define Z(x) :=f ∈ HR

n,d | f(x) = 0

,

R(x) :=f ∈ HR

n,d | f(x) = 0,Dxf = 0

, L(x) := R(x)⊥ ∩ Z(x), C(x) := Z(x)⊥.Then,

HRn,d = C(x)⊕ L(x)⊕R(x). (9.2.2)

is an orthogonal decomposition of HRn,d. Moreover, the map given by x⊥ → L(x),

a 7→√d 〈X, x〉daTX is an isometry. Here x⊥ denotes the orthogonal complement

of x in Rn.

159

Page 174: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

In the following we put d = p − 1 and, as long as it does not cause confusion,abbreviate

HR := HRn,d and ∇ := ∇n,p

Futhermore, we define the real analogue of the solution manifold introduced inSubsection 6.3.3. Let ` be an auxiliary variable and consider

F : (HR)n → R[X, `]n, f 7→ f(X)− `X,

which is the restriction of the map from Definition 6.3.3 to the real numbers. Asbefore we set Ff (X, `) := F (f)(X, `) = f(X) − `X. The real solution manifoldand the manifold of well-posed real triples are defined as

V R :=

(f, v, λ) ∈ (HR)n × S(Rn)× R | f(v) = λv,

WR :=

(f, v, λ) ∈ V R | rk D(v,λ)Ff = n,

and in a similar fashion we define

V ∇ := V R ∩ (∇× S(Rn)× R),

W∇ := WR ∩ (∇× S(Rn)× R).

As in (6.3.3) we define the projections

π1 : V R → (HR)n, (f, v, λ) 7→ f, (9.2.3)

π2 : V R → S(Rn)× R, (f, v, λ) 7→ (v, λ). (9.2.4)

The orthogonal group O(n) acts on V R via

U.(f, v, λ) := (U(f U−1), Uv, λ), U ∈ O(n). (9.2.5)

By the same arguments as for Proposition 6.3.9, the set WR is invariant under thisgroup action and O(n) acts by isometries. Let us write f ∈ ∇ as f = ∇XQ . Thenwe have ∇X(Q U−1) = U∇U−1XQ , by the chain rule. Hence, U(f U−1) ∈ ∇,which shows that V ∇ (and W∇) are also invariant under the group action.

Lemma 9.2.2. 1. V R is a Riemannian manifold of dimension dim(HR)n. Thetangent space of V R at (f, v, λ) ∈ WR is given by

T(f,v,λ)VR

(•

f,•v,

λ) ∈ (HR)n × v⊥ × R | ( •v,•

λ) = −D(v,λ)Ff |−1v⊥×R

f(v).

2. V ∇ is a Riemannian manifold of dimension dim∇. For (f, v, λ) ∈ W∇:

T(f,v,λ)V∇ =

(•

f,•v,

λ) ∈ ∇× v⊥ × R | ( •v,•

λ) = −D(v,λ)Ff |−1v⊥×R

f(v).

160

Page 175: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.2. Geometric framework II

Proof. We take the proof of Lemma 6.3.8, replace the complex numbers by thereals and take into account that the tangent space to the real sphere is TvS(Rn) =v⊥.

We later also need the following lemma, similar to Lemma 6.3.10.

Lemma 9.2.3. Let e1 := (1, 0, . . . , 0) ∈ S(Rn).

1. Let (f, v, λ) ∈ V R and denote evalv : (HR)n → R, f 7→ f(v). Then

det([

D(v,λ)Ff |−1v⊥×Revalv

][D(v,λ)Ff |−1

v⊥×Revalv]T )

= det(D(e1,λ)Ff |−2

e⊥1 ×R

).

2. Let (f, v, λ) ∈ V ∇ and denote eval∇v : ∇ → R, f 7→ f(v). Then

det([

D(v,λ)Ff |−1v⊥×Reval∇v

][D(v,λ)Ff |−1

v⊥×Reval∇v]T )

= 1pn−1 det

(D(e1,λ)Ff |−2

e⊥1 ×R

).

Proof. We first make some general considerations. Let U be an orthogonal mapwith Uv = e1. Then, by orthogonal invariance, (U(f U−1), e1, λ) ∈ V R,

D(v,λ)Ff = U[De1f − λI, −e1

] [U−1 00 1

]= UD(e1,λ)Ff

[U−1 0

0 1

].

By Lemma 6.3.10, we have det(evalvevalTv ) = det(evale1evalTe1). This implies

det([

D(v,λ)Ff |−1v⊥×Revalv

][D(v,λ)Ff |−1

v⊥×Revalv]T )

= det(D(e1,λ)Ff |−1

e⊥1 ×RD(e1,λ)Ff |−Te⊥1 ×R

)det(evale1evalTe1

)(9.2.6)

Now we prove 1. Write f in the Bombieri-Weyl basis (see Proposition 9.2.1).

f = cXd1 + terms in Xd−1

1 ,

where c ∈ Rn. Then evale1(f) = c, which shows that evale1 is an orthogonalprojection, implying evale1 evalTe1 = idRn . Using (9.2.6) proves 1.

Since V ∇ is invariant under orthogonal transformation and since eval∇v is justthe restriction of evalv to ∇, we may use equation (9.2.6) for 2., as well. Let f ∈ ∇and write f = 1

p∇XQ . Expanding Q in the Bombieri-Weyl basis yields

Q = cXp1 +√pXp−1

1 aT (X2, . . . , Xn) + terms in Xp−21 ,

with c ∈ R and a ∈ Rn−1. This shows that eval∇e1(f) = [c, 1√pa]T , which implies

det(eval∇e1(eval∇e1)T ) = p−(n−1). Combining with (9.2.6) shows the second assertion.

161

Page 176: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

9.3. General tensors

We now turn to the computation of E(n, p) as defined in (9.1.1). For p = 2, thematrix case, the expected value of #R(A) for a real Gaussian matrix was computedin [EKS94] and the ideas in this article have inspired the methods of this sectionsignificantly. For the sake of completeness we will include the results from [EKS94]in our main result, Theorem 9.3.1.

Theorem 9.3.1. 1. For n > 1 the number E(n, p) equals

2n−1√p− 1

nΓ(n− 1

2)

√π pn−

12 Γ(n)

[2(n− 1)F

(1, n− 1

2, 3

2, p−2

p

)+ F

(1, n− 1

2, n+1

2, 1p

)],

where F (a, b, c, x) is Gauss’ hypergeometric function; see (C.4.3).

2. The generating function of the E(n, p) for fixed p is

∞∑n=1

E(n, p) zn =z(

1− z√p− 1 + z

√p− 1− 2z

√p− 1 + 1

)(1− z2)(1− z

√p− 1)

.

3. If n = 2k > 1 is even, E(n, p) is equal to

1√π

Γ(n− 12

)

Γ(n−1)

( √dn

√d+1

n−2∑j=0

(n−2j

)(−d−1d+1

)jj+ 1

2

+ 2n−2

k−1∑j=0

(−1)j(k−1j

)( 1d+1)

j+k−12

j+k− 12

).

If n = 2k + 1 > 1 is odd, E(n, p) is equal to

1√π

Γ(n− 12

)

Γ(n−1)

( √dn

√d+1

n−2∑j=0

(n−2j

)(−d−1d+1

)jj+ 1

2

+ 2n−2

k−1∑j=0

(−1)j(k−1j

)1−(

dd+1

)j+k+12

j+k+ 12

).

4. Let EC(n, p) denote the complex count of eigenpairs (Theorem 6.2.5). Theasymptotic behaviour of the E(n, p) for fixed p and large n is

E(n, p)√EC(n, p)

n→∞−→

√2π, if p = 2.

1, if p > 2.

5. The asymptotic behaviour of the E(n, p) for fixed n and large p is

E(n, p)√EC(n, p)

p→∞−→ 1, n > 1.

162

Page 177: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.3. General tensors

Remark 9.3.2. For the matrix case (p = 2) the first assertion is in [EKS94, Sec-tion 5], the third assertion is [EKS94, Corollary 5.2] and the fifth assertion is[EKS94, Theorem 5.1].

9.3.1. Proof of Theorem 9.3.1

By construction, for any A ∈ (Hn)⊗p the number of equivalence classes of realeigenpairs of fA equals #R(A). Hence, there is no harm in using the notation#R(f) for this number, as well:

#R(f) := number of equivalence classes of f that

contain a real eigenpair (v, λ) ∈ (Rn\ 0)× R.

In Lemma 6.4.3 (2) we have shown that for Gaussian A ∈ (Rn)⊗p we have f ∼N((HR

n,p−1)n). This implies

E(n, p) = Ef∼N((HR

n,d)n)#R(f), where d = p− 1. (9.3.1)

In what follows we will compute the quantity on the right-hand side of that equa-tion. Recall that Γ(n, x) and γ(n, x) are the upper and lower incomplete gammafunctions, see (C.2.5). For λ ∈ R we put K0,d(λ) = 1 and define

Kn,d(λ) :=

√dn

Γ(n)

(eλ2

2d Γ

(n,λ2

d

)+ 2n−1

(λ2

2d

)n2

γ

(n

2,λ2

2d

)), n ≥ 1. (9.3.2)

The main step on the way to prove Theorem 9.3.1 is the following proposition.

Proposition 9.3.3. We have Ef∼N((HR)n) #R(f) = Eλ∼N(0,1)Kn−1,d(λ).

Proof. For n = 1 we have #R(f) = 1, so this case is immediate. The prooffor n > 1 is similar to the proof for Lemma 6.5.21. Recall the definition of theprojections π1, π2 from (9.2.3), (9.2.4). For any f we have #R(f) = 1

2|π−1

1 (f)|(because, if (v, λ) is an eigenpair of f , also (−v, (−1)d−1λ) is an eigenpair of f).Using the the integral formula from Corollary B.4.5 together with Lemma B.4.3,Lemma 9.2.2 (1) and Lemma 9.2.3 (1) we have that Ef∼N((HR)n) #R(f) equals

1

2

∫(v,λ)∈S(Rn)×R

( ∫f∈π1(π−1

2 (v,λ))

∣∣∣det D(e1,λ)Ff |e⊥1 ×R∣∣∣ ϕ(f) df

)d(v, λ);

where ϕ denotes the density of the standard normal distribution.

1Note that the variances of the real and imaginary part of complex variables in Lemma 6.5.2was defined to be σ2 = 1

2 , while here the variance is σ2 = 1.

163

Page 178: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

Note that the inner integral in the above expression is independent of v and canbe written as

E(λ) :=

∫f∈π1(π−1

2 (e1,λ))

∣∣∣det D(e1,λ)Ff |e⊥1 ×R∣∣∣ ϕ(f) df, (9.3.3)

so that

Ef∼N((HR)n)

#R(f) =

√πn

Γ(n2)

∫λ∈R

E(λ) dλ, (9.3.4)

where we have used that volS(Rn) = 2√πn

Γ(n2

). The assertion is implied using the

following.

Claim. For any λ ∈ R we have

E(λ) =

√dn−1

Γ(n2)

√πn

Γ(n− 1)

(eλ2

2d Γ(n− 1, λ2

d) + 2n−2

(λ2

2d

)n−12γ(n−1

2, λ

2

2d

))ϕ(λ),

where ϕ denotes the density function of the standard normal distribution.

Proof of the claim. Note that f ∈ π1(π−12 (e1, λ)), if and only if f(e1) = λe1. Fix

such an f and let

R := h ∈ HR | h(e1) = 0,De1h = 0.

By Proposition 9.2.1, there exist uniquely determined h ∈ Rn and M ∈ Rn×(n−1)

such that we can orthogonally decompose f as

f = Xd1λe1 +Xd−1

1

√dM (X2, . . . , Xn)T + h, (9.3.5)

so that (cf. (6.3.2))

D(e1,λ)Ff =

[(d− 1)λ

√d · a −1

0√dA− λIn−1 0

]∈ Rn×(n+1),

where a ∈ R1×(n−1) is the first row of M and A ∈ R(n−1)×(n−1) is the matrix thatis obtained by removing the first row of M . Hence,

det D(e1,λ)Ff |e⊥1 ×R = − det (√dA− λIn−1). (9.3.6)

The summands in (9.3.5) are pairwise orthogonal, which implies that

ϕ(f) = ϕRn(λe1)ϕR(n−1)×(n−1)(A)ϕRn−1(a)ϕR(h) (9.3.7)

=1

√2π

n−1 · ϕ(λ)ϕR(n−1)×(n−1)(A)ϕRn−1(a)ϕR(h).

164

Page 179: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.3. General tensors

By (9.3.6), det D(e1,λ)Ff |e⊥1 ×R is independent of a and h. Thus,

E(λ)

=ϕ(λ)√

2πn−1

∫A,a,h

∣∣∣det (√dA− λIn−1)

∣∣∣ϕR(n−1)×(n−1)(A)ϕRn−1(a)ϕR(h) d(A, a, h)

=ϕ(λ)√

2πn−1 E

A∼N(R(n−1)×(n−1))

∣∣∣det(√

dA− λIn−1

)∣∣∣=

√dn−1

√2π

n−1 ϕ(λ) EA∼N(R(n−1)×(n−1))

∣∣∣∣det

(A− λ√

dIn−1

)∣∣∣∣=

√dn−1

√πn ϕ(λ)

Γ(n2)

Γ(n− 1)

(eλ2

2d Γ

(n− 1,

λ2

d

)+ 2n−2

(λ2

2d

)n−12

γ

(n− 1

2,λ2

2d

));

the last line by Theorem E.2.1.

Proof of Theorem 9.3.1 (1)

The case n = 1 is trivial. Let n > 1. From Proposition 9.3.3 we get that

Ef∼N((HR

n,d)n)#R(f) =

√dn−1

√2π Γ(n− 1)

∫λ∈R

e−λ2

2(1−d−1) Γ

(n− 1, λ

2

d

)dλ

+

√2n−3

√2π Γ(n− 1)

∫λ∈R

e−λ2

2 |λ|n−1 γ(n−1

2, λ

2

2d

)dλ.

Making the substitution of variables x := λ2, such that dλ = dx2√x, we obtain∫

λ∈R

e−λ2

2(1−d−1) Γ

(n− 1, λ

2

d

)dλ =

∫x>0

x−12 e−

x2

(1−d−1) Γ(n− 1, x

d

)dx

=2n+1/2Γ(n− 1

2)

dn−1(1 + d−1)n−1/2F(1, n− 1

2, 3

2, d−1d+1

),

the last line by Proposition C.2.2. In the same way, but using Proposition C.2.2 (2),we get∫λ∈R

e−λ2

2 |λ|n−1γ(n−1

2, λ

2

2d

)dλ =

∫x>0

e−x2 x

n−22 γ

(n−1

2, x

2d

)dλ

=

√2n+2

Γ(n− 12)

√dn−1

(n− 1)(1 + d−1)n−12

F(1, n− 1

2, n+1

2, 1d+1

).

165

Page 180: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

Hence,

Ef∼N((HR

n,d)n)#R(f)

=

√dn−1

√2π Γ(n− 1)

2n+1/2 Γ(n− 12)

dn−1(1 + d−1)n−1/2F(1, n− 1

2, 3

2, d−1d+1

)+

√2n−3

√2π Γ(n− 1)

√2n+2

Γ(n− 12)

√dn−1

(n− 1) (1 + d−1)n−1/2F(1, n− 1

2, n+1

2, 1d+1

)=

2n−1√dn

√π(d+ 1)n−

12

Γ(n− 12)

Γ(n)

[2(n− 1)F

(1, n− 1

2; 3

2; d−1d+1

)+ F

(1, n− 1

2, n+1

2, 1d+1

)];

for the last line we have used that (n − 1)Γ(n − 1) = Γ(n). By (9.3.1) we haveE(n, p) = Ef∼N((HR

n,d)n) #R(f), for d = p− 1. Hence, substituting d by p− 1 in the

above formula we obtain the desired expression for E(n, p).

Proof of Theorem 9.3.1 (2)

We proceed as in the proof for [EKS94, Theorem 5.1]. Recall from Proposition 9.3.3that

E(n, p) = Ef∼N((HR)n)

#R(f) = Eλ∼N(0,1)

Kn−1,d(λ), (9.3.8)

where d = p− 1. We first compute the generating function of the Kn,d(λ). Notethat, since K1,d = 1, we have

∞∑n=0

Kn,dzn = 1 +

∞∑n=1

Kn,dzn.

Fix d ≥ 1, λ ∈ R and − 1√d< z < 1√

d. In the definition (9.3.2) of Kn,d(λ)

replace the gamma functions by the respective integrals from (C.2.5). Since theintegrands in the gamma function are all positive, we may apply Fubini’s theoremto interchange summation and integration. This yields

∞∑n=1

Kn,d(λ) zn

=eλ2

2d

∫ ∞t=λ2

d

e−t

(∞∑n=1

√dn

Γ(n)tn−1zn

)dt+

∫ λ2

2d

t=0

e−t

(∞∑n=1

√dn

Γ(n)2n−1

(λ2

2d

)n2tn2−1zn

)dt

=eλ2

2d z√d

∫ ∞t=λ2

d

e−t(1−z√d) dt+

1√2z |λ|

∫ λ2

2d

t=0

e−t+z√

2t |λ|√t

dt

166

Page 181: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.3. General tensors

In the right hand integral we now substitute s2 = t, so that

∞∑n=1

Kn,d(λ) zn

=eλ2

2d z√de−

λ2

d(1−z

√d)

1− z√d

+

√π√2z |λ| e

z2λ2

2

(2√π

∫ |λ|√2d

s=0

e−(s− z |λ|√

2

)2ds

)

=z√d

1− z√de−λ2 (1−2z

√d)

2d +

√π√2z |λ| e

z2λ2

2

(erf(z |λ|√

2

)+ erf

((1−z

√d) |λ|√

2d

)),

where

erf(x) :=2√π

∫ x

0

e−t2

dt

denotes the error function (see (C.1.1)). By (9.3.8) we have

∞∑n=1

E(n, p) zn = z Eλ∼N(0,1)

∞∑n=0

Kn,d(λ) zn

and hence∞∑n=1

E(n, p) zn = z (1 + s1 + s2),

where

s1 =z√d

1− z√d

Eλ∼N(0,1)

e−λ2 (1−2z

√d)

2d =zd

(1− z√d)√

1 + d− 2z√d

(for the this we used Eλ∼N(0,1) e−aλ2

2 = 1√a+1

), and

s2 =

√π√2z Eλ∼N(0,1)

|λ| ez2λ2

2

[erf

(z |λ|√

2

)+ erf

((1− z

√d) |λ|√

2d

)]calc=

z2

1− z2+

z(1− z√d)

(1− z2)√

1 + d− 2z√d.

Thus

s1 + s2 + s3calc=

1− z√d+ z

√d− 2z

√d+ 1

(1− z2)(1− z√d)

.

Substituting d = p− 1 finishes the proof.

Remark 9.3.4. The symbolcalc= indicates we have used MAPLE to compute the

respective equality. [Map14].

167

Page 182: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

Proof of Theorem 9.3.1 (3)

Recall from Theorem 9.3.1 (1) that E(n, p) equals

2n−1√p− 1

nΓ(n− 1

2)

√π pn−

12 Γ(n)

[2(n− 1)F

(1, n− 1

2, 3

2, p−2

p

)+ F

(1, n− 1

2, n+1

2, 1p

)].

By Proposition C.4.2 (4) we can express the first hypergeometric function as

F(

1, n− 12, 3

2, p−2

p

)=pn−1

2n

n−2∑j=0

(n−2j

) 1

j + 12

(−p−2

p

)j.

For the second hypergeometric function we now distinguish two cases. If n = 2kis even, by Proposition C.4.2 (5) we have

F(

1, n− 12, n+1

2, 1p

)=

(n− 1)pn2

2(p− 1)n2

k−1∑j=0

(k−1j

) 1

j + k − 12

(−1p

)j, (9.3.9)

which shows that E(n, p) equals

1√π

Γ(n− 12)

Γ(n− 1)

(√p− 1

n

√p

n−2∑j=0

(n−2j

)(− p−2p )

j

j+ 12

+ 2n−2

k−1∑j=0

(−1)j(k−1j

)( 1p)j+k− 1

2

j+k− 12

).

If n = 2k + 1 is odd, then by Proposition C.4.2 (6) we have

F(

1, n− 12; n+1

2; 1p

)=

(n− 1)pn−12

2(p− 1)n2

k−1∑j=0

(−1)j(k−1j

)1−(p−1p

)j+k+ 12

j + k + 12

and therefore in this case E(n, p) equals

√πΓ(n− 1

2)

Γ(n− 1)

(√p− 1

n

√p

n−2∑j=0

(n−2j

)(− p−2p )

j

j+ 12

+ 2n−2

k−1∑j=0

(−1)j(k−1j

)1−( p−1p )

j+k+12

j+k+ 12

),

which finishes the proof (recall p = d+ 1).

Proof of Theorem 9.3.1 (4)

The case p = 2 was proven in [EKS94, Corollary 5.2]. For p > 2 and for large nwe have

EC(n, p) =(p− 1)n − 1

p− 2∼ (p− 1)n

p− 2.

168

Page 183: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.3. General tensors

Using the expression from Theorem 9.3.1 (1) see that

limn→∞

E(n, p)√EC(n, p)

= limn→∞

2n−1√p− 2Γ(n− 1

2)

√πpn−

12 Γ(n)

[2(n− 1)F

(1, n− 1

2, 3

2, p−2

p

)+ F

(1, n− 1

2, n+1

2, 1p

)].

From description (C.4.3) of F (a, b, c, x) we see that

F(

1, n− 12, n+1

2, 1p

)=∞∑k=0

(n− 12)k

(n+12

)k

1

pk≤

∞∑k=0

(2

p

)k=

p

p− 2. (9.3.10)

By [SOJ00, 43:6:12] we have

Γ(n) ∼ Γ(n− 1

2)√n

for large n. Together with (9.3.10) and p > 2 this shows that

2n−1√p− 2 Γ(n− 1

2)

√πpn−

12 Γ(n)

F(

1, n− 12, n+1

2, 1p

)n→∞−→ 0.

and hence

limn→∞

E(n, p)√EC(n, p)

= limn→∞

2n−1√p− 2 Γ(n− 1

2)

√πpn−

12 Γ(n)

2(n− 1)F(

1, n− 12, 3

2, p−2

p

)= lim

n→∞

2n√p− 2 Γ(n− 1

2)

√πpn−

12 Γ(n− 1)

F(

1, n− 12, 3

2, p−2

p

).

Using Proposition C.4.2 (1) we replace F (1, n− 12, 3

2, p−2

p) with the incomplete beta

function:

limn→∞

E(n, p)√EC(n, p)

= limn→∞

2n√p− 2 Γ(n− 1

2)

√πpn−

12 Γ(n− 1)

pn−1

2n

√p

√p− 2

B(

12, n− 1, p−2

p

)= lim

n→∞

Γ(n− 12)

√π Γ(n− 1)

B(

12, n− 1, p−2

p

)Proposition C.3.2 tells us that for large n we have

B

(1

2, n− 1,

p− 2

p

)∼√π Γ(n− 1)

Γ(n− 12)

(1

2erfc (−ω) + b

),

169

Page 184: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

where erfc(z) is the complementary error function and

ω =(− 1

2ln (2n− 1)− 1

2ln(p− 2

p

)+ (n− 1) ln

(n−1n− 1

2

)− (n− 1) ln

(2p

)) 12

and

b =(

p−22πp(n−1)

) 12(

2(n− 12

)

p(n−1)

)n−1 (n−1

12−(n−1

2)p−2p

+√

2n−1ω

)(1 +O(n−1)).

We have −ω = Θ(n) and hence ωn→∞−→ −∞ and b

n→∞−→ 0, which shows that

limn→∞

E(n, p)√EC(n, p)

= limn→∞

(erfc(−ω)

2+ b

)= 1.

This finishes the proof.

Proof of Theorem 9.3.1 (5)

From Theorem 9.3.1 (2) we obtain

limp→∞

E(n, p)√EC(n, p)

= limp→∞

1√(p−1)n−1

p−2

1√π

Γ(n− 12)

Γ(n− 1)

√p− 1

n

√p

n−2∑j=0

(n−2j

)(−p−2p

)jj + 1

2

For large p we have

(p− 1)n − 1

p− 2∼ (p− 1)n−1 and

p− 2

p∼ 1,

which shows that

limp→∞

E(n, p)√EC(n, p)

=1√π

Γ(n− 12)

Γ(n− 1)

n−2∑j=0

(n−2j

)(−1)j

j + 12

=1√π

Γ(n− 12)

Γ(n− 1)B(

12, n− 1, 1

);

the last equality by [SOJ00, 58:4:3]. By [SOJ00, 58:1:1] we have

B

(1

2, n− 1, 1

)=√π

Γ(n− 1)

Γ(n− 12),

from which the claim follows.

170

Page 185: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.4. Symmetric tensors

9.4. Symmetric tensors

Next we compute the expectation for symmetric tensors Esym(n, p), see (9.1.2).

Theorem 9.4.1. Denote by F (a, b, c, x) Gauss’ hypergeometric function (C.4.3).For all integers n ≥ 2 and p ≥ 2 the following holds.

1. If n = 2m+ 1, we have

Esym(n, p) = 1 +

√π√p− 1

n−2√3p− 2∏n

i=1 Γ(i2

) ∑1≤i,j≤m

αi,j φi,j(p),

where αi,j := det([Γ(r + s− 1

2

)]1≤r≤m,r 6=i,1≤s≤m,s 6=j) and

φi,j(p) :=Γ(i+ j − 1

2

)3−2i−2j1−2i+2j

(− 3p−24(p−1)

)i+j−1F(

2− 2i, 1− 2j, 52− i− j, 3p−2

4(p−1)

).

2. If n = 2m, we have

Esym(n, p) =

√p− 1

n−2√3p− 2∏n

i=1 Γ(i2

) ∑0≤i,j≤m−1

βi,j ψi,j(p),

where βi,j := det([Γ(r + s+ 1

2

)]0≤r≤m−1,r 6=i,0≤s≤m−1,s 6=j) and

ψ0,j(p) :=

√π(2j + 1)!

(−1)j22j j!

(p− 2)jp

(p− 1)j(3p− 2)F(−j, 1

2, 3

2, −p2

(3p−2)(p−2)

)−

Γ(j + 1

2

)2(− 3p−2

4(p−1))j+1

and, for i > 0,

ψi,j(p) :=Γ(i+ j + 1

2

)(1−2i−2j)(1−2i+2j)

(− 3p−24(p−1)

)i+jF(−2j,−2i+ 1, 3

2− i− j, 3p−2

4(p−1)

)

Remark 9.4.2. 1. Note that symmetrices (p = 2) are fully real, meaning that anyn × n real symmetric matrices always have n real eigenvalues—thus it is notsurprising that the formulas in Theorem 9.4.1 yield Esym(n, 2) = n (althoughthe verification of this is not trivial).

2. For all k ∈ N we have Γ(k + 12) = q

√π for some q ∈ Q; see [SOJ00, 43:4:3].

A simple count reveals that that in both formulas of Esym(n, p) the exponentof√π in the numerator equals the exponent of

√π in the denominator. This

implies that

Esym(2m+ 1, p) ∈ Q(√

(p− 1)(3p− 2))

and Esym(2m, p) ∈ Q(√

3p− 2).

171

Page 186: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

9.4.1. Proof of Theorem 9.4.1

By Lemma 6.4.3 (3), for Gaussian A ∈ Sp(Rn) we have QA ∼ N(HRn,p). However,

since eigenpairs are defined for the gradient of Q, we need the following definition.

Definition 9.4.3. We define N(∇n,p) to be the push-forward distribution (seeDefinition 2.4.1) of N(HR

n,p) on the linear space ∇n,p (9.2.1) that is induced by

HRn,p → ∇n,p, QA 7→ 1

p∇XQA .

Following the same arguments as in Subsection 9.3.1 we have

#R(A) = #R(1p∇XQA ).

We see that #R(A) is constant on the fibers of the gradient map and, consequently,

EA∈Sp(Rn) Gaussian

#R(A) = Ef∼N(∇n,p)

#R(f).

The analogue of Proposition 9.3.3 for symmetric tensors is the following proposition(compare [DH16, Proposition 4.3]).

Proposition 9.4.4. Recall from Definition 2.4.9 the definition of the GaussianOrthogonal Ensemble. We have

Ef∼N(∇n,p)

#R(f) =

√π

√2n−1

Γ(n2)

EM∼GOE(n−1)λ∼N(0,1)

∣∣ det(√

2(p− 1)M −√p λIn−1

)|.

Proof. Let us denote the density of f ∈ ∇n,p by ϕ∇(f). We proceed as in the proofof Proposition 9.3.3 until (9.3.4) to see that

Ef∼N(∇n,p)

#R(f) =

√πn

Γ(n2)

∫λ∈R

E∇(λ) dλ,

where

E∇(λ) :=√pn−1

∫f∈π1(π−1

2 (e1,λ))

∣∣∣det D(e1,λ)Ff |e⊥1 ×R∣∣∣ ϕ∇(f) df.

(the additional√pn−1 is due to Lemma 9.2.3 (2)). The assertion is implied using

the following.

Claim. For any λ ∈ R we have

E∇(λ) =ϕ(λ)√

2πn−1 E

M∼GOE(n−1)|det

(√2(p− 1)M −√p λ In−1

)|,

where ϕ denotes the density function of the standard normal distribution.

172

Page 187: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.4. Symmetric tensors

Proof of the claim. We have f ∈ π1(π−12 (e1, λ)), if and only if f(e1) = λe1. Fix

such an f and let Q ∈ HRn,p with f = p−1∇XQ . By Proposition 9.2.1, there exist

uniquely determined h ∈ R = h ∈ HRn,p | h(e1) = 0,De1h = 0 and a ∈ Rn−1

such that we can orthogonally decompose Q as

Q = cXp1 +√pXp−1

1 aT (X2, . . . , Xn)T + h,

so that, by Definition 9.4.3, ϕ∇(f) = ϕ(c)ϕRn−1(a)ϕR(h). Moreover,

f = 1p∇XQ =

cXp−1

1 + p−1√pXp−2

1 aT (X2, . . . , Xn)T + 1p

dhdX1

1√pXp−1

1 a1 + 1p

dhdX2

...1√pXp−1

1 an−1 + 1p

dhdXn

Since f(e1) = λe1, this equation simplifies to

f = 1p∇XQ =

λXp−1

1 + 1p

dhdX1

1p

dhdX2

...1p

dhdXn

.

Note that

∀f ∈ π1(π−12 (e1, λ)) : ϕ∇(f) = ϕ(λ)ϕRn−1(0)ϕR(h). (9.4.1)

We make an orthogonal decomposition of R. Let R1 := h ∈ R | He1h = 0, whereHe1h = De1DXh is the Hessian of h at e1. Furthermore, let R2 denote the orthog-onal complement of R1 in R. Write h = h1 + h2 according to that decomposition.Then

De1f =

(p− 1)λ 0 . . . 0...

...0 0 . . . 0

+ 1p

He1h2. (9.4.2)

Let us write h2 in the Bombieri-Weyl basis.

h2 =

√p(p−1)

2Xp−2

1

n∑i=2

mi,iX2i +

√p(p− 1)Xp−2

1

∑1<i<j≤n

m′i,jXiXj.

Since Q ∼ N(Hn,p), we have mi,i ∼ N(0, 1) and m′i,j ∼ N(0, 1). Therefore, we

have mi,j := 1√2m′i,j ∼ N(0, 1

2), meaning that (see Definition 2.4.9)

M := (mi,j) ∼ GOE(n) (9.4.3)

173

Page 188: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

Since h2 =√

p(p−1)2

Xp−21 (X2, . . . , Xn)M(X2, . . . , Xn)T , we have

He1h2 =

√p(p−1)

2

[0 00 2M

]and thus, by (9.4.2),

De1f =

[(p− 1)λ 0

0√

2(p−1)p

M

].

so that (cf. (6.3.2))

D(e1,λ)Ff =

[(p− 2)λ 0 −1

0√

2(p−1)p

M − λIn−1 0

]∈ Rn×(n+1),

Hence,

det D(e1,λ)Ff |e⊥1 ×R = − det(√

2(p−1)p

M − λIn−1

).

Using (9.4.1) and (9.4.3) this implies

E∇(λ) =√pn−1ϕ(λ)ϕRn−1(0) E

M∼GOE(n−1)|det

(√2(p−1)p

M − λIn−1

)|

=ϕ(λ)√

2πn−1 E

M∼GOE(n−1)|det

(√2(p− 1)M −√p λ In−1

)|

This proves the claim and finishes the proof.

Proof of Theorem 9.4.1 (1)

In this case we have n = 2m+ 1. By Proposition 9.4.4 we have

Esym(n, p) =

√π

√2n−1

Γ(n2)

Eλ∼N(0,1)

EM∼GOE(n−1)

∣∣ det(√

2(p− 1)M −√p λ In−1

)∣∣and by setting

σ2 =p

2(p− 1)

and u := σλ this equation becomes

Esym(n, p) =

√π√p− 1

n−1

Γ(n2)

Eu∼N(0,σ2)

EM∼GOE(2m)

∣∣ det(M − uI)∣∣ (9.4.4)

174

Page 189: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.4. Symmetric tensors

Letαi,j := det

[Γ(r + s− 1

2

)]1≤r≤m,r 6=i1≤s≤m,s 6=j

and Pi(x) = Hei(x) is the i-th probabilist’s Hermite polynomial (D.1.2). We knowfrom Theorem E.3.2 (1) that

EM∼GOE(2m)

∣∣ det(M − uI)∣∣ (9.4.5)

= EM∼GOE(2m)

det(M − uI) +

√2πe−

u2

2∏n−1i=1 Γ

(i2

) ∑1≤i,j≤m

αi,j det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

].

The goals is to compute the expectation over u of the two summands in (9.4.5).We need the following lemma.

Lemma 9.4.5. For all m ≥ 1 we have

1.√π (2(m− 1))!(2m− 1) = 22m−1 Γ

(2m+1

2

)Γ(m).

2.√πm+1 [∏m−1

i=1 (2i)!]

(2m)! = m!2m(m+1)∏2m+1

i=1 Γ(i2

).

Proof. Throughout the proof we will have to use the identities

Γ(12) =√π, Γ(3

2) =

√π

2, Γ(x+ 1) = xΓ(x)

(see (C.2.2) and (C.2.3)). We prove both claims using an induction argument. For(1) and m = 1 we have

√π (2(m− 1))!(2m− 1)

22m−1 Γ(

2m+12

)Γ(m)

=

√π√π

= 1.

For m > 1, using the induction hypothesis, we have

√π (2(m− 1))!(2m− 1)

22m−1 Γ(

2m+12

)Γ(m)

=(2m− 2)(2m− 3)

4

2m− 1

2m− 3

Γ(

2m−12

)Γ(

2m+12

) Γ(m− 1)

Γ(m)

=(2m− 2)(2m− 1)

4

2

2m− 1

1

m− 1= 1

For (2) and m = 1 we have

√πm+1

[m−1∏i=1

(2i)!

](2m)! = 2π = m!2m(m+1)

2m+1∏i=1

Γ(i2

)

175

Page 190: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

For m > 1, using the induction hypothesis, we have

√πm+1 [∏m−1

i=1 (2i)!]

(2m)!

m!2m(m+1)∏2m+1

i=1 Γ(i2

) =

√π (2(m− 1))!2m(2m− 1)

m22m Γ(

2m+12

)Γ(m)

=

√π (2(m− 1))!(2m− 1)

22m−1 Γ(

2m+12

)Γ(m)

= 1,

the last equality because of (1). This finishes the proof.

Using Lemma 9.4.5 we can prove the following.

Lemma 9.4.6. We have

√π√p− 1

n−1

Γ(n2)

Eu∼N(0,σ2)

EM∼GOE(2m)

det(M − uI) = 1

Proof. We use Theorem E.3.1 (1) to get

EM∼GOE(2m)

det(M − uI) =

√πm [∏m−1

i=1 (2i)!]

2m(m+1)∏2m

i=1 Γ(i2

) H2m(u),

where H2m(x) is the 2m-the physicist’s Hermite polynomial (D.1.1). We have, by

Lemma D.3.1, that Eu∼N(0,σ2) H2m(u) = (2m)!m!

(2σ2 − 1)m. Plugging in σ2 = p2(p−1)

yields

Eu∼N(0,σ2)

H2m(u) =(2m)!

m!(p− 1)m

Thus

Eu∼N(0,σ2)

EM∼GOE(2m)

det(M − uI) =

√πm [∏m−1

i=1 (2i)!]

2m(m+1)∏2m

i=1 Γ(i2

) (2m)!

m!(p− 1)m

and, hence,

√π√p− 1

n−1

Γ(n2)

Eu∼N(0,σ2)

EM∼GOE(2m)

det(M − uI)

=

√π√p− 1

n−1

Γ(n2)

√πm [∏m−1

i=1 (2i)!]

2m(m+1)∏2m

i=1 Γ(i2

) (2m)!

m!(p− 1)m

=

√πm+1 [∏m−1

i=1 (2i)!]

2m(m+1)∏n

i=1 Γ(i2

) (2m)!

m!

=1

176

Page 191: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.4. Symmetric tensors

the last equality by Lemma 9.4.5 (2).

Lemma 9.4.6 in combination with (9.4.4) and (9.4.5) shows that

Esym(n, p)

=1 +

√2π√p− 1

n−1∏ni=1 Γ

(i2

) ∑1≤i,j≤m

αi,j Eu∼N(0,σ2)

e−u2

2 det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

].

Applying Lemma 9.4.7 below to this equation proves Theorem 9.4.1 (1).

Lemma 9.4.7. For any 1 ≤ i, j ≤ m we have

Eu∼N(0,σ2)

e−u2

2 det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

]=

1√2π

√3p− 2

p− 1φi,j(p),

where

φi,j(p) =Γ(i+ j − 1

2

)3−2i−2j1−2i+2j

(− 3p−24(p−1)

)i+j−1F(

2− 2i, 1− 2j, 52− i− j, 3p−2

4(p−1)

).

Proof. Recall that σ2 = p2(p−1)

. From Lemma D.3.2 (1) we get for all 1 ≤ i ≤ m

that

Eu∼N(0,σ2)

P2i−1(u)P2j−1(u) e−u2

2 (9.4.6)

=(−1)i+j−1 2i+j−1 Γ

(i+ j − 1 + 1

2

)√π (σ2 + 1)i+j−1+

12

F(1− 2i, 1− 2j, 1

2− i− j + 1, 3p−2

4(p−1)

)=

(−1)i+j−14i+j Γ(i+ j − 1

2

)2√

(p− 1

3p− 2

)i+j−12

F(1− 2i, 1− 2j, 3

2− i− j, 3p−2

4(p−1)

),

and

Eu∼N(0,σ2)

P2i(u)P2j−2(u) e−u2

2 (9.4.7)

=(−1)i+j−1 2i+j−1 Γ

(i+ j − 1 + 1

2

)√π (σ2 + 1)i+j−1+

12

F(− 2i, 2− 2j, 1

2− i− j + 1, 3p−2

4(p−1)

)=

(−1)i+j−14i+j Γ(i+ j − 1

2

)2√

(p− 1

3p− 2

)i+j−12

F(− 2i, 2− 2j, 3

2− i− j, 3p−2

4(p−1)

).

177

Page 192: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

Thus

Eu∼N(0,σ2)

e−u2

2 det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

]= E

u∼N(0,σ2)e−

u2

2 (P2j−1(u)P2i−1(u)− P2i−2(u)P2j(u))

=(−1)i+j−14i+j Γ

(i+ j − 1

2

)2√

(p− 1

3p− 2

)i+j−12

[F(1− 2i, 1− 2j, 3

2− i− j, 3p−2

4(p−1)

)− F

(2− 2i,−2j, 3

2− i− j, 3p−2

4(p−1)

)].

By Lemma C.4.1 we have

F(1− 2i, 1− 2j, 3

2− i− j, x

)− F

(2− 2i,−2j, 3

2− i− j, x

)=2x

1− 2i+ 2j

3− 2i− 2jF(2− 2i, 1− 2j, 5

2− i− j, x

).

This shows that

Eu∼N(0,σ2)

e−u2

2 det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

]=

1√2π

√3p− 2

p− 1φi,j(p),

which finishes the proof.

Proof of Theorem 9.4.1 (2)

In this case we have n = 2m and thus n− 1 = 2m− 1. Similar to (9.4.4) we have

Esym(n, p) =

√π√p− 1

n−1

Γ(n2)

Eu∼N(0,σ2)

EM∼GOE(n−1)

∣∣ det(M − uI)∣∣

=

√π√p− 1

n−1

Γ(n2)

Eu∼N(0,σ2)

EM∼GOE(2m−1)

∣∣ det(M − uI)∣∣ (9.4.8)

withσ2 =

p

2(p− 1).

Let

Pk(x) :=

Hek(x), if k = 0, 1, 2, . . .

−√

2π ex2

2 Φ(x), if k = −1,

where Hek(x) is the k-the physicist’s Hermite polynomial (D.1.2) and Φ(x) is thecumulative distribution function of the normal distribution (C.1.3). By Theo-

178

Page 193: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.4. Symmetric tensors

rem E.3.2, the inner expectation of the right-hand side of (9.4.8) is given by

EM∼GOE(2m−1)

det(M − uI) +

√2e−

u2

2∏n−1i=1 Γ

(i2

) ∑0≤i,j≤m−1

βi,j det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]where

βi,j = det[Γ(r + s+ 1

2

)]0≤r≤m−1,r 6=i0≤s≤m−1,s 6=j

.

Since the normal distribution is symmetric around the origin we have

Eu∼N(0,σ2)

EA∼GOE(2m−1)

det(A− uI2m−1)

= Eu∼N(0,σ2)

EA∼GOE(2m−1)

det(−A+ uI2m−1)

=(−1)2m−1 Eu∼N(0,σ2)

EA∼GOE(2m−1)

det(A− uI2m−1)

and henceE

u∼N(0,σ2)E

A∼GOE(2m−1)det(A− uI2m−1) = 0.

This shows that

Esym(n, p) =

√2π√p− 1

n−1∏ni=1 Γ

(i2

) ∑0≤i,j≤m−1

βi,j Eu∼N(0,σ2)

e−u22 det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]Applying Lemma 9.4.8 below to this equation proves Theorem 9.4.1 (2).

Lemma 9.4.8. We have

Eu∼N(0,σ2)

e−u22 det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]=

1√2π

√3p− 2

p− 1ψi,j(p),

where

ψ0,j(p) =

√π(2j + 1)!

(−1)j22j j!

(p− 2)jp

(p− 1)j(3p− 2)F(−j, 1

2, 3

2, −p2

(3p−2)(p−2)

)−

Γ(j + 1

2

)2(− 3p−2

4(p−1))j+1

and, for i > 0,

ψi,j(p) =Γ(i+ j + 1

2

)(1−2i−2j)(1−2i+2j)

(− 3p−24(p−1)

)i+jF(−2j,−2i+ 1, 3

2− i− j, 3p−2

4(p−1)

)Proof. We first prove the case i > 0. Fix 0 < i ≤ m and 0 ≤ j ≤ m. By

179

Page 194: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9. The expected number of real eigenpairs of a real Gaussian tensor

Lemma D.3.2 (1) and similiar to (9.4.7) we have

Eu∼N(0,σ2)

P2i(u)P2j(u)e−u2

2 (9.4.9)

=(−1)i+j4i+j+1 Γ

(i+ j + 1

2

)2√

(p− 1

3p− 2

)i+j+ 12

F(− 2i,−2j, 1

2− i− j, 3p−2

4(p−1)

).

and, similar to (9.4.6),

Eu∼N(0,σ2)

P2i−1(u)P2j+1(u)e−u2

2 (9.4.10)

=(−1)i+j4i+j+1 Γ

(i+ j + 1

2

)2√

(p− 1

3p− 2

)i+j+ 12

F(1− 2i, 1− 2j, 1

2− i− j, 3p−2

4(p−1)

).

Using

det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]= P2i(u)P2j(u)− P2i−1(u)P2j+1(u)

we get from (9.4.9) and (9.4.10) that

Eu∼N(0,σ2)

e−u22 det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]=

4

2√

2πΓ

(i+ j +

1

2

) (−4(p− 1)

3p− 2

)i+j√p− 1

3p− 2[F(−2i,−2j, 1

2− i− j, 3p−2

4(p−1)

)− F

(1− 2i, 1− 2j, 1

2− i− j, 3p−2

4(p−1)

) ]Using Lemma C.4.1 we get

F(−2i,−2j, 1

2− i− j, x

)− F

(1− 2i, 1− 2j, 1

2− i− j, x

)=2x

(1− 2i+ 2j)

(1− 2i− 2j)F(−2j,−2i+ 1, 3

2− i− j, x

),

so that

Eu∼N(0,σ2)

e−u22 det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]=

Γ(i+ j + 1

2

)√

2π (1−2i−2j)(1−2i+2j)

(− 3p−2

4(p−1)

)i+j √3p− 2

p− 1F(− 2j,−2i+ 1, 3

2− i− j, 3p−2

4(p−1)

),

180

Page 195: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

9.4. Symmetric tensors

which implies

Eu∼N(0,σ2)

e−u22 det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

]=

1√2π

√3p− 2

p− 1ψi,j(p),

as asserted.Now we prove the case i = 0. By Lemma D.3.2 (2) we have

Eu∼N(0,σ2)

P−1(u)P2j+1(u)e−u2

2

=(−1)j+1(2j + 1)!

2j j!

(1− σ2)jσ2

√1 + σ2

F(− j, 1

2, 3

2, σ4

σ4−1

)=

(−1)j+1(2j + 1)!

2j j!

( p−22(p−1)

)j( p2(p−1)

)√3p−2

2(p−1)

F(− j, 1

2, 3

2, −p2

(3p−2)(p−2)

)=

(−1)j+1(2j + 1)!

22j√

2 j!

(p− 2)jp

(p− 1)j(3p− 2)

√3p− 2

p− 1F(− j, 1

2, 3

2, −p2

(3p−2)(p−2)

). (9.4.11)

Moreover, by (9.4.9) we have

Eu∼N(0,σ2)

P0(u)P2j(u)e−u2

2 =(−1)j4j+1Γ

(j + 1

2

) (p−13p−2

)j+ 12

2√

2π. (9.4.12)

Combining (9.4.12) and (9.4.11) we see that

Eu∼N(0,σ2)

e−u22 det

[P0(u) P2j+1(u)P−1(u) P2j(u)

]=

1√2π

√3p− 2

p− 1ψ0,j(p),

where

ψ0,j(p) =

√π(2j + 1)!

(−1)j22jj!

(p− 2)jp

(p− 1)j(3p− 2)F(− j, 1

2, 3

2, −p2

(3p−2)(p−2)

)−

Γ(j + 1

2

)2(− 3p−2

4(p−1)

)j+1 ,

which proves the claim for i = 0.

181

Page 196: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 197: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[AA07] Andrew Alexander, Jee Eun Lee, Mariana Lazar AaronS. Field: Diffusion Tensor Imaging of the Brain. Neurotherapeutics,4(3):316–329, 2007.

[AA13] Antonio Auffinger, Gerard Ben Arous, Jiri Cerny: Ran-dom Matrices and Complexity of Spin Glasses. Communications onPure and Applied Mathematics, 66(2):165–201, 2013.

[AB94] Abraham Berman, Robert J. Plemmons: Nonnegative Matricesin the Mathematical Sciences. SIAM, 1994.

[AB12] Amelunxen, D. and P. Burgisser: A Coordinate-Free ConditionNumber for Convex Programming. SIAM J. Optim., 22(3):1029–1041,2012.

[ABB+15] Armentano, D., C. Beltran, P. Burgisser, F. Cucker andM. Shub: A stable, polynomial-time algorithm for the eigenpair prob-lem. ArXiv e-print 1505.03290, May 2015.

[Abr72] Abramowitz, M, Stegun I. A.: Handbook of mathematical func-tions, volume 55 of Applied Mathematics Series. United States De-partment of Commerce, 1972.

[ADM+02] Adler, R. L., J.-P. Dedieu, J. Y. Margulies, M. Martens andM. Shub: Newton’s method on Riemannian manifolds and a geomet-ric model for the human spine. IMA J. Numer. Anal., 22(3):359–390,2002.

[AGH+14] Anandkumar, A., R. Ge, D. Hsu, S. M. Kakade and M. Tel-garsky: Tensor decompositions for learning latent variable models.J. Mach. Learn. Res., 15:2773–2832, 2014.

[AHPC] A.-H. Phan, P. Tichavsky and A. Cichocki: Low complex-ity damped Gauss-Newton algorithms for CANDECOMP/PARAFAC.pages 126–147.

183

Page 198: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[AHPC13] A.-H. Phan, P. Tichavsky and A. Cichocki: Fast alternatingLS algorithms for high order CANDECOMP/PARAFAC tensor fac-torizations. pages 4834–4846, 2013.

[AM12] Absil, P.-A. and J. Malick: Projection-like retractions on matrixmanifolds. SIAM J. Optim., 22(1):135–158, 2012.

[AMR09] Allman, E. S., C. Matias and J. A. Rhodes: Identifiability ofparameters in latent structure models with many observed variables.Ann. Statist., 37(6A):3099–3132, 2009.

[AMS08] Absil, P.-A., R. Mahony and R. Sepulchre: Optimization Al-gorithms on Matrix Manifolds. Princeton University Press, 2008.

[Arm14] Armentano, Diego: Complexity of path-following methods for theeigenvalue problem. Found. Comput. Math., 14(2):185–236, 2014.

[AS72] Abramowitz, Milton and Irene Stegun: Handbook of Math-ematical Functions, With Formulas, Graphs, and Mathematical Ta-bles,. U.S. Government Printing Office, 1972.

[ASS15] Abo, H., A. Seigal and B. Sturmfels: Eigenconfigurations ofTensors. ArXiv e-print 1505.05729, May 2015.

[Bat54] Bateman, Harry: Tables of integral transforms, vol. 2. McGraw-Hill Book Company, Inc., 1954.

[BB16] Breiding, Paul and Peter Burgisser: Distribution of the eigen-values of a random system of homogeneous polynomials. Linear Al-gebra and its Applications, 497:88–107, 2016.

[BC11] Burgisser, Peter and Felipe Cucker: On a problem posed bySteve Smale. Ann. of Math., 174(3):1785–1836, 2011.

[BC13] Burgisser, P. and F. Cucker: Condition: The Geometry of Nu-merical Algorithms, volume 349 of Grundlehren der mathematischenWissenschaften. Springer, Heidelberg, 2013.

[BCS97] Burgisser, P., M. Clausen and M. A. Shokrollahi: AlgebraicComplexity Theory, volume 315 of Grundlehren der mathematischenWissenshaften. Springer, Berlin, Germany, 1997.

[BCSS98] Blum, Lenore, Felipe Cucker, Michael Shub and SteveSmale: Complexity and real computation. Springer-Verlag, NewYork, 1998.

184

Page 199: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[BDHR15] Boralevi, A., J. Draisma, E. Horobet and E. Robeva: Or-thogonal and unitary tensor decomposition from an algebraic perspec-tive. arXiv:1512.08031, 2015.

[BG73] Bjorck, A. and Gene H. Golub: Numerical methods for Comput-ing Angles between Linear Subspaces. Math. Comp., 27(123):579–594,1973.

[BHSW] Bates, Daniel J., Jonathan D. Hauenstein, Andrew J.Sommese and Charles W. Wampler: Bertini: Software for Nu-merical Algebraic Geometry.

[BL14] Buczynski, J. and J.M. Landsberg: On the third secant variety.J. Algebraic Combin., 40(2):475–502, 2014.

[BP11] Beltran, Carlos and Miguel Pardo: Fast Linear Homotopy tofind Approximate Zeros of Polynomial Systems. Found. Comp. Math.,11:95–129, 2011.

[Buc67] Buckley, Walter: Sociology and modern systems theory. Oxford,England: Prentice-Hall, 1967.

[BV16] Breiding, P. and N. Vannieuwenhoven: The condition numberof join decompositions. ArXiv e-prints 1611.08117, 2016.

[Bur17] Burgisser, Peter: Condition of Intersecting a Projective Varietywith a Varying Linear Subspace. SIAM Journal on Applied Algebraand Geometry, 1(1):111–125, 2017.

[Chu78] Chung, Kai Lai: Elementare Wahrscheinlichkeitstheorie undstochastische Prozesse. Springer, 1978.

[CJ10] Comon, P. and C. Jutten: Handbook of Blind Source Separation:Independent Component Analysis and Applications. Elsevier, 2010.

[Com94] Comon, P.: Independent component analysis, a new concept? SignalProc., 36(3):287–314, 1994.

[COV14] Chiantini, L., G. Ottaviani and N. Vannieuwenhoven: Analgorithm for generic and low-rank specific identifiability of complextensors. SIAM J. Matrix Anal. Appl., 35(4):1265–1287, 2014.

[COV15] Chiantini, L., G. Ottaviani and N. Vannieuwenhoven: Ongeneric identifiability of symmetric tensors of subgeneric rank. Trans.Amer. Math. Soc., 2015.

185

Page 200: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[CS09] Chen, J. and Y. Saad: On the Tensor SVD and the Optimal LowRank Orthogonal Approximation of Tensors. SIAM J. Matrix Anal.Appl., 30(4):1709–1734, 2009.

[CS13] Cartwright, Dustin and Bernd Sturmfels: The number ofeigenvalues of a tensor. Linear Algebra Appl., 438(2):942–952, 2013.

[dC93] Carmo, M. do: Riemannian Geometry. Birhauser, 1993.

[DD09] Deza, M. M. and E. Deza: Encyclopedia of Distances. Springer,2009.

[Dem96] Demmel, James W.: Applied Numerical Linear Algebra. SIAM,1996.

[DH16] Draisma, J. and E. Horobet: The average number of critical rank-one approximations to a tensor. Linear and Multilinear Algebra,64(12):2498–2518, 2016.

[Doo90] Doob, L.: Stochastic processes. Wiley Classics Library, 1990.

[dSL08] Silva, V. de and L.-H. Lim: Tensor Rank and the Ill-Posedness ofthe Best Low-Rank Approximation Problem. SIAM J. Matrix Anal.Appl., 30(3):1084–1127, 2008.

[EAM11] E. Acar, D.M. Dunlavy, T.G. Kolda and M. Mørup: Scalabletensor factorizations for incomplete data. Chemometr. Intell. Lab.,(106):41–56, 2011.

[Eis95] Eisenbud, David: Commutative Algebra with a View Toward Al-gebraic Geometry. Number 150 in Graduate Texts in Mathematics.Springer, 1995.

[EKS94] Edelman, Alan, Eric Kostlan and Michael Shub: How manyeigenvalues of a random matrix are real? J. Amer. Math., Soc. 7(1994), no. 1:247–267, 1994.

[EO03] Even Ozarslan, Thomas H. Mareci: Generalized diffusiontensor imaging and analytical relationships between diffusion tensorimaging and high angular resolution diffusion imaging. Magnetic Res-onance in Medicine, 2003.

[FMPS13] Friedland, S., V. Mehrmann, R. Pajarola and S.K. Suter:On best rank one approximation of tensors. Numerical Linear Algebrawith Applications, 20(6):942–955, 2013.

186

Page 201: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[GH78] Griffiths, Phillip and Joseph Harris: Principles of algebraicgeometry. John Wiley & Sons, New York, 1978. Pure and AppliedMathematics.

[Gin65] Ginibre, Jean: Statistical ensembles of complex, quaternion, andreal matrices. J. Mathematical Phys., 6:440–449, 1965.

[GKZ94] Gelfand, I.M., M. M. Kapranov and A. V. Zelevinsky: Dis-criminants, Resultants and Multidimensional Determinants. ModernBirkhauser Classics. Birkhauser, 1994.

[GQWW07] Guyan, Ni, Liqun Qi, Fei Wang and Yiju Wang: The degreeof the E-characteristic polynomial of an even order tensor. J. Math.Anal. Appl. 329, 329:1218–1229, 2007.

[GR15] Gradshteyn, I. S. and I. M. Ryzhik: Table of integrals, series,and products. Elsevier/Academic Press, Amsterdam, Eighth edition,2015.

[GvL96] Golub, G. H. and C. van Loan: Matrix Computations. The JohnHopkins University Press, 1996.

[Har92] Harris, J.: Algebraic Geometry, A First Course, volume 133 ofGraduate Text in Mathematics. Springer-Verlag, 1992.

[Has90] Hastad, J: Tensor rank is NP-complete. Journal of Algorithms,11(4):644–654, 1990.

[HH82] Hayashi, C. and F. Hayashi: A new algorithm to solve PARAFAC-model. Behaviormetrika, (11):49–60, 1982.

[Hit27] Hitchcock, F. L.: The expression of a tensor or a polyadic as asum of products. Journal of Mathematics and Physics, 6:164–189,1927.

[HJ92] Horn, R.A. and C.R. Johnson: Matrix analysis, volume 349.Cambridge University Press, Cambridge, 1992.

[HL13] Hillar, Christopher J. and Lek-Heng Lim: Most tensor prob-lems are NP-hard. J. ACM, 60(6):Art. 45, 39, 2013.

[IK99] Iarrobino, A. and V. Kanev: Power Sums, Gorenstein Algebras,and Determinantal Loci, volume 1721 of Lecture Notes in Mathemat-ics. Springer, 1999.

187

Page 202: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[KM11] Kolda, Tamara G. and Jackson R. Mayo: Shifted PowerMethod for Computing Tensor Eigenpairs. SIAM Journal on MatrixAnalysis and Applications, 32(4):1095–1124, 2011.

[Kol01] Kolda, T. G.: Orthogonal Tensor Decompositions. SIAM J. MatrixAnal. Appl., 23(1):243–255, 2001.

[Kro08] Kroonenberg, P. M.: Applied Multiway Data Analysis. Wileyseries in probability and statistics. John Wiley & Sons, Hoboken,New Jersey, 2008.

[KSV14] Kressner, D., M. Steinlechner and B. Vandereycken: Low-rank tensor completion by Riemannian optimization. BIT Numer.Math., 54(2):447–468, 2014.

[Lan12] Landsberg, J. M.: Tensors: Geometry and Applications, volume128 of Graduate Studies in Mathematics. AMS, Providence, RhodeIsland, 2012.

[Lee13] Lee, J. M.: Introduction to Smooth Manifolds, volume 218 of Gradu-ate Texts in Mathematics. Springer, New York, USA, second edition,2013.

[Lei61] Leichtweiss, Kurt: Zur Riemannschen Geometrie in Grassman-nschen Mannigfaltigkeiten. Mathematische Zeitschrift, 76:334–366,1961.

[Lim06] Lim, L.-H.: Singular Values and Eigenvalues of Tensors: A Varia-tional Approach. Proceedings of the IEEE International Workshopon Computational Advances in Multi-Sensor Adaptive Processing(CAMSAP ’05), pages 129–132, 2006.

[LNQ13] Lim, Lek-Heng, Michael K. Ng and Liqun Qi: The spectraltheory of tensors and its applications. Numer. Linear Algebra Appl.,20(6):889–890, 2013.

[LO13] Landsberg, J. M. and G. Ottaviani: Equations for secant va-rieties of Veronese and other varieties. Ann. Mat. Pura Appl. (4),192(4):596–606, 2013.

[LQY13] Li, Guoyin, Liqun Qi and Gaohang Yu: The Z-eigenvalues ofa symmetric tensor and its application to spectral hypergraph theory.Numer. Linear Algebra Appl., 20(6):1001–1029, 2013.

[Map14] Maple: 18. Waterloo Maple Inc., 2014.

188

Page 203: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[MAT15] MATLAB: R2015b. Natick, Massachusetts, 2015.

[MAT16] MATLAB: R2016b. Natick, Massachusetts, 2016.

[McC87] McCullagh, P.: Tensor Methods in Statistics. Monographs onstatistics and applied probability. Chapman and Hall, New York,1987.

[Meh91] Mehta, Madan Lal: Random matrices. Academic Press, Boston,New York, San Diego, 1991.

[MKK02] M. K. Kulkarni, S. S. Kandalgaonkar, M. I. R. TinmakerA. Nath: Markov Chain Models for Pre-Monsoon Season Thunder-storms over Pune. International Journal of Climatology, 22, 2002.

[Mui82] Muirhead, R.J.: Aspects of Multivariate Statistical Theory, volume131. John Wiley & Sons, NY, 1982.

[N90] Nolker, Stefan: Isometric Immersions with homothetical Gaussmap. Geometriae Dedicata, (34), 1990.

[Ose06] Oseledets, I., Savostyanov D.: Minimization methods for ap-proximating tensors and their comparison. 46:1641–1650, 2006.

[Pea04] Pearson, Karl: On the Theory of Contingency and its Relationto Association and Normal Correlation. Drapers’ Company ResearchMemoirs Biometric Series I. Dulau and Co., 1904.

[PG76] P. Gates, H. Tong: On Markov Chain Modeling to Some WeatherData. American Meteorological Society, 1976.

[QCL16] Qi, Y., P. Comon and L.-H. Lim: Semialgebraic Geometry of Non-negative Tensor Rank. SIAM J. Matrix Anal. Appl., 37(4):1556–1580,2016.

[Qi05] Qi, Liqun: Eigenvalues of a real supersymmetric tensor. J. SymbolicComput., 40(6):1302–1324, 2005.

[Qi07] Qi, Liqun: Eigenvalues and invariants of tensors. J. Math. Anal.Appl., 325(2):1363–1377, 2007.

[R C15] R Core Team: R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna, Aus-tria, 2015.

189

Page 204: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[S+16] Stein, W. A. et al.: Sage Mathematics Software (Version 7.3). TheSage Development Team, 2016. http://www.sagemath.org.

[SBG04] Smilde, A., R. Bro and P. Geladi: Multi-way Analysis: Applica-tions in the Chemical Sciences. John Wiley & Sons, Hoboken, NewJersey, 2004.

[Sch68] Schwartz, Jacob T: Differential Geometry and Topology. Gordonand Breach, 1968.

[SDF+16] Sidiropoulos, N. D., L. De Lathauwer, X. Fu, K. Huang,E. E. Papalexakis and Ch. Faloutsos: Tensor Decompositionfor Signal Processing and Machine Learning. arXiv:1607.01668, 2016.

[Sha13] Shafarevich, I. R.: Basic Algebraic Geometry 1, Varieties in Pro-jective Space. Springer-Verlag, 3 edition, 2013.

[SM14] Susumu Mori, J-Donal Tournier: Introduction to diffusion ten-sor imaging. Elsevier, Second edition, 2014.

[SOJ00] Spanier, Jerome, Keith B. Oldham and Myland Jan: An atlasof functions. Springer, Second edition, 2000.

[SS90] Stewart, G. W. and J.-G. Sun: Matrix Perturbation Theory, vol-ume 33 of Computer Science and Scientific Computing. AcademicPress, 1990.

[Ste78] Steele, Lynn Arthur (editor): Mathematics Today: Twelve In-formal Essays. Springer, 1978.

[Tao12] Tao, Terence: Topics in Random Matrix Theory. Graduate Stud-ies in Mathematics, vol. 132. 2012.

[TB97] Trefethen, Lloyd N. and David Bau: Numerical Linear Alge-bra. SIAM, 1997.

[TD99] Tuch DS, Weisskoff RM, Belliveau JW Wedeen VJ: Highangular resolution diffusion imaging of the human brain. Proceedingsof the 7th Annual Meeting of ISMRM, 1999.

[Tem75] Temme, Nico: Uniform Asymptotic Expansion of the IncompleteGamma Functions and the Incomplete Beta Function. Mathematicsof Computation, 29(132):1109 – 1109, 1975.

[Van16] Vannieuwenhoven, N.: A condition number for the tensor rankdecomposition. arXiv:1604.00052, 2016.

190

Page 205: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Bibliography

[VDS+16] Vervliet, N., O. Debals, L. Sorber, M. Van Barel and L. DeLathauwer: Tensorlab v3.0, March 2016.

[War83] Warner, Frank W.: Foundations of Differentiable Manifolds aldLie Groups. Number 94 in Graduate Texts in Mathematics. Springer,1983.

[Won67] Wong, Y. C.: Differential Geometry of Grassmann manifolds. Proc.Natl. Acad. Sci. U.S.A., 57(3):589–594, 1967.

[Won68] Wong, Yung-Chow: Conjugate loci in Grassmann manifolds. Bull.Amer. Math. Soc., 74(2):240–245, 03 1968.

[YL14] Ye, K. and L.-H. Lim: Schubert varieties and distances betweensubspaces of different dimensions. arXiv:1407.0900, July 2014.

[Zak93] Zak, F. L.: Tangent and Secants of Algebraic Varieties. Central Eco-nomics Mathematical Institute of the Russian Academy of Sciences,1993.

[ZG01] Zhang, T. and G. H. Golub: Rank-One Approximation to HighOrder Tensors. SIAM J. Matrix Anal. Appl., 23(2):534–550, 2001.

[ZKP11] Z. Koldovsk´y, P. Tichavsk´y and A.-H. Phan: Stability anal-ysis and fast damped Gauss-Newton algorithm for INDSCAL tensordecomposition. Proc. IEEE Workshop Statist. Signal Process., pages581–584, 2011.

191

Page 206: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 207: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

List of symbols

Avp−1 = λv: eigenpair equation for the order-p tensor A. . . . . . . . . . . . . . . . . . . . . .83D(n, d) :=

∑n−1i=0 d

i. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89E(n, p): expected number of real eigenpairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Esym(n, p): expected number of real eigenpairs of a symmetric tensor. . . . . . . . 157Ff (X, `) = f(X)− `X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86N(0, 1): standard Gaussian distribution over the real numbers. . . . . . . . . . . . . . . 31NC(0, 1): standard Gaussian distribution over the complex numbers. . . . . . . . . . 32QA(X) is the polynomial associated to the symmetric tensor A. . . . . . . . . . . . . . . . 8Sp(V ): p-th symmetric power of the vector space V . . . . . . . . . . . . . . . . . . . . . . . . . . 20V : solution manifold for the eigenpair problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87W : manifold of well-posed triples for the eigenpair problem. . . . . . . . . . . . . . . . . . 87K: denotes either R or C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1T is an abbreviation for the tensor product Kn1 ⊗ · · · ⊗Knp . . . . . . . . . . . . . . . . . . . 1Λp(V ): p-th antisymmetric power of the vector space V . . . . . . . . . . . . . . . . . . . . . . 21Φ: the addition map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22D(n, d) := dn − 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Hn,d: space of homogeneous polynomials of degree d in n variables. . . . . . . . . . . 86J : join of a collection of manifolds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37DxF : the derivative of F at x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18distweighted: weighted distance on the Segre variety. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67distchordal: the chordal distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40d

dx: partial derivative in direction x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18V : solution manifold for the h-eigenpair problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94W : manifold of well-posed triples for the h-eigenpair problem. . . . . . . . . . . . . . . . 94κ(x): symbol for condition numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27GOE(n): Gaussian Orthogonal Ensemble in Rn×n. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33µ(f, v, η), µ(f, v, η): the two condition numbers for the h-eigenpair problem. .115S : Segre variety. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21V : Veronese variety. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22dP(x, y): the angular distance between x and y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17dS(x, y): the spherical distance between x and y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17fA(X) = AXp−1: polynomial system associated to the order-p tensor A. . . . . . .85x⊥ = y ∈ Kn | 〈x, y〉 = 0: the orthogonal complement of x. . . . . . . . . . . . . . . . . .17

193

Page 208: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

List of Algorithms

1. Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2. The Riemannian Gauss–Newton method . . . . . . . . . . . . . . . . 56

3. Adaptive linear homotopy method for eigenpairs (EALH) . . . . . . 1204. Las Vegas EALH (LVEALH) . . . . . . . . . . . . . . . . . . . . . . 1225. Sampling method for the starting system for EALH . . . . . . . . . . 1236. Las Vegas EALH with sampling (LVEALHWS). . . . . . . . . . . . . 124

Page 209: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

List of Figures

1.0.1.A 2× 2 matrix and a 2× 2× 2 tensor. . . . . . . . . . . . . . . . . 11.1.1.Pearson’s contigency table VII. . . . . . . . . . . . . . . . . . . . . 31.2.1.Two pictures of the river Tajo in Toledo, Spain, are mixed. . . . . . 51.3.1.MRI scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1.Difference between dP and dS. . . . . . . . . . . . . . . . . . . . . . 18

3.2.1.Join of two copies of the unit circle—Distance to ill-posedness . . . 473.2.2.Join of two copies of the unit circle—Condition number. . . . . . . 483.3.1.Example of a non-converging condition number. . . . . . . . . . . . 53

5.1.1.Distribution of the condition number . . . . . . . . . . . . . . . . . 665.2.1.Relative error in the weighted distance . . . . . . . . . . . . . . . . 685.2.2.A sketch of the construction made in the proof of Proposition 5.2.9. 70

7.0.1.Angles in an affine hyperplane vs. angles in the ambient space. . . . 987.1.1.Expectation of the modulus of the eigenvalue I . . . . . . . . . . . . 1017.1.2.Expectation of the modulus of the eigenvalue II . . . . . . . . . . . 102

8.0.1.Why one needs to bound the norm of the input for algorithm EALH.113

9.1.1.Sample of 2000 Gaussian tensors . . . . . . . . . . . . . . . . . . . 1569.1.2.Expected number of real eigenpairs of Gaussian tensors in (R5)⊗p. . 159

195

Page 210: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 211: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

Appendix

197

Page 212: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 213: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

The following chapters contain auxiliary results that are needed throughout thework. We mostly cite the literature, except for Appendix E.3, where we give anew formula for the expected absolute value of the characteristic polynomial of aGOE matrix. In Appendix F we have put the source code of the scripts used forFigure 9.1.1 and Table 9.1.1.

199

Page 214: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 215: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

A. Operator norms

A.1. Matrix norms

Let K ∈ R,C. The two most important matrix norms in this work are thespectral norm and the Frobenius norm.

Definition A.1.1. Let A ∈ Km×n.

1. The spectral norm of A is defined as ‖A‖ := maxx∈Kn,‖x‖=1 ‖Ax‖ .

2. The Frobenius norm of A = (ai,j) is defined as ‖A‖F :=√∑m

i=1

∑nj=1 |ai,j|

2.

To prove the following on uses Theorem 2.2.11.

Lemma A.1.2. Let A ∈ Km×n and let ς1, . . . , ςminm,n denote the singular valuesof A. Then we have

1. ‖A‖ = maxς1, . . . , ςminm,n

.

2. ‖A‖F =√ς21 + . . .+ ς2

minm,n.

3. ‖A‖ ≤ ‖A‖F .

4. minx∈Kn,‖x‖=1 ‖Ax‖ = minς1, . . . , ςminm,n

.

If A is invertible, we moreover have

5. ‖A−1‖ =(min

ς1, . . . , ςminm,n

)−1.

The spectral norm is submultiplivative; that is,

‖BA‖ ≤ ‖B‖ ‖A‖ . (A.1.1)

The following lemma is immediate.

Lemma A.1.3. Let W1, . . . ,Wr ⊂ KN be linear spaces and n =∑r

i=1 dim(Wi).For each i let Ui be a matrix, whose columns form an orthonormal basis of Wi.Let U =

[U1 · · · Ur

]∈ KN×n. Then,

Ux | x ∈ S(Kn)=w1 + · · ·+ wr | ‖w1‖2 + · · ·+ ‖wr‖2 = 1 and wi ∈ Wi, i = 1, . . . , r

=x1w1 + · · ·+ xrwr | x2

1 + · · ·+ x2r = 1, wi ∈ S(Wi), i = 1, . . . , r

.

201

Page 216: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

A. Operator norms

Lemma A.1.3 yields the following.

Corollary A.1.4. Let W1, . . . ,Wr ⊂ KN be linear spaces and n =∑r

i=1 dim(Wi).Assume that n > N . Let

φ : W1 × . . .×Wr −→ KN , (w1, . . . , wr) 7−→ w1 + . . .+ wr.

Then, ςmin(φ) = ςmin(U), where the matrix U is defined as in Lemma A.1.3.

Proof. By Lemma A.1.2 (3) we have ςmin(φ) = minx∈S(Kn)

‖φ(x)‖ and, by Lemma A.1.3,

minx∈S(Rn)

‖φ(x)‖ = ςmin

([U1 · · · Ur

])= ςmin(U).

This finishes the proof.

The following is a version of the Eckart-Young characterization of the smallestsingular value—e.g. [BC13, Corollary 1.19 and Remark 1.20]—for matrices havingunit norm columns.

Lemma A.1.5. Let r ≤ N , and let SN×r ⊂ KN×r denote the matrices with unitnorm columns. Let Y ∈ SN×r. Then,

ςmin(Y ) = minX∈SN×r

is of rank <r

√√√√ r∑i=1

(sin dP(xi, yi))2,

where xi, yi denote the columns of X, Y , respectively, and dP(x, y) denotes the anglebetween x and y, see (2.1.1).

Proof. The Eckart-Young characterization of the smallest singular value tells us

ςmin(Y ) = minX∈KN×r

is of rank <r

√√√√ r∑i=1

‖xi − yi‖2. (A.1.2)

We show the assertion by proving

1. minX∈SN×r

is of rank <r

√∑ri=1 (sin dP(xi, yi))

2 ≤ minX∈KN×r

is of rank <r

√∑ri=1 ‖xi − yi‖

2,

2. minX∈SN×r

is of rank <r

√∑ri=1 (sin dP(xi, yi))

2 ≥ minX∈KN×r

is of rank <r

√∑ri=1 ‖xi − yi‖

2.

Suppose that X is a matrix that minimizes the right-hand side of (A.1.2). Toshow 1. we construct an explicitX ′ ∈ SN×r verifying the equation 1. For 1 ≤ i ≤ r,

202

Page 217: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

A.1. Matrix norms

we distinguish between two cases. If we have xi 6= 0, let x′i = αxi denote the orthog-onal projection of yi onto xi. By definition, ‖xi − yi‖ ≥ ‖x′i − yi‖ = sin dP(xi, yi).Otherwise, if xi = 0, we choose x′i as any nonzero vector in span x1, . . . , xr∩(yi)

⊥.Then, ‖xi − yi‖ = 1 = sin dP(x′i, yi) by construction. In both cases, the col-umn span of X does not change if we replace the i-th column by x′i. Hence,X ′ =

[x′1 · · · x′r

]is of rank < r. This proves 1.

To show 2. suppose that X is a matrix of rank < r minimizing the left-handexpression of equation 2. Again, we construct an explicit X ′ ∈ KN×r verify-ing the inequality. For 1 ≤ i ≤ n let x′i denote the orthogonal projection of yionto xi. Then X ′ =

[x′1 · · · x′r

]∈ KN×r is a matrix of rank < r and we have∑r

i=1 (sin dP(xi, yi))2 =

∑ri=1 ‖x′i − yi‖

2 , which proves 2.

We have seen in in Lemma A.1.2 that the inverse of the smallest singular valueof a quadratic matrix equals the spectral norm of the inverse matrix. If the matrixis not quadratic, a similar statement holds, but in terms of the pseudo inverse[SS90, Chapt. 3, Sec. 1].

Definition A.1.6 (The Pseudo-Inverse). Let A ∈ Km×n have full rank. Thepseudo inverse A† is defined as

A† =

(A∗A)−1A∗, if rkA = n

A∗(AA∗)−1, if rkA = m.

By [SS90, Chap. 3, Sec. 1, Exercise 7], we have∥∥A†∥∥ = ςmin(A)−1. The

following is [SS90, Chapt. 3, Theorem 3.9].

Theorem A.1.7 (Wedin’s theorem). Let A,B ∈ Km×n, n ≤ m, and supposerk (A) = rk (B) = n. Then

∥∥A† −B†∥∥ ≤ √2∥∥A†∥∥∥∥B†∥∥ ‖A−B‖ .

If A ∈ Kn×(n+k) is a matrix of full rank, the pseudo-inverse of A can be nicelycharacterized as A† = (A|(kerA)⊥)−1. The following lemma deals with compositionsof such restricted linear maps.

Lemma A.1.8. 1. Let u, v ∈ Kn\ 0 and P : u⊥ → v⊥ be the orthogonal projec-tion from u⊥ to v⊥. We have ‖P‖ ≤ 1. If dS(u, v) < π

2, then P is invertible

and ‖P−1‖ = (cos dS(u, v))−1.

2. Let A ∈ K(n+1)×n and u, v ∈ Kn+1\ 0, such that kerA = Kv. Then the mapA|−1

v⊥ A|u⊥ : u⊥ → v⊥ is the orthogonal projection u⊥ → v⊥.

3. Let A ∈ K(n+1)×n and u, v ∈ Kn+1\ 0, such that kerA = Kv. Then we have∥∥A|−1v⊥

∥∥ ≤ ∥∥A|−1u⊥

∥∥ and∥∥A|−1

u⊥

∥∥ ≤ (cos dS(u, v))−1∥∥A|−1

v⊥

∥∥.

203

Page 218: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

A. Operator norms

Proof. For (1) see [BC13, Lemma 16.40]. For (2) let x ∈ u⊥ and y ∈ v⊥, such thatA|−1

v⊥ A|u⊥ x = y. Then Ax = Ay, which implies that x− y ∈ kerA = Kv.

To prove (3), write A|−1v⊥ = A|−1

v⊥ A|u⊥ A|−1u⊥ . Using the submultiplicativity of

the spectral norm we obtain∥∥A|−1

v⊥

∥∥ ≤ ∥∥A|−1v⊥ A|u⊥

∥∥∥∥A|−1u⊥

∥∥ . From (1) and (2) we

get that∥∥A|−1

v⊥ A|u⊥∥∥ ≤ 1. The second claim from 3. is proven similarly.

A.2. Norms of multilinear operators

The spectral norm as defined in Definition A.1.1 generalizes naturally from lin-ear maps to multilinear maps (i.e., from matrices to tensors). For k, l ∈ N let

Lk(Cl,Cn) := Cn ⊗((Cl)∗

)⊗k(where ∗ denotes the dual) denote the vector space

of multilinear maps (Cl)k → Cn. The spectral norm of A ∈ Lk(Cl,Cn) is definedas

‖A‖ := supv1,...,vk∈S(Cl)

‖A(v1, . . . , vk)‖ . (A.2.1)

An alternative definition of a norm for multilinear operators would be the restric-tion of (A.2.1) to the diagonal:

‖A‖ := supv∈S(Cl)

‖A(v, . . . , v)‖ . (A.2.2)

It is easily seen that

‖A(v1, . . . , vk)‖ ≤ ‖A‖ ‖v1‖ · . . . · ‖vk‖ (A.2.3)

and that‖BA‖ ≤ ‖B‖ ‖A‖ (A.2.4)

for A ∈ Lk(Cl,Cn), B ∈ L1(Cn,Cn′).

204

Page 219: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B. Differential Geometry

B.1. Differentiable manifolds

Manifolds play an important role througout this work. This section is meant tobriefly explain and summarize the main notions of the subject.

We begin by recalling the definition of differentiable manifolds from [Sch68].

Definition B.1.1. An n-dimensional manifold M is a separable metric space witha system of open subsets Uα satisfying the following properties.

1. M ⊆⋃α Uα.

2. For each α there is a map hα : Uα → Rn such that hα is a homeomorphism ofUα with an open ball in Rn.

3. For all α, β the map hα h−1β defined on hβ(Uα ∩ Uβ) is smooth.

The collection (Uα, hα) is sometimes called an atlas for M and a pair (U, h) iscalled a chart. Different atlantes may yield different structures on M . This is whyone also speaks of an atlas equivalence class when defining manifolds.

We next define the tangent space to M at a point.

Definition B.1.2. Let C(M) denote the ring of real-valued smooth functions onM and let C∗ denote its dual. The tangent space to p ∈ M is defined to be thelinear space

TpM = τ ∈ C∗ | ∀f, g ∈ C(M) : τ(fg) = f(p)τ(g) + g(p)τ(f) .

If in Definition B.1.1 one replaces Rn by Cn and requires the transition mapshα h−1

β to be holomorphic, one obtains the notion of a complex manifold. Insimilar fashion one defines the tangent space to a complex manifold, which is acomplex linear space; see [GH78, Chap. 2] for more details.

Lemma B.1.3. TvS(Cn) =a ∈ Cn | <〈a, v〉 = 0

= v⊥ ⊕ Riv.

Proof. See [BC13, Equation (14.11)] and [BC13, Lemma 14.9].

205

Page 220: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B. Differential Geometry

B.2. Riemannian manifolds

The following is [dC93, Def. 2.1].

Definition B.2.1. A Riemannian manifold (M, 〈 , 〉) is a differentiable manifoldM, where for each p ∈ M the tangent space TpM is equipped with an innerproduct 〈 , 〉p, which varies smoothly with p.

We define ‖ξ‖p := 〈ξ, ξ〉p. When it doesn’t cause confusion, we also omit theindex with p in both 〈 , 〉p and ‖ ‖p.

Their product of two Riemmanian manifoldsM×N is a Riemannian manifoldwith tangent space T(p,q)M ×N = TpM × TqN where the Riemannian metric isdefined as

〈(ξ1, ζ1), (ξ2, ζ2)〉(p,q) := 〈ξ1, ξ2〉p + 〈ζ1, ζ2〉q . (B.2.1)

B.2.1. Paths and distances in Riemannian manifolds

The riemannian metric allows to compute the length of curve segments in M .Suppose that

γ : [0, 1]→M

is a smooth curve segment in M . Its length is defined as

l(γ) :=

∫ 1

0

∥∥∥∥∂γ∂t∥∥∥∥ 1

2

dt. (B.2.2)

The length of piecewise differentiable curves is defined as the sum of the lengthsof its differentiable parts. The riemannian distance between two points p, q ∈ Mis defined as

distM(p, q) = inf l(γ) | γ(0) = p, γ(1) = q (B.2.3)

The distance distM makes M a metric space, see [dC93, Prop. 2.5].

Definition B.2.2. A differentiable map f : M → N is called isometric immer-sion, if for all p ∈M and u, v ∈ TpM it holds that

〈u, v〉 = 〈Dpf (u),Dpf (v)〉 .

We also say that f is isometric. If in addition f is a diffeomorphism, it is calledan isometry

Note that, if f : M → N is an isometric immersion, one necessarily hasdimM ≤ dimN and for all p ∈ M the derivative Dpf is injective (hence thename ”immersion”). If f is an isometry, then it must hold that dimM = dimN .The following is straight-forward to prove.

206

Page 221: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B.2. Riemannian manifolds

Lemma B.2.3. Let M,N,P be Riemannian manifolds and f : M → N andg : N → P be differentiable maps.

1. Assume that f is an isometry. Then, g f is isometric, if and only if g isisometric.

2. Assume that g is an isometry. Then, g f is isometric, if and only if f isisometric.

Proof. Let p ∈M . By the chain rule we have Dp(g f) = Df(p)g Dpf . Hence, forall u, v ∈ TpM we have

〈Dp(g f) u,Dp(g f) v〉 = 〈Df(p)g Dpf u,Df(p)g Dpf v〉.

We prove 1. If g is isometric, we have

〈Dp(g f) u,Dp(g f) v〉 = 〈Dpf u,Dpf v〉 = 〈u, v〉

and hence g f is isometric. If g f is isometric, by the foregoing argument,g = g f f−1 is isometric. The case 2. is proven similarly.

Isometries between manifolds are distance preserving while isometric immer-sions are path-length preserving. We make this precise in the following lemma.

Lemma B.2.4. Let f : M → N be a differentiable map.

1. If f is an isometric immersion, for each curve γ : [0, 1] → M , we have thatl(γ) = length(f γ). In particular, for all p, q ∈M we have

distM(p, q) ≥ distN(f(p), f(q)).

2. If f is an isometry, for all p, q ∈M we have distM(p, q) = distN(f(p), f(q)).

Proof. Let γ : [0, 1]→M be a curve. Then, by (B.2.2)

l(γ) =

∫ 1

0

∥∥∥∥∂γ∂t∥∥∥∥ 1

2

dt =

∫ 1

0

∥∥∥∥Dpf∂γ

∂t

∥∥∥∥ 12

dt =

∫ 1

0

∥∥∥∥∂(f γ)

∂t

∥∥∥∥ 12

dt = l(f γ);

Now suppose that that γ(0) = p, γ(1) = q and distM(p, q) = l(γ). Then f γ isa curve in N with start and end points (f γ)(0) = f(p) and (f γ)(1) = f(q).Hence,

distM(p, q) = l(γ) = l(f γ) ≥ distN(f(p), f(q)).

If f is an isometry, f−1 is isometric, too. Hence, distM(p, q) ≤ distN(f(p), f(q)).This finishes the proof.

207

Page 222: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B. Differential Geometry

We close this subsection with a lemma that is very useful when it comes toproving isometric properties of linear maps.

Lemma B.2.5. Let 〈 , 〉 : Rn ×Rn → R be a bilinear form and U : Rn → Rn be alinear map. Then the following holds:

∀u, v ∈ Rn : 〈Au,Av〉 = 〈u, v〉 ⇐⇒ ∀u ∈ Rn : 〈Au,Au〉.

Proof. The claim follows from 〈u, v〉 = 12

(〈u− v, u− v〉 − 〈u, u〉 − 〈v, v〉) .

B.2.2. Retractions

The tangent bundle of a manifold M is the smooth vector bundle

TM := (p, v) | p ∈M, v ∈ TpM .

A retraction is a map taking a tangent vector ξp ∈ TpM to a manifoldM at pto the manifold itself [ADM+02,AMS08,KSV14].

Definition B.2.6. Let M ⊂ RN be an embedded submanifold. A retraction Ris a map from an open subset TM ⊃ U → M that satisfies all of the followingproperties for every p ∈M:

1. R(p, 0p) = p;

2. U contains a neighborhood N of (p, 0p) such that the restriction R|N is smooth;

3. R satisfies the local rigidity condition D0xR(x, ·) = idTxM for all (x, 0x) ∈ N .

We let Rp(·) := R(p, ·) be the retraction R with foot at p.

The exponential map is a retraction [AMS08], which shows that every manifoldhas at least one retraction. A retraction can be interpreted as an operator thatapproximates the action of the exponential map to first order [AMS08]. Thefollowing observation is well known.

Lemma B.2.7. Let M ⊂ RN be an m-dimensional embedded submanifold. LetR be a retraction. Then for all x ∈ M there exists some δ > 0 (depending on x)such that for all η ∈ TxM with ‖η‖ < δ one has Rx(η) = x+ η +O(‖η‖2).

Proof. Fix a tangent direction η ∈ TxM and put v := η‖η‖ . If δ is sufficiently small,

by Definition B.2.6(2), the map ψ : (−δ, δ) → RN , t 7→ Rx(tv) is a smooth curvein t, admitting a standard Taylor series expression. By exploiting property (3) ofDefinition B.2.6, we find that D0ψ = D0Rx v = v. This concludes the proof.

The next well-known result states that on a product Riemannian manifold, theproduct of the individual retractions specifies a retraction.

208

Page 223: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B.3. Hermitian manifolds

Lemma B.2.8. For 1 ≤ i ≤ r, let Mi be an embedded submanifold of M. LetRi : TMi →Mi be a retraction. If P = M1 × · · · ×Mr denotes the topologicalproduct, then

R : T P → P ,((p1, ξp1), . . . , (pr, ξpr)

)7→ (R1(p1, ξp1), . . . , Rr(pr, ξpr))

is a retraction.

Proof. All properties in Definition B.2.6 can be verified readily by exploiting theproduct structure.

B.3. Hermitian manifolds

The counterpart of a Riemannian manifold for complex manifolds is the notion ofHermitian manifold [GH78].

Definition B.3.1. A Hermitian manifold (M, 〈 , 〉) is a complex manifold M,where for each p ∈M the tangent space TpM is equipped with an hermitian innerproduct 〈 , 〉p, which varies smoothly with p.

Every n-dimensional Hermitian manifold (M, 〈 , 〉) can be turned into a 2n-dimensional Riemannian manifold as follows. For each p ∈ M the tangent spaceis a 2n-dimensional real vector space. As inner product on that space we choose<〈 , 〉. Note that for v ∈ TpM we have 〈v, v〉 = <〈v, v〉. This implies that, if onewants to define lenghts of curves on M as in (B.2.3), it suffices to regard M asRiemannian manifold in the above described way.

B.3.1. The Fubini-Study metric

Particular examples of Riemannian manifolds are S(Rn), P(Rn) and P(Cn). Notethat, as Riemannian manifolds we have S(Cn) ∼= S(R2n), so the case of the complexsphere is also covered in this subsection.

The sphere S(Rn) naturally is endowed with the Riemannian metric that isinherited from the euclidean inner product on Rn. The following can be found in,e.g., [BC13, Eq. (14.12)].

Lemma B.3.2. The Riemannian metric on S(Rn) is dS(v, w) := arccos〈v, w〉.

The metric dS is called the angular distance on S(Rn). By [BC13, Lemma 14.11]we can identify the tangent space to P(Cn) at v with the subspace of Cn given byv⊥ := w ∈ Cn | 〈v, w〉 = 0. Here 〈v, w〉 is the Hermitian inner product on Cn

and v⊥ is called the orthogonal complement of v (a little care should be taken here,since one has to take a representative for v for the definition of v⊥). The followingis [BC13, Proposition 14.12].

209

Page 224: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B. Differential Geometry

Lemma B.3.3 (Fubini-Study metric). Let the Hermitian metric on v⊥ be

〈x, y〉v :=〈x, y〉‖v‖

, (B.3.1)

where the inner product on the right is the usual Hermitian inner product on Cn.Then the Hermitian metric on P(Cn) is given by

dP(v, w) := arccos|〈v, w〉|‖v‖ ‖w‖

,

where on the right-hand side v, w ∈ Cn denote representatives. The same holdstrue for P(Rn) endowed with the inner product induced by the euclidean innerproduct on Rn.

B.4. Integration on manifolds

The following is a wrap-up of [BC13, Section A.2.3] and [BCSS98, Sec. 13.2].Let M be an oriented n-dimensional manifold and let (U, h) be a chart of M .

Let ω be a function that smoothly assigns to each p ∈ M a multilinear formω(p) ∈ Λn (TpM)∗. Then we have the n-form

ω′(x) := ω (h−1(x)) Dxh−1 ∈ Λn((TxRn)).

Since, TxRn ∼= Rn and dim Λn(Rn) = 1 we see that ω′(x) is a multiple of thedeterminant det. Let ρ be the function that satisfies ω′(x) = ρ(x) det.

Definition B.4.1. If the function ρ is smooth for all charts (U, h), the form ω iscalled an integration form on M .

A choice of an integration form allows us to define integrals on M as follows.Let (U, h) be a chart of M . For a continuous function f : U → R we define theintegral of f with respect to ω as∫

U

fω :=

∫Rnf(h−1(x)) ρ(x) dx.

To define the integral of functions that are defined on all of M one needs a partitionof unity [dC93, Chapt. 0, Sec. 5]. A partition of unity is a finite collection ofsmooth functions χ1, . . . , χr on M with values in [0, 1] such that

∑ri=1 χi = 1, and

such that each supp(χi) contained in some chart of M . Then, for a continuousfunction f : M → R we define∫

M

fω :=r∑i=1

∫M

f χi ω.

210

Page 225: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B.4. Integration on manifolds

A partition of unity exists, if and only if every connected component of M isHausdorff and has a countable basis [dC93, Chapt. 0, Theorem 5.6].

Now let M and N be Riemannian manifolds and F : M → N be a differentiablemap. The normal jacobian at a point x ∈ M of F is the determinant of thelinear map that is given by the derivative of F at x restricted to the orthogonalcomplement of its kernel. We denote it by

NJ(F )(x) := det(

DxF |(ker DxF )⊥

)(B.4.1)

The definition of the normal jacobian is used in the coarea formula.

Theorem B.4.2 (Coarea formula). Suppose that M,N are Riemannian manifoldsof dimensions m,n, respectively. Let F : M → N be a surjective smooth map.Then we have for any function χ : M → R that is integrable with respect to thevolume measure of M that∫

M

χ(x)dx =

∫y∈N

[∫F−1(y)

χ(x)

NJ(F )(x)dx

]dy.

B.4.1. An integral formula

We need the coarea formula in the following special situation. Let V ⊂ M × Nbe a submanifold with dimV = dimM and let π1 : V → M , π2 : V → N be theprojections onto the first and second coordinate, respectively.

Suppose that π2 is regular, that is every y ∈ N is a regular value of π2. Weassume that for all regular values x ∈ M of π1 the fiber π−1

1 (x) is finite. By theimplicit function theorem, the projection π1 is locally invertible in a neighborhoodW of x. Then, by applying the coarea formula twice, we have for an integrablefunction χ : M → R:∫

x∈Wχ(x) dx =

∫y∈N

[∫x∈π1(π−1

2 (y))

χ(x)NJ(π1)(x)

NJ(π2)(x)dx

]dy, (B.4.2)

If the tangent space to V at (x, y) is the graph of a linear map, the quotient ofthe normal jacobians can be desribed explicity. The following result is Lemma 3in [BCSS98, sec. 13.2],

Lemma B.4.3. In the above situation suppose that

T(x,y)V =

(•x,

•y) ∈ TxM × TyN |

•y = φ(

•x),

where φ is linear. Then

NJ(π1)(x)

NJ(π2)(x)= |det(φφT )|−

12 .

211

Page 226: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

B. Differential Geometry

Remark B.4.4. If M,N are hermitian manifolds, one must replace |det(φφT )|− 12

by |det(φφ∗)|−1; see the comment in Theorem 5 in [BCSS98, Sec. 13.2].

The following is immediate.

Corollary B.4.5. In the situation of Lemma B.4.3 we have∫x∈M|π−1(x)| dx =

∫y∈N

[∫x∈π1(π−1

2 (y))

|det(φφT )|−12 dx

]dy; (B.4.3)

provided that∫M

1 dM <∞.

212

Page 227: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

C. Higher transcendental functions

C.1. The error function and the complementaryerror function

The error function [SOJ00, 40:3:2] and the complementary error function are re-spectively defined as

erf(x) :=2√π

x∫0

exp(−t2) dt, and erfc(x) :=2√π

∞∫x

exp(−t2) dt. (C.1.1)

They satisfy the equations [SOJ00, 40:5:1, 40:5:3]

erf(−x) = −erf(x), and erfc(−x) = 2− erfc(x) (C.1.2)

The cumulative distribution function of the normal distribution [SOJ00, 40:14:2]is defined as

Φ(x) :=1√2π

∞∫−∞

e−t2

2 dt. (C.1.3)

The error function and Φ(x) are related by the following equation [SOJ00, 40:14:2]

2Φ(x) = 1 + erf

(x√2

). (C.1.4)

C.2. The (incomplete) Gamma function

For x ≥ 0 the gamma function [SOJ00, Sec. 43] is defined as

Γ(x) :=

∫ ∞0

tx−1e−tdt. (C.2.1)

For all x > 0 the Gamma functions satisfies [SOJ00, 43:5:3]

Γ(x+ 1) = xΓ(x). (C.2.2)

213

Page 228: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

C. Higher transcendental functions

Special values are

Γ(1) = 1, Γ(12) =√π, Γ(3

2) =

√π

2; (C.2.3)

the second equality by [SOJ00, 43:4:2] and the third by (C.2.2). If n is a positiveinteger, (C.2.3) and (C.2.2) combined show that

Γ(n) = (n− 1)!. (C.2.4)

The upper and lower incomplete Gamma function are denoted

Γ(n, x) :=

∫ ∞t=x

tn−1e−tdt, γ(n, x) :=

∫ x

t=0

tn−1e−tdt, (C.2.5)

where x ≥ 0.

Lemma C.2.1. Let n ≥ 2. We have

1. Γ(n−1) ≤√π

2n.

2. Γ(1 + n−1) ≤√π

2.

Proof. We have Γ(n−1) = nΓ(1 + n−1). The Gamma function is convex, which

shows that for n ≥ 2 we have Γ(1 + n−1) ≤ max

Γ(1),Γ(32)

=√π

2.

The following is [GR15, eq. 6.455].

Proposition C.2.2. For α, β, µ, ν > 0 we have

1.

∫ ∞0

xµ−1e−βxΓ(ν, αx)dx =ανΓ(µ+ ν)

µ(α + β)µ+νF

(1, µ+ ν, µ+ 1,

β

α + β

).

2.

∫ ∞0

xµ−1e−βxγ(ν, αx)dx =ανΓ(µ+ ν)

ν(α + β)µ+νF

(1, µ+ ν, ν + 1,

α

α + β

),

where F (a, b, c, x) is Gauss’ hypergeometric functions as defined in Appendix C.4.2.

C.3. The (incomplete) Beta function

For p, q > 0 the Beta function [SOJ00, 43:13] is defined as

B(p, q) :=

∫ 1

0

tp−1(1− t)q−1dt, (C.3.1)

whereas, for 0 ≤ x ≤ 1 the incomplete Beta function [SOJ00, 58:3:1] is defined as

B(p, q, x) :=

∫ x

0

tp−1(1− t)q−1dt. (C.3.2)

214

Page 229: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

C.4. Hypergeometric functions

If q is a positive integer or if both p, q are positive half-integers, we can write theincomplete beta function B(p, q, x) as a polynomial in x. The following propositiongathers [SOJ00, 58:4:3, 58:4:7 and 58:4:8].

Proposition C.3.1. Let ν ∈ R and 0 < x < 1. The following holds.

1. If m is a positive integer, then

B(ν,m, x) = xνm−1∑j=0

(m− 1

j

)(−x)j

j + ν.

2. If m,n are non-negative integers, then

B

(n+

1

2,m+ 1, x

)= xn+ 1

2

m∑j=0

(m

j

)(−x)j

j + n+ 12

, and

B

(n+ 1,m+

1

2, x

)=

n∑j=0

(−1)j(n

j

)1− (1− x)j+m+ 1

2

j +m+ 12

.

The following proposition is a combination of [Tem75, Eq. (3.9), (3.10)] andthe unlabeled equation after [Tem75, Eq. (3.14)].

Proposition C.3.2. Let erfc(·) denote the complementary error function. Forall p > 0, 0 ≤ x ≤ 1 and q →∞ we have

B(p, q, x) ∼ Γ(p) Γ(q)

Γ(p+ q)

(1

2erfc (−ω) + bx(p, q)

),

where

ω :=(p ln

(pp+q

)− p ln(x) + q ln

(qp+q

)− q ln(1− x)

) 12

and

bx(p, q) :=(

p2πq(p+q)

) 12(x(p+q)p

)p ((1−x)(p+q)

q

)q (q

p−(p+q)x+ ω−1

√p+qp

)(1 +O(q−1)).

C.4. Hypergeometric functions

For positive integers the Pochhammer polynomials [SOJ00, 18:3:1] are defined by

(x)n := x(x+ 1) . . . (x+ n− 1),

and (x)0 := 1.

215

Page 230: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

C. Higher transcendental functions

Hypergeometric functions [SOJ00, Sec. 18:14] are functions indexed by a tupleof positive integers (p, q) that are given by

pFq(a1, . . . , ap; b1, . . . , bq;x) :=∞∑k=0

(a1)k · . . . · (ap)k(b1)k · . . . · (bq)k

xk

k!.

Hypergeometric functions can be used to describe a wide variety of functions; see[SOJ00, Table 18.1–18.9]. But in this work we will only need the two special cases(p, q) = (1, 1) and (p, q) = (2, 1).

C.4.1. Kummer’s hypergeometric function

Kummer’s confluent hypergeometric function [SOJ00, Sec. 47] is given by

M(a, c, x) := 1F1(a; c;x) =∞∑k=0

(a)k(c)k

xk

k!, (C.4.1)

The error function and Kummer’s hypergeometric function are related by

erf(x) =2x√πM(

12, 3

2,−x2

); (C.4.2)

see [AS72, 13.6.19].

C.4.2. Gauss’ hypergeometric function

Gauss’ hypergeometric function [SOJ00, Sec. 60] is defined as

F (a, b, c, x) := 2F1(a, b; c;x) =∞∑k=0

(a)k (b)k(c)k

xk

k!, (C.4.3)

where a, b, c ∈ R, c 6= 0,−1,−2, . . .. The following lemma is used in the proof ofTheorem 9.4.1

Lemma C.4.1. Let a, b be non-positive integers and c 6= 0,−1,−2, . . .. Then

F (a, b+ 1, c, x)− F (a+ 1, b, c, x) =(a− b)x

cF (a+ 1, b+ 1, c+ 1, x)

Proof. Since a and b are non-negative integers, F (a, b+ 1, c, x) and F (a+ 1, b, c, x)are polynomials, whose constant term is equal to 1. Therefore,

F (a, b+ 1, c, x)− F (a+ 1, b, c, x) =∞∑k=1

(a)k(b+ 1)k − (a+ 1)k(b)k(c)k

xk

k!. (C.4.4)

216

Page 231: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

C.4. Hypergeometric functions

By [SOJ00, 18:5:6] we have (x+ 1)n = (1 + nx)(x)n and hence

(a)k(b+ 1)k − (b)k(a+ 1)k = (a)k(b)k(1 + kb)− (a)k(b)k(1 + k

a)

= (a)k(b)kka− bab

.

According to [SOJ00, 18:5:7] the latter is equal to (a+1)k−1(b+1)k−1k(a− b) and,moreover, (c)k = c(c+1)k−1. The claim follows when plugging this into (C.4.4).

Furthemore, we need the following lemma for Gauss hypergeometric functions.

Proposition C.4.2. Let b, c, ν ∈ R and 0 < x < 1.

1. If c− 1 > 0, b− c+ 1 > 0, then

F (1, b, c, x) = (c− 1)(1− x)c−b−1x1−cB(c− 1, b− c+ 1, x).

2. For all n we have

F

(1, n− 1

2,3

2, x

)=

1

2(1− x)n−1

n−2∑j=0

(n− 2

j

)(−x)j

j + 12

.

3. If n = 2k is even, then

F

(1, n− 1

2,n+ 1

2, x

)=

n− 1

2(1− x)k

k−1∑j=0

(k − 1

j

)(−x)j

j + k − 12

.

4. If n = 2k + 1 is odd, then

F

(1, n− 1

2,n+ 1

2, x

)=

n− 1

2(1− x)k+ 12xk

k−1∑j=0

(−1)j(k − 1

j

)1− (1− x)j+k+ 1

2

j + k + 12

Proof. Part (1) can be found in the first table in [SOJ00, Sec. 60:4]. For (2) putb = n− 1

2, c = 3

2and combine (1) with Proposition C.3.1 (1). Finally, to deduce (3)

and (4) put b = n− 12, c = n+1

2. and combine (1) with Proposition C.3.1 (2).

217

Page 232: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 233: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

D. Hermite polynomials

D.1. Definition and intrarelationships

The physicists’ Hermite polynomials [AS72, Sec. 22.2] are defined by

Hk(x) := (−1)kex2 dk

dxke−x

2

, k = 0, 1, 2, . . . (D.1.1)

and the probabilist’s Hermite polynomials [AS72, Sec. 22.2] are defined by

Hek(x) := (−1)kex2

2dk

dxke−

x2

2 , k = 0, 1, 2, . . . . (D.1.2)

The two definitions are related by the following equality [SOJ00, 24:1:1]

Hek(x) =1√

2kHk

(x√2

). (D.1.3)

That the functions Hk(x) and Hek(x) are indeed polynomial functions is given byfollowing lemma.

Lemma D.1.1. Hk(x) is a polynomial of degree k with leading coefficient 2k andHek(x) is a polynomial of degree k with leading coefficient 1.

Proof. See, e.g., [SOJ00, 24:6:2].

By [SOJ00, 24:5:1] we have

Hk(−z) = (−1)kHk(z) and Hek(−z) = (−1)kHek(z) (D.1.4)

Hermite polynomials can be expressed in terms of Kummer’s confluent hypergeo-metric function from (C.4.1):

H2k+1(x) = (−1)k(2k + 1)! 2x

k!M(−k, 3

2, x2) (D.1.5)

H2k(x) = (−1)k(2k)!

k!M(−k, 1

2, x2); (D.1.6)

see [AS72, 13.6.17, 13.6.18].

219

Page 234: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

D. Hermite polynomials

D.2. Orthogonality relations of the Hermitepolynomials

Let ω : R→ R≥0 be a measurable nonnegative function, called the weight function.We define

L2(ω) :=

f : R→ R |

∫ ∞−∞

f(x)2ω(x) dx

.

and call

〈 , 〉 : L2(ω)× L2(ω)→ R, (f, g) 7→∫ ∞−∞

f(x)g(x)ω(x) dx (D.2.1)

the inner product with weight ω(x). Note that for all (f, g) ∈ L2(ω) × L2(ω) wehave 〈f, g〉 <∞ due to the Cauchy-Schwartz inequality.

Two important weights in the context of Hermite polynomials are ω(x) = e−x2

2

and ω(x) = e−x2. In fact, the probabilist’s Hermite polynomials Hek(x) are orthog-

onal with respect to the first and the physicists’ Hermite polynomials Hk(x) areorthogonal with respect to latter; i.e., [Abr72, 22.2.14, 22.2.15]∫ ∞

−∞Hk(x)H`(x)e−x

2

dx =√π 2k k! δk,` (D.2.2)∫ ∞

−∞Hek(x)He`(x)e−

x2

2 dx =√

2π k! δk,` (D.2.3)

where δk,` is the Kronecker-symbol. In Lemma D.2.1 below we further give the

inner products of the Hek(x) with respect to the weight e−x2.

Lemma D.2.1. Let Γ(x) be the Gamma function from (C.2.1) and F (a, b, c, x) beGauss’ hypergeometric function as defined in (C.4.3).

1. We have∫ ∞−∞

Hem(x)Hen(x)e−x2

dx =

(−1)b

m2c+bn

2c Γ(m+n+1

2

), if m+ n is even

0, if m+ n is odd.,

2. More generally, if m+ n is even, we have for α > 0, α2 6= 12

that

∞∫−∞

Hem(x)Hen(x)e−α2x2dx =

(1− 2α2)m+n

2 Γ(m+n+1

2

)αm+n+1

F(−m−n, 1−m−n

2, α2

2α2−1

)Proof. The first is [GR15, 7.374.2] and the second is [Bat54, p. 289, Eq. (12)].

220

Page 235: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

D.3. The expectation of Hermite polynomials

D.3. The expectation of Hermite polynomials

In this section we will compute the expected value of the Hermite polynomialswhen the argument follows a normal distribution.

Lemma D.3.1. For σ2 > 0 we have

Eu∼N(0,σ2)

H2k(u) =(2k)!

k!(2σ2 − 1)k.

Proof. Write

Eu∼N(0,σ2)

H2k(u) =1√

2πσ2

∞∫u=−∞

H2k(u) e−u2

2σ2 du =1√π

∞∫w=−∞

H2k(√

2σ2w) e−w2

dw,

where the second equality is due to the change of variables w := u√2σ2

. Applying

[GR15, 7.373.2] we get

1√π

∫ ∞w=−∞

H2k(√

2σ2w)e−w2

dw =(2k)! (2σ2 − 1)k

k!

This finishes the proof.

Lemma D.3.2. Let σ2 > 0 and define

Pk(x) :=

Hek(x), if k = 0, 1, 2, . . .

−√

2π ex2

2 Φ(x), if k = −1,

where Φ is the cumulative distribution function of the standard normal distribution;cf. (C.1.3). Then, we have the following

1. If k, ` > 0 and k + ` is even, we have

Eu∼N(0,σ2)

Pk(u)P`(u)e−u2

2 =(−1)

k+`2

√2k+`

Γ(k+`+1

2

)√π√σ2 + 1

k+`+1F(− k,−`, 1−k−`

2, σ

2+12

).

2. For all k we have

Eu∼N(0,σ2)

P−1(u)P2k+1(u)e−u2

2 =(−1)k+1(2k + 1)!

2k k!

(1− σ2)kσ2

√1 + σ2

F(−k, 1

2, 3

2, σ4

σ4−1

).

221

Page 236: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

D. Hermite polynomials

Proof. To prove 1. we write

Eu∼N(0,σ2)

Pk(u)P`(u)e−u2

2 = Eu∼N(0,σ2)

Hek(u)He`(u)e−u2

2

=1√

2πσ2

∞∫u=−∞

Hek(u)He`(u)e−u

2

2

(1+

1σ2

)du.

Put α2 := 12

(1 + 1

σ2

)and observe that α2 6= 1. By Lemma D.2.1 (2) we have

1√2πσ2

∞∫u=−∞

Hek(u)He`(u)e−u

2

2

(1+

1σ2

)du

=(1− 2α2)

k+`2 Γ

(k+`+1

2

)√

2πσ2 αk+`+1F

(−k,−`, 1− k − `

2,

α2

2α2 − 1

)=

(−1)k+`2

√2k+`

Γ(k+`+1

2

)√π√σ2 + 1

k+`+1F

(−k,−`, 1− k − `

2,σ2 + 1

2

)This proves 1. For 2. we have

Eu∼N(0,σ2)

P−1(u)P2k+1(u)e−u2

2 = −√

2π Eu∼N(0,σ2)

Φ(u)He2k+1(u)

=−1

σ

∞∫u=−∞

Φ(u)He2k+1(u) e−

u2

2σ2 du.

Making a change of variables x := u√2

the right-hand integral becomes

−1√2σ

∞∫u=−∞

Φ(√

2x)He2k+1(√

2x) e−x2

σ2 dx

=−1

2k+1 σ

∞∫x=−∞

(1 + erf(x))H2k+1(x) e−x2

σ2 dx

the equality due to (C.1.4) and (D.1.3). We know from (D.1.4) that H2k+1(x) isan odd function, which implies that∫ ∞

w=−∞H2k+1(x)e−

x2

σ2 dx = 0.

222

Page 237: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

D.3. The expectation of Hermite polynomials

Moreover, by (D.1.5) we have H2k+1(x) = (−1)k (2k+1)! 2xk!

M(−k, 32, x2) and by

(C.4.2) we have erf(x) = 2x√πM(

12, 3

2,−x2

). All this shows that

Eu∼N(0,σ2)

P−1(u)P2k+1(u)e−u2

2

=(−1)k+1(2k + 1)!

2k−1√π σ k!

∞∫x=−∞

x2M(

12, 3

2,−x2

)M(−k, 3

2, x2) e−

x2

σ2 dx

=(−1)k+1(2k + 1)!

2k−2√π σ k!

∞∫x=0

x2M(

12, 3

2,−x2

)M(−k, 3

2, x2) e−

x2

σ2 dx. (D.3.1)

where for the second equality we used that the integrand is an even function.Making a change of variables t := x2 we see that

∞∫x=0

x2M(

12, 3

2,−x2

)M(−k, 3

2, x2) e−

x2

σ2 dx

=1

2

∞∫t=0

√t M

(12, 3

2,−t)M(−k, 3

2, t) e−

tσ2 dt. (D.3.2)

By [GR15, 7.622.1] we have

∞∫t=0

√t M

(12, 3

2,−t)M(−k, 3

2, t) e−

tσ2 dt = Γ

(3

2

)(1− σ2)kσ3

√1 + σ2

F(−k, 1

2, 3

2, σ4

σ4−1

)Plugging this into (D.3.2) and the result into (D.3.1) we obtain

Eu∼N(0,σ2)

P−1(u)P2k+1(u)e−u2

2

=(−1)k+1(2k + 1)!

2k−1√π σ k!

Γ(

32

) (1− σ2)kσ3

√1 + σ2

F(−k, 1

2, 3

2, σ4

σ4−1

)=

(−1)k+1(2k + 1)!

2k k!

(1− σ2)kσ2

√1 + σ2

F(−k, 1

2, 3

2, σ4

σ4−1

).

For the second equality we have used that Γ(

32

)=√π

2, see (C.2.3). This finishes

the proof.

223

Page 238: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 239: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value ofrandom determinants

Let K ∈ R,C and A ∈ Km×m. We denote by ai,j the (i, j)-entry of A and by Ai,j

the matrix that is obtained from A by removing the i-th row and the j-th column.The following proposition is [ABB+15, Eq. (39)].

Proposition E.0.1. Let A be a random complex matrix with independent entriesall of them having finite first and second moment. Let 1 ≤ i ≤ m and A′ be thematrix that is obtained by replacing the i-th row of A by the i-th row of EA. Then,we have E |detA|2 = E |detA′|2 +

∑mj=1 Var(ai,j)E |detAi,j|2 .

Proof. We have detA =∑m

j=1(−1)j+iai,j · detAi,j by Laplace’s expansion. Thus,

|detA|2 = detA · detA =m∑

j,j′=1

(−1)j+j′ai,jai,j′ · detAi,j detA

i,j′

.

From this we see that

E |detA|2 =m∑

j,j′=1

(−1)j+j′ E[ai,jai,j′ ] · E[detAi,j detA

i,j′

], and

E |detA′|2 =m∑

j,j′=1

(−1)j+j′ E ai,j E ai,j′ · E[detAi,j detA

i,j′

]

The independence of the ai,j implies that

E ai,jai,j′ detAi,j detAi,j′

= E[ai,jai,j′ ] · E[detAi,j detAi,j′

].

We have

E[ai,jai,j′ ] =

E ai,j E ai,j′ + Var(ai,j), if j = j′.

E ai,j E ai,j′ , if j 6= j′

Using

E |detA|2 =m∑

j,j′=1

(−1)j+j′ E[ai,jai,j′ ] · E[detAi,j detA

i,j′

]

225

Page 240: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

We conclude that

E |detA|2 =m∑

j,j′=1

(−1)j+j′ E ai,j EAi,j′ · E[detAi,j detA

i,j′

]

+m∑j=1

Var(ai,j)E[detAi,j detAi,j

]

=E |detA′|2 +m∑j=1

Var(ai,j)E∣∣detAi,j

∣∣2 ,which shows the assertion.

E.1. Complex Ginibre ensemble

Recall from Definition 2.4.7 the definition of the complex Ginibre ensemble. Thefollowing is an auxiliary lemma.

Lemma E.1.1.

1. We have EA∼NC(Cn×n)

|det(A)|2 = n!.

2. For S ⊂ [n] := 1, . . . , n we write |S| for the cardinality of S and defineAS ∈ C|S|×|S| to be the submatrix of A ∈ Cn×n indexed by S. Then for anyu ∈ C we have that

det (A+ uIn) =∑S⊂[n]

un−|S| detAS,

where In is the n× n unit matrix.

Proof. The first assertion is [BC13, Lemma 4.12], and the second assertion can befound in, e.g., [HJ92, Theorem 1.2.12].

Proposition E.1.2. We have for A ∈ Cn×n and u ∈ C

EA∼NC(Cn×n)

|det(A− uIn)|2 = n!n∑k=0

1

k!|u|2k.

Proof. By Lemma E.1.1 (3), det(A− uIn×n) =∑

S⊂[n](−u)n−|S| detAS, hence

|det(A− uIn×n)|2 =∑S1,S2

(−u)n−|S1| (−u)n−|S2| detAS1 detAS2 .

226

Page 241: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.2. Real Ginibre ensemble

Due to Lemma E.1.1 (2) and since we deal with centered distributions we have

E [detAS1 detAS2 ] =

|S1|!, if S1 = S2

0, else..

Hence,

E |det(A− uIn×n)|2 =n∑k=0

(n

k

)k! |u|2(n−k) = n!

n∑k=0

1

(n− k!)|u|2(n−k).

This finishes the proof.

E.2. Real Ginibre ensemble

Recall from Definition 2.4.8 the definition of the real Ginibre ensemble N(Rn×n).

Theorem E.2.1. Let In denote the n× n-identity matrix. We have for standardgaussian A ∈ Rn×n and fixed u ∈ R

EA∼N(Rn×n)

|det(A−uIn)| =√

2n

√π

Γ(n+1

2

)Γ(n)

(eu2

2 Γ(n, u2) + 2n−1

(u2

2

)n2

γ

(n

2,u2

2

)),

where Γ(n, z) and γ(n, z) are the upper and lower incomplete gamma functions(C.2.5), respectively.

Proof. Put B := (A − uIn)T (A − uIn). Then B is said to have the noncentralWishart distribution; see [Mui82, Definition 10.3.1, p. 441]. We have

|det(A− uIn)| = det(B)12

and the expectation of det(B)12 is given in [Mui82, Theorem 10.3.7, p. 447]. Com-

bining this with [EKS94, Theorem 4.1] yields the claim.

E.3. Gaussian Orthogonal Ensemble

Recall from Definition 2.4.9 the definition of the Gaussian Orthogonal EnsembleGOE(n). In this section we deduce a new formula for the expected absolute valueof the characteristic polynomial of a GOE matrix.

In what follows let us denote

In(u) := EA∼GOE(n)

|det(A− uI)|, and Jn(u) := EA∼GOE(n)

det(A− uI). (E.3.1)

227

Page 242: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

We remark that |Jn(u)| ≤ In(u) by the triangle inequality. A computationof Jn(u) is given in [Meh91, Sec. 22]:

Theorem E.3.1 (The expected characteristic polynomial of a matrix from theGaussian Orthogonal Ensemble). Let Hk(x) denote the physicist’s Hermite poly-nomial (D.1.1). Then

1. J2m(u) =

√πm ∏m−1

i=1 (2i)!

2m(m+1)∏2m

i=1 Γ(i2

) H2m(u).

2. J2m+1(u) =

√πm+1∏m

i=0(2i)!

2m(m+1)∏2m+1

i=1 Γ( i2)

m∑i=0

1

22i i!(xH2i+1(x)−H ′2i+1(x)).

The ideas in this section are inspired by the ideas in [Meh91, Sec. 22]. Mod-ifying the arguments from that reference we prove a formula for In(u). In fact,In(u) is expressend in terms of Jn(u) and a collection of the probabilist’s Hermitepolynomials (D.1.2).

Theorem E.3.2 (The expected absolute value of the characteristic polynomial ofa matrix from the Gaussian Orthogonal Ensemble). Let u ∈ R be fixed. Define thefunctions P−1(x), P0(x), P1, (x), P2(x), . . . as

Pk(x) :=

Hek(x), if k = 0, 1, 2, . . .

−√

2π ex2

2 Φ(x), if k = −1,

where Hek(x) is the k-th (physicist’s) Hermite polynomial (D.1.2) and Φ is thecumulative distribution function of the standard normal distribution (C.1.3). Thefollowing holds.

1. If n = 2m is even, we have

In(u) = Jn(u) +

√2π e−

u2

2∏ni=1 Γ

(i2

) ∑1≤i,j≤m

det(Γi,j1 ) det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

],

where Γi,j1 :=[Γ(r + s− 1

2

)]1≤r≤m,r 6=i1≤s≤m,s 6=j

.

2. If n = 2m− 1 is odd we have

In(u) = Jn(u) +

√2 e−

u2

2∏ni=1 Γ

(i2

) ∑0≤i,j≤m−1

det(Γi,j2 ) det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

],

where Γi,j2 =[Γ(r + s+ 1

2

)]0≤r≤m−1,r 6=i0≤s≤m−1,s 6=j

.

228

Page 243: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

E.3.1. Preparation of some integrals

To prove Theorem E.3.2 we are required to compute the integrals in (E.3.24)and (E.3.28) below. This we will do in the present section. As in Theorem E.3.2we abbreviate

Pk(x) :=

Hek(x), if k = 0, 1, 2, . . .

−√

2π ex2

2 Φ(x), if k = −1.(E.3.2)

and, moreover, put

Gk(x) :=

x∫−∞

Pk(y) e−y2

2 dy, k = 0, 1, 2, . . . (E.3.3)

We can express the functions Gk(x) in terms of the Pk(x).

Lemma E.3.3. We have

1. For all k: Gk(x) = −e−x2

2 Pk−1(x).

2. Gk(∞) =

√2π, if k = 0

0, if k ≥ 1

Proof. Note that (2) is a direct consequence of (1). For (1) let k ≥ 0 and write

Gk(x) =

x∫y=−∞

Hek(y)e−y2

2 dy

=

x∫y=−∞

(−1)kdk

dyke−

y2

2 dy, (by (D.1.2)).

Thus Gk(x) = (−1)k dk−1

dxk−1 e−x

2

2 = −e−x2

2 Pk−1(x) as desired.

We now fix the following notation: If two functions f : R → R, g : R → Rsatisfy

∫R f(x)e−x

2dx <∞ and

∫R g(x)e−x

2dx <∞, we define

〈f(x), g(x)〉 :=

∫Rf(x)g(x)e−

x2

2 dx, (E.3.4)

which is finite due to the Cauchy-Schwartz inequality. In the terms of defini-

tion (D.2.1) this is the inner product with weight e−x2

2 . The functions Pk(x)and Gk(x) satisfy the following orthogonality relations with respect to 〈 , 〉.

229

Page 244: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

Lemma E.3.4. For all k, ` ≥ 0 we have

1. 〈Gk(x), P`(x)〉 = −〈G`(x), Pk(x)〉.

2. 〈Gk(x), P`(x)〉 =

(−1)i+jΓ

(i+ j − 1

2

), if k = 2i− 1 and ` = 2j

0, if k + ` is even

Proof. For (1) we have

〈Gk(x), P`(x)〉 =

∫RGk(x)P`(x)e−

x2

2 dx

=

∫R

(∫ x

−∞Pk(y)e−

y2

2 dy

)P`(x)e−

x2

2 dx

=

∫R

(∫ ∞y

P`(x)e−x2

2 dx

)Pk(y)e−

y2

2 dy

= (−1)`∫R

(∫ −y−∞

P`(x)e−x2

2 dx

)Pk(y)e−

y2

2 dy

= (−1)k+`

∫R

(∫ y

−∞P`(x)e−

x2

2 dx

)Pk(y)e−

y2

2 dy

= (−1)k+`〈G`(x), Pk(x)〉,

where the fourth equality is due to the transformation x 7→ −x and equation(D.1.4) and the fifth equality is obtained using the transformation y 7→ −y. Thisshows (1) for the case k+ ` odd. Further, for k+ ` even we get 〈P`, Pk〉 = 0, whichproves (1) and (2) for this case. Now assume that k = 2i− 1, ` = 2j, in particulark 6= 0. We use Lemma E.3.3 to write

〈Gk(x), P`(x)〉 = −∫RPk−1(x)P`(x)e−x

2

dx

= −∫RP2i−2(x)P2j(x)e−x

2

dx

= −∫RHe2i−2

(x)He2j(x)e−x2

dx. (E.3.5)

By Lemma D.2.1 (1) we have∫RHem(x)Hen(x)e−x

2

dx =

(−1)b

m2c+bn

2c Γ(m+n+1

2

), if m+ n is even

0, if m+ n is odd.

Applying this to (E.3.5) we see that 〈Gk(x), P`(x)〉 = (−1)i+jΓ(i+ j − 1

2

). This

finishes the proof.

230

Page 245: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

We proceed with a general lemma that will ease the computation of equations(E.3.24) and (E.3.27) significantly.

Lemma E.3.5. Let f : Rm × R → R, (x, u) 7→ f(x, u) be a measurable function,such that for all u ∈ R we have

∫x∈Rm f(x, u) dx < ∞. Write x = (x1, . . . , xm)

and assume that f is invariant under any permutations of the xi. Then

m∑j=0

(m

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm

f(x, u) dx =

∫x∈Rm

f(x, u) dx

for all u ∈ R.

Proof. We prove the statement by induction. For m = 1 we have∫x1≤u

f(x1, u) dx1 +

∫u≤x1

f(x1, u) dx1 =

∫x1∈R

f(x1, u) dx1.

For m > 1 we write

gj(u) :=

∫x1,...,xj≤u

u≤xj+1,...,xm

f(x1, . . . , xm, u) dx, j = 0, . . . ,m

Using(mj

)=(m−1j

)+(m−1j−1

)[SOJ00, 6:5:3] we have

m∑j=0

(m

j

)gj(u) =

m−1∑j=0

(m− 1

j

)(gj(u) + gj+1(u)), (E.3.6)

where

gj(u) + gj+1(u) =

∫x1,...,xj≤u

u≤xj+2,...,xm

∞∫xj+1=−∞

f(x1, . . . , xm, u) dx.

By assumption f is invariant under any permutations of the xi. Thus, making achange of variables that interchanges xj+1 and xm we see that

gj(u) + gj+1(u) =

∞∫xm=−∞

x1,...,xj≤uu≤xj+1,...,xm−1

f(x1, . . . , xm−1, xm, u)dx1 . . . dxm−1

dxm.

231

Page 246: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

Plugging this into (E.3.6) and interchanging summation and integration we obtain

m∑j=0

(m

j

)gj(u)

=

∞∫xm=−∞

m−1∑j=0

(m− 1

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm−1

f(x1, . . . , xm−1, xm, u)dx1 . . . dxm−1

dxm.

We can now apply the induction hypothesis to the inner integral and conclude theproof.

The following lemma is the key to prove Proposition E.3.7 and Proposition E.3.8below, which themselves give formulas for the integrals in Section E.3.2 and SectionE.3.2, respectively.

Lemma E.3.6. Let x = (x1, . . . , xm−1) ∈ Rm−1 and let A denote the 2m×2(m−1)matrix

A =

G0(x1) P0(x1) . . . G0(xm−1) P0(xm−1)...

...G2m(x1) P2m(x1) . . . G2m(xm−1) P2m(xm−1)

.Moreover, put

Γa,b1 :=[Γ(r + s− 1

2

)]1≤r≤m,r 6=a,1≤s≤m,s 6=b.

,

and

Γa,b2 :=[Γ(r + s+ 1

2

)]0≤r≤m−1,r 6=a,0≤s≤m−1,s 6=b.

;

(compare the definitions in Theorem E.3.2).

1. Let S ⊂ 1, . . . , 2m be a subset with 2 elements and let AS ∈ R2(m−1)×2(m−1) bethe matrix that is obtained by removing from A all the rows indexed by S ∪0.Then ∫

x∈Rm−1

det(AS) e−m−1∑i=1

xi2

2 dx

=

(m− 1)! 2m−1 det(Γa,b1 ), if S = 2a− 1, 2b and a ≤ b

−(m− 1)! 2m−1 det(Γa,b1 ), if S = 2a− 1, 2b and a > b

0, if S has any other form.

232

Page 247: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

2. Let S ⊂ 0, . . . , 2m− 1 be a subset with 2 elements and let AS ∈ R2(m−1)×2(m−1)

be the matrix that is obtained by removing from A all the rows indexed byS ∪ 2m. Then∫

x∈Rm−1

det(AS) e−m−1∑i=1

xi2

2 dx

=

(m− 1)! 2m−1 det(Γa,b2 ), if S = 2a, 2b+ 1 and a ≤ b

−(m− 1)! 2m−1 det(Γa,b2 ), if S = 2a, 2b+ 1 and a > b

0, if S has any other form.

Proof. We first prove 1. Fix S ⊂ 1, . . . , 2m. Then

AS =[Gi(x1) Pi(x1) . . . Gi(xm) Pi(xm−1)

]1≤i≤2m,i 6∈S . (E.3.7)

Let us denote the quantity that we want to compute by Ξ:

Ξ :=

∫x∈Rm−1

det(AS) e−m−1∑i=1

xi2

2 dx (E.3.8)

To ease notation put µ := m − 1. Furthermore, let us denote the elements in1, . . . , 2m \S in ascending order by s1 < . . . < s2µ and let Σ2µ denote the groupof permutations on 1, . . . , 2µ. Expanding the determinant of AS yields

det(AS) =∑π∈Σ2µ

sgn(π)

µ∏i=1

Gsπ(2i−1)(xi)Psπ(2i)(xi). (E.3.9)

Recall from (E.3.4) the definition of 〈 , 〉. Plugging (E.3.9) into (E.3.8) and inte-grating over all the xi we see that

Ξ =∑π∈Σ2µ

sgn(π)

µ∏i=1

〈Gsπ(2i−1)(x), Psπ(2i)(x)〉. (E.3.10)

From Lemma E.3.3 we know that 〈Gk(x), P`(x)〉 = 0 whenever k+ ` is even. Thisalready proves that Ξ = 0, if S is not of the form S = 2a− 1, 2b, because in thiscase we can’t make a partition of 1, . . . , 2m \S into pairs of numbers where onenumber is even and the other is odd. If, on the other hand, S = 2a− 1, 2b doescontain one odd and two even elements, in (E.3.10) we may as well sum over thesubset

Σ′2µ :=π ∈ Σ2µ | ∀i ∈ 1, . . . , µ : sπ(2i−1) + sπ(2i) is odd

.

233

Page 248: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

Let T ⊂ Σ2µ be the subgroup generated by the set of neighbor transpositions(1 2), (3 4), . . . , ((2µ− 1) 2µ). We define an equivalence relation on Σ′2µ via:

∀π, σ ∈ Σ′2µ : π ∼ σ :⇔ ∃τ ∈ T : π = στ.

Note that the multiplication with τ from the right is crucial. Let C := Σ′2µ/ ∼denote the set of equivalence classes of T in Σ′2µ. A set of representatives for C is

R :=π ∈ Σ2µ | ∀i ∈ 1, . . . , µ : sπ(2i−1) is odd and sπ(2i) is even

.

Making a partition of Σ′2µ into the equivalence classes of ∼ in (E.3.10) we get

Ξ =∑π∈R

∑τ∈T

sgn(π τ)

µ∏i=1

〈Gsπτ(2i−1)(x), Psπτ(2i)(x)〉.

For a fixed π ∈ R and all τ ∈ T , by Lemma E.3.4 (1) we have

µ∏i=1

〈Gsπτ(2i−1)(x), Psπτ(2i)(x)〉 = sgn(τ)

µ∏i=1

〈Gsπ(2i−1)(x), Psπ(2i)(x)〉

so that

Ξ = 2µ∑π∈R

sgn(π)

µ∏i=1

〈Gsπ(2i−1)(x), Psπ(2i)(x)〉.

Let us investigate R further. We denote the group of permutation on 1, . . . , µby Σµ. The group Σµ × Σµ acts transitively and faithful on R via

∀i : ((σ1, σ2).π) (2i− 1) := π(2σ1(i)− 1) and ((σ1, σ2).π) (2i) := π(2σ2(i))

This shows that that we have a bijection Σµ × Σµ → R, (σ1, σ2) 7→ (σ1, σ2).π?where π? ∈ R is fixed. Moreover,

∀(σ1, σ2) ∈ Σµ × Σµ : sign((σ1, σ2).π?) = sgn(σ1)sgn(σ2)sign(π?).

Let us denote 2ki − 1 = sπ?(2i−1) and 2`i = sπ?(2i). We choose π? uniquely byrequiring k1 < k2 < . . . < kµ and `1 < `2 < . . . < `µ. By doing so we get

Ξ =2µsgn(π?)∑

(σ1,σ2)∈Σµ×Σµ

sgn(σ1)sgn(σ2)

µ∏i=1

〈G2kσ1(i)−1(x), P2`σ2(i)(x)〉

=2µµ! sgn(π?)∑σ∈Σµ

sgn(σ)

µ∏i=1

〈G2kσ(i)−1(x), P2`i(x)〉.

234

Page 249: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

so that, by Lemma E.3.4 (2),

Ξ =2µµ! sgn(π?)∑σ∈Σµ

sgn(σ)

µ∏i=1

(−1)kσ(i)+`i Γ(kσ(i) + `i − 1

2

)(E.3.11)

By construction we have

µ⋃i=1

2ki − 1, 2`i = 1, . . . , 2m \S,

so that k1, . . . , kµ = 1, . . . ,m \ a, and `1, . . . , `µ = 1, . . . ,m \ b . Thisshows that for all σ ∈ Σµ we have

µ∏i=1

(−1)kσ(i)+`i = (−1)m(m+1)−a−b = (−1)a+b. (E.3.12)

and, furthermore,

∑σ∈Σµ

sgn(σ)

µ∏i=1

Γ(kσ(i) + `i − 1

2

)= det

([Γ(k + `− 1

2

)]1≤k≤m,k 6=a,1≤`≤m,` 6=b.

)(E.3.13)

Putting together (E.3.11), (E.3.12) and (E.3.13) shows that

Ξ = 2µµ! sgn(π?)(−1)a+b det(Γa,b1 ).

Thus, to prove 1. it remains to prove the following.

Claim. sgn(π?) =

(−1)a+b, if a ≤ b.

(−1)a+b−1, if a > b.

Proof of the claim. Let 2a− 1, 2b = x, y, such that x < y. Then π?(z) = zfor all z < x and all z > y. Furthermore, π?(x+ 1) = x+ 2, π?(x+ 2) = x+ 1 andso on for all pairs of consequent numbers in the interval [x, y]. This implies that

sgn(π?) = (−1)y−x−1

2 , which proves the claim.We now prove 2. Fix S ⊂ 0, . . . , 2m− 1. Similar to (E.3.7) we have

AS =[Gi(x1) Pi(x1) . . . Gi(xm) Pi(xm)

]0≤i≤2m−1,i 6∈S .

Put Gi(x) := Gi−1(x) and Pi(x) := Pi−1(x), so that

AS =[Gi(x1) Pi(x1) . . . Gi(xm) Pi(xm)

]1≤i≤2m,i 6∈S

,

235

Page 250: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

where S ⊂ 1, . . . , 2m is the set that is obtained from S by adding 1 to bothelements of S. Observe that,

〈Gk(x), P`(x)〉 = 〈Gk−1(x), P`−1(x)〉 = −〈G`−1(x), Pk−1〉,

by Lemma E.3.4 (1) and hence, by Lemma E.3.4 (2),

〈Gk(x), P`(x)〉 =

(−1)i+j+1Γ

(i+ j − 1

2

), if k = 2j + 1, ` = 2i

0, if k + ` is even

=

(−1)i+jΓ

(i+ j − 3

2

), if k = 2j − 1, ` = 2i

0, if k + ` is even

We may now proceed as in the proof of 1. until (E.3.11), and conclude that∫x1,...,xm−1∈R

det(AS) e−m−1∑i=1

x2i2 dx1 . . . dxm−1

=

(m− 1)! 2m−1 det( Γa

′,b′

2 ), if S = 2a′ − 1, 2b′ and a ≤ b

−(m− 1)! 2m−1 det(Γa′,b′

2 ), if S = 2a′ − 1, 2b′ and a > b

0, if S has any other form.

,

whereΓa′,b′

2 :=[Γ(k + `− 3

2

)]1≤k≤m,k 6=a′1≤`≤m,` 6=b′

.

Note thatΓa′,b′

2 =[Γ(k + `+ 1

2

)]0≤k≤m−1,k 6=a′−10≤`≤m−1,s 6=b′−1

= Γa′−1,b′−1

2 .

If S = 2a′ − 1, 2b′ then, by definition, we have S = 2a, 2b+ 1, where a = a′−1and b = b′ − 1. Hence,∫

x1,...,xm−1∈R

det(AS) e−m−1∑i=1

x2i2 dx1 . . . dxm−1

=

(m− 1)! 2m−1 det(Γa,b2 ), if S = 2a, 2b+ 1 and a ≤ b

−(m− 1)! 2m−1 det(Γa,b2 ), if S = 2a, 2b+ 1 and a > b

0, if S has any other form.

,

This finishes the proof.

236

Page 251: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

Proposition E.3.7. Let x = (x1, . . . , xm−1) ∈ Rm−1 and let M denote the matrix P0(u) G0(x1) P0(x1) . . . G0(xm−1) P0(xm−1) G0(u) G0(∞)...

......

...P2m(u) G2m(x1) P2m(x1) . . . G2m(xm−1) P2m(xm−1) G2m(u) G2m(∞)

We have ∫

x∈Rm−1

det(M) e−m−1∑i=1

xi2

2 dx

=√

2π(m− 1)! 2m−1 e−u2

2

∑1≤i,j≤m

det(Γi,j1 ) det

[P2j(u) P2i−1(u)P2j−1(u) P2i−2(u)

].

where Γi,j1 :=[

Γ(r+s−1

2

) ]1≤s≤m,s 6=j1≤r≤m,r 6=i

.

Proof. Let us denote the quantity that we want to compute by Θ:

Θ :=

∫x∈Rm−1

det(M) e−m−1∑i=1

xi2

2 dx.

A permutation with negative sign of the columns of M yields,

det(M) = − det[Gi(∞) Gi(u) Pi(u)

[Gi(xj) Pi(xj)

]j=1,...,m−1

]i=0,...,2m

By Lemma E.3.3 we have Gi(∞) = 0 for i ≥ 1 and G0(∞) =√

2π. Expanding thedeterminant with Laplace expansion we get

det(M) = −√

2π∑

1≤k<`≤2m

(−1)k+`−1(Gk(u)P`(u)−G`(u)Pk(u)) det(Ak,`),

where Ak,` :=[Gi(xj) Pi(xj)

]1≤i≤2m,i 6∈k,`1≤j≤m−1

. Hence,

Θ =√

2π∑

1≤k<`≤2m

(−1)k+`(Gk(u)P`(u)−G`(u)Pk(u))

∫x1,...,xm−1∈R

det(Ak,`)e−m−1∑i=1

x2i2 dx1 . . . dxm−1 (E.3.14)

In the notation of Lemma E.3.6 we have Ak,` = Ak,`. Applying Lemma E.3.6

237

Page 252: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

yields ∫x1,...,xm−1∈R

det(Ak,`)e−

m∑i=1

x2i2

dx1 . . . dxm−1

=

(m− 1)!2m−1 det(Γi,j1 ), if k, ` = 2i− 1, 2j , i ≤ j.

−(m− 1)!2m−1 det(Γi,j1 ), if k, ` = 2i− 1, 2j , i > j.

0, else.

where Γi,j1 =[

Γ(r+s−1

2

) ]1≤r≤m,r 6=i1≤s≤m,s 6=j

. When we want to plug this into (E.3.14) we

have to incorporate thatIf k = 2i− 1, ` = 2j and k < `, then i ≤ j.

If k = 2j, ` = 2i− 1 and k < `, then i > j.

From this we get

Θ =√

2π(m− 1)!2m−1e−u2

2

[ ∑1≤i≤j≤m

det(Γi,j) (G2j(u)P2i−1(u)−G2i−1(u)P2j(u))

−∑

1≤j<i≤m

det(Γi,j) (G2i−1(u)P2j(u)−G2j(u)P2i−1(u))].

By Lemma E.3.3 we have Gk(u) = −e−u2

2 Pk−1(u), k ≥ 1, which we can plug ininto the upper expression to obtain

Θ =√

2π(m− 1)!2m−1e−u2

2

[ ∑1≤i≤j≤m

det(Γi,j)(P2i−2(u)P2j(u)− P2j−1(u)P2i−1(u))

−∑

1≤j<i≤m

det(Γi,j) (P2j−1(u)P2i−1(u)− P2i−2(u)P2j(u))],

which shows that

Θ =√

2π(m− 1)! 2m−1 e−u2

2

∑1≤i,j≤m

det(Γi,j) (P2i−2(u)P2j(u)− P2j−1(u)P2i−1(u))

=√

2π(m− 1)! 2m−1 e−u2

2

∑1≤i,j≤m

det(Γi,j) det

[P2j(u) P2i−1(u)P2j−1(u) P2i−2(u)

]This finishes the proof.

238

Page 253: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

Proposition E.3.8. Let x = (x1, . . . , xm−1) ∈ Rm−1 and let M denote the matrix P0(u) G0(x1) P0(x1) . . . G0(xm−1) P0(xm−1) G0(u)...

......

...P2m−1(u) G2m−1(x1) P2m−1(x1) . . . G2m−1(xm−1) P2m−1(xm) G2m−1(u)

.Then ∫

x∈Rm−1

det(M) e−m−1∑i=1

xi2

2 dx

=(m− 1)! 2m−1 e−u2

2

∑0≤i,j≤m−1

det(Γi,j2 ) det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

],

where Γi,j2 =[Γ(r + s− 3

2

)]0≤r≤m−1,r 6=i0≤s≤m−1,s 6=j

.

Proof. The proof works similar as the the proof for Proposition E.3.7: Again, wedenote by Θ the quantity that we want to compute:

Θ :=

∫x∈Rm−1

det(M) e−m−1∑i=1

xi2

2 dx.

We have

det(M) = − det[Gi(u) Pi(u)

[Gi(xj) Pi(xj)

]j=1,...,m

]i=0,...,2m−1

.

Expanding the determinant with Laplace expansion we get

det(M) = −∑

0≤k<`≤2m−1

(−1)k+`−1(Gk(u)P`(u)−G`(u)Pk(u)) det(Ak,`),

where

Ak,` =[Gi(x1) Pi(x1) . . . Gi(xm) Pj(xm)

]i=0,...,2m−1,i 6∈k,`

Hence,

Θ =∑

0≤k<`≤2m+1

(−1)k+`(Gk(u)P`(u)−G`(u)Pk(u))

∫x1,...,xm−1∈R

det(Ak,`)e−m−1∑i=1

x2i2 dx1 . . . dxm−1. (E.3.15)

239

Page 254: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

By Lemma E.3.6 (2) we have∫x1,...,xm−1∈R

det(Ak,`)e−m−1∑i=1

x2i2

dx1 . . . dxm−1

=

(m− 1)! 2m−1 det(Γi,j2 ), if k, ` = 2i, 2j + 1 and i ≤ j

−(m− 1)! 2m−1 det(Γi,j2 ), if k, ` = 2i, 2j + 1 and i > j

0, else.

where Γi,j2 =[Γ(r + s+ 1

2

)]0≤r≤m−1,r 6=i0≤s≤m−1,s 6=j

. When pluggin this into (E.3.15) we must

take into account thatIf k = 2i, ` = 2j + 1 and k < `, then i ≤ j.

If k = 2j + 1, ` = 2i and k < `, then i > j.

This yields

Θ = (m− 1)! 2m−1[ ∑

0≤j<i≤m−1

det(Γi,j2 ) (G2j+1(u)P2i(u)−G2i(u)P2j+1(u))

−∑

0≤i≤j≤m−1

det(Γi,j2 ) (G2i(u)P2j+1(u)−G2j+1(u)P2i(u))]

= m! 2m−1∑

0≤i,j≤m−1

det(Γi,j2 ) (P2i(u)G2j+1(u)− P2j+1(u)G2i(u))

Using from Lemma E.3.3 that Gk(u) = −e−u2

2 Pk−1(u), we finally obtain

Θ = (m− 1)! 2m−1 e−u2

2

∑0≤i,j≤m−1

det(Γi,j2 ) det

[P2j+1(u) P2i(u)P2j(u) P2i−1(u)

].

This finishes the proof.

E.3.2. Proof of Theorem E.3.2

We now prove Theorem E.3.2. Recall from (E.3.1) that we have put

In(u) := EA∼GOE(n)

|det(A− uI)|, and Jn(u) := EA∼GOE(n)

det(A− uI).

The proof of Theorem E.3.2 is based on the idea to decompose

In(u) = (In(u) + Jn(u))− Jn(u)

240

Page 255: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

and to compute the summands seperately. Note that Theorem E.3.1 yields aformula for Jn(u). By definition of the Gaussian Orthogonal Ensemble (Defini-tion 2.4.9),

In(u) =1

√2n√

πn(n+1)/2

∫A∈Rn×n

symmetric

|det(A− uIn)| e−12

Trace(ATA) dA.

By [Mui82, Theorem 3.2.17], the density of the eigenvalues λ = (λ1, . . . , λn) ofA ∼ GOE(n) is given by

1√2n∏n

i=1 Γ(i2

) ∆(λ) e−

n∑i=1

λ2i2 1λ1≤...≤λn,

where∆(λ) :=

∏1≤i<j≤n

(λj − λi)

and 1λ1≤...≤λn is the characteristic function of the set λ1 ≤ . . . ≤ λn. Hence,

In(u) =1√

2n∏n

i=1 Γ(i2

) ∫λ1≤...≤λn

∆(λ) e−

n∑i=1

λ2i2

n∏i=1

|λi − u| dλ. (E.3.16)

Similiarly,

Jn(u) =1√

2n∏n

i=1 Γ(i2

) ∫λ1≤...≤λn

∆(λ) e−

n∑i=1

λ2i2

n∏i=1

(λi − u) dλ. (E.3.17)

In the remainder of the section we put.

C :=1√

2n ∏n

i=1 Γ(i2

) (E.3.18)

and λ0 := −∞. We can write (E.3.16) as

In(u) = C

n∑j=0

(−1)j∫

λ0≤λ1≤...≤λj≤uu≤λj+1≤...≤λn

∆(λ) e−

n∑i=1

λ2i2

n∏i=1

(λi − u) dλ

241

Page 256: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

and (E.3.17) as

Jn(u) = Cn∑j=0

∫λ0≤λ1≤...≤λj≤uu≤λj+1≤...≤λn

∆(λ) e−

n∑i=1

λ2i2

n∏i=1

(λi − u) dλ. (E.3.19)

Hence,

In(u) + Jn(u) = 2C

bn2c∑

j=0

∫λ0≤λ1≤...≤λ2j≤uu≤λ2j+1≤...≤λn

∆(λ) e−

n∑i=1

λ2i2

n∏i=1

(λi − u) dλ (E.3.20)

We write ∆(λ)∏n

i=1(λi − u) as a Vandermonde determinant:

∆(λ)n∏i=1

(λi − u) =n∏i=1

(λi − u)∏

1≤i<j≤n

(λj − λi) = det[uk λk1 . . . λkn

]k=0,...,n

.

Since we may add arbitrary multiple of rows to other rows of a matrix withoutchanging its determinant, we have

∆(λ)n∏i=1

(λi − u) = det[Pk(u) Pk(λ1) . . . Pk(λn)

]k=0,...,n

, (E.3.21)

where the Pk(x), k = 0, 1, . . . , n, are the Hermite polynomials from (E.3.2). Notethat it is crucial that the Pk are monic polynomials as implied by Lemma D.1.1Plugging this into (E.3.20) shows that In(u) + Jn(u) equals

2C

bn2c∑

j=0

∫λ0≤λ1≤...≤λ2j≤uu≤λ2j+1≤...≤λn

det[Pk(u) Pk(λ1) . . . Pk(λn)

]k=0,...,n

e−

n∑i=1

λ2i2 dλ (E.3.22)

We now distinguish the cases n even and n odd.

The case when n is even

Recall that we have put n = 2m, so that bn2c = m. Moreover, recall from (E.3.3)

that we have put

Gk(x) =

x∫−∞

Pk(y) e−y2

2 dy

242

Page 257: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

Observe that each λi appears in exactly one column on the right hand side of(E.3.21). Integrating over λ1, λ3, λ5, . . . in (E.3.22) therefore yields

In(u) + Jn(u) = 2Cm∑j=0

∫λ2≤λ4≤...≤λ2j≤uu≤λ2j+2≤...≤λ2m

det(Nj) e−

m∑i=1

λ22i2

dλ2 . . . dλ2m (E.3.23)

where Nj is the matrix[Pk(u)

[Gk(λ2i)−Gk(λ2i−2) Pk(λ2i)

]i=1,...j

[Gk(λ2j+2)−Gk(u) Pk(λ2j+2)

]. . .

. . .[Gk(λ2i)−Gk(λ2i−2) Pk(λ2i)

]i=j+2,...,m

]k=0,...,n

Adding the first column of Nj to its third column, and the result to the fifthcolumn and so on, does not change the value of the determinant. Hence, we havedet(Nj) = det(Mj), where the matrix Mj is given by[

Pk(u)[Gk(λ2i) Pk(λ2i)

]i=1,...j

[Gk(λ2i)−Gk(u) Pk(λ2i)

]i=j+1,...,m

]k=0,...,n

Observe that each λ2i appears in exactly two columns of Mj. Hence, making achange of variables by interchanging λ2i and λ2i′ for any two i, i′ does not changethe value of the determinant of Mj. Writing xi := λ2i, for 1 ≤ i ≤ m, we thereforehave

Im(u) +Jn(u) =2C

m!

m∑j=0

(m

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm

det(Mj) e−

m∑i=1

x2i2 dx1 . . . dxm, (E.3.24)

Using the multilinearity of the determinant we can write det(Mj) as a sum ofdeterminants of matrices, each of which double colums either equal [ Gk(xi) Pk(xi) ] or[ −Gk(u) Pk(xi) ]. Observe that whenever the column [ −Gk(u) Pk(xi) ] appears twice in amatrix the corresponding determinant equals zero. Moreover, we may interchangethe double columns as we wish without changing the value of the determinant.All this yields

det(Mj) = det(K)− (m− j) det(L), (E.3.25)

where

K =[Pk(u)

[Gk(xi) Pk(xi)

]i=1,...,m

]k=0,...,2m

,

L =[Pk(u)

[Gk(xi) Pk(xi)

]i=1,...,m−1

Gk(u) Pk(xm)]k=0,...,2m

.

243

Page 258: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

We first observe that, by Lemma E.3.5, we have

C

m!

m∑j=0

(m

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm

det(K) e−

m∑i=1

x2i2 dx1 . . . dxm

=C

m!

∫x1,...,xm∈R

det(K) e−

m∑i=1

x2i2 dx1 . . . dxm

In [Meh91, Eq. 22.2.25] it is shown that the integral to the right is equal to Jn(u)Combining this with (E.3.24) and (E.3.25) we see that

In(u)

=(In(u) + Jn(u))− J (u)

=Jn(u)− 2C

m!

m∑j=0

(m

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm

(m− j) det(L) e−∑mi=1

x2i2 dx1 . . . dxm

=Jn(u)− 2C

(m− 1)!

m−1∑j=0

(m− 1

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm

det(L) e−∑mi=1

x2i2 dx1 . . . dxm

Since det(L)e−∑m−1i=1

x2i2 is invariant under permuting x1, . . . , xm−1 (excluding xm!)

we may apply Lemma E.3.5 to obtain

In(u) = Jn(u)− 2C

(m− 1)!

∫x1,...,xm−1

∞∫xm=u

det(L)e−x2m2 dxm

e−m−1∑i=1

x2i2 dx,

(E.3.26)where dx := dx1 . . . dxm−1. Observe that xm appears in one single column in L.Integrating over xm in (E.3.26) therefore gives

∞∫xm=u

det(L)e−x2m2 dxm

= det[Pk(u) [Gk(xi)Pk(xi)]i=1,...,m−1 Gk(u) Gk(∞)−Gk(u)

]k=0,...,2m

= det[Pk(u) [Gk(xi)Pk(xi)]i=1,...,m−1 Gk(u) Gk(∞)

]k=0,...,2m︸ ︷︷ ︸

=:M

.

244

Page 259: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

From Proposition E.3.7 we get∫x∈Rm−1

det(M)e−m−1∑i=1

x2i2 dx

=√

2π(m− 1)! 2m−1 e−u2

2

∑1≤i,j≤m

det(Γi,j1 ) det

[P2j(u) P2i−1(u)P2j−1(u) P2i−2(u)

].

where

Γi,j1 :=

(r + s− 1

2

)]1≤s≤m,s6=j1≤r≤m,r 6=i

.

Hence, by (E.3.26):

In(u) = Jn(u)− C√

2π 2m e−u2

2

∑1≤i,j≤m

det(Γi,j1 ) det

[P2j(u) P2i−1(u)P2j−1(u) P2i−2(u)

]

Finally, we substitute C =(√

2n∏n

i=1 Γ(i2

))−1

(see (E.3.18)) and put the minus

into the determinant to obtain

In(u) = Jn(u) +

√2π e−

u2

2∏ni=1 Γ

(i2

) ∑1≤i,j≤m

det(Γi,j1 ) det

[P2i−1(u) P2j(u)P2i−2(u) P2j−1(u)

]This finishes the proof.

The case when n is odd

Here we have n = 2m−1 and hence bn2c = m−1. We proceed as in the preceeding

section and can therefore be brief in our explanations. In (E.3.22) we integrateover all the λi with i odd to obtain

In(u) + Jn(u) = 2Cm−1∑j=0

∫x1≤x2≤...≤xj≤u

u≤xj+1≤...≤xm−1

det(Nj) e−m−1∑i=1

x2i2 dx, (E.3.27)

where xi := λ2i, 1 ≤ i ≤ m− 1 and Nj is the matrix

Nj =[Pk(u)

[Gk(xi)−Gk(xi−1) Pk(xi)

]i=1,...j

[Gk(xj+1)−Gk(u) Pk(xj+1)

]. . .

. . .[Gk(xi)−Gk(xi−1) Pk(xi)

]i=j+2,...,m−1

Gk(∞)−Gk(xm−1)]k=0,...,n

.

245

Page 260: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E. Expected absolute value of random determinants

We have det(Nj) = det(Mj), where

Mj =[Pk(u)

[Gk(xi) Pk(xi)

]i=1,...j

. . .

. . .[Gk(xi)−Gk(u) Pk(xi)

]i=j+1,...,m−1

Gk(∞)−Gk(u)]k=0,...,n

Permuting x1, . . . , xj or permuting xj+1, . . . , xm does not change the value ofdet(Mj), so that

In(u) + Jn(u) =2C

(m− 1)!

m−1∑j=0

(m− 1

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm−1

det(Mj)e−m−1∑i=1

x2i2 dx,

(E.3.28)where dx := dx1 . . . dxm−1. Using the multilinearity of the determinant we have

det(Mj) = det(K)− det(M)− (m− 1− j) det(L),

where

K =[Pk(u)

[Gk(xi) Pk(xi)

]i=1,...,m−1

Gk(∞)]k=0,...,2m−1

,

M =[Pk(u)

[Gk(xi) Pk(xi)

]i=1,...,m−1

Gk(u)]k=0,...,2m−1

,

and

L =[Pk(u)

[Gk(xi) Pk(xi)

]i=1,...,m−2

. . .

. . . Gk(u) Pk(xm−1) Gk(∞)−Gk(u)]k=0,...,2m−1

.

Integrating∫xm−1>u

det(L) e−x2m−1

2 dx yields a matrix similar to L, but where the

Pk(xm−1) in L is replaced by Gk(∞)−Gk(u). Hence,

∞∫xm−1=u

det(L) e−x2m−1

2 dxm−1 = 0

and thus

2C

(m− 1)!

m−1∑j=0

(m− 1

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm−1

(m− 1− j) det(L) e−m−1∑i=1

x2i2 dx = 0.

246

Page 261: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

E.3. Gaussian Orthogonal Ensemble

Using this, (E.3.28) becomes

In(u) + Jn(u) =2C

(m− 1)!

m−1∑i=0

(m− 1

j

) ∫x1,...,xj≤u

u≤xj+1,...,xm−1

(det(K)− det(M))e−m−1∑i=1

x2i2 dx

By construction, both det(K) and det(M) are invariant under any permutation ofthe xi. We may apply Lemma E.3.5 to get

In(u) + Jn(u) =2C

(m− 1)!

∫x∈Rm−1

(det(K)− det(M)) e−m−1∑i=1

x2i2 dx.

In the same way as above we show that

Jn(u) =C

(m− 1)!

∫x∈Rm−1

det(K) e−m−1∑i=1

x2i2 dx.

Using this we deduce that

In(u) =(In(u) + Jn(u))− Jn(u) (E.3.29)

=Jn(u)− 2C

(m− 1)!

∫x∈Rm−1

det(M) e−m−1∑i=1

x2i2 dx (E.3.30)

By Proposition E.3.8 we have∫x∈Rm−1

det(K) e−m−1∑i=1

x2i2 dx

=(m− 1)! 2m−1 e−u2

2

∑0≤i,j≤m−1

det(Γi,j2 ) det

[P2j+1(u) P2i(u)P2j(u) P2i−1(u)

],

where Γi,j2 =[Γ(r + s+ 1

2

)]0≤r≤m−1,r 6=i,0≤s≤m−1,s 6=j. Combining this with (E.3.29),

substituting C = (√

2n∏n

i=1 Γ(i2

))−1, see (E.3.18), and putting the minus into the

determinant we get

In(u) = Jn(u) +

√2 e−

u2

2∏ni=1 Γ

(i2

) ∑0≤i,j≤m−1

det(Γi,j2 ) det

[P2i(u) P2j+1(u)P2i−1(u) P2j(u)

].

This finishes the proof.

247

Page 262: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen
Page 263: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F. Source code

F.1. R script to generate Gaussian tensors

The following R [R C15] script was used to generate the histograms in Figure 9.1.1.The script invokes a system call to BERTINI [BHSW], which must be installedbefore executing the script.

########################################################################################### I n i t i a l i z e the ve c t o r s h , h sym# They should s t o r e the numbers o f r e a l# e i g e n p a i r s from the 2000 exper iments#############################################h=rep ( 0 , 2 0 0 0 ) ;h sym=rep ( 0 , 2 0 0 0 ) ;r e q u i r e ( g t o o l s )

############################################## Loop f o r Gaussian t e n s o r s# Repeat the experiment 2000 t imes#############################################q=1;whi l e ( q <=2000)

pr in t ( q )A<−array (0 , dim=c ( 5 , 5 , 5 ) ) ;# Generate Gaussian gene ra l t en so rf o r ( i in 1 : 5 )

f o r ( j in 1 : 5 )f o r ( k in 1 : 5 )

A[ i , j , k]=rnorm (1 , mean = 0 , sd = 1 ) ;

############################################ Generate the polynomial system f A###########################################X<−array (0 , dim=c ( 5 , 5 ) ) ;var=l i s t (”u” ,” v ” ,” x ” ,” y ” ,” z ” ) ;s t r i n g=c (” f =”, ”g=”, ”h=”, ”k=”, ” l =”);

249

Page 264: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F. Source code

f o r ( i in 1 : 5 )f o r ( j in 1 : 5 )

X[ i , j ]= s p r i n t f (”∗% s∗%s ” , var [ i ] , var [ j ] ) ;f o r ( k in 1 : 5 )

s t r i n g [ k]= paste0 ( s t r i n g [ k ] , . . .” (” ,A[ i , j , k ] , ” ) ” ,X[ i , j ] , ”+”) ;

############################################ Prepare the B e r t i n i input f o r g ene ra l t e n s o r s###########################################s t r i n g [1 ]= paste0 ( s t r i n g [1 ] , ” ( −1)∗u ; ” ) ;s t r i n g [2 ]= paste0 ( s t r i n g [2 ] , ” ( −1)∗ v ; ” ) ;s t r i n g [3 ]= paste0 ( s t r i n g [3 ] , ” ( −1)∗ x ; ” ) ;s t r i n g [4 ]= paste0 ( s t r i n g [4 ] , ” ( −1)∗ y ; ” ) ;s t r i n g [5 ]= paste0 ( s t r i n g [5 ] , ” ( −1)∗ z ; ” ) ;b e r t i n i<− f i l e (” input ” ) ;wr i t eL ine s ( c (” va r i ab l e g roup u , v , x , y , z ; ” , . . .

” func t i on f , g , h , k , l ; ” , s t r i n g [ 1 ] , . . .s t r i n g [ 2 ] , s t r i n g [ 3 ] , s t r i n g [ 4 ] , . . .s t r i n g [ 5 ] , ”END”) , b e r t i n i ) ;

c l o s e ( b e r t i n i )# Execute B e r t i n isystem (” b e r t i n i input ”)r e a l s o l <−scan (” r e a l f i n i t e s o l u t i o n s ”)h [ q]= r e a l s o l [ 1 ] −1 ;# B e r t i n i sometimes f a l s e l y computes double r oo t s# Only proceed the experiment when the pa r i t y i s r i g h ti f (h [ q]%%2==1)

q=q+1;

############################################ Loop f o r Gaussian symmetric t e n s o r s###########################################q=1;whi l e (q<=2000)

B<−array (0 , dim=c ( 5 , 5 , 5 ) ) ;# Generate Gaussian symmetric t en so rP<−permutat ions (n=3, r =3);f o r ( i in 1 : 5 )

f o r ( j in 1 : i )f o r ( k in 1 : j )

250

Page 265: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F.1. R script to generate Gaussian tensors

a lpha 1=sum( c ( i , j , k)==1);a lpha 2=sum( c ( i , j , k)==2);a lpha 3=sum( c ( i , j , k)==3);a lpha 4=sum( c ( i , j , k)==4);a lpha 5=sum( c ( i , j , k)==5);s=(gamma( a lpha 1 +1)∗gamma( a lpha 2 +1)∗ . . .

gamma( a lpha 3 +1)∗gamma( a lpha 4 +1)∗ . . .gamma( a lpha 5 +1))/6;

s=s q r t ( s ) ;y=rnorm (1 , mean = 0 , sd = s ) ;f o r ( t in 1 : 6 )

x=c (P[ t , 1 ] ,P [ t , 2 ] ,P [ t , 3 ] ) ;B[ c ( i , j , k ) [ x [ 1 ] ] , c ( i , j , k ) [ x [ 2 ] ] , c ( i , j , k ) [ x [ 3 ] ] ] = y ;

############################################ Generate the polynomial system f A###########################################X<−array (0 , dim=c ( 5 , 5 ) ) ;var=l i s t (”u” ,” v ” ,” x ” ,” y ” ,” z ” ) ;s t r ing sym=c (” f =”, ”g=”, ”h=”, ”k=”, ” l =”);f o r ( i in 1 : 5 )

f o r ( j in 1 : 5 )X[ i , j ]= s p r i n t f (”∗% s∗%s ” , var [ i ] , var [ j ] ) ;f o r ( k in 1 : 5 )

s t r ing sym [ k]= paste0 ( s t r ing sym [ k ] , . . .” (” ,B[ i , j , k ] , ” ) ” ,X[ i , j ] , ”+”) ;

############################################ Prepare the B e r t i n i input f o r symmetric t e n s o r s###########################################str ing sym2=str ing sym ;s t r ing sym [1]= paste0 ( s t r ing sym [1 ] ,” ( −1)∗u ; ” ) ;s t r ing sym [2]= paste0 ( s t r ing sym [2 ] ,” ( −1)∗ v ; ” ) ;s t r ing sym [3]= paste0 ( s t r ing sym [3 ] ,” ( −1)∗ x ; ” ) ;s t r ing sym [4]= paste0 ( s t r ing sym [4 ] ,” ( −1)∗ y ; ” ) ;s t r ing sym [5]= paste0 ( s t r ing sym [5 ] ,” ( −1)∗ z ; ” ) ;b e r t i n i<− f i l e (” input ” ) ;wr i t eL ine s ( c (” va r i ab l e g roup u , v , x , y , z ; ” , . . .

” func t i on f , g , h , k , l ; ” , s t r ing sym [ 1 ] , . . .

251

Page 266: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F. Source code

s t r ing sym [ 2 ] , s t r ing sym [ 3 ] , s t r ing sym [ 4 ] , . . .s t r ing sym [ 5 ] , ”END”) , b e r t i n i ) ;

c l o s e ( b e r t i n i )# Execute B e r t i n isystem (” b e r t i n i input ”)r e a l s o l <−scan (” r e a l f i n i t e s o l u t i o n s ”)h sym [ q]= r e a l s o l [ 1 ] −1 ;# B e r t i n i sometimes f a l s e l y computes double r oo t s# Only proceed the experiment when the pa r i t y i s r i g h ti f ( h sym [ q]%%2==1)

q=q+1;# Cleanf i l e . remove (” input ”)

############################################## Now the vec to r h conta in s the 2000 e n t r i e s with# numbers o f r e a l e i g e n p a i r s o f a Gaussian tenso r# and h sym conta in s the same f o r Gaussian# symmetric t e n s o r s#############################################

F.2. SAGE scripts to compute the expected numberof real eigenpairs

The following sage scripts we used to compute the identities in Table 9.1.1

F.2.1. Script for E(n, p)

########################################################################################### Compute A, A 1 and A 2 , such that# E(n , p)=A( A 1+A 2 )############################################## d e f i n e the v a r i a b l e s#############################################p ,m, n , i , j , x=var ( ’ p ,m, n , i , j , x ’ ) ;############################################## Set the value o f n#############################################n=2;#############################################

252

Page 267: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F.2. SAGE scripts to compute the expected number of real eigenpairs

# compute A, A 1 , A 2#############################################A(p)=2ˆ(n−1)∗ s q r t (p−1)ˆn∗gamma(n−1/2)/ . . .

( s q r t ( p i )∗pˆ(n−1/2)∗gamma(n ) ) ;A 1 (p)=2∗(n−1)∗ hypergeometr ic ( [ 1 , n−1 / 2 ] , [ 3 / 2 ] , ( p−2)/p ) ;A 1 (p)=A 1 (p ) . s imp l i f y hype rgeomet r i c ( ) ;A 2 (p)=hypergeometr ic ( [ 1 , n−1/2 ] , [ ( n+1)/2] ,1/p ) ;A 2 (p)=A 2 (p ) . s imp l i f y hype rgeomet r i c ( ) ;############################################## compute E(n , p)=A( A 1+A 2 )#############################################E(p)=A(p )∗ ( A 2 (p)+A 1 (p ) ) ;E(p)=E(p ) . expand ( ) ;E(p)=E(p ) . s i m p l i f y ( ) ;E(p)=E(p ) . f a c t o r ( ) ;############################################## p r i n t s the formula# ( wrap ’ l a t e x ( ) ’ around the command to get tex code )#############################################pr in t (E(p ) )

F.2.2. Script for Esym(2m+ 1, p)

########################################################################################### The case n = 2m+1 i s odd : Compute A and A 1 ,# so that E sym (n , p)=1+A∗A 1############################################## d e f i n e the v a r i a b l e s#############################################p ,m, n , i , j , x=var ( ’ p ,m, n , i , j , x ’ ) ;############################################## Set the value o f m#############################################m=2;n=2∗m+1;############################################## compute A#############################################A(p)= s q r t ( p i )∗ s q r t (p−1)ˆ(n−2)∗ s q r t (3∗p−2)/ . . .

( prod (gamma( i /2) f o r i in ( 1 . . n ) ) ) ;############################################## compute the determinants o f the Gamma matr i ce s#############################################

253

Page 268: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F. Source code

Gamma 1 = matrix (m, lambda i , j : gamma( i+j +3/2)) ;G 1=matrix (m, lambda i , j : det ( . . .

Gamma 1 . matr ix from rows and columns ( . . .[ 0 . . i −1, i +1. .m− 1 ] , [ 0 . . j −1, j +1. .m−1 ] ) ) ) ;

############################################## compute A 1#############################################from sage . misc . mrange import cantor productL = l i s t ( c a r t e s i a n p r o d u c t i t e r a t o r ( [ [ 1 . .m] , [ 1 . .m] ] ) ) ;A 1 (p) = sum( G 1 [ i −1, j −1] ∗ gamma( i+j −1/2) ∗ . . .

((1−2∗ i +2∗ j )/(3−2∗ i−2∗ j ) ) ∗ . . .(−(3∗p−2)/(4∗(p−1)))ˆ(1− i−j ) ∗ . . .hypergeometr ic ([2−2∗ i ,1−2∗ j ] , [5/2− i−j ] , . . .(3∗p−2)/(4∗(p−1))) f o r ( i , j ) in L ) ;

A 1 (p) = A 1 (p ) . s imp l i f y hype rgeomet r i c ( ) ;############################################## compute E \mathrmsym(n , p)#############################################E sym odd (p)=A(p)∗A 1 (p ) ;E sym odd (p)=E sym odd (p ) . f a c t o r ( )E sym odd (p)=E sym odd (p)+1

############################################## p r i n t s the formula# ( wrap ’ l a t e x ( ) ’ around the command to get tex code )#############################################pr in t ( E sym odd (p ) )

F.2.3. Script for Esym(2m, p)

############################################## The case n = 2m i s even : Compute B, B 1 , B 2# and B 3 , so that E sym (n , p)=B∗( B 1−B 2+B 3 )############################################## d e f i n e the v a r i a b l e s#############################################p ,m, n , i , j , x=var ( ’ p ,m, n , i , j , x ’ ) ;############################################## Set the value o f m#############################################m=2; n=2∗m;############################################## compute B#############################################B(p)= s q r t (p−1)ˆ(n−2)∗ s q r t (3∗p−2)/ . . .

254

Page 269: Numerical and Statistical Aspects of Tensor Decompositions...Numerical and Statistical Aspects of Tensor Decompositions vorgelegt von Paul Breiding, M.Sc. Mathematik geb. in Witzenhausen

F.2. SAGE scripts to compute the expected number of real eigenpairs

( prod (gamma( i /2) f o r i in ( 1 . . n ) ) ) ;############################################## compute the determinants o f the Gamma matr i ce s#############################################G 2 = matrix (m, lambda i , j : gamma( i+j +1/2)) ;G 2 = matrix (m, lambda i , j : det ( . . .

G 2 . matr ix from rows and columns ( . . .[ 0 . . i −1, i +1. .m− 1 ] , [ 0 . . j −1, j +1. .m−1 ] ) ) ) ;

############################################## compute B 1#############################################B 1 (p) = sum( s q r t ( p i ) ∗ G 2 [ 0 , j ] ∗ . . .

(gamma(2∗ j +2)/((−1)ˆ j ∗4ˆ j ∗gamma( j +1))) ∗ . . .( ( p−2)ˆ j ∗p ) / ( ( p−1)ˆ j ∗(3∗p−2)) + . . .hypergeometr ic ([− j , 1 / 2 ] , [ 3 / 2 ] , . . .−pˆ2/((3∗p−2)∗(p−2))) f o r j in [ 0 . .m−1 ] ) ;

B 1 (p) = B 1 (p ) . s imp l i f y hype rgeomet r i c ( ) ;############################################## compute B 2#############################################B 2 (p) = sum( G 2 [ 0 , j ] ∗ gamma( j +1/2) ∗ . . .

(−4∗(p−1)/(3∗p−2))ˆ( j +1) /2 f o r j in [ 0 . .m−1 ] ) ;############################################## compute B 3#############################################from sage . misc . mrange import cantor productL = l i s t ( c a r t e s i a n p r o d u c t i t e r a t o r ( [ [ 1 . . m−1] , [ 0 . .m− 1 ] ] ) ) ;B 3 (p) = sum( G 2 [ i , j ] ∗ gamma( i+j +1/2) ∗ . . .

((1−2∗ i +2∗ j )/(1−2∗ i−2∗ j ) ) ∗ . . .(−4∗(p−1)/(3∗p−2))ˆ( i+j ) ∗ . . .hypergeometr ic ([−2∗ j ,−2∗ i +1] ,[3/2− i−j ] , . . .(3∗p−2)/(4∗(p−1))) f o r ( i , j ) in L ) ;

B 3 (p) = B 3 (p ) . s imp l i f y hype rgeomet r i c ( ) ;############################################## compute E \mathrmsym(n , p)#############################################E sym even (p)=B(p) ∗ ( B 1 (p)−B 2 (p)+B 3 (p ) ) ;E sym even (p)=E sym even (p ) . f a c t o r ( )

############################################## p r i n t s the formula# ( wrap ’ l a t e x ( ) ’ around the command to get tex code )#############################################pr in t ( E sym even (p ) )

255