Diss Tauboeck

Dissertation

WirelineMultiple - Input / Multiple - Output

Systems

ausgefuhrt zum Zwecke der Erlangung des akademischen Grades

eines Doktors der technischen Wissenschaften

eingereicht an der

Technischen Universitat Wien

Fakultat fur Elektrotechnik und Informationstechnik

von

Dipl.-Ing. Georg Taubock

Wien, November 2005

Supervisor

Prof. Johannes HuberLehrstuhl fur Informationsubertragung

Friedrich-Alexander-Universitat Erlangen-Nurnberg, Germany

Examiner

Prof. Johann WeinrichterInstitut fur Nachrichtentechnik und Hochfrequenztechnik

Technische Universitat Wien, Austria

ftw. Dissertation Series

Georg Taubock

WirelineMultiple - Input / Multiple - Output

Systems

telecommunications research center vienna

This work was carried out with funding from Kplus

in the ftw. projects B0/I0/C4.

This thesis has been prepared using LATEX.

September 2006

1. Auflage

Alle Rechte vorbehalten

Copyright c 2006 Georg TaubockHerausgeber: Forschungszentrum Telekommunikation Wien

Printed in Austria

ISBN 3-902477-06-7

KURZFASSUNG

In den letzten Jahren kam die Idee auf, herkommliche Telefonleitungen fur diehochratige Datenubertragung zu verwenden. Es stellte sich heraus, dass die exis-tierenden Telefonleitungen schelle Datenubertragung unterstutzen, sofern sie nichtzu lange sind. In der einschlagigen Literatur werden entsprechende digitale Daten-ubertragungsverfahren unter dem Begriff xDSL (Digital Subscriber Line) Daten-ubertragung zusammengefasst.

Die Ursache, dass die Verwendung von existierenden Telefonleitungen aufkurzere Kabellangen beschrankt ist, liegt darin, dass in den meisten praktisch ver-wendeten Topologien die einzelnen Doppeladern zu Kabelbundel zusammenge-fasst sind, wodurch sich zwei Doppeladern, die nahe bei einander liegen, gegen-seitig aufgrund des elektromagnetischen Feldes storen. In der entsprechenden Lit-eratur spricht man in diesem Zusammenhang auch von Ubersprechen. Je nach-dem, wo das Ubersprechen seinen Ursprung hat, wird das Ubersprechen auch inNear-End Crosstalk (NEXT) und Far-End Crosstalk (FEXT) eingeteilt. Fur dievorliegende Arbeit beschranken wir uns auf FEXT.

Wir nehmen an, dass die Datenubertragung uber die einzelnen Kabel mit Hilfevon Discrete Multitone modulation (DMT) durchgefuhrt wird, das ist das Modu-lationsverfahren, das auch bei ADSL und VDSL eingesetzt wird. Da wir simul-tane Datenubertragung uber mehrere Kabel eines Kabelbundels betrachten, kanndas komplette Ubertragungssystem auch als Multiple - Input / Multiple - Output(MIMO) System angesehen werden, und wir sprechen in diesem Zusammenhangauch von MIMO DMT fur das gesamte Modulationsverfahren. Die vorliegendeArbeit behandelt in sehr ausfuhrlicher Weise MIMO DMT.

Wir entwickeln eine Theorie fur komplexwertige Zufallsvektoren, die auchrotationsvariante Zufallsvektoren (haben eine nicht verschwindende Pseudoko-varianzmatrix) einschliet, und die daher fur uns von groter Bedeutung ist, da diebei DMT und MIMO DMT auftretenden Zufallsvektoren im Allgemeinen dieseEigenschaft besitzen (auch das wird in dieser Arbeit gezeigt). Wir beweisen einVerallgemeinertes Maximum Entropie Theorem und erzielen verschiedene Kapa-zitatsresultate, welche auch die Pseudokovarianzmatrix in ihren Ausdrucken inklu-diert. Wir zeigen, dass eine nicht-verschwindende Pseudokovarianzmatrix (desRauschens) die Kapazitat erhoht und berechnen den Kapazitatsverlust, wenn irr-tumlich angenommen wird, dass die Pseudokovarianzmatrix die Nullmatrix ist.

Wir fuhren eine detaillierte Rauschanalyse fur ein DMT System durch undzeigen, dass das Rauschen am Entscheidereingang im Allgemeinen rotationsvari-ant ist. Wir berechnen die entsprechende Kovarianz- und Pseudokovarianzmatrix,

vi

welche dann in einem weiteren Schritt spezialisiert wird, um die Varianzen vonReal- und Imaginarteil des Rauschens und die Korrelationen zwischen Real- undImaginarteil des Rauschens bei fixen Frequenzen / Subtragern zu erhalten. Mit-tels Eigenwertzerlegungen ist es moglich, die Exzentrizitat und die Rotationen derRauschellipsen zu bestimmen. Es stellt sich heraus, dass die Rotationswinkel un-abhangig von den Rauscheigenschaften am Empfangereingang sind. Sie hangenausschlielich von der Nummer des betrachteten Subtragers (Subtragerfrequenz)ab. Auerdem wird gezeigt, dass verschiedene Rauschvarianzen bzw. Korre-lationen nicht auftreten, wenn das Rauschen (am Empfangereingang) wei ist.Fur farbiges Rauschen treten verschiedene Rauschvarianzen bzw. Korrelationenallerdings auf, und man musste eigentlich rotierte rechteckige Signalpunktkonstel-lationen anstatt der ublichen (quadratischen) QAM - Konstellationen verwenden.Andernfalls muss man einen Kapazitatsverlust und eine erhohte Symbolfehler-wahrscheinlichkeit hinnehmen. Wir berechnen beide Groen und zeigen, wie manexistierende Ladealgorithmen modifizieren muss, um die optimalen Konstellation-sparameter zu erhalten.

Ebenso fuhren wir eine detaillierte Interferenzanalyse fur ein DMT Systemdurch. Wir betrachten den Fall, dass die Impulsantwort den Zyklischen Prafix aufbeiden Seiten uberschreitet, was zu Precursors und Postcursors von beiden be-nachbarten DMT Symbolen (Intersymbolinterferenz) und zu Intertragerinterferenzfuhrt. Wir leiten geschlossene Formeln fur beide Beitrage ab und studieren ihrestatistischen Eigenschaften. Es stellt sich heraus, dass beide Interferenzbeitragekomplexwertige Zufallsvektoren sind mit gleichen ersten und zweiten Momentenund einer nicht-verschwindenden Pseudokovarianzmatrix.

Wir zeigen auch, wie die erzielten Rausch- und Interferenzresultate fur dasDesign von Zeitbereichsentzerrer genutzt werden konnen.

In einem zweiten Schritt verallgemeinern wir die Rausch- und Interferenzre-sultate von DMT zu MIMO DMT. Wiederum ist es moglich, Losungen in ge-schlossener Form zu erhalten, und das sogar fur sehr allgemeine Annahmen inBezug auf die Korrelationen zwischen den verschiedenen Doppeladern eines Ka-belbundels.

Wir prasentieren die allgemeine Form eines Ubertragungsverfahrens, welchesauf den MIMO DMT Kanal abgestimmt ist und auf so genannten Joint ProcessingFunctions basiert. Es ermoglicht die Verwendung von Single - Input / Single -Output (SISO) Codes.

Weiters beschaftigen wir uns mit Ubertragungsverfahren, dessen Joint Process-ing Functions mit Hilfe der Singularwertzerlegung der Kanalmatrix bestimmt wer-den. Wir zeigen, dass die optimalen Joint Processing Functions auf diese Weiseerhalten werden konnen und studieren Varianten mit geringerer Komplexitat. Umquantitative Resultate zu erzielen, fuhren wir Simulationen mit realistischen (prak-tisch genutzten) Parametern durch und vergleichen die verschiedenen Methoden.

Schlussendlich prasentieren wir das UP MIMO Verfahren (UP steht fur Uni-tary Parametrization) und behandeln verschiedenste Aspekte davon.

ABSTRACT

In recent years, the idea to use existing telephone lines for high data rate transmis-sion took shape. It turned out that existing telephone lines can support high datarates, provided that the lengths of the cables are not too long. In the correspondingliterature, these transmission methods are referred to as xDSL (Digital SubscriberLine) transmission.

The reason that the use of existing telephone lines is limited to shorter looplengths is that the various twisted pairs are bundled together in cable bundles inmost practical topologies. It is quite clear that two loops that are close to eachother disturb each other because of the induced electro-magnetic field. Note thatthis effect is called crosstalk in literature. Hence, there is recent research to reducethe performance degradation introduced by crosstalk. There are several issues re-garding crosstalk, and, therefore, different approaches to this problem. Note thatcrosstalk is subdivided into two types, i.e., Near-End Crosstalk (NEXT) and Far-End Crosstalk (FEXT), depending on the location the crosstalk originates from.Throughout this manuscript, we consider only crosstalk that stems from the far-end side (FEXT).

We will assume that transmission over the individual loops is performed viaDiscrete Multitone modulation (DMT), which is the modulation scheme used inADSL and VDSL. Since we are considering simultaneous transmission over severalloops in a cable bundle, the whole transmission system can be regarded as a Mul-tiple - Input / Multiple - Output (MIMO) system, and we will refer to the overallmodulation scheme as MIMO DMT modulation scheme.

The present work gives a comprehensive treatment of Multiple - Input / Mul-tiple - Output Discrete Multitone (MIMO DMT) transmission. We show that sucha transmission scenario can be modeled by a complex vector channel, i.e., by adeterministic complex matrix and by a complex noise vector.

We develop a theory for complex random vectors that takes into account ro-tationally variant random vectors, and is therefore of great importance for ourpurpose, since complex random vectors (in DMT and MIMO DMT) have a non-vanishing pseudo-covariance matrix in general (as we will also show in this manu-script). We prove a Generalized Maximum Entropy Theorem, that includes thepseudo-covariance matrix in its entropy inequality and therefore tightens the up-per bound for rotationally variant random vectors. We show that the additionalcorrection term is independent of the specific probability distribution of the con-sidered random vector. Furthermore, we obtain several capacity results for thecomplex vector channel considered, that take into account the pseudo-covariance

viii

matrix. We show that a non-vanishing pseudo-covariance matrix (of the noise) in-creases capacity and calculate the capacity loss if it is erroneously assumed that thepseudo-covariance matrix is the zero matrix. Note also that we derive a criterionfor a matrix to be a pseudo-covariance matrix. This generalizes the well-knowncriterion that a matrix is a covariance matrix of a certain random vector if and onlyif it is symmetric / Hermitian and non-negative definite.

We perform a detailed noise analysis for a DMT system and show that thenoise vector at the input of the Decision Device is rotationally variant in general.We calculate the corresponding covariance matrix and pseudo-covariance matrix,which is then specialized in order to obtain the noise variances of real and imagi-nary part and to obtain the correlations between real and imaginary part for a fixedfrequency / subcarrier. Via eigenvalue decompositions, we are able to determinethe eccentricities and the rotations of the noise ellipses. It turns out that the rotationangles are independent of the actual noise characteristics. They only depend on thenumber of the considered subcarrier. Furthermore, it is shown that different noisevariances and correlations of real and imaginary part do not occur in the presenceof white noise (at the input of the receiver). For colored noise, they do occur, andone has to use rotated rectangular constellations instead of the common (square)QAM constellations. Otherwise, one has to accept a capacity loss and increasedsymbol error probability. We calculate both quantities. Furthermore, we showhow to modify the existing bit-loading algorithms in order to obtain the optimumconstellation parameters.

We also perform a detailed interference analysis for a DMT system. We con-sider the case when the channel impulse response exceeds the Cyclic Prefix on bothsides, which yields precursors and postcursors from both neighboring DMT sym-bols (intersymbol interference) and also intercarrier interference. We derive closedform formulas for both contributions and consider their statistical properties aswell. It turns out that both interference contributions are complex random vectorswith equal first and second order moments and a non-vanishing pseudo-covariancematrix.

We also show how the noise and interference results obtained can be utilizedfor the design of Time Domain Equalizers.

In a second step, we generalize the noise and interference results from DMT tothe MIMO DMT case. Again, it is possible to obtain closed form solutions, evenfor very general assumptions with respect to correlations across the various loopsof the cable bundle.

We present the general form of a transmission scheme that is suited to theMIMO DMT channel and is based on so-called joint processing functions. It allowsthe use of Single - Input / Single - Output (SISO) codes, and we introduce the (sum-) capacity as a performance measure.

We deal with transmission schemes whose joint processing functions are basedon the Singular Value Decomposition (SVD) of the channel matrix. We show thatthe optimum joint processing function can be obtained by means of the SVD. Fur-thermore, we study low(er)-complexity variations and discuss their performance.

ix

To obtain quantitative results, we perform simulations with realistic (practicallyused) parameters and compare the various methods.

Finally, we present the UP MIMO scheme (UP stands for Unitary Parametri-zation), a scheme that was originally designed by the author for wireless trans-mission, and also has applications to wireline transmission. Specifically, it can beused to reduce the computational complexity at the transmitter side (but not at thereceiver side). We treat various aspects of this scheme.

ACKNOWLEDGEMENT

I would like to thank Prof. Johannes Huber and Prof. Johann Weinrichter for theirsupport that goes far beyond what I would have expected.

I am grateful to all my colleagues at the Telecommunications Research CenterVienna (ftw.), especially to Jossy Sayir for his continuous (in fact, continuous andnot piecewise continuous) assistance and encouragement, to Werner Henkel for hissupport in every way, and to Driton Statovci for our fruitful discussions concerningthe practical aspects of my work. The collaboration with them was a constantsource of new ideas and entertaining hours. The professional, inspiring, and openwork environment at ftw., shaped by Markus Kommenda and Horst Rode, providedthe basis for the work on this thesis.

I would like to thank my wife, my family, and my friends for their continuoussympathy during my research adventure, and especially my mother for being suchan enthusiastic grandmother with helping hands whenever the father is busy withwriting his thesis.

I would like to dedicate this work to my daughter Maria Shirin who is thesmiling sun in my life.

CONTENTS

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. The Subscriber-Line Network . . . . . . . . . . . . . . . . . . . . . . 52.1 Discrete Multitone Modulation . . . . . . . . . . . . . . . . . . . 52.2 Discrete Multitone Modulation on a Cable Bundle . . . . . . . . . 162.3 The Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 24

3. Entropy and Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1 The Maximum Entropy Theorem for Complex Random Vectors . 28

3.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 283.1.2 Differential Entropy of Complex Random Vectors . . . . . 373.1.3 The Euclidean Matrix Norm . . . . . . . . . . . . . . . . 47

3.2 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2.1 Rotationally Invariant Noise . . . . . . . . . . . . . . . . 513.2.2 Rotationally Variant Noise . . . . . . . . . . . . . . . . . 543.2.3 Capacity Loss . . . . . . . . . . . . . . . . . . . . . . . . 60

4. Noise and Interference Analysis of DMT . . . . . . . . . . . . . . . . 654.1 Derivation of the Noise Characteristics . . . . . . . . . . . . . . . 66

4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 664.1.2 First and Second Moments . . . . . . . . . . . . . . . . . 674.1.3 Frequency Dependent Noise Analysis . . . . . . . . . . . 714.1.4 Powers and Statistical Dependencies of Real and Imagi-

nary Part of the Noise . . . . . . . . . . . . . . . . . . . 734.1.5 Consequences and Asymptotic Analysis of the Noise Char-

acteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 Capacity Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3 Symbol Error Probability and Optimized Bit-Loading . . . . . . . 824.4 Intersymbol Interference (ISI) and Intercarrier Interference (ICI) . 92

4.4.1 Deterministic Interference Analysis . . . . . . . . . . . . 954.4.2 Interference Statistics . . . . . . . . . . . . . . . . . . . . 1014.4.3 Time Domain Equalizer . . . . . . . . . . . . . . . . . . 104

xiv Contents

5. Multiple-Input / Multiple-Output Discrete Multitone . . . . . . . . . . . 1075.1 Noise and Interference . . . . . . . . . . . . . . . . . . . . . . . 108

5.1.1 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.1.2 Interference . . . . . . . . . . . . . . . . . . . . . . . . . 1155.1.3 MIMO Time Domain Equalizer . . . . . . . . . . . . . . 120

5.2 The Transmission Scenario . . . . . . . . . . . . . . . . . . . . . 1215.3 New Design Methods based on the Singular Value Decomposition 126

5.3.1 Full Diagonalization . . . . . . . . . . . . . . . . . . . . 1275.3.2 Approximate Diagonalization . . . . . . . . . . . . . . . 1315.3.3 Diagonalization of Subsets . . . . . . . . . . . . . . . . . 1325.3.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . 133

5.4 The UP MIMO Scheme . . . . . . . . . . . . . . . . . . . . . . . 1365.4.1 The Receiver Side . . . . . . . . . . . . . . . . . . . . . 1375.4.2 Parametrization of Unitary (Orthonormal) Matrices . . . . 1395.4.3 The Transmitter Side . . . . . . . . . . . . . . . . . . . . 1425.4.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . 145

6. Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . . 147

Appendix 151

A. Notation and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . 153A.1 Mathematical Notation . . . . . . . . . . . . . . . . . . . . . . . 153A.2 Frequently Used Symbols . . . . . . . . . . . . . . . . . . . . . . 154A.3 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

B. Simulation Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 157B.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

B.1.1 Transmission Medium . . . . . . . . . . . . . . . . . . . 157B.1.2 DMT Parameters . . . . . . . . . . . . . . . . . . . . . . 157B.1.3 Noise Model . . . . . . . . . . . . . . . . . . . . . . . . 157

B.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158B.2.1 Transmission Medium . . . . . . . . . . . . . . . . . . . 158B.2.2 DMT Parameters . . . . . . . . . . . . . . . . . . . . . . 158B.2.3 Noise Model . . . . . . . . . . . . . . . . . . . . . . . . 159

B.3 Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159B.3.1 Transmission Medium . . . . . . . . . . . . . . . . . . . 160B.3.2 DMT Parameters . . . . . . . . . . . . . . . . . . . . . . 160B.3.3 Noise Model . . . . . . . . . . . . . . . . . . . . . . . . 160

1. INTRODUCTION

We are currently living in an age called the information age, characterized by adramatic increase in the exchange of information. Technically, this informationexchange is supported by the linking and networking of various systems, so thatthe growing demand of (cross -) communication can be served. However, thisboosts the overall information flow, and we have to think of techniques that enableus to transport this enormous amount of data.

In principle, the technologies that support transmission at extremely high datarates already exist, but their technical implementation is expensive. Optical datatransmission is one example that provides us with the necessary data rates, butthat is very costly. Hence, economical aspects also play a role in the decision ofwhich technology to use. There is no doubt that optical data transmission will bethe preferred solution in the long term, but for the near future, people will try todevelop cheaper methods that accommodate the desired demand for high data ratetransmission.

In recent years, the idea to use existing telephone lines for high data rate trans-mission took shape. This solution is relatively inexpensive, so that - from an eco-nomical point of view - there would be no objections against it. It turns out thatexisting telephone lines can support high data rates, provided that the lengths of thecables are not too long. In the corresponding literature, these transmission methodsare referred to as xDSL (Digital Subscriber Line) transmission.

However, the data rates supported are limited for longer loop lengths. In suchsituations, the application of an xDSL transmission system is limited.

So, one might ask, what are the (physical) reasons that xDSL transmission isnot effective for longer loop lengths? After analyzing this question, it was foundthat the limiting mechanisms are due to the fact that the various twisted pairs arebundled together in cable bundles in most practical topologies. It is quite clear -even for people that are not electrical engineers - that two loops that are close toeach other disturb each other because of the induced electro-magnetic field. Notethat this effect is called crosstalk in literature.

So, is it possible to overcome these limitations? One trivial solution would benot to bundle the different loops into cable bundles. On the other hand, this wouldfoil the economical advantages of xDSL transmission.

Hence, there is recent research to reduce the performance degradation intro-duced by crosstalk. There are several issues regarding crosstalk, and, therefore,different approaches to this problem. Note that crosstalk is subdivided into twotypes, i.e., Near-End Crosstalk (NEXT) and Far-End Crosstalk (FEXT), depending

2 1. Introduction

on the location the crosstalk originates from. Throughout this manuscript, we willassume that we know how to cope with NEXT and consider only the crosstalk thatstems from the far-end side (FEXT). We want to emphasize that NEXT it still atopic of ongoing research and our assumption is introduced to simplify matters byconsidering only one crosstalk source.

There is one way to deal with this problem that is based on the viewpoint thatconsiders crosstalk not as disturbance but - instead - as part of the channel. In thepresent work, we will pursue this idea and develop several methods for communi-cation over channels with crosstalk. We also want to emphasize that the resultingperformance gains are in line with Information Theory, which has led to the fun-damental result that reliable communication is only possible below a certain datarate threshold (the capacity) for a given (physical) channel. If crosstalk is con-sidered as part of the channel instead of as a disturbance source, this correspondsto a change of the channel. Thus, Information Theory provides us with a new datarate threshold.

In order to minimize the costs, we will assume that transmission over the indi-vidual loops is performed via Discrete Multitone modulation (DMT), which is themodulation scheme used in ADSL and VDSL. Note that existing technologies areusually cheaper than new technologies. Hence, a main part of this manuscript dealswith DMT, working out aspects not known before. We will show that there can besituations (not only in theory but also in practice) where these aspects gain enor-mous influence on the transmission performance and should be taken into account.

The present work is structured as follows:

1. The first chapter contains this introduction.

2. The second chapter considers the Subscriber-Line Network and presents thefundamentals of Discrete Multitone modulation (DMT). Furthermore, we areconsidering transmission over cable bundles and show how the crosstalk canbe modeled in order to reflect the properties of the real world. If each loopis equipped with DMT transmission, the crosstalk is transformed accordingto the DMT modulation scheme. We will show that DMT transmission overcable bundles can be described as a complex valued vector channel, i.e., asa complex matrix - vector product plus an additive complex valued noisevector. Since a vector channel has several inputs and outputs (the elementsof the vectors), we refer to such a scenario as Multiple - Input / MultipleOutput (MIMO) System.

3. Since we are dealing with complex valued vectors, Chapter 3 presents a the-ory about complex random vectors. Note that most literature about complexrandom vectors deals only with a sub-class, the so-called rotationally invari-ant complex random vectors. We will show in Chapter 4 that the complexrandom vectors we are looking at are not elements of this sub-class. Instead,they are rotationally variant, so that it makes sense to develop this theory.Specifically, we generalize the Maximum Entropy Theorem to rotationally

3variant complex random vectors, and then use it to obtain certain capacityresults. With the introduction of a rotationally invariant analogon of a com-plex random vector we are able to show that the additional correction term inthe Generalized Maximum Entropy Theorem is independent of the actual dis-tribution of the considered complex random vector. We want to emphasizethat it is not widely known that the complex random vectors are rotation-ally variant in general, and we will calculate the capacity loss that must beaccepted if this fact is neglected (using very general assumptions withoutmaking use of any specific DMT properties).

4. As already mentioned, it is shown in Chapter 4 that we have to deal with ro-tationally variant random vectors. To be more precise, we will show in thischapter that the noise vector (at the input of the Decision Device) has thisproperty (in general), and we will look at the implications resulting fromthis observation. It turns out that colored noise at the input of the DMT re-ceiver makes the noise vector at the input of the Decision Device rotationallyvariant. In order to cope with rotationally variant noise, one should applyrotated rectangular constellations instead of the common quadratic QAMconstellations, and we will derive analytical formulas for the (optimum) pa-rameters / shape of these constellations. In case this is not done, one mustaccept a capacity loss and higher symbol error probability. Both quantitieswill be calculated.

Finally in this chapter, we consider the problem to determine the Time Do-main Equalizer (TDE) coefficients. If the TDE coefficients are not suitablyadapted, there will be intersymbol and intercarrier interference. We will ob-tain closed form solutions for both interference contributions and study theirstatistical properties. Again, it turns out that the interference is representedby a rotationally variant complex random vector, which is a second argu-ment for the use of rotated rectangular signal constellations.

5. The first part of Chapter 5 generalizes the results of Chapter 4 about noiseand interference to the MIMO case, i.e., instead of looking at one single loopequipped with DMT transmission, we look at the whole cable bundle. Thisis of great importance, since these results are fundamental for the design oflow-complexity MIMO Time Domain Equalizers. This design is much moredifficult than for the single loop case.

The second part presents the general form of a transmission scheme thatcan cope with crosstalk. It is based on so-called joint processing functionsand allows the use of conventional Single - Input / Single - Output (SISO)codes, so that we can again resort to existing technologies for this part of thetransmission scheme.

The third section deals with transmission schemes whose joint processingfunctions are based on the Singular Value Decomposition (SVD) [19] ofthe channel matrix. We will show that we can obtain the optimum joint

4 1. Introduction

processing functions by means of the SVD. Furthermore, we study low(er)-complexity transmission variants and discuss their performance. To obtainquantitative results, we perform simulations of these transmission schemeswith realistic (practically used) parameters and compare the various meth-ods.

The final section of this chapter presents the UP MIMO1 scheme, a schemethat was originally designed by the author for wireless transmission, and thatalso has applications in wireline transmission. Specifically, it can be used toreduce the computational complexity at the transmitter side (but not at thereceiver side). We will treat various aspects of this scheme.

6. The last chapter presents the overall conclusions and gives an outlook onpossible further developments.

1 The name comes from the terminology unitary parametrization.

2. THE SUBSCRIBER-LINE NETWORK

In this chapter we will first review one of the most important modulation schemesused in the subscriber line network. This modulation scheme is called DiscreteMultitone modulation (DMT) and is currently used in the ADSL [25, 26, 2830]and VDSL [912, 27] standards. It allows an efficient implementation and exhibitsperformance near capacity. Our main focus in this chapter is to find a compactmathematical description for the relationship between the data to be transmittedand the received data.

Secondly, we will extend these considerations to DMT transmission over cablebundles. It is well known, that if several modems transmit over a cable bundle si-multaneously, each modem disturbs the other, so that we will encounter severe per-formance degradations. In literature, cf. [48], these interference mechanisms arecalled Near-End Crosstalk (NEXT) and Far-End Crosstalk (FEXT). In this work wewill mainly focus on FEXT, since there exist1 several techniques [48] to compen-sate for NEXT, whereas FEXT cancellation methods are still a topic of researchand development. Again, it is our goal to find a compact model for the input -output behavior.

It will turn out that both scenarios can be described essentially in the same way,which puts us in a position to analyze and optimize both systems using the sameframework.

2.1 Discrete Multitone Modulation

Discrete Multitone modulation is a modulation scheme based on existing efficientalgorithms for the calculation of (inverse) discrete Fourier transforms (Fast FourierTransform - FFT, Inverse Fast Fourier Transform - IFFT, [59]). Adding and re-moving the so called Cyclic Prefix (also called Guard Interval), cf. [13], in thetransmitter and receiver, respectively, transform the linear and time-invariant dis-tortions introduced by the channel (the copper wire) into cyclic distortions, whichcan be completely removed by means of (inverse) discrete Fourier transforms. Al-ternatively, one can also say that the DMT is an intelligent way to cope with intersymbol interference (ISI).

A complete (uncoded) DMT transmission system is depicted in Figure 2.1. Fora detailed discussion, we also refer to [13].

1 There is no doubt that present and future research should treat the various aspects of NEXT aswell.

6 2. The Subscriber-Line Network

Source

Conjugate ComplexExtension

IFFT(Length: N )

Add Cyclic Prefix(Length: p )

Parallel

Serial

D/A Converter

Transmit FilterDriver Stage

Channel(Twisted Pair)

Noise

Receive FilterLowpass

A/D Converter

Time Domain Equalizer(Adaptive FIR Filter)

Remove Cyclic Prefix(Length: p )

FFT(Length: N )

Remove ConjugateComplex Extension

FrequencyDomain

Equalizer

DecisionDevice

Sink

Parallel

Serial

bit rate1/ Tb

DMT symbol rate1/ T

s

channel symbol rate1/T

analog signal

MappingInverse

Mapping

Fig. 2.1: DMT transmission system.

2.1. Discrete Multitone Modulation 7

The Source emits a sequence of bits that are converted in the block Mappinginto a sequence of complex valued symbol vectors of dimension N2 + 1, the socalled DMT symbols. Note that there is the additional constraint that the firstand last elements of the vectors (the DMT symbols) have to be real valued. Eachsymbol element of the vector is chosen according to optimized discrete symbolconstellations. The optimization is usually carried out in a pre-transmission phasein which channel and noise characteristics are measured, so that high performanceis guaranteed during transmission and the required (signal power) constraints (cf.[912, 2530]) are fulfilled. In the next two blocks, Conjugate Complex Extensionand IFFT, each complex vector of dimension N2 +1 is extended to a complex vectorof dimensionN , with the property that one half of the vector is a conjugated versionof the other half and passed through an inverse discrete Fourier transform of lengthN . Note that the vectors at the output of the inverse discrete Fourier transform,which is implemented using fast Fourier algorithms, are now real valued vectors.The block Add Cyclic Prefix takes the last p elements of each vector and produces anew (N + p) - dimensional vector, which is obtained by stacking these p elementsand the original N - dimensional vector. In the next step (block Parallel Serial),all elements of these vectors are put together into a sequence of numbers, which arethen (block D/A Converter) transformed into the analog domain, passed through atransmit filter, and transmitted over the twisted pair (block Transmit Filter, DriverStage).

The received signal is passed through a Receive Filter (Lowpass) and convertedback to the digital domain (block A/D Converter). It goes through an adaptive filtercalled Time Domain Equalizer, and the symbols are then stacked into a sequenceof (N + p) - dimensional vectors (block Serial Parallel). The first p elementsof each vector are removed (block Remove Cyclic Prefix, which is naturally imple-mented together with the block Serial Parallel in practice), and for the resultingN - dimensional vectors a Fast Fourier Transform (FFT) is computed. The firstN2 + 1 elements of each vector (block Remove Conjugate Complex Extension) are

multiplied with N2 + 1 complex scalars2 (block Frequency Domain Equalizer),

which are calculated in the pre-transmission phase, so that the distortions intro-duced by the channel are compensated. The Decision Device maps back onto thesignal constellations, and the block Inverse Mapping in turn produces a bitstream,which is (hopefully) equal to the bitstream generated by the Source.

Observe the relation between DMT symbol rate 1Ts and channel symbol rate1T ,

cf. also Figure 2.1, given by1T=N + pTs

. (2.1)

In order to analyze and optimize this system, we will build up a mathematicalmodel for the input - output behavior. We start with the part of the transmissionsystem that models the channel, as shown in Figure 2.2.

All involved signals are real valued, discrete-time signals and we assume that

2 The first and last scalars are real valued.


D/A Converter



Noise


A/D Converter

rt

s

Fig. 2.2: DMT: The Channel.

the input - output behavior can be approximated by a linear and time-invariantconvolution of the input signal t = [t(n)]n=,..., with a real valued impulseresponse g = [g(n)]n=,..., plus an additive noise term s = [s(n)]n=,...,,i.e.,

r = g t+ s, (2.2)

r(n) =

k=g(k)t(n k) + s(n).

Next, we will include the so called Time Domain Equalizer (TDE) in our model,cf. Figure 2.3. Again, we model its behavior by a convolution with a real valued,time-discrete impulse response e = [e(n)]n=,...,, i.e.,

u = e r, (2.3)

u(n) =

k=e(k)r(n k).

Let h = [h(n)]n=,..., denote the convolution of e and g, h = e g, and letz = [z(n)]n=,..., denote the convolution of e and s, z = e s. Then, weobtain an input - output relationship as

u = h t+ z, (2.4)

u(n) =

k=h(k)t(n k) + z(n).

It can be seen from Figure 2.3 that the TDE is an adaptive filter. Its coefficientsare determined in the pre-transmission phase, so that the overall channel impulse


D/A Converter



Noise


A/D Converter


ut

s

Fig. 2.3: DMT: The Channel and the Time Domain Equalizer.

response h has a length shorter or equal to p + 1, p being the length of the CyclicPrefix, cf. Figure 2.1, or - to be more precise - they are determined such that

h(n) = 0, n < 0 or n > p. (2.5)

Note that for practically occurring channel impulse responses g, the resulting fil-ter e will be always non-causal and therefore not implementable. Since a simpledelay in the receiver solves this problem, assumption (2.5) will be maintained forsimplicity.

We will address the issue of designing the coefficients of the TDE in moredetail (including an analysis of the case that (2.5) holds only approximately) laterin Section 4.4.

Due to assumption (2.5) the infinite sum in (2.4) is replaced by a finite sum, sothat (2.4) simplifies to

u(n) =p

k=0

h(k)t(n k) + z(n). (2.6)

We can now reformulate this representation using a matrix - vector notation.With

bn0 =

u(n0)

u(n0 + 1)...

u(n0 +N 1)

RN , vn0 =

z(n0)z(n0 + 1)

...z(n0 +N 1)

RN ,


qn0 =

t(n0 p)...

t(n0 1)t(n0)

t(n0 + 1)...

t(n0 +N 1)

R(N+p),

and

H =

h(p) h(p 1) h(0) 0

. . . . . . . . .. . . . . . . . .

0 h(p) h(p 1) h(0)

RN(N+p)

we have

bn0 = Hqn0 + vn0 . (2.7)

From Figure 2.4, it can be seen that (2.7) describes the input - output behaviorbetween the input of the Parallel Serial block and the output of the RemoveCyclic Prefix block.

Let an0 denote theN - dimensional input vector of the Add Cyclic Prefix block,cf. also Figure 2.4. In this block3, the last p elements of the vector are stackedtogether with allN elements of the vector and are output as a (N+p) - dimensionalvector. We can write this operation as a matrix - vector product, i.e., qn0 = Ran0 ,with

R =

0 IpIN

R(N+p)N , (2.8)

Ip and IN being identity matrices of dimension p andN , respectively. WithHcyc =HR, we obtain the relation

bn0 = Hcycan0 + vn0 . (2.9)

3 We assume N > p.



Parallel

Serial

D/A Converter



Noise


A/D Converter



Parallel

Serial

. . . ,an0 ,an0+N+p, . . . . . . ,bn0 ,bn0+N+p, . . .

. . . ,qn0 ,qn0+N+p, . . .

Fig. 2.4: DMT: The Channel, the Time Domain Equalizer, the Parallel / Serial Conversion,and the Cyclic Prefix.


A straightforward calculation yields

HTcyc =

h(0) h(1) h(p) 0. . . . . . . . .

. . . . . . h(p)

h(p) 0. . . . . .

......

. . . . . . h(1)h(1) h(p) h(0)

RNN ,

which shows that the transposed matrix HTcyc is a cyclic matrix - each row can beobtained by cyclic permutations of the other rows. It is well known [13, 59] thatcyclic matrices can be diagonalized by means of the discrete Fourier transform(DFT) and the inverse discrete Fourier transform (IDFT). Let

F =[

1Ne

2piNkl]k,l=0,...,N1

and F1 =[

1Ne

2piNkl]k,l=0,...,N1

(2.10)denote the DFT - and the IDFT - matrix, respectively, and let

H(z) =

n=h(n)zn =

pn=0

h(n)zn (2.11)

denote the Z - transform [13, 59] of h. Then,

F1HTcycF =

H(e

2piN0)

0

H(e

2piN1)

. . .

0 H(e

2piN(N1)

)

,

and transposing this equation (using F = FT and F1 = FT),

FHcycF1 =

H(e

2piN0)

0

H(e

2piN1)

. . .

0 H(e

2piN(N1)

)

. (2.12)

Note that the block IFFT in the transmitter performs a multiplication withF1 andthe block FFT in the receiver performs a multiplication with F using efficient fastFourier algorithms, cf. Figure 2.5, such that Hcyc is diagonalized and hence ISI isavoided.


Conjugate ComplexExtension

IFFT(Length: N )


Parallel

Serial

D/A Converter



Noise


A/D Converter



FFT(Length: N )

Remove ConjugateComplex Extension

FrequencyDomain

Equalizer

Parallel

Serial

. . . , en0 , en0+N+p, . . . . . . ,mn0 ,mn0+N+p, . . .

. . . , fn0 , fn0+N+p, . . .

. . . , cn0 , cn0+N+p, . . . . . . ,dn0 ,dn0+N+p, . . .

Fig. 2.5: DMT: The Channel, the Time Domain Equalizer, the Parallel / Serial Conver-sion, the Cyclic Prefix, the (I)FFT, the Conjugate Complex Extension, and theFrequency Domain Equalizer.


The blocks Conjugate Complex Extension and Remove Conjugate Complex Ex-tension have the task of achieving a real valued transmit signal. This happens if theoutput vector an0 of the IDFT is real valued. In order to guarantee this constraint,the input vector of the IDFT has to fulfill a Hermitian symmetry condition, i.e., thevector (for even N )

cn0 =

cn0(0)

...cn0(

N2 )

...cn0(N 1)

with an0 = F1cn0

has to satisfy [13, 59]

cn0(n) = cn0(N n), n = 1, . . . , N2 1

and (2.13)

={cn0(0)} = ={cn0(

N2 )}= 0,

where the superscript denotes complex conjugation. The block Conjugate Com-plex Extension takes its

(N2 + 1

)- dimensional complex input vector,

en0 =

en0(0)...en0(

N2 )

,for which the first and last elements are assumed to have vanishing imaginary parts,={en0(0)} = =

{en0(

N2 )}= 0, and extends it to the N - dimensional complex

vector

cn0 =

en0(0)...

en0(N2 )

en0(N2 1)...

en0(1)

. (2.14)

On the other hand, the block Remove Conjugate Complex Extension inverts theoperation of the block Conjugate Complex Extension. Mathematically, this can bewritten as a matrix - vector product, i.e., fn0 = Edn0 , with

E =[IN2+1 0

] R(N2 +1)N (2.15)

and dn0 being the output vector of the DFT, dn0 = Fbn0 . For the nomenclature ofthe previously described input / output vectors, we also refer to Figure 2.5. Finally,


we obtain an input - output relation as

fn0 = Edn0= EFbn0= EF

(Hcycan0 + vn0

)(2.16)

= EFHcycan0 +Ewn0= EFHcycF1cn0 +Ewn0= Den0 +Ewn0

with the abbreviationswn0 = Fvn0

and

D =

H(e

2piN0)

0

H(e

2piN1)

. . .

0 H(e

2piN

N2

)

.

Let be defined as

: C C, z 7 z ={

1z , z 6= 00, z = 0

and let D denote the (Moore - Penrose) pseudo inverse [19] of D, i.e.,

D =

H(e

2piN0)

0

H(e

2piN1)

. . .

0 H(e

2piN

N2

)

.

The Frequency Domain Equalizer (FDE), cf. also Figure 2.5, performs a multipli-cation of its input vector with the matrix D, i.e., mn0 = Dfn0 , so that we finallyobtain

mn0 = en0 +DEwn0 , (2.17)

with the additional requirement that en0(n) = 0 if H(e

2piNn)= 0.

The block Mapping, see Figure 2.1, maps the bitstream onto complex symbols,according to specific constellations (usually QAM constellations). In principle, weare free to design these constellations, except that the constellations correspond-ing to subcarrier 0 and N2 have to be real valued (they correspond to en0(0) and

en0(N2 )), and that en0(n) = 0 if H

(e

2piNn)= 0. However, there is another re-

quirement that we have to take care of. We cannot transmit with arbitrarily high


power, as this would violate the standards [912, 2530] and is also impossiblefrom an engineering point of view. In order to model this power constraint, wewill assume that the average sum power4 of the vector en0 is smaller or equal toa certain number SDMT. This power constraint now translates to a constraint onthe possible signal constellations. We also want to emphasize that this model isonly an approximation for the power constraint of a real system required by thestandards [912, 2530], but it allows an analysis of the system on one hand, and isalso accurate to some extent on the other hand. Note that in practical systems, thereis an additional constraint on the power spectral density of the transmit signal.

There remain two blocks to explain. The Decision Device maps (rounds) thenoisy elements of the vectors mn0 back onto the constellations, whereas the blockInverse Mapping is an exact inversion of the block Mapping and produces a bit-stream, which is hopefully equal to the bitstream emitted by the Source.

2.2 Discrete Multitone Modulation on a Cable Bundle

In real transmission scenarios, the various loops that are used for transmission arebundled together into a Cable Bundle. For example, consider the transmission be-tween the Central Office and the Cabinet [48]. Since many customers have to beprovided with high data rate links, the telecom providers try to use the same ca-ble duct for the different loops for as far as possible and to split the loops at thefurthest possible position for economical reasons. Usually this split position is theCabinet. A similar application for cable bundles might be data transmission fromand to a mobile base station, where high data rates are required and one loop hasnot enough capacity. Of course, optical fibers will have sufficient capacity, but theyare expensive5 and cable bundles could therefore be a good compromise betweencost and data rate. On the other hand, it is intuitively clear that loops that are closeto each other disturb each other and limit the performance. This can be illustratedas depicted in Figure 2.6. The k-th receiver (here shown for the 2-nd receiver)receives not only the (distorted) signal from transmitter k plus (background) noise.It also receives signals from all other transmitters, no matter in what direction theytransmit. We classify these additional receive signals by means of their origin: ifthe source of the signal comes from the same location as the considered receiver,we call it Near-End Crosstalk (NEXT) [48], whereas if the source of the signalcomes from the other end of the transmission line, we call it Far-End Crosstalk(FEXT) [48]. Note that it is - at least conceptually - easier to cope with NEXT,since the NEXT producing signals are also known in receiver k (they are at thesame location) and can therefore be subtracted [3]. This approach is called NEXTcancellation [48]. Another approach is to use different frequency bands for differ-ent transmit directions. Therefore, we will assume in the subsequent sections and

4 To be more precise: the vector en0 is modeled as random vector having a correlation matrix [34]with a trace smaller or equal to SDMT.

5 In fact, it is not the fiber that is expensive but the equipment for modulating optical signals and- of course - to lay the fiber. Fiber is cheaper than copper.

2.2. Discrete Multitone Modulation on a Cable Bundle 17

FEXT

NEXT

Noise

12

K

12

L

Fig. 2.6: DMT: Transmission over a Cable Bundle.

chapters that there is no Near-End Crosstalk, without (severe) loss of generality.The more challenging disturbance is Far-End Crosstalk. In principle, there are twoways of dealing with these signals. One is to consider them as additional noise.The other - more intelligent - is to develop algorithms that utilize the additionalsignals, which will result in better performance (higher capacity), but will requiremore computational effort. The design of such algorithms that achieve superiorperformance and are efficiently implementable is still an open problem. We willmainly treat this issue. To make things easier - from an analytical and economicalpoint of view - we will assume that each loop is equipped with Discrete Multi-tone modulation (DMT). Note that DMT is an existing, widely used and thereforeinexpensive technology, that is amenable to analysis, cf. Section 2.1.

The transmission system we are going to deal with is depicted in Figure 2.7with a block definition as shown in Figure 2.8. Since it has several inputs and out-puts, it can be considered as a Multiple - Input / Multiple - Output (MIMO) system.We assume that all transmitters and receivers are synchronized, i.e., all transmit-ters use the same clock and sampling rate and, similarly, all receivers use the sameclock and sampling rate. Furthermore, all DMT modulators and demodulators areassumed to operate with the same parameters: the (I)DFT lengths are all the same,as well as the lengths of the Cyclic Prefixes.

Like in the single loop case, we start the analysis of this system with the partthat models the channel, cf. Fig 2.9. Again, all involved signals are real valued,discrete-time signals and are denoted by tk, sk, rk, and uk, respectively,where the superscript bracket notation k, k = 1, . . . ,K, is chosen to avoid a con-


Transmit Unit

Receive Unit

Transmit Unit

Transmit Unit

Receive Unit

Receive Unit

Time Domain Equalizer 1

Time Domain Equalizer K


DMT Modulator

DMT Demodulator

DMT Demodulator

DMT Demodulator

DMT Modulator

DMT Modulator

FEXT

Noise

. . . , eKn0 , eKn0+N+p

, . . .?

. . . , e2n0 , e2n0+N+p

, . . .?

. . . , e1n0 , e1n0+N+p

, . . .6

. . . , f 1n0 , f1n0+N+p

, . . .?

. . . , f 2n0 , f2n0+N+p

, . . .6

. . . , f Kn0 , fKn0+N+p

, . . .6

Fig. 2.7: Multiple - Input / Multiple - Output DMT transmission system.


DMT Modulator

ConjugateComplexExtension

IFFT(Length: N )

AddCyclicPrefix

(Length: p )

Parallel Serial

Transmit Unit

D/A Converter Transmit FilterDriver Stage

DMT Demodulator

RemoveConjugateComplexExtension

FFT(Length: N )

RemoveCyclicPrefix

(Length: p )

Parallel Serial

Receive Unit

A/D Converter Receive FilterLowpass

...,ek

n0,ek

n0+N+p,...

...,fk

n0,fk

n0+N+p,...

tk

uk

tk

rk

Fig. 2.8: Block definition.


Transmit Unit

Receive Unit

Transmit Unit

Transmit Unit

Receive Unit

Receive Unit


Time Domain Equalizer K


FEXT

Noise

tK

t2

t1

u1

u2

uK

r1

r2

rK

s1, s2, . . . , sK

Fig. 2.9: MIMO DMT: The Channel and the Time Domain Equalizers.


fusion between a signal on the k-th loop and the k-th power of a signal. The input- output behavior is approximated by linear and time-invariant convolutions of theinput signals tl =

[tl(n)

]n=,..., with real valued impulse responses g

kl =[gkl(n)

]n=,..., plus additive noise terms s

k =[sk(n)

]n=,...,. Note

that k, l = 1, . . . ,K. In contrast to the single loop scenario, the k-th received sig-nal rk =

[rk(n)

]n=,..., does not only depend on the k-th transmit signal

tk. It also depends (linearly) on all other transmit signals tl, l 6= k, i.e.,

rk =Kl=1

gkltl + sk, (2.18)

rk(n) =Kl=1

m=

gkl(m)tl(nm) + sk(n).

The Time Domain Equalizers (TDEs) have a similar function as in the singleloop case. Again, it is their task to shorten the impulse responses, and their behav-ior is modeled by convolutions with real valued, time-discrete impulse responsesek =

[ek(n)

]n=,...,, i.e.,

uk = ekrk, (2.19)

uk(n) =

m=ek(m)rk(nm).

For complexity reasons, we do not allow the k-th TDE to use other input signalsthan the k-th input signal rk. Let hkl =

[hkl(n)

]n=,..., denote the con-

volution of ek and gkl, hkl= ekgkl, and let zk = [zk(n)]n=,...,

denote the convolution of ek and sk, zk= eksk. Then, we obtain an input- output relation as

uk =Kl=1

hkltl + zk, (2.20)

uk(n) =Kl=1

m=

hkl(m)tl(nm) + zk(n).

The coefficients of the TDEs are determined in the pre-transmission phase,such that all overall channel impulse responses hkl have lengths shorter or equalto p+1, p being the length of the Cyclic Prefixes, or - to be more precise - they aredetermined such that

hkl(n) = 0, n < 0 or n > p. (2.21)

Note that the calculation of the TDE coefficients is a nontrivial problem, since theimpulse response of the k-th TDE, ek, has to shorten all gkl, l = 1, . . . ,K,


simultaneously, and can therefore be an over-determined problem, which can beonly solved in an approximate sense. We will address the issue of designing thecoefficients of the TDEs in more detail later in Section 5.1.

The MIMO DMT transmission system, as depicted in Figure 2.7, has DMTdemodulators at each TDE output. As seen in the previous section, the DMT de-modulator performs a linear operation of its input signal

uk =Kl=1

hkltl + zk. (2.22)

We can therefore interchange the sum in (2.22) with the operation of the DMT de-modulator and apply the analysis of single loop DMT (Section 2.1) to each elementof the sum. The input - output behavior of the whole MIMO DMT system is thenobtained by summation. Let

. . . , ekn0 , ekn0+N+p

, . . . C N2 +1

and. . . , f kn0 , f

kn0+N+p

, . . . C N2 +1denote the input vector sequence of the k-th modulator and the output vector se-quence of the k-th demodulator, respectively, and define

vkn0 =

zk(n0)

zk(n0 + 1)...

zk(n0 +N 1)

RN .Then the MIMO DMT input - output behavior can be immediately derived fromequation (2.16) as

f kn0 =Kl=1

Dkleln0 +Ewkn0 (2.23)

with the abbreviationswkn0 = Fv

kn0

and

Dkl =

Hkl

(e

2piN0)

0

Hkl(e

2piN1)

. . .

0 Hkl(e

2piN

N2

)

,

where

Hkl(z) =p

n=0

hkl(n)zn (2.24)


denotes the Z - transform of hkl and E and F are defined in (2.15) and (2.10),respectively. With equation (2.23), we have found an expression that - in principle- can be used as starting point for further analysis and optimization. Unfortunately,we have to deal with a set of K equations, since k takes values from 1 to K. Forsimplicity (and also for notational reasons) we will rewrite equation (2.23) in thefollowing way. Let

xn0 =

e1n0 (0)

...eKn0 (0)e1n0 (1)

...eKn0 (1)

...

e1n0

(N2

)...

eKn0

(N2

)

, yn0 =

f1n0 (0)

...fKn0 (0)f1n0 (1)

...fKn0 (1)

...

f1n0

(N2

)...

fKn0

(N2

)

, and nn0 =

w1n0 (0)

...wKn0 (0)w1n0 (1)

...wKn0 (1)

...

w1n0

(N2

)...

wKn0

(N2

)

C(N2 +1)K

denote stacked versions of the input, output and noise vectors, respectively, and let

HMIMO DMT =

H(0) 0

H(1). . .

0 H(N2)

C(N2 +1)K(N2 +1)K

with

H(n) =

H11

(e

2piNn)

H1K(e

2piNn)

.... . .

...

HK1(e

2piNn)

HKK(e

2piNn) CKK

denote a block diagonal matrix. Then, equation (2.23) translates to

yn0 = HMIMO DMTxn0 + nn0 , (2.25)

and we have found a compact mathematical description of the input - output behav-ior for synchronized DMT transmission over cable bundles. Like in the single loopcase, we have to impose a power constraint. Instead of having power constraints


for each loop, we will assume that the average sum power6 of the vector xn0 issmaller or equal to a certain number SMIMO DMT. Mathematically, this is easier tohandle, and if we set SMIMO DMT = KSDMT, it can be shown by simulation thatin real systems, this relaxed power constraint is sufficient to guarantee a correctpower distribution on the various loops.

2.3 The Channel Model

In the previous two sections, we built up a mathematical model for the behavior ofDMT transmission over one single loop and DMT transmission over a cable bun-dle. An inspection of the (final) equations (2.16) and (2.25) shows that both modelshave the same structure, i.e., the output vector is obtained by a multiplication of theinput vector with a certain matrix plus an additive noise vector. As a consequence,both scenarios are special cases of the following general channel model,

y = Ax+ n, (2.26)

where y Cr and x Ct denote the received and transmitted vectors, respec-tively. A is a deterministic r t complex matrix, the channel matrix, and n Cris the noise vector. The transmitter is constrained in its total power7 to S,

E{xHx} S, (2.27)or, equivalently, since xHx = tr

(xxH

), and expectation and trace commute,

tr(E{xxH}) S. (2.28)

Observe that all occurring vectors are modeled as random vectors, so that the ex-pectation operation E{} in equations (2.27) and (2.28) is well defined. We willdiscuss related definitions and properties of complex random vectors in Section3.1.

Furthermore, we want to emphasize that the channel matrix as well as the noisecharacteristics are assumed to be known at the receiver and the transmitter. Thismeans that a real system has to determine (estimate) these parameters in a pre-transmission phase before the effective information transmission starts.

If we compare the results obtained for DMT transmission over one single loop,cf. equations (2.16) and (2.17), with this channel model, we conclude that A is adiagonal matrix and we have r = t = N2 + 1 in this case. The Frequency DomainEqualizer (FDE) even yields an identity channel matrix.

In the MIMO DMT case, cf. equation (2.25), the channel matrix is no longera diagonal matrix. It is a block diagonal matrix. All sub-matrices on the diago-nal have dimension K K, whereas the whole matrix has dimensions r = t =(N2 + 1

)K. Observe that there is no FEXT if and only if A is a diagonal matrix.

6 To be more precise: the vector xn0 is modeled as random vector having a correlation matrix[34] with a trace smaller or equal to SMIMO DMT.

7 The superscript H denotes Hermitian transposition, i.e., transposition and complex conjugation.

2.3. The Channel Model 25

In the following chapters, we will try to analyze and optimize transmissionover channels of the form (2.26), (2.27) and (2.28). We will develop algorithmsthat are adapted to this transmission scenario and that can be efficiently imple-mented. We will show that in the special case of a diagonal or block diagonalchannel matrix, the computational effort can be remarkably reduced. We also wantto emphasize that the complex channel as defined in equations (2.26), (2.27) and(2.28) is equivalent to a real channel of doubled dimension. Let


random vector is not sufficient for a complete statistical description. Only for a sub-class, the so-called rotationally invariant random vectors, the covariance matrixfully specifies the p.d.f. . We will treat the theory of complex random vectors inSection 3.1.

3. ENTROPY AND CAPACITY

In this chapter, it is our goal to calculate the capacity [6] for channels of the form(2.26), (2.27) and (2.28). Since this is the complex channel representation, we haveto deal with complex random vectors. As already mentioned in Section 2.3, forrotationally variant Gaussian complex random vectors, mean and covariance ma-trix are not sufficient for a statistical description. The knowledge of the so-calledpseudo-covariance matrix1 is also mandatory [41]. Therefore, the first section(Section 3.1) of this chapter is dedicated to the theory of complex random vectorsand presents the main definitions and theorems. Furthermore, the concept of En-tropy is extended from the real to the complex case, and a Generalized MaximumEntropy Theorem is proved, cf. also [57]. In contrast to previous work [41, 58],this theorem also takes into account the pseudo-covariance matrix and strengthensthe results known before. With the introduction of a rotationally invariant anal-ogon of a complex random vector we are able to show that the main statementis independent of the probability distribution of the considered complex randomvector.

The reason why we give such a detailed introduction to complex random vec-tors is that we will show in Section 4.1 that the noise vector Ewn0 in (2.16) andin turn the noise vector nn0 in (2.25) are rotationally variant in general, and there-fore the pseudo-covariance matrices of these noise vectors should be taken intoaccount. In practical systems, as well as in literature, this fact is simply neglected,resulting in an unnecessary performance loss.

We will calculate the capacity for three cases. First, for rotationally invariantnoise, and second, for rotationally variant noise, where the pseudo-covariance ma-trix is taken into account. For the third case, see also [55], we assume that thenoise is rotationally variant but it is erroneously believed that the noise is rotation-ally invariant, so that the pseudo-covariance matrix is neglected. This results in adecreased capacity and we will calculate the capacity loss. For related literaturewe also refer to [33, 39, 62].

1 Note that the add on pseudo does not tell us anything about the definition of this matrix. Thisis probably one of the reasons why other appellations - such as complementary covariance matrix(see [34]) - are circulating in literature as well. None of these appellations is really satisfactory inour opinion. In fact, this matrix is the cross-covariance matrix [34] of the considered random vectorand its complex conjugate. However, in the subsequent sections we will maintain the widely usedterminology pseudo-covariance matrix.

28 3. Entropy and Capacity

3.1 The Maximum Entropy Theorem for Complex Random Vectors

3.1.1 Preliminaries

A complex random vector x Cn is defined as a random vector of the formx =

3.1. The Maximum Entropy Theorem for Complex Random Vectors 29

Proposition 3.1 Let x Cn denote a complex random vector with expectationvector x, covariance matrix Cx and pseudo-covariance matrix Px. Let A Cmn and b Cm denote a deterministic matrix and a deterministic vector,respectively, and consider the complex random vector y Cm, obtained by theaffine transformation y = Ax+ b. Then,

y = Ax + b, Cy = ACxAH and Py = APxAT.

Proof. y y = Ax+ bAx b = A (x x), and therefore Cy == E{(y y) (y y)H} = E{A (x x) (x x)HAH} = ACxAH andPy = E{(y y) (y y)T} = E{A (x x) (x x)TAT} = APxAT.

Definition 3.2 A complex random vector x Cn is called rotationally invariant(cf. [32]) or proper (cf. [41]) or circularly symmetric (cf. [58]) if its pseudo-covariance matrix Px vanishes.

Corollary 3.3 Rotational invariance is invariant under affine transformations, i.e.,

Px = 0 PAx+b = 0.

Proof. Apply Proposition 3.1 to Definition 3.2.

Decomposition (3.5) suggests the following definition2 of mappings x, A and A(note that x and A can be easily distinguished considering their domains),

Definition 3.4

x : Cn R2n, x 7 x =[


Lemma 3.5

C = AB C = ABC = A+B C = A+ BC = AH C = ATC = A1 C = A1

det(A)= |det (A)|2 = det (AAH) , A Cnnz = x+ y z = x+ yy = Ax y = Ax


and

AB =[


and furthermore ( is real), applying Lemma 3.5,

AA I2n = AAH In,and consequently,

det(AA I2n

)= | det (AAH In) |2.

The eigenvalues of AA are the squared eigenvalues of A and are equal to theeigenvalues AAH, which are the squared singular values of A.The previous result gives us knowledge about the eigenvalues of A for a symmetricmatrix A. But what about its eigenvectors? Can we obtain an analogous result?This question is mainly of theoretical interest since Theorem 3.8 together withProposition 3.7 is sufficient to prove the subsequent results. However, in orderto get a deeper understanding of the underlying structure we will present anotherproof of Theorem 3.8, which gives insight into the eigenvector problem as well.The proof is strongly related to the not so well known Takagis factorization [24]that applies to symmetric complex matrices:

Theorem 3.9 (Takagis factorization) Let A Cnn denote a symmetric ma-trix, i.e.,AT = A. Then there exist a unitary matrixQ Cnn and a real valued,diagonal matrix Rnn with non-negative entries, i.e.,

=

1 0. . .0 n

, i 0, i = 1, . . . , n,such that

A = QQT. (3.7)

The columns of Q are an orthonormal set of eigenvectors for AAH, and the cor-responding diagonal entries of are the non-negative square roots of the corre-sponding eigenvalues of AAH, i.e., the singular values [19] of A.

Proof. [24].We want to emphasize that factorization (3.7) represents at the same time the Sin-gular Value Decomposition (SVD) [19] of A.

As a corollary we (re-)obtain a stronger version of Theorem 3.8 and Proposition3.7 (for a symmetric A):

Corollary 3.10 Let A Cnn denote a symmetric matrix, i.e., AT = A. Thenthere exist a unitary matrix Q Cnn and a real valued, diagonal matrix Rnn with non-negative entries, such that

A = QQT. (3.8)

Q and are as in Theorem 3.9, in particular the diagonal entries of are thesingular values of A.


Proof.

A = Q (QT)

= QQT (3.9)

= Q (QH)

= QQH (3.10)= QQT, (3.11)

where (3.9) and (3.10) are consequences of statements 3 and 2 of Lemma 3.6, re-spectively, and (3.11) follows from the third statement of Lemma 3.5.We want to emphasize that factorization (3.8) represents the eigenvalue decompo-sition of A. Note that the diagonal eigenvalue matrix has the structure

=

1 0. . .

n1

. . .0 n

,

i being the singular values ofA. It is very remarkable that the orthonormal eigen-vector matrix is of the form Q for a unitary matrix Q.

Corollary 3.11 Let Px = E{(x x) (x x)T} denote the pseudo-covariancematrix of a complex random vector x Cn. Then there exist a unitary matrixQx Cnn and a real valued, diagonal matrix x Rnn with non-negativeentries, such that

Px = QxxQTx . (3.12)

The diagonal entries of x are the singular values of Px.

Proof. Px Cnn and PTx = Px.

Until now, we were able to express the eigenvalues of Px in terms of Px. Wewill see later that it is very important to find a similar relation for the determinantof Cx. Due to decomposition (3.5) we can expect that det (Cx) depends on thecovariance matrix Cx and the pseudo-covariance matrix Px. In order to solve thisproblem, we need the following

Definition 3.12 A matrix B Cnn is called generalized Cholesky factor of apositive definite Hermitian matrix A Cnn if it satisfies the equation

A = BBH.


Since detA = |detB|2, a generalized Cholesky factor is always a non-singularmatrix. Note that the conventional Cholesky decomposition (cf. [19]), A = LLH,where L is lower-triangular, yields a generalized Cholesky factor L. There areother ways of constructing a generalized Cholesky factor as well. LetA =UDUH

be the eigenvalue decomposition of A. Then U is a unitary matrix consisting ofthe normalized eigenvectors of A and D is a diagonal matrix having the real andpositive eigenvalues of A as diagonal entries. For any matrix T, which satisfiesD = TTH, B = UT is a generalized Cholesky factor. The next theorem tells usthat if we know one generalized Cholesky factor, we know them all:

Theorem 3.13 Suppose B is a generalized Cholesky factor of A. Then, for anyunitary matrix U, C = BU is also a generalized Cholesky factor. Conversely, ifB and C are generalized Cholesky factors, there exists a unitary matrix U, suchthat C = BU.

Proof. The direct part follows from

CCH = BUUHBH = BBH = A,

where UH = U1 was used. For the converse part, we have

BBH = CCH,

and thereforeC1B = CHBH,

which is equivalent to (B1C

)1 = (B1C)H .This shows that U = B1C is unitary.

Lemma 3.14 Suppose the complex random vector x Cn has a non-singularcovariance matrix Cx and a pseudo-covariance matrix Px. Let Bx be a gener-alized Cholesky factor of Cx and let i denote the singular values of PBx1x =Bx1PxBxT, the pseudo-covariance matrix of the complex random vectorBx1x. Then det (Cx) can be expressed as

det (Cx) = 22n (detCx)2i>0

(1 2i

).

Proof. Consider the complex random vector y = Bx1x. Obviously, y has acovariance matrix Cy = In and a pseudo-covariance matrix Py = PBx1x. Thisyields (x = Bxy, according to Lemma 3.5)


det (Cx) = det(CBxy

)= det

(BxCyBTx

)= det (Cy) det

(BxBTx

)= det (Cy) det Cx

= det (Cy) (detCx)2 = 22n (detCx)2 det (2Cy)

= 22n (detCx)2 det(I2n + Py

),

and, applying Corollary 3.11,

det (Cx) = 22n (detCx)2 det(I2n + y

)= 22n (detCx)2

i>0

(1 2i

),

where the product is over all positive eigenvalues (counted with multiplicity) ofPBx1x, or, equivalently (Corollary 3.11), over all (non-zero) singular values ofPBx1x.

Furthermore, the concept of generalized Cholesky factors enables us to formu-late a criterion for a matrix to be a pseudo-covariance matrix:

Theorem 3.15 Let C Cnn be an Hermitian positive definite matrix and B ageneralized Cholesky factor of C. C and a matrix P Cnn are covariancematrix and pseudo-covariance matrix of a complex random vector, respectively, ifand only if P is symmetric and the singular values of B1PBT are smaller orequal to 1.

Proof. We will first prove the necessary condition, i.e., the condition that P issymmetric and that the singular values of B1PBT are smaller or equal to 1,if C and P are covariance and pseudo-covariance matrix of a complex randomvector, respectively. Let us take a look at the linearly transformed random vectory = B1x, where x denotes the complex random vector with covariance matrixCand pseudo-covariance matrix P. Obviously, y has a covariance matrix Cy = Inand a pseudo-covariance matrix Py = PB1x = B

1PBT. From

Cy =12

(Cy + Py

)=

12

(I2n + B1PBT

),

and the fact that Cy is non-negative definite, we conclude - using Corollary 3.11- that the singular values of B1PBT are smaller or equal to 1. Note that apseudo-covariance matrix is symmetric by definition.

In order to prove thatP being symmetric and the singular values ofB1PBT

being smaller or equal to 1 is also a sufficient condition that C and P are covari-ance and pseudo-covariance matrix of a complex random vector, respectively, we


have to construct a complex random vector with covariance matrix C and pseudo-covariance matrix P. Define C1 = I2n and P1 = B1PBT, and consider

12

(C1 + P1

)=

12

(I2n + B1PBT

),

which is a symmetric and non-negative definite (apply Corollary 3.10) matrix.Therefore, there exists5 a complex random vector y Cn, such that

y =[


with1 =

Cx + rx2

and 2 =Cx rx

2(3.15)

and

U =[cos x2 sin x2sin x2 cos

x2

]. (3.16)

Proof. (3.13) is a consequence of (3.5). For (3.14) with (3.15) and (3.16) observethat

Cx =12

[Cx 00 Cx

]+rx2

[cosx sinxsinx cosx

]

U

24 1 00 1

35UT.

Note that we do not require rx 0, i.e., we explicitly allow ambiguities of x inthis relaxed polar coordinates representation of Px.

3.1.2 Differential Entropy of Complex Random Vectors

The differential entropy h(x) of a complex random vector x is defined as the dif-ferential entropy h(x) of the corresponding real random vector7 x:

Definition 3.18

h(x) = h(x) =

supp{fx}fx(x) log fx(x)dx,

provided that the p.d.f fx of x and the integral exists. supp{fx} R2n is thesupport set of the random vector x.

For complex Gaussian random vectors, the differential entropy is calculated in [41]or [58] as

h(x) = h(x) =12log det (2pieCx) , (3.17)

which - unfortunately - uses the covariance matrix of the corresponding real ran-dom vector x. We are interested in a result depending on the covariance matrixCxand the pseudo-covariance matrix Px of x.

It is well known, cf. [41] or [58], that a rotationally invariant complex Gaus-sian random vector maximizes entropy under the constraint of a given covariancematrix:

Theorem 3.19 (Maximum Entropy Theorem for Complex Random Vectors) aSuppose the complex random vector x Cn is zero-mean and has a non-singularcovariance matrix Cx. Then the differential entropy of x satisfies

h(x) log det (pieCx)7 Throughout this manuscript log = log2 unit of entropy is [bit].


with equality if and only if x is rotationally invariant and Gaussian.

Proof. [41], [58].

Theorem 3.19 tells us that all zero-mean complex random vectors with a spec-ified covariance matrix C have entropies smaller or equal to log det (pieC). But ifwe are given a complex random vector with a certain covariance and non-vanishingpseudo-covariance matrix, the bound is not tight anymore (since it does not takeinto account the pseudo-covariance matrix). For this case, it should be possible tostrengthen the theorem in the sense that it is a statement for a class of complex ran-dom vectors with a specified covariance and pseudo-covariance matrix. In order toprove such an extension, we need the real counterpart of Theorem 3.19:

Theorem 3.20 (Maximum Entropy Theorem for Real Random Vectors) aSuppose the real random vector8 x R2n has a non-singular covariance matrixCx. Then the differential entropy of x satisfies

h(x) 12log det (2pieCx)

with equality if and only if x is Gaussian.

Proof. We will first prove this theorem for the special case of x being a zero-meanrandom vector (cf. also [6]). Let fx denote the p.d.f. of x, and let g R2n be arandom vector with p.d.f.

fg(x) =1

det (2piCx)e

12xTC1x x, (3.18)

so that g is a zero-mean, Gaussian distributed random vector with covariance ma-trix Cx. Observe that

(x = [x(1) x(2n)]T

)R2n

fx(x)x(i)x(j)dx =R2n

fg(x)x(i)x(j)dx, 1 i, j 2n

and that log fg(x) is an affine combination of the terms x(i)x(j). Thus,

Efx{log fg(x)} = Efg{log fg(x)}.

8 Note that this theorem is also valid for real random vectors, which are not induced by complexrandom vectors. Furthermore, odd dimensions are allowed as well.


Then,

h(x) h(g) =

supp{fx}fx(x) log fx(x)dx+

R2n

fg(x) log fg(x)dx

=

supp{fx}fx(x) log fx(x)dx+

+

supp{fx}fx(x) log fg(x)dx

=

supp{fx}fx(x) log

fg(x)fx(x)

dx

1ln 2

supp{fx}

fx(x)(fg(x)fx(x)

1)dx

=1ln 2

(supp{fx}

fg(x)dx 1)

1ln 2

(1 1) = 0,

with equality if and only if fx = fg almost everywhere (cf. [21]). Thus h(x) h(g).

The general case, where the random vector x has a non-vanishing expectationvector x, is an immediate consequence of the special case, since h(x x) =h(x) and Cxx = Cx.

Combining this theorem with Lemma 3.14, we end up with

Theorem 3.21 (Generalized Maximum Entropy Theorem) aSuppose the complex random vector x Cn has a non-singular covariance matrixCx and a pseudo-covariance matrix Px. Let Bx be a generalized Cholesky factorof Cx and let i denote the singular values of Bx1PxBxT, which must9 besmaller than 1. Then the differential entropy of x satisfies

h(x) log det (pieCx) + 12i

log(1 2i

)

0

,

with equality if and only if x is Gaussian.

9 According to Theorem 3.15, the singular values are smaller or equal to 1. This additional con-straint is mandatory for the validity of the statement.


Proof. We have (using Lemma 3.14),12log det (2pieCx) =

12log((2pie)2n

)+12log detCx

= n log (2pie) +12log(22n

)+12log((detCx)

2)+

+12

i

log(1 2i

)= n log 2 + n log (pie) n log 2 +

+ log detCx +12

i

log(1 2i

)= log det (pieCx) +

12

i

log(1 2i

),

which, together with Definition 3.18, Theorem 3.20, and equation (3.17), impliesthe theorem.

Note that Theorem 3.19 is a corollary of Theorem 3.21. Therefore, Theorem3.21 is really a generalization of Theorem 3.19.

The Generalized Maximum Entropy Theorem (Theorem 3.21) compares twocomplex random vectors, the original one and its Gaussian distributed counter-part, i.e., a complex random vector with the same covariance matrix and pseudo-covariance matrix but with a Gaussian distribution. The differential entropy of thisGaussian random vector is equal to the differential entropy of a Gaussian randomvector that is rotationally invariant with the same covariance matrix modified bya correction term (12

ilog(1 2i

), cf. Theorem 3.21) that comes from the

non-vanishing pseudo-covariance matrix. We will show in the following that thiscorrection term is a very fundamental quantity that does not only apply to the differ-ential entropy of a Gaussian distributed random vector or that is valid for the upperbound of Theorem 3.21. To be more precise, we will show that this correction termis independent of the actual distribution, i.e., it always measures the deviation ofthe differential entropy from the ideal rotationally invariant differential entropy,no matter according to what law the random vector is distributed. Of course, wehave to define carefully what we mean by ideal rotationally invariant differen-tial entropy for an arbitrarily distributed random vector, but we will see quite soonthat this can be done in a very intuitive and natural way. But before we proceed inthis direction, we will introduce the concept of a widely affine transformation:

Definition 3.22 A mapping

A : Cn Cm, x 7 A(x)is called widely affine transformation or widely affine mapping if there exist twomatrices A1,A2 Cmn and a vector b Cm, such that

A(x) = A1x+A2x + b, x Cn.


Note that this definition is a modification of the definition of a widely linear trans-formation / mapping that can be found in literature (see e.g. [33, 44]). The differ-ence is the additional translation vector b.

A widely affine transformation can be equivalently described by an affine trans-formation on the real and imaginary part level:

Theorem 3.23 A mapping

A : Cn Cm, x 7 A(x)is a widely affine transformation if and only if there exist a matrix A R2m2nand a vector b Cm, such that

A (x) = Ax+ b, x Cn.Proof. We have (using Lemma 3.5),

A (x) = A1x+ A2x + b

= A1x+ A2

[In 00 In

]x+ b

= A1x+ A2x+ b (3.19)

=[


distribution law of the given random vector. For a given Gaussian random vectoreverything is clear: the associated random vector will be Gaussian as well and isfully specified by mean vector, covariance matrix, and rotationally invariance. Butwhat about a random vector that is not Gaussian?

Let us look again at the Gaussian case. On the real and imaginary part level wedeal with two Gaussian random vectors with the same mean vector but with dif-ferent covariance matrices (in the complex domain both random vectors have thesame covariance matrix, but one random vector has a vanishing pseudo-covariancematrix whereas the other has a non-vanishing covariance matrix). Recalling thataffine transformations of Gaussian random vectors are again Gaussian (see e.g.[34]) we can view one vector to be an affine transformed version of the other. It isa consequence of the theory of generalized Cholesky factors that it is possible, cf.also Theorem 3.27, to construct an affine transformation (on the real and imaginarypart level or a widely affine transformation on the complex level) that transformsone covariance matrix into another. This observation together with the fact that a(widely) affine transformation preserves the character of a random vector also inthe non-Gaussian case is used for the definition of a canonically associated rota-tionally invariant random vector to a given rotationally variant random vector:

Definition 3.24 Let y Cn denote a complex random vector with mean vectory, covariance matrix Cy, and pseudo-covariance matrix Py. A complex randomvector x Cn is called rotationally invariant analogon of y, if its mean vector,covariance matrix, and pseudo-covariance matrix satisfy

x = y, Cx = Cy, Px = 0,

and there exists a widely affine transformation

A : Cn Cn, x 7 A(x) ,

such that y and A(x) are identically distributed [21].

Since we do not want to exclude the existence of a rotationally invariant analogonin advance, the mapping A is allowed to be widely affine and not only affine.According to Corollary 3.3, an affine A would imply Py = 0. Furthermore, inorder to ensure x = y the mapping A is allowed to be widely affine and notonly widely linear.

We want to emphasize that a rotationally invariant analogon of a given complexrandom vector is not uniquely defined: suppose x denotes a rotationally invariantanalogon of a complex random vector y. Then, for all [0, 2pi[ ( is fixed), therandom vectors x = ex+

(1 e)x are rotationally invariant analogons of y

as well. To see this, observe that x and x have the same mean vectors, covariancematrices, and pseudo-covariance matrices (equal to the zero matrix), respectively,and that the widely affine transformations A are obtained from the widely affinetransformation A via A (x) = A

(ex +

(1 e)x).


In the following we will deal with the existence of a rotationally invariant anal-ogon, i.e., with the existence of the widely affine transformation of Definition 3.24.First of all observe that invertible affine transformations do not change any exis-tence statements regarding rotationally invariant analogons:

Proposition 3.25 Suppose the random vector x Cn is a rotationally invariantanalogon of the random vector y Cn. Then, for any invertible matrix M Cnn and any vector c Cn (both deterministic), the random vector Mx+ c isa rotationally invariant analogon of the random vector My + c.

Proof. We have, cf. Proposition 3.1,

Mx+c =Mx + c =My + c = My+c,

CMx+c =MCxMH =MCyMH = CMy+c,

PMx+c =MPxMT = 0,

so that it remains to show the existence of the widely affine transformation of Def-inition 3.24. From Definition 3.24, there exists a widely affine transformation

A : Cn Cn, x 7 A(x) ,

such that A(x) and y are identically distributed. Then the mapping

A : Cn Cn,

defined by

A(x) =MA(M1x M1c)+ c, x Cn,

is a widely affine transformation that satisfies

A(Mx+ c) =MA(x)+c,

so that A(Mx+ c) and My + c are identically distributed.Next, we will prove the existence of a rotationally invariant analogon for a simplespecial case:

Lemma 3.26 Let y Cn denote a zero-mean complex random vector with co-variance matrix and pseudo-covariance matrix

Cy = In and Py =

1 0. . .0 n

with 0 i < 1,respectively. Then there exists a rotationally invariant analogon x of y.


Proof. According to (3.5) we have,

Cy =12

1 + 1 0. . .

1 + n1 1

. . .0 1 n

,

whereas a rotationally invariant analogon x would satisfy

Cx =12I2n.

Let A R2n2n denote the (invertible) matrix

A =

1 + 1 0

. . . 1 + n

1 1. . .

01 n

.

Then the random vector x defined (from y) via

x = A1y

is a rotationally invariant analogon of y. To see this consider the widely affine10

transformationA : Cn Cn, x 7 A(x) ,

defined by (apply Theorem 3.23)

A (x) = Ax, x Cn.

Note that i < 1 is mandatory for the validity of this proof, because otherwisethe inverse A1 would not exist. However, if we are given a rotationally invariantrandom vector x (with Cx = 12I2n) and consider the random vector y that isobtained from x via the same widely affine transformation y = A(x) as in theproof, then x is a rotationally invariant analogon of y also in the case i = 1(i {1, . . . , n}). This fact is very natural and is the reason why the widely affinetransformation of Definition 3.24 is chosen to be a mapping from the rotationallyinvariant random vector to the rotationally variant random vector and not from the

10 It is even a widely linear transformation.


rotationally variant random vector to the rotationally invariant random vector. Wewant to mention that it is even possible to prove Lemma 3.26 allowing i = 1i = 1, . . . , n. Since this special case is not relevant for our work, we omit themore sophisticated proof.

Finally, we present to following (general) existence theorem regarding rota-tionally invariant analogons:

Theorem 3.27 Suppose the complex random vector y Cn has a non-singularcovariance matrix Cy and a pseudo-covariance matrix Py. Let By be a general-ized Cholesky factor ofCy and assume that the singular values ofBy1PyByT,are smaller than 1. Then there exists a rotationally invariant analogon x of y.

Proof. According to Proposition 3.25 we can assume (maintaining full generality)that y is a zero-mean random vector. Consider Takagis factorization (Theorem3.9) of the pseudo-covariance matrix of the random vector y = By1y, i.e.,

QyyQTy = By1PyByT.

The random vector y = Q1y y = Q1y By

1y has a covariance matrix Cy =In and a (diagonal) pseudo-covariance matrix Py = y with entries within theinterval [0, 1[. A combination of Lemma 3.26 with Proposition 3.25 (M = ByQyand c = 0) shows the existence of a rotationally invariant analogon x of y and,hence, concludes the proof.

With the introduced framework we are now in the position to formulate thepromised theorem that quantifies the deviation of the differential entropy of a ro-tationally variant random vector from the differential entropy of its rotationallyinvariant analogon11 independently of the actual probability distribution:

Theorem 3.28 Suppose the complex random vector y Cn has a non-singularcovariance matrix Cy and a pseudo-covariance matrix Py. Let By be a general-ized Cholesky factor ofCy and let i denote the singular values ofBy1PyByT,which must be smaller than 1. Furthermore, let x denote a rotationally invariantanalogon of y (which exists according to Theorem 3.27). Then the differentialentropy of y satisfies

h(y) = h(x) +12

i

log(1

Documents

Diss Tauboeck