535
TECHNICAL REPORTS IN COMPUTER SCIENCE Technische Universität Dortmund Technische Universität Dortmund — Fakultät für Informatik Otto-Hahn-Str. 14, 44227 Dortmund

TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

T E C H N I C A L R E P O RT S I N C O M P U T E R S C I E N C E

Technische Universität Dortmund

Technische Universität Dortmund — Fakultät für InformatikOtto-Hahn-Str. 14, 44227 Dortmund

kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
A Companion to Mathematics for Computer Science: Sets, Categories, Topologies, Measures Ernst-Erich Doberkat [email protected] Number 847 April 2015
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
kossmann
Schreibmaschinentext
Page 2: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

ii

Page 3: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Contents

Preface vi

1 The Axiom of Choice and Some Of Its Equivalents 11.1 The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Cantor’s Enumeration of N× N . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Well-Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Ordinal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5 Zorn’s Lemma and Tuckey’s Maximality Principle . . . . . . . . . . . . . . . 21

1.5.1 Compactness for Propositional Logic . . . . . . . . . . . . . . . . . . . 231.5.2 Extending Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5.3 Bases in Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.5.4 Extending Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . 281.5.5 Maximal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.5.6 Ideals and Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.5.7 The Stone Representation Theorem . . . . . . . . . . . . . . . . . . . 371.5.8 Compactness and Alexander’s Subbase Theorem . . . . . . . . . . . . 40

1.6 Boolean σ-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461.6.1 Construction Through Transfinite Induction . . . . . . . . . . . . . . . 501.6.2 Factoring Through σ-Ideals . . . . . . . . . . . . . . . . . . . . . . . . 511.6.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521.6.4 µ-Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

1.7 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651.7.1 Determined Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671.7.2 Proofs Through Games . . . . . . . . . . . . . . . . . . . . . . . . . . 69

1.8 Wrapping it up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2 Categories 792.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802.2 Elementary Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

2.2.1 Products and Coproducts . . . . . . . . . . . . . . . . . . . . . . . . . 932.2.2 Pullbacks and Pushouts . . . . . . . . . . . . . . . . . . . . . . . . . . 100

2.3 Functors and Natural Transformations . . . . . . . . . . . . . . . . . . . . . . 1052.3.1 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1052.3.2 Natural Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 112

iii

Page 4: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

iv CONTENTS

2.3.3 Limits and Colimits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

2.4 Monads and Kleisli Tripels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

2.4.1 Kleisli Tripels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

2.4.2 Monads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

2.4.3 Monads in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

2.5 Adjunctions and Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

2.5.1 Adjunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

2.5.2 Eilenberg-Moore Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 138

2.6 Coalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

2.6.1 Bisimulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

2.6.2 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

2.7 Modal Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

2.7.1 Frames and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

2.7.2 The Lindenbaum Construction . . . . . . . . . . . . . . . . . . . . . . 180

2.7.3 Coalgebraic Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

2.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

3 Topological Spaces 207

3.1 Defining Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

3.1.1 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

3.1.2 Neighborhood Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

3.2 Filters and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

3.3 Separation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

3.4 Local Compactness and Compactification . . . . . . . . . . . . . . . . . . . . 234

3.5 Pseudometric and Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 240

3.5.1 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

3.5.2 Baire’s Theorem and a Banach-Mazur Game . . . . . . . . . . . . . . 261

3.6 A Gallery of Spaces and Techniques . . . . . . . . . . . . . . . . . . . . . . . 265

3.6.1 Godel’s Completeness Theorem . . . . . . . . . . . . . . . . . . . . . . 266

3.6.2 Topological Systems or: Topology Via Logic . . . . . . . . . . . . . . . 270

3.6.3 The Stone-Weierstraß Theorem . . . . . . . . . . . . . . . . . . . . . . 282

3.6.4 Uniform Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

3.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

4 Measures for Probabilistic Systems 311

4.1 Measurable Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 313

4.1.1 Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

4.1.2 A σ-Algebra On Spaces Of Measures . . . . . . . . . . . . . . . . . . . 318

4.1.3 Case Study: Stochastic Effectivity Functions . . . . . . . . . . . . . . 320

4.1.4 The Alexandrov Topology On Spaces of Measures . . . . . . . . . . . 334

4.2 Real-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

4.2.1 Essentially Bounded Functions . . . . . . . . . . . . . . . . . . . . . . 344

4.2.2 Convergence almost everywhere and in measure . . . . . . . . . . . . . 346

4.3 Countably Generated σ-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 351

4.3.1 Borel Sets in Polish and Analytic Spaces . . . . . . . . . . . . . . . . . 357

Page 5: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

CONTENTS v

4.3.2 Manipulating Polish Topologies . . . . . . . . . . . . . . . . . . . . . . 3614.4 Analytic Sets and Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

4.4.1 Souslin’s Separation Theorem . . . . . . . . . . . . . . . . . . . . . . . 3674.4.2 Smooth Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . 370

4.5 The Souslin Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3764.6 Universally Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

4.6.1 Lubin’s Extension through von Neumann’s Selectors . . . . . . . . . . 3844.6.2 Completing a Transition Kernel . . . . . . . . . . . . . . . . . . . . . . 387

4.7 Measurable Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3914.8 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393

4.8.1 From Measure to Integral . . . . . . . . . . . . . . . . . . . . . . . . . 3944.8.2 The Daniell Integral and Riesz’s Representation Theorem . . . . . . . 399

4.9 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4054.9.1 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4104.9.2 Infinite Products and Projective Limits . . . . . . . . . . . . . . . . . 4134.9.3 Case Study: Continuous Time Stochastic Logic . . . . . . . . . . . . . 4194.9.4 Case Study: A Stochastic Interpretation of Game Logic . . . . . . . . 426

4.10 The Weak Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4424.10.1 The Hutchinson Metric . . . . . . . . . . . . . . . . . . . . . . . . . . 4484.10.2 Case Study: Eilenberg-Moore Algebras for the Giry Monad . . . . . . 4504.10.3 Case Study: Bisimulation . . . . . . . . . . . . . . . . . . . . . . . . . 4524.10.4 Case Study: Quotients for Stochastic Relations . . . . . . . . . . . . . 460

4.11 Lp-Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4664.11.1 A Spoonful Hilbert Space Theory . . . . . . . . . . . . . . . . . . . . . 4674.11.2 The Lp-Spaces are Banach Spaces . . . . . . . . . . . . . . . . . . . . 4714.11.3 The Lebesgue-Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . 4764.11.4 Continuous Linear Functionals on Lp . . . . . . . . . . . . . . . . . . . 4824.11.5 Disintegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

4.12 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4934.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

List of Examples 498

References 506

Index 513

Page 6: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

vi CONTENTS

Page 7: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Preface

The idea to write this book came to me when, after having taught an undergraduate courseon Concrete Mathematics using the wonderful eponymous book [GKP89], I wanted to do agraduate course on Markov transition systems for computer scientists. It turned out that Ihad to devote most of the time to lay the foundations from sets, measures and topology, andthat I could not find an adequate textbook to recommend to my students. This contrastssignificantly to the situation in other areas, such as the analysis of algorithms, where manyvery fine textbooks are available. Consequently I had to dig through the mathematical lit-erature, taking pieces from here and there, in effect trying to nail a firm albeit makeshiftmathematical scaffold the students could stand on securely.

Looking at the research literature in this area and in related fields, one also finds a lack ofquotable resources. Each author has to construct her or his own foundations in order to getgoing, wasting considerable effort to prove the same lemmata over and over again.

So the plan for this book grew. I decided to focus on sets, topologies, categories and measures.Let me tell you, why.

Sets and the Axiom of Choice. Sets are a universal tool for computer scientists, thetool which has been imported as a lingua franca from mathematics. When surveying thecomputer science literature, we see that sets and the corresponding constructs like maps,power sets, orders etc. are being used freely, but there is usually no concern regarding theaxiomatic basis of these objects — sets are being used, albeit in a fairly naive way. This shouldnot surprise anybody, because they are just tools and most of the time not the objects ofconsideration themselves. Fairly early in the education of the computer scientist, however, sheor he encounters the phenomenon of recursion, be it as a recursive function, be it as a recursivedefinition. And here the question arises immediately, why the corresponding constructs work,specifically, how one can be sure that a particular recursive program is actually terminating.The same question, probably a little bit more focused, appears in techniques which are relatedto term rewriting. Here one inquires whether a particular chain of replacements will actuallylead to a result in a finite amount of time. People in term rewriting have found a way ofwriting this down, namely a terminating condition which is closely related to some well-ordering. This means that there are no infinitely long chains, which is of course a very similarcondition to the one that is encountered when talking about the termination of a recursiveprocedure: Here one does not want to have infinitely long chains of procedure or method calls.This suggests structural similarities between the invocation of a recursive method terminates,and rewriting a term.

vii

Page 8: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

viii CONTENTS

When investigating the background in front of which all this happens, we1 find that we needto look at well-orderings. These are orderings which forbid the existence of infinitely longdecreasing chains. Do well-orderings always exist? This question is of course fairly easy toanswer when we talk about finite scenarios, but sometimes it is mandatory to consider infiniteobjects as well. The world may be finite, but our models of the world are not always. Hencethe question arises whether we can take an arbitrary set and construct a well-ordering for it.As it turns out, this question is very closely connected with another question, which at firstglance does not really look related at all: We are given a collection of non-empty sets, are weable to select from each set exactly one element? One of the interesting points which indicatesthat things are probably a little bit more complicated than they look is the observation thatthe possibility of well-ordering arbitrary set is equivalent to the question of selection, whichcame to be known as the Axiom of Choice. It turned out during the discussion in mathematicsthat there is really a whole and very full bag of other properties, which are equivalent to thisaxiom. We will see that the Axiom of Choice is equivalent to some well-known proof principleslike Zorn’s Lemma or Tuckey’s Maximality Principle. Because this discussion relating theAxiom of Choice and similar constructions has been raging in mathematics for more thanone century now, we can not hope to be able to even completely list all those things whichwe have to leave out. Nevertheless we try to touch upon some topics, which appear to beimportant for developing mathematical structures within computer science. We can evenshow that some results do not hold, if another path is pursued and the Axiom of Choice isreplaced by another one; this refers to a game theoretic scenario, which is of course of interestto computer scientists as well.

Because the discussion on the Axiom of Choice touches upon so many areas in mathematics,it gives the opportunity to pay a visit to some of them. In this sense the Axiom of Choice isa peg on which we hang our walking stick.

Categories. Many areas of Mathematics show surprising structural similarities, which sug-gests that it might be interesting and helpful to focus on an abstract view, unifying concepts.This abstract view looks at the mathematical objects from the outside and studies the rela-tionship between them, for example groups (as objects) and homomorphisms (as an indicatorof their relationship), or topological spaces together with continuous maps, or ordered setswith monotone maps, or . . .. It leads to the general notion of a category. A category is basedon a class of objects together with morphisms for each two objects, which can be composed;composition follows some laws which are considered evident and natural.

This is an approach which has considerable appeal to a software engineer as well. In softwareengineering, the implementation details of a software system are usually not particularlyimportant from an architectural point of view, they are encapsulated in a component. Incontrast, the relationship of components to each other is of interest because this knowledge isnecessary for composing a system from its components. Roughly speaking, the architecture

1For the usage of the first person plural in this book, let me quote William Goldbloom Bloch. He writesin his enjoyable book on Borges’ mathematical ideas in a similar situation: “This should not be construed asa ’royal we.’ It has been a construct of the community of mathematicians for centuries and it traditionallysignifies two ideas: that ’we’ are all in consultation with each other through space and time, making use of eachother’s insights and ideas to advance the ongoing human project of mathematics, and that ’we’ — the authorand reader — are together following the sequences of logical ideas that lead to inexorable, and sometimespoetic, conclusions.” [Blo08, p. 19]

Page 9: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

CONTENTS ix

of a software system is characterized both through its components and their interaction, thestatic part of which can be described through what we may perceive as morphisms.

This has been recognized fairly early in the software architecture community, witnessed bythe April 1995 issue of the IEEE Transactions on Software Engineering. It was devoted tosoftware architecture and demonstrated that formalisms from category theory in discussingarchitectures are very helpful for clarifying structures. So the language of categories offerssome attractions to software engineers. We will also see that the tool set of modal logics,another area which is important to software construction, profits substantially from construc-tions which are firmly grounded in categories.

We are going to discuss categories here and to introduce the reader to the basic constructions.The world of categories is too rich to be captured in these few pages, so we have made anattempt to provide the reader with some basic proficiency in categories, helping in learningher or him to get a grasp on the basic techniques. This modest goal is attained by blendingthe abstract mathematical development with a plethora of examples.

Topological Spaces. A topology formalizes the notion of an open set; call a set openiff each of its members leaves a little room like a breathing space around it. This givesimmediately a hint at the structure of the collection of open sets — they should be closedunder finite intersections, but under arbitrary unions, yielding the base for a calculus ofobservable properties. This approach puts its emphasis subtly away from the classic approach,e.g., in mathematical analysis or probability theory, by stressing different properties of aspace. The traditional approach, for example, stresses separation properties such as beingable to separate two distinct points through an open set. Such a strong emphasis is notnecessarily observed in the computationally oriented use of topologies, where, just to give oneexample, pseudometrics for measuring the conceptual distance between objects are importantfor finding an approximation between Markov transition systems.

We give a brief introduction to some of the main properties of topological spaces. The objec-tive is to provide the tools and methods offered by set-theoretic topology to an applicationoriented reader, thus we introduce the very basic notions of topology, and hint at applicationsof these tools. Some connections to logic and set theory are indicated. Compactness hasbeen made available very early, thus compact spaces serve occasionally as an exercise ground,before compactness with its ramifications is discussed in depth. Continuity is an importanttopic, so are the basic constructions like product or quotients which are enabled by it. Sincesome interesting and important topological constructions are tied to filters, we study filtersand convergence, comparing in examples the sometimes more easily handled nets to the occa-sionally more cumbersome filters. Talking about convergence, separation properties suggestthemselves. It happens so often that one works with a powerful concept, but that this conceptrequires assumptions which are too strong, hence one has to weaken it in a sensible way. Thisis demonstrated in the transition from compactness to local compactness; we discuss localcompact spaces, and we give an example of a compactification. Quantitative aspects enterwhen one measures openness through a pseudometric; here many concepts are seen in a new,sharper light, in particular the problem of completeness comes up. This is possible on com-plete spaces, and, even better, if a space is not complete, then you can complete it. Completespaces have some very special properties, for example the intersection of countably many open

Page 10: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

x CONTENTS

dense sets is dense again. This is Baire’s Theorem, we show through a Banach-Mazur gameplayed on a topological space that being of first category can be determined through one ofthe players having a winning strategy.

This completes the round trip of basic properties of topological spaces. We then present asmall gallery in which topology is in action. The reason for singling out some topics is thatwe want to demonstrate the techniques developed with topological spaces for some interestingapplications. For example, Godel’s Completeness Theorem for (countable) first order logichas been proved by Rasiowa and Sikorski through a combination of Baire’s Theorem andStone’s topological representation of Boolean algebras. This topic is discussed. The calculus ofobservations, which is mentioned above, leads to the notion of topological systems. This hintsat an interplay of topology and order, since a topology is after all a complete Heyting algebrain the partial order provided by inclusion. Another important topic is the approximation ofcontinuous functions by a given class of functions, like the polynomials on a closed interval,leading quickly to the Stone-Weierstraß Theorem on a compact topological space, a topicwith a rich history. Finally, the relationship of pseudometric spaces to general topologicalspaces is reflected again, we introduce uniform spaces as an ample class of spaces which ismore general than pseudometric spaces, but less general than their topological cousins. Herewe find concepts like completeness or uniform continuity, which are formulated for metricspaces, but which cannot be realized in general topological ones.

Measures for Probabilistic Systems. Markov transition systems are based on transitionprobabilities on a measurable space. This is a generalization of discrete spaces, where certainsets are declared to be measurable. So, in contrast to assuming that we know the probabilityfor the transition between two states, we have to model the probability of a transition goingfrom one state to a set of states: point-to-point probabilities are no longer available dueto working in a comparatively large space. Measurable spaces are the domains of theseprobabilities. This approach has the advantage of being more general than finite or countablespaces, but now one deals with a fairly involved mathematical structure.

We starts off with a discussion of σ-algebras, which have already been met in the chapter ofsets, and we look at the structure of σ-algebras, in particular at its generators; it turns outthat the underlying space has something to say about it. In particular we will deal with Polishspaces and their brethren. Some constructions in this area are studied, they have immediateapplications in logic, and in Markov transition systems, in which measures are vital. We showalso that we can construct measurable selections, which we use for an investigation into thestructure of quotients in the Kleisli monad, providing an interesting and fruitful example forthe interplay of arguments from measure theory and categories. This interplay is stressedalso for the investigation of stochastic effectivity functions, which leads to an interpretationof game logics.

After having laid the groundwork we construct the integral of a measurable function throughan approximation process, very much in the tradition of the Riemann integral, but with alarger scope. We also go the other way: given an integral, we construct a measure from it.This is the elegant way P. J. Daniell did propose for constructing measures, and it can bebrought to fruit in this context for a fairly simple proof of the Riesz Representation Theoremon compact metric spaces.

Page 11: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

CONTENTS xi

Having all these tools at our disposal, we look at product measures, which can be introducednow through a kind of line sweeping — if you want to measure an area in the plane, you mea-sure the line length as you sweep over the area; this produces a function of the abscissa, whichthen yields the area through integration. One of the main tools here is Fubini’s Theorem.Applications include a discussion of projective systems. A case study shows that projectivesystems arise naturally in the study of continuous time stochastic logics.

Finally, we take up a classic: Lp-spaces. We start from Hilbert spaces, apply the repre-sentation of linear functionals on L2 to obtain the Radon-Nikodym Theorem through vonNeumann’s ingenious proof, and derive from it the representation of the dual spaces.

Because we are driven by applications to Markov transition systems and similar objects, wedo not strive for the most general approach to measure and integral. We usually formulatethe results for finite or σ-finite measures, leaving the more general cases outside our focus.This means also that we do not deal with complex measures; we show, however, in which wayone could start to deal with complex measures when the occasion arises.

Things left out. So many things had to be left out. This is an incomplete list, from whichmany things had to be left out as well. For example, I do not discuss ill-founded sets inthe chapter on set theory, and I cannot do even a teeny tiny step into forcing or infinitecombinatorics. I do not cover final coalgebras or coinduction in the chapter on categories,where one will also miss an extensive discussion on limits and colimits. It would have beennice to look into hyperspaces in the chapter on topologies, or to discuss topological groupswith their Haar measure, let alone provide a glimpse at topological vector spaces. Talkingabout measures, martingales are missing, and the connections to topological measure theoryare looked at through the glasses of Polish or analytic spaces.

But, alas, many a choice had to be made, and since I am a confessing Westphalian, I quotethis proverb

Wat dem een sin Uhl, is dem anneren sin Nachtigal.

So I tried to incorporate topics which to me seem useful.

Organization. Each chapter derives its content in the usual strict mathematical way, withproofs and all that. It belongs certainly to the education of a computer scientist of the the-oretical variety to carry out proofs, and not to rely on good faith or on the well-known artof handwaving. The development is supported by many examples, some motivating, someprovided because they are mathematically interesting, but most of them oriented towardsapplications in computer science. Larger examples are dubbed case studies. They appearinteresting also from a modelling point of view as well as from their application of the math-ematical techniques at hand. The same applies to exercises, which are collected at the endof each chapter. Bibliographic notes give usually the source for some particular approach, aproof, or an idea which is pursued; they give also hints where further information may befound. It has not always been easy to attribute a development to a particular paper, book, orauthor, since folklore quickly spreads, results and ideas are amended or otherwise modified,sometimes obscuring the true originators. Thus the author apologizes if results could notalways be attributed properly, or not at all.

Page 12: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

xii CONTENTS

How to read this book. This is a book with an intended audience which is somewhatadvanced, hence it seems to be a bit out of place to give suggestions how to read it. Theadvice to the reader is just to take what she or he needs, do the exercises, and if this is notenough, to look for further information in the literature. The author took some pains toprovide an ample list of references. Good luck.

Thanks. Finally, I want to thank J. Bessai, B. Fuchssteiner, H. G. Kellerer, E. O. Omodeo,P. Panangaden, D. Pumplun, H. Sabadhikary & S. M. Srivastava, P. Sanchez Terraf, and F.Stetter. They helped in opening some doors — mathematically or otherwise — for me, madesome insightful comments, or gave me a helping hand which, sometimes in the transitiveclosure, had some impact on this book. The Deutsche Forschungsgemeinschaft supported myresearch into algebraic and coalgebraic properties of stochastic relations for more than tenyears; some results of this work could be used for this book. Stefan Dissmann and Alla Stank-jawitschene, my former secretary, helped clearing my path and taking many administrativeobstacles off my back.

The cooperation with Dr. Mario Aigner and Sonja Gasser of Springer-Verlag was constructiveand helpful. I want to thank them all.

Above all I owe my thanks to Gudrun for all her love and understanding. I devote this bookto our youngest granddaughter Nina Luise; she will probably like the idea of playing withsymbols.

Bochum, April 2015Ernst-Erich Doberkat

Page 13: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Chapter 1

The Axiom of Choice and Some OfIts Equivalents

Sets are a universal tool for computer scientists, the tool which has been imported as alingua franca from mathematics. Program development, for example, starts sometimes froma mathematical description of the things to be done, the specification and the data structuresand — you guess it — sets are the language, in which these first designs are usually writtendown. There is even a programming language called SETL based on sets [SDDS86], thislanguage served as a prototyping tool, its development having been essentially motivated bythe ambition to make the road from a formal description of an object to its representationthrough an executable program as short as possible, see [CFO89, COP01], and for practicalissues [DF89].

In fact, it turned out that programming in what might be called executable set theory hasthe advantage of being able to experiment with the objects at hand, leading, for exampleto the first implementation of the programming language Ada, the implementation of whichwas deemed for quite a time as nearly impossible. On the other hand it turned out thatsets may be a feature nice to have in a programming language, but that they are probablynot always the appropriate universal data structure for engineering program systems. This iswitnessed by the fact that some languages, like for example Haskell [OGS09], have set-likeconstructs such as list comprehension, but they do not implement sets fully. As the case maybe, sets are important objects when arguing about programs. They constitute an importantcomponent of the tool kit which a serious computer scientist should have ready in her or hisbackpack.

When surveying the computer science literature, we see that sets and the correspondingconstructs like maps, power sets, orders etc. are being used freely, but there is usually noconcern regarding the axiomatic basis of these objects — sets are being used, albeit in a fairlynaive way. This should surprise nobody, because they are just tools and most of the time notthe objects of consideration themselves. A tool should be handy and come to the use of acomputer scientist as soon as needed, but it really should not bring with it complications of itsown. Fairly early in the education of the computer scientist, however, she or he encounters thephenomenon of recursion, be it as a recursive function, be it as a recursive definition. And hereof course the question arises immediately, why the corresponding constructs work, specifically,

1

Page 14: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

how one can be sure that a particular recursive program is actually terminating. The samequestion, probably a little bit more focused, appears in techniques which are related to termrewriting. Here one inquires whether a particular chain of replacements will actually lead toa result in a finite amount of time. People in term rewriting have found a way of writing thisdown, namely a terminating condition which is closely related to some well-ordering. Thismeans that we do not have infinitely long chains, which is of course a very similar conditionto the one that is encountered when talking about the termination of a recursive procedure:Here we do not want to have infinitely long chains of procedure or method calls. This suggestsstructural similarities between the invocation of a recursive method terminates, and rewritinga term. If you think about it, mathematical induction enters this family of observations, themembers of which show a considerable similarity.

When we investigate the background in front of which all this happens, we find that we needto look at well-orderings. These are orderings which forbid the existence of infinitely longdecreasing chains. It turns out that the mathematical ideas expressed here are fairly closelyconnected to ordinal numbers. It is not difficult to construct a bridge from orderings and well-orders to the question whether it is actually possible to find a well-order for each and everyset. The bridge a computer scientist might traverse is loosely described as follows: Becausewe want to be able to deal with arbitrary objects, and because we want to run programs withthese arbitrary objects, it should be possible to construct terminating recursive methods forthose objects. But in order to do that, we should make sure that no infinite chains of methodinvocations may occur, which in turn poses the question whether or not we can impose anorder on these objects that renders infinite chains impossible (admittedly somewhat indirectly,because the order is imposed actually by procedure calls). But here we are — we want toknow whether such a construction is possible; mathematically this leads to the possibility ofwell-ordering each and every set.

This question is of course fairly easy to answer when we talk about finite scenarios, butsometimes it is mandatory to consider infinite objects as well. The world may be finite,but our models of the world are not always. Hence the question arises whether we cantake an arbitrarily large set and construct a well-ordering for this set. As it turns out, thisquestion is very closely connected with another question, which at first glance does not reallylook similar at all: We are given a collection of non-empty sets, are we able to select fromeach set exactly one element? This question has vexed mathematicians for more than onecentury now. One of the interesting points, which indicates that things are probably a littlebit more complicated than they look, is the observation that the possibility of well-orderingarbitrary set is equivalent to the question of selection, which came to be known as the Axiomof Choice. It turned out during the discussion in mathematics that there is really a wholeand very full bag of other properties, which are equivalent to this axiom. We will see thatthe Axiom of Choice is equivalent to some well-known proof principles like Zorn’s Lemmaor Tuckey’s Maximality Principle. Because this discussion relating the Axiom of Choice andsimilar constructions has been raging in mathematics for more than one century now, wecan not hope to be able to even completely list all those things which we have to leave out.Nevertheless we try to touch upon some topics, which appear to be important for developingmathematical structures within computer science.

Since the Axiom of Choice and its variants touch upon those topics in mathematics muchin use in computer science, this gives us the opportunity to select some of these topics and

Page 15: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3

discuss them, independently and in the light of the use of the Axiom of Choice. We discuss forexample lattices, introduce ideals and filters and pose the maximality question: Is it alwayspossible to extend a filter to a maximal filter? It turns out that the answer is in the affirmative,and this has some interesting applications for the structure theory of, for example, Booleanalgebras. Because of this we are able to discuss one of the true classics in this area, namelyStone’s Representation Theorem, which says that every Boolean algebra is isomorphic to analgebra of sets. Another interesting application of Zorn’s Lemma is Alexander’s Theorem,which shows that one may restrict one’s attention to covering a topological space with subbaseelements for establishing compactness of the space. Because we have then compactness at ourdisposal, we establish also compactness of the space of all prime ideals of a Boolean algebra.Quite apart from these question, which are oriented towards order structures, we establishthe Hahn-Banach Theorem, which shows that a dominated linear functional can be extendedfrom a linear subspace to the entire space in a dominated way.

A particular class of Boolean algebras are closed even under countable infima and suprema,these algebras are called σ-algebras. Since these algebras are interesting in particular when itcomes to probabilistic models in computer science, we treat these σ-algebras in some detail, inparticular with respect to measures and their extensions. The general situation in applicationis sometimes that one has the generator of a Boolean σ-algebra and a set function whichbehaves decently on this generator, and one wants to extend this set function to the wholeσ-algebra. This gives rise to a fairly rich and interesting construction, which in turn hassome connections to the question of the Axiom of Choice. The extension process extends themeasure far beyond the Boolean σ-algebra generated by the family under consideration, andthe question arises how far this extension really goes. This may be of interest, e.g., if onewants to measure some set, so that one has to know whether this set can be measured at all,hence whether it is actually in the domain of the extended measure. The Axiom of Choicehelps in demonstrating that this is not always possible. It can be shown that there are setswhich can not be measured. This depends on a selection argument for classes of an easilyconstructed equivalence relation.

This will be discussed. Then we turn to games, games as a model for human interaction,in which two players, Angel and Demon, play against each other. We describe how a gameis played, and what strategies are. In particular we say what constitutes winning strategies.This is done first in the context of infinite sequences of natural numbers. The model has theadvantage of being fairly easy to grasp, it has the additional structural advantage that wecan map many applications to this scenario.

Actually, games become really interesting when we know that one of the participants hasactually a chance to win. Hence we postulate that our games are of this kind, so that alwayseither Angel or Demon has a strategy to win the game. Unfortunately it turns out that thispostulate, called the Axiom of Determinacy, is in conflict with the Axiom of Choice. This isof course a fairly unpleasant situation, because both axioms appear as reasonable statements.So we have to see what can be done about this. We show that if we assume the Axiomof Determinacy, we can actually demonstrate that each and every subset of the real line ismeasurable. This is a contradiction to the observation we just related.

This discussion serves two purposes. The first one is that one sometimes wants to challengethe Axiom of Choice in favor of other postulates, which may turn out to have more advantages(in the context of games, the postulate that one of the players has a winning strategy, no

Page 16: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

matter how the game is constructed, has certainly some advantageous aspects). But theAxiom of Choice is, as we will see, quite a fundamental postulate, so one has to find a balancebetween both. This does look terribly complicated, but on the other hand does not seemto be difficult to manage from a practical point of view — and computer scientists are bydefinition practical people! The second reason for introducing games and for elaborating onthese results is to demonstrate that games can actually be used as tools for proofs. Thesetools are used in some branches of mathematics quite extensively, and it appears that thismay be an attractive choice for computer scientists as well.

We work usually in what is called naive set theory, in which sets are used as a formal manner ofspeaking without much thinking about it. Sets are just tools to formally express ideas.

When mathematicians and logicians like G. Frege, G. Cantor or B. Russell thought aboutthe basic foundations of mathematics, they found a huge pile of unposed and unansweredquestions about the basic building blocks of mathematics, e.g., the definition of a cardinalnumber was usually taken for granted, without a formal foundation; a foundation was evenresisted or ridiculed1.

The Axioms of ZFC. Nevertheless, at around the turn of the century there seems to havebeen some consensus among mathematicians that the following axioms are helpful for theirwork; they are called the Zermelo-Fraenkel System With Choice (ZFC) after E. Zermelo andA. A. Fraenkel.

We will discuss them briefly and informally now. Here they are.

Extensionality Two sets are equal iff they contain the same elements. This requires thatsets exist, and that we know which elements are contained in them; usually these notions(set, element) are taken for granted.

Empty Set Axiom There is a set with no elements. This is of course the empty set, denotedby ∅.

Axiom of Pairs For any two sets, there exists a set whose elements are precisely thesesets. From the extensionality axiom we conclude that this set is uniquely determined.Without the axiom of pairs it would be difficult to construct maps. Hence we canconstruct sets like a, b and singleton sets a (because the axiom does not talk aboutdifferent elements, so we can form the set a, a, which, by the axiom of extensionality,equals the set a). We can also define an ordered pair through 〈a, b〉 := a, a, b.

Axiom of Separation Let ϕ be a statement of the formal language with a free variable z.For any set x there is a set containing all z in x for which ϕ(z) holds. This permitsforming sets by describing the properties of their elements. Note the restriction “Forany set x”; suppose we drop this and postulate “There is a set containing all z forwhich ϕ(z) holds.” Let ϕ(z) be the statement z 6∈ z, then we would have postulatedthe existence of the set a := z | z 6∈ z (is a ∈ a?). Hence we have to be a bit moremodest.

1Frege’s position, for example, was considered in the polemic by J. K. Thomae, “Gedankenlose Denker,eine Ferienplauderei” (Thinkers without a thought, a causerie for the vacations). Jahresber. Deut. Math.Ver.15, 1906, 434 - 438 as somewhat hare-brained, see Thiel’s treatise [Thi65]

Page 17: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

5

Power Set Axiom For any set x there exists a set consisting of all subsets of x. This set iscalled the power set of x and denoted by P (x). Of course, we have to define the notionof a subset, before we can determine all subsets of a set x. A set u is called a subset ofset x (in symbols u ⊆ x) iff every element of u in a element of x.

Union Axiom For any set there is a set which is the union of all elements of x. This setis denoted by

⋃x. If x contains only a handful of elements like x = a, b, c, we write⋃

x as a ∪ b ∪ c. The notion of a union is not yet defined, although intuitively clear.We could rewrite this axiom a little by stating it as: given a set x, there exists a set ysuch that w ∈ y iff there exists a set a with a ∈ x and w ∈ a. The intersection of twosets a and b can then be defined through the axiom of separation with the predicateϕ(z) := z ∈ a and z ∈ b, so that we obtain a ∩ b := z ∈ a ∪ b | z ∈ a and z ∈ b.

This is the first group of axioms which are somewhat intuitive. It is possible to build fromit many mathematical notions (like maps with their domains and ranges, finite Cartesianproducts). But it turns out that they are not yet sufficient, so an extension to them isneeded.

Axiom of Infinity There is an inductive set. This means that there exists a set x with theproperty that ∅ ∈ x and that y ∪ y ∈ x whenever y ∈ x. Apparently, this permitsbuilding infinite sets.

Axiom of Replacement Let ϕ be a formula with two arguments. If for every a thereexists exactly one b such that ϕ(a, b) holds, then for each set x there exists a set y whichcontains exactly those elements b for which ϕ(a, b) holds for some a ∈ x. Intuitively, ifwe can find for formula ϕ for each a ∈ x exactly an element b such that ϕ(a, b) is true,then we can collect all these elements b in a set. Let ϕ be the formula ϕ(x, y) iff x is aset and y = P (x), then there exists for a given family x of sets the set of all powersetsP (a) with a ∈ x.

Axiom of Foundation Every set contains an ∈-minimal element. Sets contain other setsas elements, as we have seen, so there might be the danger that a situation like a ∈ b ∈c ∈ a occurs, hence that there is a ∈-cycle. In some situations this might be desirable,but not in this very basic scenario, where we try to find a fixed ground to work on. Aformal description of this axiom reads that for each set x there exists a set y such thaty ∈ x and x∩ y = ∅. We will have to deal with a very similar property when discussingordinal numbers in Section 1.4.

Now we have recorded some axioms which put the ground on our daily work, to be usedwithout any qualms. It permits to build up the mathematical structures like relations, maps,injectivity, surjectivity, what have you. We will not do this here (it gets somewhat boringafter a time if one is not after some special effect — then it may become awfully hard) butrather trust that these structures are available and familiar.

But there still is a catch: look at the argumentation in the following proposition whichconstructs some sort of an inverse for a surjective map.

Proposition 1.0.1 There exists for each surjective map f : A → B a function g : B → Asuch that (f g)(b) = b for all b ∈ B.

Proof For each b ∈ B, the set f−1[b]

is not empty, because f is surjective. Thus we can

Page 18: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

6 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

pick for each b ∈ B an element g(b) ∈ f−1[b]. Then g : B → A is a map, and f(g(b)) = b

by construction. aa is end ofproof.

Where is the catch? The proof seems to be perfectly innocent and straightforward. Wesimply have a look at all the inverse images of elements of the image set B, all these inverseimages are not empty, so we pick from each of these inverse images exactly one element andconstruct a map from this.

Well, the catch lies in picking an element from each member of this collection. The collectionof axioms above says nowhere that this selection is permitted (now you might think thatmathematicians find a sneaky way of permitting such a pick, through the back door, so tospeak; trust me — they cannot!).

Hence we need some additional device, and this is the Axiom of Choice. It will be discussedat length now; we take the opportunity to use this discussion as kind of a peg onto whichwe hang some other objects as well. The general approach will be that we will discuss math-ematical objects of interest, and at crucial point the discussion of (AC) and its equivalentswill be continued (if you ever listened to a Wagner opera, you will have encountered hisleitmotifs).

1.1 The Axiom of Choice

The Axiom of Choice states that

(AC) Given a family F of non-empty subsets of someset X, there exists a function f : F → X such that

f(F ) ∈ F for all F ∈ F .

The function the existence of which is postulated by this axiom is called a choice function onF .

It is at this point not quite clear why mathematicians make such a fuss about (AC):

W. Sierpinski It is the great and ancient problem of existence that underlies the wholecontroversy about the axiom of choice.

P. Maddy The Axiom of Choice has easily the most tortured history of all set-theoreticaxioms.

T. Jech There has been some controversy about the Axiom of Choice, indeed.

H. Herrlich AC, the axiom of choice, because of its non-constructive character, is the mostcontroversial mathematical axiom, shunned by some, used indiscriminately by others.

(see [Her06]). In fact, let X = N, the set of natural numbers. If F is a set of non-emptysubsets of N, a choice function is immediate — just let f(F ) := minF . So why bother? Wewill see below that N is a special case. B. Russell gave an interesting illustration: Supposethat you have an infinite set of pairs of shoes, and you are to select systematically one shoefrom each pair. You can always take either the left or the right one. But now try the same

Page 19: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.2. CANTOR’S ENUMERATION OF N× N 7

with an infinite set of pairs of socks, where the left sock cannot be told from the right one.Then you have to have a choice function.

But we do not have to turn to socks in order to see that a choice function is helpful, we ratherprove Proposition 1.0.1 again.

Proof (of Proposition 1.0.1)Define

F := f−1[b]| b ∈ B,

then F is a collection of non-empty subsets of A, since f is onto. By assumption there existsa choice function G : F → A on F . Put g(b) := G(f−1

[b]), then f(g(b)) = b. a

So this is a pure, simple and direct application of (AC), making one wonder what applicationthe existence of a choice function will find. We’ll see.

1.2 Cantor’s Enumeration of N× N

We will deal in this section with the comparison of sets with respect to their size. We say thattwo sets A and B have the same cardinality iff there exists a bijection between them. Thiscondition can sometimes be relaxed by saying that there exists an injective map f : A → Band an injective map g : B → A. Intuitively, A and B have the same size, since the imageof each set is contained in the other one. So we would expect that there exists a bijectionbetween A and B. This is what the famous Schroder-Bernstein Theorem says.

Theorem 1.2.1 Let f : A → B and g : B → A be injective maps. Then there exists aSchroder-BernsteinTheorem

bijection h : A→ B.

Proof Define recursively

A0 := A \ g[B],

An+1 := g[f[An]]

andBn := f

[An].

If a ∈ A with a 6∈ A0, there exists a unique b =: g∗(a) such that a = g(b), because g is aninjection. Now define the map h : A→ B through

h(a) :=

f(a), if a ∈

⋃n≥0An

g∗(a), otherwise.

Assume that h(a) = h(a′). If a, a′ ∈⋃n≥0An, we may conclude that a = a′, since f is one

to one. If a ∈ An for some n and a′ 6∈⋃n≥0An, then h(a) = f(a), h(a′) = g∗(a′), hence

a′ = g(h(a′)) = g(h(a)) = g(f(a)). This implies a ∈ An+1, contrary to our assumption.Hence h is an injection. If b ∈

⋃n≥0Bn, then b = f(a) = h(a). Now let b /∈

⋃n≥0Bn.

We claim that g(b) 6∈ An for any n ≥ 0. In fact, if g(b) ∈ An with n > 0, we know thatg(b) = g(f(a)) for some a ∈ An−1, so b = f(a) ∈ f

[An−1

], contrary to our assumption. Hence

h(g(b)) = g∗(g(b)) = b. Thus h is also onto. a

Page 20: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

8 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Another proof will be suggested in Exercise 1.7 through a fixed point argument.

This is a first application of the Schroder-Bernstein Theorem.

Example 1.2.2 Let X = N. If there exists an injection P (N) → N, then the Schroder-Bernstein Theorem implies that there exists a bijection f : N → P (N), because we have theinjective map N 3 x 7→ x ∈ P (N). Now look at A := x ∈ N | x 6∈ f(x). Then there existsa ∈ N with A = f(a). But a ∈ A iff a 6∈ A, thus there cannot exist an injection P (N) → N.,

, is end ofexample.

Call a set A countably infinite there exists a bijection A → N. By the Schroder-BernsteinTheorem 1.2.1 it then suffices to find an injective map A→ N and an injective map N→ A.A set is called countable iff it is either finite or countably infinite. Example 1.2.2 tell us thatP (N) is not countable.

We will have a closer look at countably infinite sets now and show that the set of all finitesequences of natural numbers is countable; for simplicity, we work with N0 := N∪ 0.N0

We start with showing that there exists a bijection from the Cartesian product N0×N0 → N0.Cantor’s celebrated procedure for producing an enumeration for N0 ×N0 works for an initialsection as follows

〈0, 0〉 // 〈0, 1〉

zz

〈0, 2〉

zz

〈0, 3〉

zz

. . . 0 // 1

3

6

. . .

〈1, 0〉

55

〈1, 1〉

zz

〈1, 2〉

zz

〈1, 3〉

zz

. . . 2

77

4

7

11

. . .

〈2, 0〉

77

〈2, 1〉

zz

〈2, 2〉

zz

〈2, 3〉

zz

. . . 5

::

8

12

17

. . .

〈3, 0〉 〈3, 1〉 〈3, 2〉 〈3, 3〉 . . . 9 13 18 24 . . .

Define the function

J(x, y) :=

(x+ y + 1

2

)+ x,

then an easy computation shows that this yields just the enumeration scheme of Cantor’sprocedure. We will have a closer look at J now; note that the function x 7→

(x2

)increases

monotonically.

Proposition 1.2.3 J : N0 × N0 → N0 is a bijection.

Proof 1. J is injective. We show first that J(a, b) = J(x, y) implies a = x. Assume thatx > a, then x can be written as x = a+ r for some positive r, so(

a+ r + y + 1

2

)+ r =

(a+ b+ 1

2

),

hence b > r+ y, so that b can be written as b = r+ y+ s with some positive s. Abbreviatingc := a+ r + y + 1 we obtain (

c

2

)+ r =

(c+ s

2

).

Page 21: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.2. CANTOR’S ENUMERATION OF N× N 9

But because we have r < c, we get(c

2

)+ r <

(c

2

)+ c =

(c+ 1

2

)≤(c+ s

2

).

This is a contradiction. Hence x ≤ a. Interchanging the roles of x and a one obtains a ≤ x,so that x = a may be inferred.

Thus we obtain (a+ y + 1

2

)=

(a+ b+ 1

2

).

This yields the quadratic equation

y2 + 2ay − (b2 + 2ab) = 0

which has the solutions b and −(2a + b). If a = b = 0 then y = 0 = b, if b > 0, theonly non-negative solution is b, so that y = b also in this case. Hence we have shown thatJ(a, b) = J(x, y) implies 〈a, b〉 = 〈x, y〉.

2. J is onto. Define Z := J[N0 × N0

], then 0 = J(0, 0) ∈ Z and 1 = J(0, 1) ∈ Z. Assume

that n ∈ Z, so that n = J(x, y) for some 〈x, y〉 ∈ N0. We consider these cases

y > 0 : n+ 1 = J(x, y) + 1 =(x+y+1

2

)+ x+ 1 = J(x+ 1, y − 1) ∈ Z.

y = 0 : n =(x2

)+ x =

(x+1

2

), thus n+ 1 =

(x+1

2

)+ 1.

x > 0 : n+ 1 =(x+1

2

)+ 1 =

(1+(x−1)+1

2

)+ 1 = J(1, x− 1) ∈ Z.

x = 0 : Then n = 0, so that n+ 1 = J(0, 1) ∈ Z.

Thus we have shown that 0 ∈ Z, and that n ∈ Z implies n + 1 ∈ Z, from which we inferZ = N0. a

This construction permits the construction of an enumeration for the set of all non-emptysequences of elements of N0. First we have a look at sequences of fixed length. For this, defineinductively

t1(x) := x,

tk+1(x1, . . . , xk, xk+1) := J(tk(x1, . . . , xk), xk+1)

(x ∈ N0 and k ∈ N, 〈x1, . . . , xk+1〉 ∈ Nk+10 ), the idea being that an enumeration of Nk × N is Idea

reduced to an enumeration of N× N, an enumeration of which in turn is known.

Proposition 1.2.4 The maps tk are bijections Nk0 → N0.

Proof 1. The proof proceeds by induction on k. It is trivial for k = 0. Now assume that wehave established bijectivity for tk : Nk0 → N0.

2. tk+1 is injective: Assume tk+1(x1, . . . , xk, xk+1) = tk+1(x′1, . . . , x′k, x′k+1), this means

J(tk(x1, . . . , xk), xk+1) = J(tk(x′1, . . . , x

′k), x

′k+1),

hence tk(x1, . . . , xk) = tk(x′1, . . . , x

′k) and xk+1 = x′k+1 by Proposition 1.2.3. By induction

hypothesis, 〈x1, . . . , xk〉 = 〈x′1, . . . , x′k〉.

Page 22: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

10 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

3. tk+1 is onto: Given n ∈ N0, there exists 〈a, b〉 ∈ N0 × N0 with J(a, b) = n. Given a ∈ N0,there exists 〈x1, . . . , xk〉 ∈ Nk0 with tk(x1, . . . , xk) = a by induction hypothesis, so

n = J(a, b) = J(tk(x1, . . . , xk), b) = tk+1(x1, . . . , xk, b).

a

From this, we can manufacture a bijection⋃k∈N

Nk0 → N0

in this way. Given a finite sequence v of natural numbers, we use its length, say, k, as oneparameter of an enumeration of N × N, the other parameter for this enumeration is tk(v).This yields a bijection.

Proposition 1.2.5 There exists a bijection s :⋃k∈NNk0 → N0.

Proof Defines(x1, . . . , xk) := J(k, tk(x1, . . . , xk))

for k ∈ N and 〈x1, . . . , xk〉 ∈ Nk0. Because J and tk both are injective, s is injective. Given n ∈N0, we can find 〈a, b〉 ∈ N0×N0 with J(a, b) = n. Given b ∈ N0, we can find 〈x1, . . . , xa〉 ∈ Na0with ta(x1, . . . , xa) = b, so that

n = J(a, b) = J(a, ta(x1, . . . , xa)).

Hence s is also surjective. a

One wonders why we did go through this somewhat elaborate construction. First, the con-struction is elegant in its simplicity, but there is another, more subtle reason. When tracingthe arguments leading to Proposition 1.2.5 one sees that the argumentation is elementary, itdoes not require any set theoretic assumptions like (AC). But now look at this:

Proposition 1.2.6 Let An | n ∈ N0 be a sequence of countably infinite sets. Then (AC)implies that

⋃n∈N0

An is countable.

Proof We assume for simplicity that the An are mutually disjoint. Given n ∈ N0 there existsan enumeration ψn : An → N0. (AC) permits us to fix for each n such an enumeration ψn,then define

ψ :

⋃n∈N0

An → N0

x 7→ J(k, ψk(x)), if x ∈ Akwith J as the bijection defined in Proposition 1.2.3. a

This is somewhat puzzling at first; but note that the proof of Proposition 1.2.5 does notrequire a selection argument, because we are in a position to construct tk for all k ∈ N.

Having (AC), hence Proposition 1.2.6 at our disposal, one shows by induction that

Nk+10 =

⋃n∈N0

Nk0 × n

is countable for every k ∈ N. This establishes the countability of⋃k∈NNk0 immediately. On the

other hand it can be shown that Proposition 1.2.6 is not valid if (AC) is not assumed [KM76,

Page 23: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.3. WELL-ORDERED SETS 11

p. 172] or [Her06, Section 3.1]. This is also true if (AC) is weakened somewhat to postulate theexistence of a choice function for countable families of non-empty sets (which in our case wouldsuffice). The proof of non-validity, however, is in either case far beyond our scope.

1.3 Well-Ordered Sets

A relation R on a set M is called an order relation iff it is reflexive (thus xRx holds for allx ∈ M), antisymmetric (this means that xRy and yRx implies x = y for all x, y ∈ M) andtransitive (hence xRy and yRz implies xRz for all x, y, z ∈ M). The relation R is calledlinear iff one of the cases x = y, xRy or yRx applies for all x, y ∈M , and it is called strict iffxRx is false for each x ∈M . If R is strict and transitive, then it is called a strict order.

Let R be an order relation then x ∈M is called a lower bound for ∅ 6= A ⊆M iff xRz holdsfor all z ∈M and a smallest element for A iff it is both a lower bound for A and a member ofA. Upper bounds and largest elements are defined similarly. An element y is called maximaliff there exists no element x with yRx, minimal elements are defined similarly. A minimalupper bound for a set A 6= ∅ is called the supremum of A and denoted by supA, similarly, amaximal lower bound for A is called the infimum of A and denoted by inf A. Neither infimum sup, infnor supremum of a non-empty set need to exist.

Example 1.3.1 Look at this ordered set:

A

B C

D E F

Here A is the maximum, because every ele-ment is smaller than A; the minimal elementsare D, E and F , but there is no minimum.The minimal elements cannot be compared toeach other.

,

Example 1.3.2 Define a ≤d b iff a divides b for a, b ∈ N, thus a ≤d b iff there exists k ∈ Nsuch that b = k · a. Let g be the greatest common divisor of a and b, then g = infa, b, andif s is the smallest common multiple of a and b, then s = supa, b. Here is why: One notesfirst that both g ≤d a and g ≤d b holds, because g is a common divisor of a and b. Let g′ beanother common divisor of a and b, then one shows easily that g′ divides g, so that g′ ≤d gholds. Thus g is in fact the greatest common divisor. One argues similarly for the lowestcommon multiple of a and b. ,

Example 1.3.3 Order S := P (N) \ N by inclusion. Then N \ k is maximal in S forevery k ∈ N. For, we obtain from the definition of S and its order that each element whichcontains N \ k properly would be outside the basic set S. The set A := n, n+ 2 | n ∈ Nis unbounded in S. Assume that T is an upper bound for A, then n ∈ n, n+ 2 ⊆ T and foreach n ∈ N, so that T = N 6∈ S. ,

Usually strict orders are written as < (or <M , if the basis set is to be emphasized), and orderrelations as ≤ or ≤M , resp.

Page 24: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

12 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Let <M be a strict order on M , <N be a strict order on N , then a map f : M → N is calledincreasing iff x <M y implies f(x) <N f(y); M and N are called similar iff f is a bijectionsuch that x <M y is equivalent to f(x) <N f(y). An order isomorphism is a bijection whichtogether with its inverse is increasing.

Definition 1.3.4 The strict linear order < on a set M is called a well-ordering on M iffeach non-empty subset of M has a smallest element. M is then called well-ordered (under<).

These are simple examples of well-ordered sets.

Example 1.3.5 N (this shows the special role of N alluded to above), finite linearly orderedsets, and 1− 1

n | n ∈ N are well-ordered. ,

Not every ordered set, however, is well-ordered, witnessed by these simple examples.

Example 1.3.6 Z is not well-ordered, because it does not have a minimal element. R isneither, because, e.g., the open interval ]0, 1[ does not have a smallest element. The powersetof N, denoted by P (N), is not well-ordered by inclusion because a well-order is linear, and1, 2 and 3, 4 are not comparable. Finally, 1 + 1

n | n ∈ N is not well-ordered, because

the set does not contain a smallest element. ,

Example 1.3.7 A reduction system R = (A,→) is a set A together with a relation→⊆ A×A;the intent is to have a set of rewrite rules, say, 〈a, b〉 ∈→ such that a may be replaced by bin words over an alphabet which includes the carrier A of R. Usually, one writes a → b iff

〈a, b〉 ∈→. Denote by+→ the reflexive-transitive closure of relation →, i.e., x

+→ y iff x = y orthere exists a chain x = a0 → . . .→ ak = y.

We call R terminating iff there are no infinite chains a0 → a1 → . . . → ak → . . .. Thefollowing proof rule is associated with a reduction system:

(WFI)∀x ∈ A :

(∀y ∈ A : x

+→ y ⇒ P (y))⇒ P (x)

∀x ∈ A : P (x)

Here P is a predicate on A so that P (x) is true iff x has property P . The the rule (WFI)says that if we can conclude for every x that P (x) holds, provided the property holds for allpredecessors of x, then we may conclude that P holds for each element of A.

This rule is equivalent to termination. In fact

• If → terminates, then (WFI) holds. Assume that (WFI) does not hold, then we find

x0 ∈ A such that P (x0) does not hold, hence we can find some x1 with x0+→ x1 and

P (x1) does not hold. For x1 we find x2 for which P does not hold with x1+→ x2, etc.

Hence we construct an infinite chain x0+→ x1

+→ . . . of elements for which P does nothold. But this means that → does not terminate.

• If (WFI) holds, then → terminates. Take the predicate P (x) iff there is not infinite

chain starting from x. Now (WFI) says that if y+→ x, and if P (y) holds, that then P (x)

holds. This means that no infinite chain starts from y, and x is a successor to y, thatthen no infinite chain starts from x either. Hence, by the conclusion of this rule, no xis the starting point of an infinite chain, consequently → terminates.

Page 25: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.3. WELL-ORDERED SETS 13

Now let (A,→) be a terminating reduction system, then each non-empty subset B ⊆ A hasa minimal element, because if this is not the case, we can construct an infinite descending

chain. But (A,→) is usually not well-ordered, because+→ is not necessarily strict. ,

There are some helpful ways of producing a new well-order from old ones.

Example 1.3.8 Let M and N be well-ordered and disjoint sets, define on M ∪N

a < b iff

a <M b, if a, b ∈M,

a <N b, if a, b ∈ N,a ∈M, b ∈ N, otherwise.

Then M ∪ N is well-ordered; this well-ordered set is usually denoted by M + N . Note thatM +N is not the same as N +M .

If the sets are not disjoint, make a copy of each upon setting M ′ := M ×1, N ′ := N ×2,and order these sets through, e.g. 〈m, 1〉 <M ′ 〈m′, 1〉 iff m <M m′. ,

Example 1.3.9 Define on the Cartesian product M ×N

〈m,n〉 < 〈m′, n′〉 iff

m < m′

n < n′, if m = m′

This lexicographic order yields a well-ordering again. ,

Example 1.3.10 Let Z be well-ordered, and assume that for each z ∈ Z the set Mz iswell-ordered so that the sets (Mz)z∈Z are mutually disjoint. Then

⋃z∈ZMz is well-ordered.

,

Having a look at (AC) again, we see that it holds in a well-ordered set:

Proposition 1.3.11 Let F be a family of non-empty subsets of the well-ordered set M . Thenthere exists a choice function on F .

Proof For each F ∈ F there exists a smallest element mF ∈ F . Put f(F ) := mF , thenf : F →M is a choice function on F . a

Thus, if we can find a well-order on a set, then we know that we can find choice functions.We formulate first this property.

(WO) Each set can be well-ordered.

We will refer to this property as (WO). Hence we can rephrase Proposition 1.3.11 as

(WO) =⇒ (AC)

Establishing the converse will turn out to be more interesting, since it will require the intro-duction of a new class of objects, viz., the ordinal numbers. This is what we will undertakein the next section.

We start with some preparations which deal with properties of well-orders.

Page 26: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

14 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Lemma 1.3.12 Let M be well-ordered and f : M → M be an increasing map. Then x ≤Mf(x) (thus x < f(x) or x = f(x)) holds for all x ∈M .

Proof Suppose that the set Y := y ∈ M | f(y) < y is not empty, then it has a smallestelement z. Since f(z) < z we obtain f(f(z)) < f(z) < z, because f is increasing. Thiscontradicts the choice of z. a

Let M be well-ordered, then define for x ∈ M the initial segment O(x) (or OM (x)) for x asO(x), OM (x)O(x) := z ∈M | z < x.

We obtain as a consequence

Corollary 1.3.13 No well-ordered set is order isomorphic to an initial segment of itself.

Proof An isomorphism f : M → OM (x) for some x ∈ M would have f(x) < x, whichcontradicts Lemma 1.3.12. a

A surprising consequence of Lemma 1.3.12 is that there exists at most one isomorphismbetween well-ordered sets.

Corollary 1.3.14 Let A and B be well-ordered sets. If f : A→ B and g : A→ B are orderisomorphisms, then f = g.

Proof Clearly both g−1 f and f−1 g are increasing, yielding x ≤ (g−1 f)(x) and x ≤(f−1 g)(x) for each x ∈ A, which means g(x) ≤ f(x) and f(x) ≤ g(x) for each x ∈ A.a

This is an important property of well-ordered sets akin to induction in the set of naturalnumbers. Accordingly it is called the principle of transfinite induction, sometimes also calledNoetherian induction (after the eminent German mathematician Emmy Noether) or well-founded induction (after the virtually unknown Chinese mathematician Wel Fun Ded).

Theorem 1.3.15 Let M be well-ordered, B ⊆ M be a set which has for each x ∈ M theproperty that O(x) ⊆ B implies x ∈ B. Then B = M .

Proof Assume that M \B 6= ∅, then there exists is a smallest element x in this set. Since xis minimal, all elements smaller than x are elements of B, hence O(x) ⊆ B. But this impliesx ∈ B, a contradiction. a

Let us have a second look at a proof of Lemma 1.3.12, this time using the principle oftransfinite induction. Put B := z ∈ M | z ≤ f(z), and assume O(x) ⊆ B. If y ∈ B withy 6= f(y), then y < f(y) and y < x, so that f(y) < f(x), thus y < f(x). Hence f(x) is largerthan any element of O(x), thus f(x) ∈M \O(x). But x is the smallest element of the latterset, which implies x < f(x), so x ∈ B. From Theorem 1.3.15 we see now that B = M .

We will show now that each set can be well ordered. In order to do this, we construct aprototypical well order and show that each set can be mapped bijectively to this set. Thisthen will serve as the basis for the construction of a well order for this set.

Carrying out this programme requires the prototype. This will be considered next.

Page 27: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.4. ORDINAL NUMBERS 15

1.4 Ordinal Numbers

Following von Neumann [KM76, § VII.9], ordinal numbers are defined as sets with thesespecial properties.

Definition 1.4.1 A set α is called an ordinal number iff these conditions are satisfied

¬ Every element of α is a set.

­ If β ∈ α, then β ⊆ α.

® If β, γ ∈ α, then β = γ or β ∈ γ or γ ∈ β.

¯ If ∅ 6= B ⊆ α, then there exists γ ∈ B with γ ∩B = ∅.

Hence in order to show that a given set is an ordinal, we have to show that the properties ¬, Obligation­, ® and ¯ hold. We will demonstrate this principle for some examples.

Example 1.4.2 Consider this definition of the somewhat natural numbers N0

0 := ∅,n+ 1 := 0, . . . , n,

N0 := 0, 1, . . ..

Then N0 is an ordinal number. Each element of N0 is a set by definition. Let β ∈ N0. Ifβ = 0, β = ∅ ⊆ N0, if β 6= 0, β = n = 0, . . . , n − 1 ⊆ N0. One argues similarly forproperty ®. Finally, let ∅ 6= β ⊆ N0, and let γ be the smallest element of β. If δ ∈ γ ∩ β,then δ is both an element of β and smaller than γ, this is a contradiction. Hence γ∩ δ = ∅. ,

Example 1.4.3 Let α be an ordinal number, then β := α ∪ α is an ordinal. It is thesmallest ordinal which is greater than α. Property ¬ is evident, so is property ­. Letγ, γ′ ∈ β with γ 6= γ′, and assume that, say, γ = α, then γ′ 6= α, consequently γ′ ∈ γ. Ifneither γ nor γ′ are equal to α, property ® trivially holds. Assume finally that ∅ 6= B ⊆ β. IfB ∩ α 6= ∅, property ¯ for β follows from this property for α, if, however, B = α, observethat α ∈ β with B ∩ α = ∅. Hence this property holds for β as well. ,

Definition 1.4.4 Let α be an ordinal, then α ∪ α is called the successor to α and denotedby α+ 1. α+ 1

It is clear from this definition that no ordinal can be squeezed in between an ordinal α and itsuccessor α+ 1.

Lemma 1.4.5 If M is a non-empty set of ordinals, then

1. α∗ :=⋂M is an ordinal; it is the largest ordinal contained in all elements of M .

2. α∗ :=⋃M is an ordinal; it is the smallest ordinal which contains all elements of M .

Proof We iterate over the defining properties of an ordinal number for⋂M . Since every

element γ of⋂M is also an element of every α ∈ M , we may conclude that γ is a set, and

that γ ⊆⋂M . If γ, δ ∈

⋂M ⊆ α for each α ∈ M , we have either γ = δ, γ ∈ δ or δ ∈ γ.

Page 28: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

16 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Finally, if ∅ 6= B ⊆⋂M ⊆ α for each α ∈ M , we find η ∈ B such that η ∩ B = ∅. Thus

α∗ :=⋂M has all the properties of an ordinal number from Definition 1.4.1. It is clear that

α∗ is the largest ordinal contained in all elements of M .

The proof for⋃M works along the same lines. a

Corollary 1.4.6 Given a non-empty set M of ordinals, there is always an ordinal which isstrictly larger than all the elements of M .

Proof If α∗ :=⋃M ∈M , then α∗+1 is the desired ordinal, otherwise α∗ is suitable. a

This is an interesting consequence.

Corollary 1.4.7 There is no set of all ordinals.

Proof If Z is the set of all ordinals, then Lemma 1.4.5 shows that α∗ :=⋃Z is an ordinal.

But the successor α∗ + 1 to α∗ is an ordinal as well by Example 1.4.3, which, however, is notan element of Z. This is a contradiction. a

Definition 1.4.8 An ordinal λ is called a limit ordinal iff α < λ implies α + 1 < λ for allordinals α.

Limit ordi-nal

Thus a limit ordinal is not reachable through the successor operation. This is a convenientcharacterization of limit ordinals.

Proposition 1.4.9 Let λ be an ordinal. Then

1. If λ is a limit ordinal, then⋃λ = λ.

2. If⋃λ = λ, then λ is a limit ordinal.

Proof 1. Assume firt that λ is a limit ordinal. Let β ∈⋃λ, then β ∈ α for some α ∈ λ.

Since α is an ordinal, we conclude β ∈ α ⊆ λ, so⋃λ ⊆ λ. On the other hand, if α ∈ λ, then

α+ 1 ∈ λ, since λ is a limit ordinal. Thus α ∈⋃λ, so

⋃λ ⊇ λ. This proves part 1.

2. Let α < λ =⋃λ, then α ∈ β for some β ∈ λ. Then either α+ 1 ∈ β or α+ 1 = β, in any

case α+ 1 ⊆ β, so that α+ 1 ∈ λ. Thus λ is a limit ordinal. This establishes part 2. a

Ordinals can be odd or even: A limit ordinal is said to be even; if the ordinal ζ can be writtenOdd, evenas ζ = ξ+1 and ξ is even, then ζ is odd, if ξ is odd, ζ is even. This classification is sometimeshelpful, some constructions involving ordinals depend on it, see for example Section 1.6.1 onpage 50.

Several properties of ordinal numbers are established now; this is required for carrying outthe programme sketched above. The first property states that the ∈-relation is not cyclic,which seems to be trivial. But since ordinal numbers have the dual face of being elementsand subsets of the same set, we will need to exclude this property explicitly by showing thatthe properties of ordinals prevent this undesired behavior.

Lemma 1.4.10 If α is an ordinal number, then there does not exist a sequence β1, . . . , βk ofsets with βk ∈ β1 ∈ . . . βk−1 ∈ βk ∈ α.

Page 29: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.4. ORDINAL NUMBERS 17

Proof If there exists such sets β1, . . . , βk, put γ := β1, . . . , βk, then γ is the smallest ordinalcontaining β1, . . . , βk. Now βk ∈ α implies βk ⊆ α, thus βk−1 ∈ α, hence βk−1 ⊆ α, so thatβ1, . . . , βk ∈ α. But now βi−1 ∈ βi ∩ γ for 1 ≤ i ≤ k and βk ∈ β1 ∩ γ, so that property ¯ inDefinition 1.4.1 is violated. Hence γ is not an ordinal at all. a

Lemma 1.4.11 If α is an ordinal, then each β ∈ α is an ordinal as well.

Proof 1. The properties of ordinal numbers from Definition 1.4.1 are inherited. This isimmediate for properties ¬, ® and ¯, so we have to take care of property ­.

2. Let γ ∈ β, we have to show that γ ⊆ β. So if η ∈ γ, we have by property ® for α inthe definition of ordinals either η = γ (which would imply γ ∈ γ ∈ β ∈ α, contradictingLemma 1.4.10), or γ ∈ η (which would yield γ ∈ η ∈ γ ∈ β ∈ α, contradicting Lemma 1.4.10again). Thus η ∈ β, so that property ­ also holds. a

Lemma 1.4.12 Let α and β be ordinals, then these properties are equivalent

1. α ∈ β.

2. α ⊆ β and α 6= β.

Proof 1 ⇒ 2: We obtain α ⊆ β from α ∈ β and from property ­, and α 6= β fromLemma 1.4.10, for otherwise we could conclude β ∈ β.

2 ⇒ 1: Because α is a proper subset of β, thus ∅ 6= β \ α ⊆ β, we infer from property ¯ forordinals that we can find γ ∈ β \ α such that γ ∩ β \ α = ∅. We claim that γ = α.

“⊆”: Since γ ∈ β we know that γ ⊆ β, and since γ ∩ β \ α = ∅, it follows γ ⊆ α.

“⊇”: We will show that the assumption α \ γ 6= ∅ is contradictory. Because ∅ 6= α \ γ ⊆ αwe find η ∈ α \ γ with η ∩ α \ γ = ∅. Because η ∈ α \ γ ⊆ α ⊆ β we conclude η ∈ β.From property ® we infer that the cases η = γ, η ∈ γ and γ ∈ η may occur. Look atthese cases in turn

• η = γ: This is impossible, because we would have then η ∈ α and η ∈ β \ α.

• η ∈ γ: This is impossible because η ∈ α \ γ.

• γ ∈ η: We know that γ 6∈ α \ γ, but γ ∈ α, which implies γ ∈ γ ∈ α, contradictingLemma 1.4.10.

Thus we conclude that the assumption α\γ 6= ∅ leads us to a contradiction, from whichthe desired inclusion is inferred.

a

Consequently, the containment relation ∈ yields a total order on an ordinal number.

Lemma 1.4.13 If α and β are ordinals, then either α ⊆ β or β ⊆ α. Thus α ∈ β or β ∈ αfor α 6= β.

Proof Suppose α 6= α ∩ β 6= β, then α ∩ β ∈ α and α ∩ β ∈ β by Lemma 1.4.12, henceα ∩ β ∈ α ∩ β, contradicting Lemma 1.4.10. a

Page 30: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

18 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Since we want to use the ordinals as prototypes for well-orders, we have to show that theyconstitute a well-order themselves; as an order relation inclusion or, what amount to be thesame, containment, suggests itself.

Lemma 1.4.14 Every ordinal is well-ordered by the inclusion relation.

Proof Let α be an ordinal, we show first that α is linearly ordered by inclusion. Takeβ, γ ∈ α, then either β = γ, β ∈ γ or γ ∈ β. The last two conditions translate because ofproperty ­ to β ⊆ γ or γ ⊆ β. Now let B be a non-empty subset of α, then we know fromproperty ¯ that there exists γ ∈ B with γ∩B = ∅. This is the smallest element of B. In fact,let η ∈ B with η 6= γ, then either γ ∈ η or η ∈ γ. But γ ∈ η is impossible, since otherwiseγ ∈ B ∩ η. So η ∈ γ, hence η ⊆ γ. a

We can describe this strict order even a bit more precise.

Lemma 1.4.15 If α and β are distinct ordinals, then either α is an initial segment of β orβ is an initial segment of α.

Proof Because α 6= β, we have either α ⊆ β or β ⊆ α by Lemma 1.4.13. Assume that α ⊆ βholds. If γ ∈ α, then γ ⊆ α, thus all elements of α precede the element α; conversely, if η ∈ βwith η ⊆ α, then η ∈ α. Hence α is a segment of β. It cannot be similar to β because ofCorollary 1.3.13. a

Historically, ordinal numbers have been introduced as some sort of equivalence classes ofwell-ordered sets under order isomorphisms (note that some sort of equivalence classes is acautionary expression, alluding to the fact that there is no such thing as a set of all sets).We show now that the present definition is not too far away from the traditional definition.Loosely speaking, the ordinals defined here may serve as representatives for those classes ofwell-ordered sets. We want to establish

Theorem 1.4.16 If M is a well-ordered set, then there exists an ordinal α such that M andα are isomorphic.

The proof will be done in several steps. Call two well-ordered sets A and B similar (A ∼ B) iffthere exists an isomorphism between them. Recall that isomorphisms preserve order relations

Outline forthe proof

in both directions.

Define the set H as all elements of M the initial segment of which is similar to some ordinalnumber, i.e.,

H := z ∈M | αz ∼ O(z) for some ordinal αz.

In view of Lemma 1.4.15, if αz ∼ O(z) and α′z ∼ O(z), then αz = α′z, so the ordinal αz isuniquely determined, if it exists. We first show by induction that H = M . For this, assumethat O(z) ⊆ H, then we have to show that z ∈ H, so we have to find an ordinal αz withαz ∼ O(z). In fact, the natural choice is

αz := αx ∈M | x < z,

so we show that this is an ordinal number by going through the properties according toDefinition 1.4.1. Since each element of αz is an ordinal, property ¬ is satisfied. Let αx ∈ αz,¬ 3

then x < z; if η ∈ αx, then η is an ordinal number, hence an initial segment of αx by

Page 31: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.4. ORDINAL NUMBERS 19

Lemma 1.4.15, thus η ∼ O(t) for some t. Hence t < x < z, so that αt = η ∈ αz. Thusproperty ­ is satisfied. Property ® follows from Lemma 1.4.13: Take αx, αy ∈ αz, then αx ­ 3

and αy are ordinals. Assume that they are different, then either αx ⊆ αy or αy ⊆ αx, so thatby Lemma 1.4.12 αx ∈ αy or αy ∈ αx follows. Finally, let ∅ 6= B ⊆ αx. Then B corresponds ® 3

to a non-empty subset of M with a smallest element y. Then αy ∈ αz, because y < z, andwe claim that αy ∩B = ∅. In fact, if η ∈ αy ∩B, then η = αt for some t ∈ B, so that y wouldnot be minimal. This shows that Property ¯ is satisfied. Hence αz is an ordinal. In order to ¯ 3

establish that z ∈ H we have to show that αz is similar to O(z). But this follows from theconstruction.

Consequently we know that the initial segment for each element of M is similar to an ordi-nal.

We are now in a position to complete the proof

Proof (for Theorem 1.4.16) Let

α := αz | αz ∼ O(z) for some z ∈M,

then one shows with exactly the arguments from above that α is an ordinal. Moreover, α issimilar to M : Consider the map z 7→ αz, provided αz ∼ O(z). It is clear that it is one to one,since x < y implies αx ∈ αy, for O(x) is a (proper) initial segment of O(y). It is also onto,because given η ∈ α, we find z ∈M with η ∼ O(z), so that z 7→ η. a

Let us have a brief look at all countable ordinals. They will be used later on for a partic-ular construction in Section 1.7 for the construction of a game, and in Section 1.6.1 for theconstruction of a σ-algebra.

Proposition 1.4.17 Let ω1 := α | α is a countable ordinal. Then ω1 is an ordinal, thefirst uncountable ordinal. ω1

Proof Exercise 1.11; the proof will have to look at the properties ¬ through ¯. a

Denote by W (α) := ζ | ζ < α all ordinals smaller than α, hence the initial segment ofordinals determined by α. Given an arbitrary non-empty set S, a map f : W (α) → S iscalled an α-sequence over S and sometimes denoted by 〈aζ | ζ < α〉, where aζ := f(ζ). Thenext very general statement says that these sequences can be defined by transfinite recursionin the following manner.

Theorem 1.4.18 Let S be a non-empty set, and let Φ be the set of all α-sequences over Sfor some ordinal α. Moreover, assume that h : Φ → S is a map of α-sequences over S to S.Then there exists a uniquely determined (α+ 1)-sequence 〈aζ | ζ ≤ α〉 such that

aζ = h(〈aξ | ξ < ζ〉)

for all ζ ≤ α.

Page 32: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

20 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Proof 0. The proof works by induction. We show first that the sequence is uniquely deter- Outlinemined, and then we define this uniquely determined sequence inductively.

1. We show uniqueness first. Assume that we have two α + 1-sequences 〈aζ | ζ ≤ α〉 and〈bζ | ζ ≤ α〉 such that

aζ = h(〈aη | η < ζ〉),bζ = h(〈bη | η < ζ〉)

for all ζ ≤ α. Then we show by induction on ζ that aζ = bζ . The induction begins at thesmallest ordinal ζ = ∅, so that a∅ = h(∅) = b∅, and the induction step is trivial.

2. The sequence 〈aζ | ζ ≤ α〉 is defined now by induction on ζ. If 〈αη | η ≤ ζ〉 is defined, thendefine

αζ+1 := h(〈aη | η ≤ ζ〉).

If, however, λ is a limit ordinal such that 〈αη | η ≤ ζ〉 is defined for each ζ < λ, then onenotes that 〈aξ | ξ < ζ ′〉 is the restriction of 〈aξ | ξ < ζ〉 for ζ < ζ ′ < λ by uniqueness, sothat

αλ := h(〈aζ | ζ < λ〉)

defines αλ uniquely. a

We are now in a position to show that the existence of a choice function implies that eachset S can be well-ordered. The idea of the proof is to find for some suitable ordinal α anα-sequence 〈aζ | ζ < α〉 over S which exhausts S, hence so that S = aζ | ζ < α, and then

Idea: ex-haust S

to use the well-ordering of the ordinals by saying that aζ < aξ iff ζ < ξ.

Constructing the sequence will use the choice function, selecting an element in such a waythat one can be sure that it has not been selected previously.

Theorem 1.4.19 If (AC) holds, then each set S can be well-ordered.

Proof Let f : P (S) \ ∅ → S be a choice function on the non-empty subset of S. Extendf by putting f(∅) := p, where p 6∈ S. This element p serves as an indicator that we are donewith constructing the sequence. Let C be the set of all ordinals ζ such that there exists awell-order <B on a subset B ⊆ S with (B,<B) is similar to (O(ζ), cp. Theorem 1.4.16. SinceC is a set of ordinals, there exists a smallest ordinal α not in C by Corollary 1.4.6.

By Theorem 1.4.18 there exists an α-sequence 〈aζ | ζ < α〉 over S such that

aζ := f(S \ 〈aη | η < ζ〉) ∈ S \ 〈aη | η < ζ〉

for all ζ < α. Now if S\〈aη | η < ζ〉 6= ∅, then aζ 6= p, and aζ /∈ aη | η < ζ, so that the aζ aremutually different. Suppose that this process does not exhaust S, then aζ 6= p for all ζ < α.Construct the corresponding well-order < on aζ | ζ < α, then (aζ | ζ < α, <) ∼ O(α).Thus α ∈ C, contradicting the choice of α. Hence there exists a smallest ordinal ξ < α withaξ = p, which implies that S = aζ | ζ < ξ so that elements having different labels are infact different. This yields a well-order on S. a

Hence we have shown

Page 33: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 21

Theorem 1.4.20 The following statements are equivalent

(AC) The Axiom of Choice.

(WO) Each set can be well-ordered.

a

(AC) has other important and often used equivalent formulations, which we will discussnow.

1.5 Zorn’s Lemma and Tuckey’s Maximality Principle

Let A be an ordered set, then B ⊆ A is called a chain iff it is linearly ordered, an orderedset in which each chain has an upper bound is sometimes called inductively ordered . ThenZorn’s Lemma states

(ZL) If A is an ordered set in which every chain has anupper bound, then A has a maximal element.

Proposition 1.5.1 (ZL) implies (AC).

Proof 0. Given a family of non-empty sets F , we want to find a choice function for it. Inorder to be in a position to apply Zorn’s Lemma, we have to put ourselves in a position thatwe have an ordered set at out disposal in which every chain has an upper bound. We take as

Proofoutline

the ordered set all functions f which are defined on subsets of F , for which f(F ) ∈ F holds,whenever F is in the domain of f . This set can be ordered in a natural way, and it is then notdifficult to see that each chain has an upper bound. Thus we obtain from (ZL) a maximalelement, which easily is shown to be a map with F as its domain. This is the plan.

1. Let F 6= ∅ be a family of nonempty subsets of a set S, we want to find a choice functionon F . Define

R := 〈F, s〉 | s ∈ F ∈ F,

then R ⊆ F ×A is a relation. Put

C := f | f is a function with f ⊆ R

(note that we use functions here as sets of pairs). Then C 6= ∅, because 〈F, s〉 ∈ C for each〈F, s〉 ∈ R. C is ordered by inclusion, and each chain has an upper bound in C. In fact, ifK ⊆ C is a chain, then

⋃K is a map: let 〈F, s〉, 〈F, s′〉 ∈

⋃K, then there exists f1, f2 ∈ K

with 〈F, s〉 ∈ f1, 〈F, s′〉 ∈ f2. Because K is a chain, either f1 ⊆ f2 or vice versa, let us assumef1 ⊆ f2. Thus 〈F, s〉 ∈ f2, and since f2 is a map, we may conclude that s = s′. Hence

⋃K is

an upper bound to K in C.

2. By (ZL), C has a maximal element f∗. We prove that f∗ is the desired choice function,hence that there exists for each and every F ∈ F some s ∈ F with f∗(F ) = s, or, equivalently,〈F, s〉 ∈ f∗. Consequently, the domain of f∗ should be all of F . Assume that the domain off∗ does not contain some F ∗ ∈ F , then the map f∗ ∪ 〈F ∗, a〉 contains for each a ∈ F ∗ the

Page 34: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

22 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

map f∗ properly. This is a contradiction, thus f∗ : F → S with f∗(F ) ∈ F for all F ∈ F .a

We will encounter this pattern over and over again when applying (ZL). We need an orderedset, for which we can establish the chain condition. The maximal element obtained from (ZL)

(ZL): Pat-tern

will then have to be checked for its suitability, usually by bringing the assumption that it isnot the one we are looking for to a contradiction.

A proof for (AC)⇒ (ZL) uses a well-ordering argument for constructing a maximal chain.

Proposition 1.5.2 Assume that A is an ordered set in which each chain has an upper bound,and assume that there exists a choice function on P (A)\∅. Then A has a maximal element.

Proof 0. We construct from the choice function a maximal element for A by transfiniteinduction.

1. As in the proof of Theorem 1.4.19, let C be the set of all ordinals ζ such that there exists awell-order <B on a subset B ⊆ A with (B,<B) ∼ O(ζ), and let α be the smallest ordinal notin C, see Corollary 1.4.6. Extend the choice function f on P (A) \ ∅ upon setting f(∅) := pwith p 6∈ A. This element will again serve as a marker, indicating that the selection processis finished.

2. Define by induction a transfinite sequence 〈aζ | ζ < α〉 such that a∅ ∈ A is arbitraryand

aζ := f(x ∈ A | x > aη for all η < ζ).

Assume that aζ 6= p, then aζ > aη for all η < ζ. As in the proof of Theorem 1.4.19, there is asmallest ordinal β < α such that aβ = p. The selection process makes sure that 〈aζ | ζ < β〉is an increasing sequence, and that there does not exists an element x ∈ A such that aζ < xfor all ζ < β.

3. Let t be an upper bound for the chain 〈aζ | ζ < β〉. If t is not a maximal element for A,then there exists x with x > t, hence x > aζ for all ζ < β, which is a contradiction. a

Call a subset F ⊆ P (A) of finite character iff the following condition holds: F is a member ofF iff each finite subset of F is a member of F . The following statement is known as Tuckey’sLemma or as Tuckey’s Maximality Principle:

(MP) Each family of finite character has a maximalelement.

This is another equivalent to (AC).

Proposition 1.5.3 (MP) ⇔ (AC).

Proof 0. We show (MP) ⇒ (AC), the other direction is delegated to the Exercises.

1. Let F ⊆ P (S) \ ∅ be a family of nonempty sets. We construct a choice function for F .Consider

G := f | f is a choice function for some E ⊆ F.

Then G is of finite character. In fact, let f be map from E ⊆ F to S such that each finitesubset f0 ⊆ f is a choice function for some E0 ⊆ E , then f is itself a choice function for E .

Page 35: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 23

Conversely, if f : E → S is a choice function for E ⊆ F , then each finite subset of f is a choicefunction for its domain. Thus there exists by the Maximality Principle a maximal elementf∗ ∈ G. The domain of f∗ is all of F , because otherwise f∗ could be extended as in the proofof Proposition 1.5.1, and it is clear that f∗ is a choice function on F . a

Thus we have shown

Theorem 1.5.4 The following statements are equivalent

(AC) The Axiom of Choice.

(WO) Each set can be well-ordered.

(ZL) If A is an ordered set in which every chain has an upper bound, then A has a maximalelement (Zorn’s Lemma).

(MP) Each family of finite character has a maximal element (Tuckey’s Maximality Principle).

We will discuss some applications of Zorn’s Lemma and the Maximality Principle now. FromTheorem 1.5.4 we know that in each case we could use also (AC) or (WO), as the case may be.An application of Zorn’s Lemma appears sometimes to be more convenient and less technicalthan using (WO).

1.5.1 Compactness for Propositional Logic

We will show that a set of propositional formulas is satisfiable iff each finite subset is satisfiable.This is usually called the Compactness Theorem for Propositional Logic.

Fix a set V 6= ∅ of variables. A propositional formula ϕ is given through this grammar

ϕ ::= x | ϕ ∧ ϕ | ¬ϕ

with x ∈ V . Hence a formula is either a variable, the conjunction of two formulas, or thenegation of a formula. The disjunction ϕ ∨ ψ is defined through ¬(¬ϕ ∧ ¬ψ), implicationϕ → ψ as ¬ϕ ∨ ψ, finally ϕ ↔ ψ is defined through (ϕ → ψ) ∧ (ψ → ϕ). Denote by Fthe set of all propositional formulas — actually, the set of all formulas depends on the set ofvariables, so we ought to write F(V ); since we fix V , however, we use this somewhat lighternotation.

A valuation v evaluates formulas. Instead of using true and false, we use the values 0 and1, hence a valuation is a map V → 0, 1 which is extended in a straightforward manner toa map F → 0, 1, which is again denoted by v:

v(ϕ1 ∧ ϕ2) := minv(ϕ1), v(ϕ2),v(¬ϕ) := 1− v(ϕ).

Then we have obviously, e.g.,

v(ϕ1 ∨ ϕ2) = maxv(ϕ1), v(ϕ2),v(ϕ1 → ϕ2) = 1 iff v(ϕ1) ≤ v(ϕ2),

v(ϕ1 ↔ ϕ2) = 1 iff v(ϕ1) = v(ϕ2).

Page 36: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

24 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

For example,

v(ϕ→ (ψ → γ)) = max1− v(ϕ),max1− v(ψ), v(γ)= max1− v(ϕ), 1− v(ψ), v(γ)= max1− v(ψ),max1− v(ϕ), v(γ)= v(ψ → (ϕ→ γ)).

Hence (ϕ→ (ψ → γ)

)↔(ψ → (ϕ→ γ)

)↔ (ϕ ∧ ψ)→ γ.

A formula is true for a valuation iff this valuation gives it the value 1; a set A of formulas issatisfied by a valuation iff each formula in A is true under this valuation. Formally:

Definition 1.5.5 Let v : F → 0, 1 be a valuation. Then formula ϕ is true for v (insymbols: v |= ϕ) iff v(ϕ) = 1. If A ⊆ F is a set of propositional formulas, then A is said tov |= ϕbe satisfied by v iff each formula in A is true for v, i.e., iff v |= ϕ for all ϕ ∈ A. This iswritten as v |= A.v |= A

We are interested in the question whether or not we can find for a set of formulas a valuationsatisfying it.

Definition 1.5.6 A ⊆ F is called satisfiable iff there exists a valuation v : F → 0, 1 withv |= A.

Depending on the size of the set of variables, the set of formulas may be quite large. If Vis countable, however, F is countable as well, so in this case the question may be easier toanswer; this will be discussed briefly after giving the proof of the Compactness Theorem. Wewant to establish the general case.

Before we state and prove the result, we need a lemma which permits us to extend the rangeof our knowledge of satisfiability just by one formula.

Lemma 1.5.7 Let A ⊆ F be satisfiable, and ϕ 6∈ A be a formula. Then one of A∪ ϕ andA ∪ ¬ϕ is satisfiable.

Proof If A ∪ ϕ is not satisfiable, but A is, let v be the valuation for which v |= A holds.Because v(ϕ) = 0 we conclude v(¬ϕ) = 1, so that v |= A ∪ ¬ϕ. a

We establish now the Compactness Theorem for propositional logic. It permits reducing thequestion of satisfiability of a set A of formulas to finite subsets of A.

Theorem 1.5.8 Let A ⊆ F be a set of propositional formulas. Then A is satisfiable iff eachfinite subset of A is satisfiable.

Proof 0. We will focus on satisfiability of A provided each finite subset of A is satisfiable,because the other half of the assertion is trivial . The idea is to apply (ZL), so that we have

Outline forthe proof

to construct an ordered set which satisfies the chain condition. This set will consist of pairs〈B, v〉 with B ⊆ A and v |= B. We know from the assumption that we have plenty of these

Page 37: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 25

pairs. The order is straightforward, and we will establish the chain condition easily. Themaximal element will have A as its first component.

1. LetC := 〈B, v〉 | B ⊆ A, v |= B

and define 〈B1, v1〉 ≤ 〈B2, v2〉 iff B1 ⊆ B2 and v1(ϕ) = v2(ϕ) for all ϕ ∈ B1, so that 〈B1, v1〉 ≤〈B2, v2〉 holds iff B1 is contained in B2, and if the valuations coincide on the smaller set. Thisis a partial order. If D ⊆ C is a chain, then put B :=

⋃D, and define v(ϕ) := v′(ϕ), if ϕ ∈ B′

with 〈B′, v′〉 ∈ D. Since D is a chain, v is well defined. Moreover, v |= B: let ϕ ∈ B, thenϕ ∈ B′ for some 〈B′, v′〉 ∈ D, since v′ |= B′, we have v(ϕ) = v′(ϕ) = 1. Hence by Zorn’sLemma there exists a maximal element 〈M, w〉, in particular w |=M.

We claim that M = A. Suppose this is not the case, then there exists ϕ ∈ A with ϕ 6∈ M.But eitherM∪ϕ orM∪¬ϕ is satisfiable by Lemma 1.5.7, hence 〈M, w〉 is not maximal.This is a contradiction.

But this means that M = A, hence A is satisfiable. a

Suppose that V is countable, then we know that F is countable as well. Then another prooffor Theorem 1.5.8 can be given; this will be sketched now. Enumerate F as ϕ1, ϕ2, . . .. Call— just temporarily — A ⊆ F finitely satisfiable iff each finite subset of A is satisfiable. LetA be such a finitely satisfiable set. We construct a sequenceM0,M1, . . . of finitely satisfiablesets, starting from M0 := A. If Mn is defined, put

Mn+1 :=

Mn ∪ ϕn+1 if Mn ∪ ϕn+1 is finitely satisfiable,

Mn ∪ ¬ϕn+1 otherwise.

This will give a finitely satisfiable set M∗ :=⋃n≥0Mn. Now define v∗(ϕ) := 1 iff ϕ ∈ M∗.

We claim that v∗ |= ϕ iff ϕ ∈ M∗. This is proved by a straightforward induction on ϕ.Because A ⊆ M∗, we know that v∗ |= A. This approach could be modified for the generalcase, well-ordering F .

The approach used for the general proof can be extended from propositional logic to first-order logic by introducing suitable constants (they are called Henkin constants). We referthe reader to [Bar77, Chapter 1], since we are concentrating presently on applications ofZorn’s Lemma; see, however, the discussion in Section 3.6.1, where additional references canbe found.

1.5.2 Extending Orders

We will establish a generalization to the well-known fact that each finite graph G can beembedded into a linear order, provided the graph does not have any cycles. This is known asa topological sort of the graph [Knu73, Algorithm T, p. 262] or [CLR92, Section 23.4]. Onenotes first that G must have a node k which does not have any predecessor (hence there is nonode ` which is connected to k through an edge ` → k). If such an node k would not exist,one could construct for each node a cycle on which it lies. The algorithm proceeds recursively.If the graph contains at most one node, it returns either the empty list or the list containingthe node. The general case constructs a list having k as its head and the list for G \ k as itstail; here G \ k is the graph with node k and all edges emanating from k removed.

Page 38: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

26 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Finiteness plays a special role in the argument above, because it makes sure that we havea well-order among the nodes, which in turn is needed for making sure that the algorithmterminates. Let us turn to the general case. Given a partial order ≤ on a set S, we show that≤ can be extended to a strict order ≤s (hence a ≤ b implies a ≤s b for all a, b ∈ S).

This will be shown through Zorn’s Lemma. Put

G := R | R is a partial order on S with ≤ ⊆ R

and order G by inclusion. Let C ⊆ G be a chain, then we claim that R0 :=⋃C is a partial order.

It is obvious that R0 is reflexive; if aR0b and bR0a, then there exists relations R1, R2 ∈ C withaR1b and bR2a. Since C is a chain, we know that R1 ⊆ R2 or R2 ⊆ R1 holds. Assume that theformer holds, then aR2b follows, so that we may conclude a = b. Hence R0 is antisymmetric.Transitivity is proved along the same lines, using that C is a chain. By Zorn’s Lemma, G hasa maximal element M ; since M ∈ G, M is a partial order which contains the given partialorder ≤.

We have to show that M is linear. Assume that it is not, so that there exists a, b ∈ S suchthat both aMb and bMa are false. Put

M ′ := M ∪ 〈x, y〉 | xMa and bMy.

Then M ′ contains M properly. If we can show that M ′ is a partial order, we have shown thatPlanM is not maximal, which is a contradiction. Let’s see:

• M ′ is reflexive: Since M ⊆M ′ and M is reflexive, xMx holds for all x ∈ S.

• M ′ is transitive: Let xM ′y and yM ′z, then these cases are possible

1. xMy and yMz, hence xMz, thus xM ′z.

2. xMy and yMa and bMz, thus xMa and bMz, so that xM ′z.

3. xMa and bMy and yMz, hence xM ′z.

4. xMa and bMy and yMa and bMz, but then bMa contrary to our assumption.Hence this case cannot occur.

Thus we may conclude that M ′ is transitive.

• M ′ is antisymmetric. Assume that xM ′y and yM ′x, and look at the cases above withz = x. Case 2 would imply xMa and bMx, so is not possible, case 3 is excluded for thesame reason, so only case 1 is left, which implies x = y.

Thus the assumption that there exists a, b ∈ S such that both aMb and bMa are false leadsto the conclusion that M is not maximal in G, which is a contradiction.

Then a <s b iff 〈a, b〉 ∈M defines the desired total order, and by construction it extends thegiven order.

Hence we have shown:

Proposition 1.5.9 Each partial order on a set can be extended to a total order. a

It is clear that this applies to acyclic graphs, so that we have here a very general version oftopological sorting.

Page 39: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 27

1.5.3 Bases in Vector Spaces

Fix a vector space V over fieldK. A setB ⊆ V is called linearly independent iff∑

b∈B0ab·b = 0

Linearindepen-

dence

implies ab = 0 for all b ∈ B0, whenever B0 is a finite non-empty subset of B. Hence, e.g., asingle vector v with v 6= 0 is linear independent.

Example 1.5.10 The reals R form a vector space over the rationals Q. Then√

2 and√

3 arelinearly independent. In fact, assume that q1

√2+q2

√3 = 0 with rational numbers q1 = r1/s1

and q2 = r2/s2. Then we can find integers t1, t2 such that t1√

2 = t2√

3 so that t1 and t2 haveno common divisors. But 2t21 = 3t22 implies that 2 and 3 are both common divisors to t1 andto t2. ,

The linear independent set B is called a base for V iff B is linear independent, and if each Baseelement v ∈ V can be represented as

v =n∑i=1

ai · bi

for some a1, . . . , an ∈ K and b1, . . . , bn ∈ B. This representation is unique.

Proposition 1.5.11 Each vector space V has a base.

Proof 0. We first find a maximal independent set through (ZL) by considering the family of Planall independent sets, and then we show that this set is a base.

1. Let

V := B ⊆ V | B is linear independent.

Then V contains all singletons with non-null vectors, hence it is not empty. Order V byinclusion, and let B be a chain in V. Then B0 :=

⋃B is independent. In fact, if

∑ni=1 ai·bi = 0,

let bi ∈ Bi ∈ B for 1 ≤ i ≤ n. Since C is linearly ordered, we find some k such that bi ∈ Bk,since Bk is independent, we may conclude b1 = . . . = bn = 0. By Zorn’s Lemma there existsa maximal independent set B∗ ∈ V.

2. If B∗ is not a basis, then we find a vector x which cannot be represented as a finite linearcombination of elements of B∗. Clearly x 6∈ B∗. But then B∗ ∪ x is linear independent, forit could otherwise be represented by elements from B∗. This contradicts the maximality ofB∗. a

One notes that part 1 of the proof could as well argue with the Maximality Principle, becausea set is linear independent iff each finite subset is linear independent. The set V constructedin the proof is of finite character, hence contains by (MP) a maximal element. Then one argueexactly as in part 2. of the proof. This shows that (ZL) and (MP) are close relatives.

These proofs are not constructive, since they do not tell us how to construct a base for agiven vector space, not even in the finite dimensional case.

Page 40: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

28 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

1.5.4 Extending Linear Functionals

Sometimes one is given a linear map from a subspace of a vector space to the reals, and onewants to extend this map to a linear map on the whole space. Usually there is the constraintthat both the given map and the extension should be dominated by a sublinear map.

Let V be a vector space over the reals. A map f : V → R is said to be a linear functional (ora linear map) on V iff f(α · x+ βy) = α · f(x) + β · f(y) holds for all x, y ∈ V and α, β ∈ R.Linear mapThus a linear functional is compatible with the vector space structure of V . Call p : V → Rsublinear iff p(x+ y) ≤ p(x) + p(y), and p(α · x) = α · p(x) for all x, y ∈ V and α ≥ 0.Sublinearity

We have a look at the situation in the finite dimensional case first. This will permit us toisolate the central argument easily, which then will be applied to the general situation.

Proposition 1.5.12 Let V be a finite dimensional real vector space with a sublinear func-tional p : V → R. Given a subspace V0, and a linear map f0 : V0 → R such that f0(x) ≤ p(x)for all x ∈ V0. Then there exists a linear functional f : V → R which extends f0 such thatf(x) ≤ p(x) for all x ∈ V .

Proof 1. It is enough to show that f0 can be extended to a linear functional dominated byp to the vector space generated by V0 ∪ z with z 6∈ V0. In fact, we can then repeat this

Line of at-tack

procedure a finite number of times, in each step adding a new basis vector not contained inthe previous subspace. Since V is finite dimensional, this will eventually give us V as thedomain for the linear functional.

2. Let z 6∈ V0, then v + α · z | v ∈ V0, α ∈ R is the vector space generated by V0 and z,because this is clearly a vector space containing V0 ∪ z, and each vector space containingV0 ∪ z must also contain linear combinations of the form v + α · z with v ∈ V0 and α ∈ R.The representation of an element in this vector space is unique: assume v+α · z = v′+α′ · z,then v−v′ = (α−α′) ·z, and because z 6∈ V0, this implies v−v′ = 0, hence also α = α′.

3. Now set

f(v + α · z) := f0(v) + α · c

with a value c which will have to be determined. Consider v, v′ ∈ V0, then we have

f0(v)− f0(v′) = f0(v − v′) ≤ p(v − v′) ≤ p(v + z) + p(−z − v′)

for an arbitrary v1 ∈ V . Thus we obtain −p(z − v′) − f0(v′) ≤ p(v + z) − f0(v). Note thatthe left hand side of this inequality is independent of v, and that the right hand side isindependent of v′, which means that we can find c with

i. c ≤ p(v + z)− f0(v) for all v ∈ V0,

ii. c ≥ −p(−z − v′)− f0(v′) for all v ∈ V0.

Now let’s see what happens. Fix α. If α = 0, we have f(v + 0 · z) = f0(v) ≤ p(v + 0 · z). Ifα > 0, we have

f(v+α ·z) = α ·f(v/α+z) = α ·(f0)(v/α)+c) ≤ α ·(f0(v)+p(v/α+z)−f0(v/α) = p(v+α ·z)

Page 41: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 29

by i. and sublinearity. If, however, α < 0, we use the inequality ii. and sublinearity of p; notethat the coefficient −z/α of z is positive in this case.

Summarizing, we have f(v + α · z) ≤ p(v + α · z) for all v ∈ V0 and α ∈ R. a

When having a closer look at the proof, we see that the assumption on working in a fi-nite dimensional vector space is only important for making sure that the extension processterminates in a finite number of steps. The core of this proof, however, consists in the ob-servation that we can extend a linear functional from a vector space V0 to a vector spacev+α · z | v ∈ V0, α ∈ R with z 6∈ V0 without loosing domination by the sublinear functionalp. Let us record this important intermediate result.

Corollary 1.5.13 Let V0 be a vector space, V0 ⊆ V , p : V → R be a sublinear functional, andz 6∈ V0. Then each linear functional f0 : V0 → R which is dominated by p can be extended toa linear functional f on the vector space generated by V0 and z such that f is also dominatedby p. a

Now we are in a position to formulate and prove the Hahn-Banach Theorem. We will useZorn’s Lemma for the proof by setting up a partial order such that each chain has an upper Outlinebound. The elements of this ordered set will be pairs 〈V ′, f ′〉 such that V ′ is a subspace of Vwith V0 ⊆ V , and f ′ will be a linear map extending f0 and being dominated by p, the orderbeing straightforward, induced by the extension condition. We may conclude then that thereexists a maximal element. By the “dimension free” version of the extension just stated wewill then show that the assumption that we did not capture the whole vector space throughour maximal element will yield a contradiction.

Theorem 1.5.14 Let V be a real vector space with a sublinear functional p : V → R. Givena subspace V0, and a linear map f0 : V0 → R such that f0(x) ≤ p(x) for all x ∈ V0. Thenthere exists a linear functional f : V → R which extends f0 such that f(x) ≤ p(x) for allx ∈ V .

Proof 1. Define 〈V ′, f ′〉 ∈ W iff V ′ is a vector space with V0 ⊆ V ′ ⊆ V , and f ′ : V ′ → Rextends f0 and is dominated by p. Define 〈V ′, f ′〉 ≤ 〈V ′′, f ′′〉 iff V ′ is a subspace of V ′′ andf ′′ is an extension to f ′ for 〈V ′, f ′〉, 〈V ′′, f ′′〉 ∈ W. Then ≤ is a partial order on W. Let(〈Vi, fi〉

)i∈I be a chain in W, then V ′ :=

⋃i∈I Vi is a subspace of V . In fact, let x, x′ ∈ V ′,

then x ∈ Vi and x′ ∈ Vi′ . Then either Vi ⊆ Vi′ or Vi′ ⊆ Vi. Assume the former, hencex, x′ ∈ Vi′ , thus α · x + β · x′ ∈ Vi′ ⊆ V ′ for all α, β ∈ R. Put f ′(x) := fi(x), if x ∈ Vi forsome i ∈ I, then f ′ : V ′ → R is well defined, linear and dominated by p, moreover, f ′ extendsevery fi, hence, by transitivity, f0. This implies 〈V ′, f ′〉 ∈ W, and this is obviously an upperbound for the chain.

2. Hence each chain has an upper bound in W, so that Zorn’s Lemma implies the existenceof a maximal element 〈V +, f+〉 ∈ W. Assume that V + 6= V , then there exists z ∈ V withz 6∈ V +. Then the vector space V ∗ generated by V + ∪ z contains V + properly, and f+

has a linear extension f∗ to V ∗ which is dominated by p by Corollary 1.5.13. But this means〈V +, f+〉 is strictly smaller than 〈V ∗, f∗〉 ∈ W, a contradiction. Hence V + = V , and f+ isthe desired extension. a

The Hahn-Banach Theorem is sometimes considered as one of the corner stones in functionalanalysis, because it permits to construct linear functionals with given properties. A first idea

Page 42: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

30 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

why this might be important is hinted at in Exercise 4.29.

1.5.5 Maximal Filters

Fix a set S. The power set P (S) is ordered by inclusion ⊆, exhibiting some interesting prop-erties. We single out subsets of P (S) which are called filters. These filters will be discussedin subsequent sections, and then the aspect, that a filter lives in an ordered environment,becomes dominant. But here is the definition of a filter of subsets.

Definition 1.5.15 A non-empty subset F ⊆ P (S) is called a filter iff

1. ∅ 6∈ F ,

2. if F1, F2 ∈ F , then F1 ∩ F2 ∈ F ,

3. if F ∈ F and F ⊆ F ′, then F ′ ∈ F .

Thus a filter is closed under finite intersections and closed with respect to super sets, and itmust not contain the empty set.

Example 1.5.16 Given s ∈ S, the set Fs := A ⊆ S | s ∈ A is a filter, which is calledthe principal ultrafilter associated with x. Let M be an infinite set. Then F := A ⊆ M |

Principalultrafilter

M \A is finite is a filter, the filter of cofinite sets. ,

The filter from Example 1.5.16 is special because it is maximal, we cannot find a filter Gwhich properly contains Fs. Let’s try: Take G ∈ G with G 6∈ Fs, then s 6∈ G, hence s ∈ S \G,so that both G ∈ G and S \ G ∈ G, the latter one via F . This implies ∅ ∈ G, since a filteris closed under finite intersections. We have arrived at a contradiction, giving rise to thedefinition of a maximal filter (Definition 1.5.20) in a moment.

Before stating it, we will introduce filter bases. Sometimes we are not presented with a filterproper, but rather with a family of sets which generates one.

Definition 1.5.17 A subset B ⊆ P (S) is called a filter base iff no intersection of a finitecollection of elements of B is empty, thus iff ∅ 6∈ B1 ∩ . . . ∩Bn | B1, . . . , Bn ∈ B.

Example 1.5.18 Fix x ∈ R, then the set B :=

]a, b[| a < x < b

of all open intervalscontaining x is a filter base. Let (an)n∈N be a sequence in R, then the set E :=

ak | k ≥

n | n ∈ N

of infinite tails of the sequence is a filter base as well. ,

Clearly, if B is to be contained in a filter F , then it must not have the empty sets amongits finite intersections, because all these finite intersections are elements of F . It is easy tocharacterize the filter generated by a base.

Lemma 1.5.19 Let B ⊆ P (S) be a filter base, then

F := B ⊆ S | B ⊇ B1 ∩ . . . ∩Bn for some B1, . . . , Bn ∈ B

is the smallest filter containing B.

Proof It is clear that F is a filter, because it cannot contain the empty set, it is closed underfinite intersections, and it is closed under super sets. Let G be a filter containing B, and let

Page 43: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 31

B ⊇ B1 ∩ . . . ∩ Bn for some B1, . . . , Bn ∈ B ⊆ G, hence B ∈ G. Thus F ⊆ G, so that F is infact the smallest filter containing B. a

Let us return to the properties of the filter which is defined in Example 1.5.16.

Definition 1.5.20 A filter is called maximal iff it is not properly contained in another filter.Maximal filters are also called ultrafilters.

This is an easy characterization of maximal filters.

Lemma 1.5.21 These conditions are equivalent for a filter F

1. F is maximal.

2. For each subset A ⊆ S, either A ∈ F or S \A ∈ F .

Proof 1 ⇒ 2: Assume there is a set A ⊆ S such that both A 6∈ F and S \ A 6∈ F holds.Then

G0 := F ∩A | F ∈ F

is a filter base, because F ∩ A = ∅ for some F ∈ F would imply F ⊆ S \ A, thus S \ A ∈ F .Because F ∩ A 6∈ F for all F ∈ F we conclude that the filter G generated by G0 contains Fproperly. Thus F is not maximal.

2 ⇒ 1: A filter G which contains F properly will contain a set A 6∈ F . By assumption,S \A ∈ F ⊆ G, so that ∅ ∈ G. Thus G is not a filter. a

Example 1.5.22 The filter F of cofinite sets from Example 1.5.15 for an infinite set M isnot an ultrafilter. In fact, decompose M = M0 ∪M1 into disjoint sets M0 and M1 which areboth infinite. Then neither M0 nor its complement are contained in F . ,

The existence of ultrafilters is trivial by Example 1.5.16, but we do not know whether eachfilter is actually contained in an ultrafilter. The answer is in the affirmative.

Theorem 1.5.23 Each filter can be extended to a maximal filter.

Proof Let F be a filter on S, and define

V := G | G is a filter with F ⊆ G.

Order V by inclusion. Then each chain C in V has an upper bound in V. In fact, let H :=⋃C.

If A ∈ H and A ⊆ B, there exists a filter G ∈ C with A ∈ G, hence B ∈ G, so that B ∈ H.If A,B ∈ H, we find GA,GB ∈ H with either GA ⊆ GB or GB ⊆ GA, because C is linearlyordered. Assume the former, hence A,B ∈ CB, hence A ∩ B ∈ GB ⊆ H. So H is a filter inV.

Thus there exists in V a maximal element F∗ which is a maximal filter (just repeat theargument in the proof of 2 ⇒ 1 for Lemma 1.5.21). F∗ contains F . a

Corollary 1.5.24 Let ∅ 6= A ⊆ X be a non-empty subset of a set X. Then there exists anultrafilter containing A.

Proof Using Theorem 1.5.23, extend the filter B ⊆ X | A ⊆ B to an ultrafilter. a

Page 44: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

32 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

1.5.6 Ideals and Filters

We will now translate some of the arguments above from the power set of a set to a partiallyordered set which has at least part of the algebraic properties of the powerset.

Recall that a lattice (L,≤) is an set L with an order relation ≤ such that each non-emptyfinite subset has a lower bound and an upper bound. Put a ∧ b := infa, b, this is the meetof a and b, and a ∨ b := supa, b, this is the join of a and b.. In a similar way, we put for

Lattice,join, meet ∨

S := supS, and∧S := inf S for S ⊆ L, provided supS resp. inf S exists in L.

We note these properties (a, b ∈ L):

Impotency a ∧ a = a ∨ a = a.

Commutativity a ∧ b = b ∧ a and a ∨ b = b ∨ a.

Absorption a∧ (a∨ b) = a and a∨ (a∧ b) = a. In fact, a ≤ a∨ b, thus a = a∧a ≤ a∧ (a∨ b),on the other hand a ∧ (a ∨ b) ≤ a. The second equality is proved similarly.

For simplicity we assume that the lattice is bounded, i.e., that it has a smallest element ⊥and a largest element >, so that we can put ⊥ := sup ∅ and > := inf ∅, resp.

That is this the generalization of the properties of a power set is clear:

Example 1.5.25 The power set P (S) of a set S is a lattice, where A ≤ B iff A ⊆ B, so that

A ∩B = infA,B,A ∪B = supA,B.

,

But there are lattices which do not derive from a power set, as the following exampleshows.

Example 1.5.26 Look at this example

>

H I J

E F G

A B C D

Then B,C has these upper bounds:>, H, I, thus has no smallest upper bound,so that — probably contrary to the firstview — B ∨ C does not exist. trying to de-termine A ∨ B, we see that the set of upperbounds to A,B is just >, F,H, I, henceA ∨B = F .

,

This is another example of a lattice. It indicates that we have to carefully look at the context,when discussing joins and meets.

Page 45: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 33

Example 1.5.27 Consider the set J of all open intervals ]a, b[ with a, b ∈ R, and take theorder inherited from P (R), then J is closed under taking the infimum of two elements (sincethe intersection of two open intervals is again an open interval), but J is not closed undertaking the supremum of two elements in P (R), since the union of two open intervals is notnecessarily an open interval. Nevertheless, J is a lattice in its own right, because we have

]a1, b1[ ∨ ]a2, b2[ = ] mina1, a2,maxb1, b2[

in J . Hence we have to make sure that we look for the supremum in the proper set. ,

The next example also asks for a cautionary approach.

Example 1.5.28 Similarly, consider the set R of all closed rectangles in the plane R × R,again with the order inherited from P (R× R). The intersection R1 ∩ R2 of two closedrectangles R1, R2 ∈ R is an element of R and is indeed the infimum of R1 and R2. But whatdo we take as the supremum in R, if it exists at all? From the definition of the supremum wehave

R1 ∨R2 =⋂R ∈ R | R1 ⊆ R and R2 ⊆ R,

in plain words, the smallest closed rectangle which encloses both R1 and R2. Hence, e.g.,

[0, 1]× [0, 1] ∨ [5, 6]× [8, 9] = [0, 6]× [0, 9].

This renders R a lattice indeed. ,

A lattice is called distributive iff

a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c),a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c)

holds (both equations are actually equivalent, see Exercise 1.23).

Example 1.5.29 The powerset lattice P (S) is a distributive lattice, because unions andintersections are distributive.

But beware! Distributivity is not necessarily inherited. Consider the lattice J of closedintervals of the real line, as in Example 1.5.27, then

I1 ∧ I2 = I1 ∩ I2,

I1 ∨ I2 = [min I1 ∪ I2,max I1 ∪ I2],

as above. Put A := [−3,−2], B := [−1, 1], C := [2, 3], then

(A ∧B) ∨ (B ∧ C) = ∅,B ∧ (A ∨ C) = [−1, 1].

Thus J is not distributive, although the order has been inherited from the powerset. ,

Example 1.5.30 Let P be a set with a partial order ≤. A set D ⊆ P is called a down setiff t ∈ D and s ≤ t imply s ∈ D. Hence a down set is downward closed in the sense that allelements below an element of the set belong to the set as well. A generic example for a downset is s ∈ P | s ≤ t with t ∈ P . Down sets of this shape are called principal down sets.

Page 46: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

34 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

The intersection and the union of two down sets are down sets again. For example, let D1

and D2 be down sets, let t ∈ D1 ∪D2, and assume s ≤ t. Because t ∈ D1 or t ∈ D2 we mayconclude that s ∈ D1 or s ∈ D2, hence s ∈ D1 ∪D2. Let D(P ) be the set of all down sets ofP , then D(P ) is a distributive lattice; this is so because the infimum and the supremum oftwo elements in D(P ) are the same as in P (P ).

Define

Ψ :

P → D(P )

t 7→ s ∈ P | s ≤ t

Then t1 ≤ t2 implies Ψ(t1) ⊆ Ψ(t2), hence the order structure carries over from P to D(P ).Moreover Ψ(t1) = Ψ(t2) implies t1 = t2, so that Ψ is injective. Hence we have embedded thepartially ordered set P into a distributive lattice. ,

Filters and and ideals are important structures in a lattice.

Definition 1.5.31 Let L be a lattice.

J ⊆ L is called an ideal iff• ∅ 6= J 6= L.• If a, b ∈ J , then a ∨ b ∈ J .• If a ∈ J and b ∈ L with b ≤ a, thenb ∈ J .

The ideal J is called prime iff a∧b ∈ J impliesa ∈ J or b ∈ J , it is called maximal iff it isnot properly contained in another ideal.

F ⊆ L is called a filter iff• ∅ 6= F 6= L.• If a, b ∈ F , then a ∧ b ∈ F .• If a ∈ F and b ∈ L with b ≥ a, thenb ∈ F .

The filter F is called prime iff a∨b ∈ F impliesa ∈ F or b ∈ F , it is called maximal iff it isnot properly contained in another filter.

Maximal filters are also called ultrafilters. Recall the definition of an ultrafilter of sets inDefinition 1.5.20; we have defined already ultrafilters for the special case that the underlyinglattice is the power set of a given set. The notion of a filter base fortunately carries overdirectly from Definition 1.5.17, so that we may use Lemma 1.5.19 in the present context aswell. We will be in this section a bit more general, but first some simple examples.

Example 1.5.32 I := F ⊆ N | F is finite is an ideal in P (N) with set inclusion as thepartial order. This is so since the intersection of two finite sets is finite again, and becausesubsets of finite sets are finite again. Also ∅ 6= I 6= P (Nat). This ideal is not prime. ,

Example 1.5.33 Consider all divisors of 24.

24

12 8

6 4

3 2

1

1, 2, 3, 6 is an ideal, 1, 2, 3, 4, 6 is not. ,

Page 47: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 35

Example 1.5.34 Let S 6= ∅ be a set, a ∈ S. Then P (S \ a) is a prime ideal in P (S) (withset inclusion as the partial order). In fact, ∅ 6= P (S \ a) 6= P (S), and if a 6∈ A and a 6∈ B,then a 6∈ A ∪B. On the other hand, if a 6∈ A ∩B, then a 6∈ A or a 6∈ B. ,

Lemma 1.5.35 Let L be a lattice, ∅ 6= F 6= L be a proper non-empty subset of L.

• These conditions are equivalent

1. F is a filter.

2. > ∈ F and (a ∧ b ∈ F ⇔ a ∈ F and b ∈ F ).

• If filter F is maximal and L is distributive, then F is a prime filter

Proof 1. The implication 1 ⇒ 2 in the first part is trivial, for 2 ⇒ 1 one notes that a ≤ bis equivalent to a ∧ b = a.

2. In order to show that the maximal filter F is prime, we show that a∨b ∈ F implies a ∈ F orb ∈ F . Assume that a∨b ∈ F with a 6∈ F . Consider B := f∧b | f ∈ F, then ⊥ 6∈ B. In fact,assume that f ∧b = ⊥ for some f ∈ F , then we could write a = (f ∧b)∨a = (f ∨a)∧(b∨a) bydistributivity. Since f ∈ F and F is a filter, f ∨ a ∈ F follows, and since b∨ a ∈ F , we obtaina ∈ F , contradicting the assumption. Thus B is a filter base, and because F is maximal wemay conclude that B ⊆ F , which in turn implies b ∈ F . a

Hence maximal filters are prime in a distributive lattice. If the lattice is not distributive, thismay not be true. Look at this example

>

a b c

The lattice is not distributive, because (a ∧b) ∨ c = c 6= > = (a ∨ c) ∧ (b ∨ c). Then >,>, a, >, b, >, c and >, d are filters,>, a, >, b, >, c are maximal, but noneof them is prime.

Prime ideals and prime filters are not only dual notions, they are also complementary con-cepts.

Lemma 1.5.36 In a lattice L a subset F is a prime filter iff its complement L\F is a primeideal.

Proof Exercise 1.15. a

When defining a lattice as a generalization of the power set construct, we restricted theattention to joins and meets of elements, but neglected the observation that in a powerset each set has a complement. The corresponding abstraction is a Boolean algebra. Such aBoolean algebra B is a distributive lattice such that there exists a unary operation − : B → Bsuch that

Booleanalgebra

a ∨ −a = >a ∧ −a = ⊥

−a is called the complement of a. We assume that ∧ binds stronger than ∨, and thatcomplementation binds stronger than the binary operations.

Page 48: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

36 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

The power of complementation shows already in the next lemma, which relates prime idealsto maximal ideals, and prime filters to maximal filters.

Lemma 1.5.37 Let B be a Boolean algebra. Then an ideal is maximal iff it is prime, and afilter is maximal iff it is prime.

Proof Note that a Boolean algebra is a distributive lattice with more than one element (viz.,⊥ and >). We prove the assertion only for filters. That a maximal filter is prime has beenshown in Lemma 1.5.35. If F is not maximal, there exists a with a 6∈ F and −a 6∈ F byLemma 1.5.21. But > = a ∨ −a ∈ F , hence F is not prime. a

This is another and probably surprising equivalent to (AC).

(MI) Each lattice with more than one element containsa maximal ideal.

Theorem 1.5.38 (MI) is equivalent to (AC).

Proof 1. (MI) ⇒ (AC): We show actually that (MI) implies (MP), an application ofTheorem 1.5.4 will then establish the claim. Let F ⊆ P (S) be a family of finite character.In order to apply (MI), we need a lattice, which we will define now. Define L := F ∪ S,and put for X,Y ∈ F

X ∧ Y := X ∩ Y,

X ∨ Y :=

X ∪ Y, if X ∪ Y ∈ F ,S, otherwise

Then L is a lattice with top element S and bottom element ∅. Let M be a maximal ideal inL, then we assert that M∗ :=

⋃M is a maximal element of F . Then M∗ 6= S.

First we show that M∗ ∈ F . If a1, . . . , ak ∈ M∗, then we can find Mi ∈ M such thatmi ∈Mi for 1 ≤ i ≤ n. Since M is an ideal in L, we know that M1 ∨ . . . ∨Mn ∈M, so thata1, . . . , ak ∈ F , hence M∗ ∈ F .

Now assume that M∗ is not maximal, then we can find N ∈ F such that M∗ is a propersubset of N , hence there exists t ∈ N such that t 6∈ M∗. Because N ∈ F and F is of finitecharacter, t ∈ F . Now put M′ :=M∪ M ∨ t | M ∈ M =M∪ M ∪ t | M ∈ M,then M′ is an ideal in L which properly contains M. This is a contradiction, hence we havefound a maximal element of F .

2. (AC)⇒ (MI): Again, we use the equivalences in Theorem 1.5.4, because we actually show(ZL) ⇒ (MI). Let L be a lattice with at least two elements, and order

I := I ⊆ L | I is an ideal in L

by inclusion. Because b ∈ L | b ≤ a ∈ I for a ∈ L, a 6= > (by assumption, such an elementexists), we know that I 6= ∅. If C ⊆ I is a chain, then I :=

⋃C ∈ I. In fact, ∅ 6= I 6= L,

because > 6∈ I, and if a, b ∈ I, we find I1, I2 with a ∈ I1, b ∈ I2, because C is a chain, we mayassume that I1 ⊆ I2, hence a, b ∈ I2, so that a ∨ b ∈ I2 ⊆ I. If a ≤ b and b ∈ I then a ∈ I,because b ∈ I1 for some I1 ∈ I. Hence each chain has an upper bound in I. (ZL) implies theexistence of a maximal element M ∈ I. a

Page 49: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 37

Since each Boolean algebra is a lattice with more than the top element, the following corollaryis a consequence of Theorem 1.5.38. It is known under the name Prime Ideal Theorem. Weknow from Lemma 1.5.37 that prime ideals and maximal ideals are really the same.

Theorem 1.5.39 (AC) implies the existence of a prime ideal in a Boolean algebra. a

The converse does not hold — it can be shown that the Prime Ideal Theorem is strictlyweaker than (AC) [Jec06, p. 81].

1.5.7 The Stone Representation Theorem

Let us stick for a moment to Boolean algebras and discuss the famous Stone RepresentationTheorem, which requires the Prime Ideal Theorem at a crucial point.

Fix a Boolean algebra B and define for two elements a, b ∈ B their symmetric difference a bthrough

a b := (a ∧ −b) ∨ (−a ∧ b)

If B = P (S) for some set S, and if ∧,∨,− are the respective set operations ∩,∪, S \ ·,then A B is in fact equal to the symmetric difference A∆B := (A \ B) ∪ (B \ A) =(A ∪B) \ (B ∩A).

Fix an ideal I of B, and define ∼Ia ∼I b⇔ a b ∈ I

Then ∼I is a congruence, i.e., an equivalence relation which is compatible with the operationson the Boolean algebra. This will be shown now through a sequence of statements.

We state some helpful properties.

Lemma 1.5.40 Let B be a Boolean algebra, then

1. a a = ⊥, a b = b a and a b = (−a) (−b).

2. a b = (a ∨ b) ∧ −(a ∧ b).

3. (a b) ∧ c = (a ∧ c) (b ∧ c) and c ∧ (a b) = (c ∧ a) (c ∧ b).

Proof The properties under 1. are fairly obvious, 2. is calculated directly using distributivity,finally the first part of 3. follows thus

(a ∧ c) (b ∧ c) = (a ∧ c ∧ −(b ∧ c)) ∨ (b ∧ c ∧ −(a ∧ c))= (a ∧ c ∧ (−b ∨ −c)) ∨ (b ∧ c ∧ (−a ∨ −c))= (a ∧ −b ∧ c) ∨ (b ∧ −a ∧ c)= (a b) ∧ b,

because a ∧ c ∧ −c = ⊥ = b ∧ c ∧ −c. a

Lemma 1.5.41 ∼I is an equivalence relation on B with these properties

1. a ∼I a′ and b ∼I b′ imply a ∧ b ∼I a′ ∧ b′ and a ∨ b ∼I a′ ∨ b′.

Page 50: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

38 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

2. a ∼I a′ implies −a ∼I −a′.

Proof Because a a′ ∈ I and b b′ ∈ I we conclude that (a a′) ∨ (b b′) ∈ I, thus

(a ∨ a′) (b ∨ b′) ≤ ((a ∨ b) ∧ −(a ∧ b)) ∨ ((a′ ∨ b′) ∧ −(a′ ∧ b′))= (a a′) ∨ (b b′) ∈ I.

Since I is an ideal, we conclude (a ∨ a′) (b ∨ b′) ∈ I.

From Lemma 1.5.40 we conclude that a ∧ b ∼I a′ ∧ b ∼I a′ ∧ b′. The assertion aboutcomplementation follows from Lemma 1.5.40 as well. a

Denote by [x]∼I the equivalence class of x ∈ B, and let η∼I : x 7→ [x]∼I be the associatedfactor map. Define on the factor space B/I := [x]∼I | x ∈ B the operationsB/I, η∼I

[a]∼I ∧ [b]∼I := [a ∧ b]∼I ,[a]∼I ∨ [b]∼I := [a ∨ b]∼I ,

− [a]∼I := [−a]∼I .

We have also

[a]∼I ≤ [b]∼I ⇔ a (a ∧ b) ∈ I ⇔ b (a ∨ b) ∈ I,a ∈ I ⇔ a ∼I ⊥.

The following statement is now fairly easy to prove. Recall that a homomorphism f :(B,∧,∨,−)→ (B′,∧′,∨′,−′) is a map f : B → B′ such that

f(a ∧ b) = f(a) ∧′ f(b), f(a ∨ b) = f(a) ∨′ f(b) and f(−a) = −′f(a)

for all a, b ∈ B are valid.

Proposition 1.5.42 The factor space B/I is a Boolean algebra, and η∼I is a homomorphismof Boolean algebras.

Proof The operations on B/I are well defined by Lemma 1.5.41 and yield a lattice with [>]∼Ias the largest and and [⊥]∼I as the smallest element, resp. Hence − is a complementationoperator on B/I because

[a]∼I ∧ [−a]∼I = [⊥]∼I ,

[a]∼I ∨ [−a]∼I = [>]∼I .

It is evident from the construction that η∼I is a homomorphism. a

The Prime Ideal Theorem implies that the Boolean algebra B/I has a prime ideal J byCorollary 1.5.39. This observation leads to a stronger version of this theorem for the givenBoolean algebra.

Page 51: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 39

Theorem 1.5.43 Let I be an ideal in a Boolean algebra. Then (AC) implies that there existsa prime ideal K which contains I.

Proof 0. The plan of the proof is fairly straightforward: we know that B/I has a maximalPlanideal by the Prime Ideal Theorem. This prime ideal is lifted to the given Boolean algebra B,and then we claim that this is the prime ideal on B we are looking for.

1. Construct the factor algebra B/I, then (AC) implies that this Boolean algebra has a primeideal J . We claim that

K := x ∈ B | [x]∼I ∈ J

is the desired prime ideal. Since I = [⊥]∼I ∈ J , we see that I ⊆ K holds, thus K 6= ∅.

2. K is an ideal. If K = B, then > ∈ K which would mean [>]∼I ∈ J , but this is impossible.Let a ≤ b with b ∈ K, hence a = a ∧ b, so that [a]∼I = [a ∧ b]∼I . Because b ∈ K, we infer[a ∧ b]∼I ∈ J , hence [a]∼I ∈ J , so that a ∈ K. If a, b ∈ K, then a ∨ b ∈ K, because J is anideal.

3. K is prime. In fact, we have

a ∧ b ∈ K ⇔ [a ∧ b]∼I ∈ J ⇔ [a]∼I ∧ [b]∼I ∈ J⇒ [a]∼I ∈ J or [b]∼I ∈ J ⇔ a ∈ K or b ∈ K

a

As a consequence, we can find in a Boolean algebra for any given element a 6= > a primeideal which does contain it.

Corollary 1.5.44 Let B be a Boolean algebra and assume that (AC) holds.

1. Given a 6= >, there exists a prime ideal which contains a.

2. Given a, b ∈ B with a 6= b, there exists a prime ideal which contains a but not b.

3. Given a, b ∈ B with a 6= b, there exists an ultrafilter which contains a but not b.

Proof We find a prime ideal K which extends the ideal x ∈ B | x ≤ a. This establishesthe first part.

If a 6= b, we have a b 6= ⊥, so a ∧ −b 6= ⊥ or −a ∧ b 6= ⊥. Assume the former, then thereexists a prime ideal K with −(a ∧ −b) ∈ K, so that both b ∈ K and −a ∈ K holds. Since−a ∈ K implies a 6∈ K, we are done with the second part. The third part follows throughLemma 1.5.36. a

This yields one of the classics, the Stone Representation Theorem. It states that each Booleanalgebra is essentially a set algebra, i.e., a Boolean algebra comprised of sets.

Theorem 1.5.45 Let B be a Boolean algebra, and assume that (AC) holds. Then there existsa set S0 and a Boolean set algebra S ⊆ P (S0) such that B is isomorphic to S.

Proof 0. We map each element of the Boolean algebra to the ultrafilters which it contains as Outlinean element. This yields a map which is compatible with the Boolean structure, from whichwe obtain the objects we are looking for.

Page 52: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

40 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

1. Define

S0 := U | U is an ultrafilter on B,ψ(b) := U ∈ S0 | b ∈ U.

Then these properties are easily established:

ψ(b1 ∧ b2) = ψ(b1) ∩ ψ(b2),

ψ(b1 ∨ b2) = ψ(b1) ∪ ψ(b2),

ψ(−b) = S0 \ ψ(b).

For example, we obtain from Lemma 1.5.35 that

U ∈ ψ(b1 ∧ b2)⇔ b1 ∧ b2 ∈ U⇔ b1 ∈ U and b2 ∈ U⇔ U ∈ ψ(b1) and U ∈ ψ(b2)

⇔ U ∈ ψ(b1) ∩ ψ(b2).

Similarly, U ∈ ψ(−b) ⇔ −b ∈ U ⇔ b 6∈ U ⇔ U 6∈ ψ(b) by Lemma 1.5.21, because U is anultrafilter.

3. Because we can find by Corollary 1.5.44 for b1 6= b2 an ultrafilter which contains b1, butnot b2 , we conclude that ψ is injective (this is actually the place where (AC) is used). Thusthe Boolean algebras B and ψ

[B]

are isomorphic, and the latter one is comprised of sets.a

The Stone Representation Theorem gives a representation of a Boolean algebra as a setalgebra. So, on first sight, the effort of introducing the additional abstraction looks futile.But it is not. First, it does not say that a Boolean algebra is always a full power set.Second, it is sometimes much easier in an application to work with an abstract tool like aBoolean algebra than with a set algebra (because you then have to cater for a carrier set,after all). A third reason is a classification issue — Boolean algebras are isomorphic to setalgebras, are lattices then always isomorphic to set lattices? The representation throughdown sets from Example 1.5.30 seems to suggest just that. But looking at this representationmore closely, one sees that additional properties are probably not preserved; for example, aBoolean algebra may be represented as a lattice through its down sets, but it is far from clearthat the representation preserves also complements. Thus we do not have in general such aclear cut picture for general lattices, as we have for Boolean algebras.

1.5.8 Compactness and Alexander’s Subbase Theorem

We will prove in this section Alexander’s Subbase Theorem as an application for Zorn’sLemma. The Theorem states that when proving a topological space compact one may restrictone’s attention to a particular subclass of open sets, a class which is usually easier to handlethan the full family of open sets. This application of Zorn’s Lemma is interesting because itshows in which way a maximality argument can be used for establishing a property througha subclass (rather than extending a property until maximality puts a stop to it, as we did

Page 53: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 41

in showing that each vector space has a basis). Alexander’s Theorem is also a very practicaltool, as we will see later.

This section assumes that (AC) holds.

We start with the closed interval [u, v] with −∞ < u < v < +∞ as an important example of acompact space. It has the following property: each cover through a countable number of openintervals contains a finite subcover which already cover the interval. This is what the famousHeine-Borel Theorem states. We give below Borel’s proof [Fic64, vol. I, p. 163].

Theorem 1.5.46 Let an interval [u, v] with −∞ < u < v < +∞ be given. Then each cover]xn, yn[ | n ∈ N

of [u, v] through a countable number of open intervals contains a finite

cover ]xn1 , yn1 [, . . . , ]xnk , ynk [.

Proof Suppose the assertion is false, then either [u, 1/2(u+v)] or [1/2(u+v), v] is not coveredby finitely many of those intervals; select the corresponding one, call it [a1, b1]. This intervalcan be halved, let [a2, b2] be the half which cannot be covered by finitely many intervals.Repeating this process, one obtains a sequence [an, bn] | n ∈ N of closed intervals, eachhaving half of the length of its predecessor, and each one not being covered by an finitenumber of intervals from ]xn, yn[| n ∈ N. Because the lengths of the intervals shrink tozero, there exists c ∈ [u, v] with limn→∞ an = c = limn→∞ bn, hence c ∈]xm, ym[ for some m.But there is some n0 ∈ N with [an, bn] ⊆]xm, ym[ for n ≥ n0, contradicting the assumptionthat [an, bn] cannot be covered by a finite number of those intervals. a

Although the proof is given for a countable cover, its analysis shows that it goes through foran arbitrary cover of open intervals. This is so because each cover induces a partition of theinterval considered into two parts, so a sequence of intervals will result in any case.

This section will discuss compact spaces which have the property that an arbitrary covercontains a finite one. To be on firm ground, we first introduce topological spaces as the kindof objects to be discussed here.

Definition 1.5.47 Given a set X, a subset τ ⊆ P (X) is called a topology iff these conditionsare satisfied:

• ∅, X ∈ τ .

• If G1, . . . , Gk ∈ τ , then G1 ∩ . . . ∩Gk ∈ τ , thus τ is closed under finite intersections.

• If τ0 ⊆ τ , then⋃τ0 ∈ τ , thus τ is closed under arbitrary unions.

The pair (X, τ) is then called a topological space, the elements of τ are called open sets. Anopen neighborhood U of an element x ∈ X is an open set U with x ∈ U , a neighborhood ofx is a set which contains an open neighborhood of x.

These are the topologies one can always find on a set X.

Example 1.5.48 P (X) and ∅, X are always topologies; the former one is called the dis-crete topology, the latter one is called indiscrete. ,

The topology one deals with usually on the reals is given by intervals, and the plane istopologically described by open balls (well, they really are disks, but they are given throughmeasuring a distance, and in this case the name “ball” sticks).

Page 54: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

42 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Example 1.5.49 Call a set G ⊆ R open iff for each x ∈ G there exists a, b ∈ R with a < bsuch that x ∈ ]a, b[ ⊆ G; note that ∅ is open. Then the open sets form a topology on thereals, which is also called the interval topology.

Clearly, G is open iff, given x ∈ G, there exists ε > 0 with ]x − ε, x + ε[⊆ G. Call a subsetG ⊆ R2 of the Euclidean plane open iff, given x ∈ G, there exists ε > 0 such that Bε(x) ⊆ G,where

B(〈x1, x2〉, r) := 〈y1, y2〉 |√

(y1 − x1)2 + (y2 − x2)2 < r

is the open ball centered at 〈x1, x2〉 with radius r. ,

Let (X, τ) be a topological space. If Y ⊆ X, then the trace of τ on Y gives a topology τYon Y , formally, τY := G ∩ Y | G ∈ τ, the subspace topology. This permits sometimes toτYtransfer a property from the space to its subsets.

A set F ⊆ X is called closed iff its complement X \ F is open. Then both ∅ and X areClosed setclosed, and the closed sets are closed (no pun intended) under arbitrary intersections andfinite unions. We associate with each set an open set and a closed set:

Definition 1.5.50 Let M ⊆ X, then

• Mo :=⋃G ∈ τ | G ⊆M is called the interior of M .

• Ma :=⋂F ⊆ X |M ⊆ F and F is closed is called the closure of M .

• ∂M := Ma \Mo is called the boundary of M .Mo,Ma, ∂M

We have always Mo ⊆M ⊆Ma; this is apparent from the definition. Clearly, Mo is an openset, and it is the largest open set which is contained in M , so that M is open iff M = Mo.Similarly, Ma is a closed set, and it is the smallest closed set which contains M . We also haveM is closed iff M = Ma. The boundary ∂M is also a closed set, because it is the intersectionof two closed sets, and we have ∂M = ∂(X \M). M is closed iff ∂M ⊆M . All this is easilyestablished through the definitions.

Look at the indiscrete topology: Here we have xo = ∅ and xa = X for each x ∈ X. Forthe discrete topology one sees Ao = Aa = A for each A ⊆ X.

Example 1.5.51 In the Euclidean topology on R2 of Example 1.5.49, we have

Br(x1, x2)a = 〈y1, y2〉 |√

(y1 − x1)2 + (y2 − x2)2 ≤ r,

∂Br(x1, x2) = 〈y1, y2〉 |√

(y1 − x1)2 + (y2 − x2)2 = r.

,

Just to get familiar with boundaries:

Lemma 1.5.52 Let (X, τ) be a topological space, A ⊆ X. Then x ∈ ∂A iff each openneighborhood of x has a non-empty intersection with A and with X \ A. In particular ∂A =∂(X \A) and ∂(A ∪B) ⊆ (∂A) ∪ (∂B).

Page 55: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 43

Proof Let x ∈ ∂A, and U an open neighborhood of x. If A ∩ U = ∅, then A ⊆ X \ U , sox 6∈ Aa, if U ∩ X \ A = ∅, it follows x ∈ Ao. So we arrive at a contradiction. Assume thatx ∈

⋂U ∈ τ | x ∈ U,U∩A 6= ∅, U∩X\A 6= ∅, then x 6∈ Ao, similarly, x 6∈ X \Ao = X\(Aa).

a

A set without a boundary is both closed and open, so it is called clopen. The clopen sets of Clopen seta topological space form a Boolean algebra.

Sometimes it is sufficient to describe the topology in terms of some special sets, like the openballs for the Euclidean topology. These balls form a base in the following sense.

Definition 1.5.53 A subset β ⊆ τ of the open sets is called a base for the topology iff foreach open set G ∈ τ and for each x ∈ G there exists B ∈ β such that x ∈ B ⊆ G, thus iffeach open set is the union of all base elements contained in it.

Base,subbase

A subset σ ⊆ τ is called a subbase for τ iff the set of finite intersections of elements of σforms a base for τ .

Then the open intervals are a base for the interval topology, and the open balls are a basefor the Euclidean topology (actually, we did introduce the respective topologies through theirbases). A subbase for the interval topology is given by the sets

]−∞, a[ | a ∈ R

, because

the set of finite intersections includes all open intervals, which in turn form a base. Bases andsubbases are not uniquely determined, for example is

]r, s[ | r < s, r, s ∈ Q

a base for the

interval topology.

Let us return to the problem discussed in the opening of this section. We have seen thatbounded closed intervals have the remarkable property that, whenever we cover them by anarbitrary number of open intervals, we can find a finite collection among these intervals whichalready cover the interval. This property can be generalized to arbitrary topological spaces;subsets with this properties are called compact, formally:

Definition 1.5.54 The topological space (X, τ) is called compact iff each cover of X by opensets contains a finite subcover.

Thus X is compact iff, whenever (Gi)i∈I is a collection of open sets with X =⋃i∈I Gi,

there exists I0 ⊆ I finite such that C ⊆⋃i∈I0 Gi. It is apparent that compactness is a

generalization of finiteness, so that compact sets are somewhat small, measured in terms ofopen sets. Consider as a trivial example the discrete topology. Then X is compact preciselywhen X is finite.

This is an easy consequence of the definition.

Lemma 1.5.55 Let (X, τ) be a compact topological space, and F ⊆ X closed. Then (F, τF )is a compact topological space. a

The following example shows a close connection of Boolean algebras to compact topologicalspaces; this is the famous Stone Duality Theorem.

Example 1.5.56 Let B be a Boolean algebra with ℘B as the set of all prime ideals of B.Define

Xa := I ∈ ℘B | a 6∈ I.

Page 56: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

44 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Then we have these properties

• X> = ℘B, since an ideal does not contain >.

• X−a = ℘B \Xa. To see this, let I be a prime filter, then I ∈ X−a iff −a 6∈ I, this is thecase iff −a ∈ B \ I, hence iff a 6∈ B \ I, since B \ I is a maximal filter by Lemma 1.5.36and Lemma 1.5.37; the latter condition is equivalent to a ∈ I, hence to I 6∈ Xa.

• Xa∧b = Xa ∩Xb and Xa∨b = Xa ∪Xb. This follows similarly from Lemma 1.5.35.

Define a topology τ on ℘B by taking the sets Xa as a base, formally

β := Xa | a ∈ B.

We claim that (℘B, τ) is compact. In fact, let U be a cover of ℘B with open sets. Becauseeach U ∈ U can be written as a union of elements of β, we may and do assume that U ⊆ β,so that U = Xa | a ∈ A for some A ⊆ B. Now let J be the ideal generated by A, so that Jcan be written as

J := b ∈ B | b ≤ a1 ∨ . . . ∨ ak for some a1, . . . , ak ∈ A.

We distinguish these cases:

> ∈ J : In this case we have > = a1 ∨ . . . ∨ ak for some a1, . . . , ak ∈ A, which means

℘B = X> = Xa1∨...∨ak = Xa1 ∪ . . . ∪Xak

with Xa1 , . . . , Xak ∈ U , so we have found a finite subcover in U .

> 6∈ J : Then J is a proper ideal, so by Corollary 1.5.39 there exists a prime ideal K withJ ⊆ K. But we cannot find a ∈ A such that K ∈ Xa: assume in the contrary thatK ∈ Xa for some a ∈ A, then a 6∈ K, hence −a ∈ K, since K is prime (Lemma 1.5.36).But by construction a ∈ J , since a ∈ A, which implies a ∈ K, hence > ∈ K, acontradiction. Thus K ∈ ℘B, but K fails to be covered by U , which is a contradiction.

Thus (℘B, τ) is a compact space, which is sometimes called the prime ideal space of theBoolean algebra.

We conclude that the sets Xa are clopen, since X−a = ℘B \Xa. Moreover, each clopen set inthis space can be represented in this way. In fact, let U be clopen, thus U =

⋃Xa | a ∈ A for

some A ⊆ B. Since U is closed, it is compact by Lemma 1.5.55, so there exist a1, . . . , an ∈ Asuch that U = Xa1 ∪ . . . ∪Xan = Xa1∨...∨an . ,

Compactness is formulated in terms of a cover through arbitrary open sets. Alexander’sTheorem states that it is sufficient to consider covers which come from a subbase for thetopology. This is usually quite a considerable help, since subbases are mostly easier to handlethan the collection of all open sets; Example 1.5.58 confirms this impression. The proof comesas an application of Zorn’s Lemma. The proof follows essentially the one given in [HS65,Theorem 6.40].

Theorem 1.5.57 Let (X, τ) be a topological space with a subbase σ. Then the followingstatements are equivalent.

1. X is compact.

Page 57: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.5. ZORN’S LEMMA AND TUCKEY’S MAXIMALITY PRINCIPLE 45

2. Each cover of X by elements of σ contains a finite subcover.

Proof 0. Because the elements of a subbase are open, the implication 1 ⇒ 2 is trivial,hence we have to show 2 ⇒ 1. The idea of the proof goes as follows: If the assertion is false,

Proofoutline

there exists a cover which does not have a finite subcover. Take the set of all these covers,ans order them by inclusion. It is not difficult to see that each chain has an upper boundin this set, so Zorn’s Lemma gives a maximal element. Maximality is somewhat fragile here,because adding something to this maximal element will break it. This will permit us to derivea contradiction.

1. Assume that the assertion is false, and define

Z := C | C is on open cover of X without a finite subcover.

Order Z by inclusion, and let Z0 ⊆ Z be a chain, then C :=⋃Z0 ∈ Z. In fact, it is clear that C

is a cover, and assume that C has a finite subcover, say E1, . . . , Ek. Then Ej ∈ Cj ∈ Z0, andsince Z0 is a chain with respect to inclusion, we find some Ci ∈ Z0 with E1, . . . , Ek ⊆ Ci,which is a contradiction. By Zorn’s Lemma, Z has a maximal element V. This meansthat

• V is an open cover of X.

• V does not contain a finite subcover.

• If U ∈ τ is open with U 6∈ V, then V ∪ U contains a finite subcover.

Let W := V ∩ σ, hence W contains all elements of V which are taken from the subbase.By assumption, no finite subfamily of W covers X, hence W is not a cover for X, whichimplies that R := X \

⋂W 6= ∅. Let x ∈ R, then there exists V ∈ V such that x ∈ V ,

because V is a cover for X. Since V is open and σ is a subbase, we find S1, . . . , Sk ∈ σ withx ∈ S1∩ . . .∩Sn ⊆ V . Because x 6∈

⋃W, we conclude that no Sj is an element of V (otherwise

Sj ∈ V ∩ σ = W, a contradiction). V is maximal, each Sj is open, thus V ∪ Sj contains afinite cover of X. Hence we can find for each j some open set Aj which is a finite union ofelements in V such that Aj ∪ Sj = X. But this means

V ∪k⋃j=1

Aj ⊇ (

k⋂j=1

Sj) ∪ (

k⋃j=1

Aj) = X.

Hence X can be covered through a finite number of elements in V; this is a contradiction tothe maximality of V. a

Observe how (ZL) enters the argument precisely when we need a maximal cover with no finitesubcover.

The Priestley topology as discussed by Goldblatt [Gol12] provides a first example for the useof Alexander’s Theorem. It indicates that using a cover comprised of elements coming froma subbase simplifies the argumentation considerably.

Example 1.5.58 Given x ∈ X, define

‖x‖ := A ⊆ X | x ∈ A,−‖x‖ := A ⊆ X | x 6∈ A.

Page 58: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

46 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

The Priestley topology on P (X) is defined as the topology which is generated by the subbase

σ := ‖x‖ | x ∈ X ∪ −‖x‖ | x ∈ X.

Hence the basic sets of this topology have the form

‖x1‖ ∩ . . . ∩ ‖xk‖ ∩ −‖y1‖ ∩ . . . ∩ −‖yn‖

for x1, . . . , xk, y1, . . . , yn ∈ X and some n, k ∈ N.

We claim that P (X) is compact in the Priestley topology. In fact, let C be a cover of P (X)with elements from the subbase σ. Put P := x ∈ X | −‖x‖ ∈ C. Then P ∈ P (X), so wemust find some element from C which contains P . If P ∈ −‖x‖ ∈ C for some x ∈ X, thismeans x 6∈ P , so by definition −‖x‖ 6∈ C, which is a contradiction. Thus there exists x ∈ Xsuch that P ∈ ‖x‖ ∈ C. But this means x ∈ P , hence −‖x‖ ∈ C, so ‖x‖,−‖x‖ ⊆ C is acover of P (X). Thus P (X) is compact by Alexander’s Theorem 1.5.57. ,

Alexander’s Subbase Theorem will be of considerable help, e.g, when characterizing com-pactness through ultrafilters in Theorem 3.2.11 and in establishing Tihonov’s Theorem 3.2.12on product compactness. It also emphasizes the close connection of compactness and (AC).

1.6 Boolean σ-Algebras

We generalize the notion of a Boolean algebra by introducing countable operations, leading toBoolean σ-algebras. This extension becomes important e.g., when working with probabilitiesor, more general, with measures. For example, one of the fundamental probability laws statesthat the probability of a disjoint union of countable events equals the infinite sum of theevents’ probabilities. In order to express this adequately, the domain of the probability mustbe closed under countable unions.

We assume in this section that (AC) holds.

Given a Boolean algebra B, we associate as above with the lattice operations on B an orderrelation ≤ by

a ≤ b⇐⇒ a ∧ b = a (⇐⇒ a ∨ b = b).

We will switch in the discussion below between the order and the use of the algebraic opera-tions.

Definition 1.6.1 A Boolean algebra B is called a Boolean σ-algebra iff it is closed undercountable suprema and infima.

Example 1.6.2 The power set of each set is a Boolean σ-algebra. Consider

A := A ⊆ R | A is countable or R \A is countable.

Then A is a Boolean σ-algebra (we use here that the countable union of countable set iscountable again, hence (AC)). This is sometimes called the countable-cocountable σ-algebra.On the other hand, her little sister,

D := A ⊆ R | A is finite or R \A is finite,

Page 59: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 47

the finite-cofinite algebra, is a Boolean algebra, but evidently no σ-algebra. ,

We define for the countable subset A = an | n ∈ N of a Boolean algebra B∧A :=

∧n∈N

an := infan | n ∈ N,∨A :=

∨n∈N

an := supan | n ∈ N

as its infimum resp. its supremum, which both exist, since B is closed under countable infimaand suprema. In addition, we note that

inf ∅ = >,sup ∅ = ⊥.

We know that a Boolean algebra is a distributive lattice, in addition to that a stronger infinitedistributive law holds for a Boolean σ-algebra.

Lemma 1.6.3 Let B be a Boolean σ-algebra, (an)n∈N be a sequence of elements in B, then

b ∧∨n∈N

an =∨n∈N

(b ∧ an),

b ∨∧n∈N

an =∧n∈N

(b ∨ an)

holds for any b ∈ B.

Proof We establish the first equality, the second one follows by duality. Since b∧ an ≤ b andb ∧ an ≤ an, we see that

∨n∈N(b ∧ an) ≤ b ∧

∨n∈N an. For establishing the reverse inequality,

assume that s is an upper bound to b ∧ an | n ∈ N, hence b ∧ an ≤ s for all n ∈ N,consequently, an = (b ∧ an) ∨ (−b ∧ an) ≤ s ∨ (−b ∧ an) ≤ s ∨ −b. Thus

b ∧∨n∈N

an ≤ b ∧ (s ∨ −b) = (b ∧ s) ∨ (b ∧ −b) ≤ b ∧ s ≤ s.

Hence s is an upper bound to b ∧∨n∈N an as well. Now apply this to the upper bound

s :=∨n∈N(b ∧ an). a

Let A be a non-empty subset of a Boolean σ-algebra B, then there exists a smallest σ-algebraC which contains A. In fact, this must be

C =⋂D ⊆ B | D is a σ-algebra with A ⊆ D.

We first note that the intersection of a set of σ-algebras is a σ-algebra again. Moreover,there exists always a σ-algebra which contains A, viz., the superset B. Consequently, C thethe object of our desire, it is denoted by σ(A), so that σ(A) denotes the smallest σ-algebracontaining A. σ is an example for a closure operator : We have A ⊆ σ(A), and A1 ⊆ A2 σ(A)implies σ(A1) ⊆ σ(A2), moreover, applying the operator twice does not yield anything new:σ(σ(A)) = σ(A).

Page 60: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

48 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Example 1.6.4 Let A := [a, b] | a, b ∈ [0, 1] be the set of all closed intervals [a, b] := x ∈R | a ≤ x ≤ b of the unit interval [0, 1]. Denote by B := σ(A) the σ-algebra generatedby A; the elements of B are sometimes called the Borel sets of [0, 1]. Then the half open Borel setsintervals [a, b[ and ]a, b] are members of B. We can write, e.g., [a, b[=

⋃n∈N[a, b− 1/n]. Since

[a, b − 1/n] ∈ A ⊆ B for all n ∈ N, and since B is closed under countable unions, the claimfollows.

A more complicated Borel set is constructed in this way: define C0 := [0, 1] and assume thatCn is defined already as a union of 2n mutually disjoint intervals of length 1/3n each, sayCn =

⋃1≤j≤2n Ij . Obtain Cn+1 by removing the open middle third of each interval Ij . For

example

C0 =[0, 1],

C1 =[0, 1/3] ∪ [2/3, 1],

C2 =[0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1],

C3 =[0, 1/27] ∪ [2/27, 1/9] ∪ [2/9, 7/27] ∪ [8/27] ∪ [2/3, 19/27]

∪ [20/27, 7/9] ∪ [8/9, 25/27] ∪ [26/27, 1]

and so on. Clearly Cn ∈ B, because this set is the finite union of closed intervals. Now put

C :=⋂n∈N

Cn,

then C ∈ B, because it is the countable intersection of sets in B. This set is known as theCantor setCantor ternary set. ,

The next two examples deal with σ-algebras of sets, each defined on the infinite product0, 1N. It may be used as a model for an infinite sequence of flipping coins — 0 denotinghead, 1 denoting tail. But we can only observe a finite numbers of these events, probably aslong as we want. So we cater for that by having a look at the σ-algebra which is defined bythese finite observations.

Example 1.6.5 Let X := 0, 1N be the set of all infinite binary sequences, and put B :=σ(Ak,i | k ∈ N, i = 0, 1) with Ak,i := 〈x1, x2, . . .〉 | xk = i as the set of all sequences thek-th component of which is i.

We claim that that for r ∈ N0 both Sk,r := 〈x1, x2, . . .〉 ∈ X | x1 + . . . + xk = r andTr := 〈x1, x2, . . .〉 ∈ X |

∑∞i=0 xi = r are elements of B.

In fact, given a finite binary sequence v := 〈v1, . . . , vk〉, the set

Qv := x ∈ X | 〈x1, . . . xk〉 = v =k⋂i=1

Ai,vi

is a member of B, the set Lk,r of binary sequences of length k which sum up to r is finite.Thus

Sk,r =⋃

v∈Lk,r

Qv ∈ B.

Page 61: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 49

Since

Tr =⋃n∈N

Sn,r ∩(Xk ×

∏k>n

0),

the assertion follows also for Tr. ,

We continue the example by looking at all sequences for which the average result of flippinga coin n times will converge as n tends to infinity. This is a bit more involved because wenow have to take care of the limit.

Example 1.6.6 Let X := 0, 1N be the set of all infinite binary sequences as in Exam-ple 1.6.5, and put

W := 〈x1, x2, . . .〉 ∈ X |1

n

n∑i=1

xi converges.

We claim that W ∈ B, noting that a real sequence (yn)n∈N converges iff it is a Cauchysequence, i.e., iff given 0 < ε ∈ Q there exists n0 ∈ N such that |ym−yn| < ε for all n,m ≥ n0.

Given F ⊆ N finite, the set

HF := x ∈ X | xj = 1 for all j ∈ F and xi = 0 for all i 6∈ F

=⋂j∈F

Aj,1 ∩⋂i 6∈F

Ai,0

is a member of B; since there are countably many finite subsets of N which have exactlyr elements, we obtain T =

⋃HF | F ⊆ N with |F | = r, which is a countable union of

elements of B, hence an element of B.

The sequence(

1n

∑ni=1 xi

)n∈N converges iff

∀ε > 0, ε ∈ Q∃n0 ∈ N∀n ≥ n0∀m ≥ n0 :∣∣ 1n

n∑i=1

xi −1

m

m∑i=1

xi∣∣ < ε,

thus iff

〈x1, x2, . . .〉 ∈⋂

ε>0,ε∈Q

⋃n0∈N

⋂N3n≥n0

⋂N3m≥n0

Wn,m,ε.

with

Wn,m,ε := 〈x1, x2, . . .〉 |∣∣ 1n

n∑i=1

xi −1

m

m∑i=1

xi∣∣ < ε.

Now 〈x1, x2, . . .〉 ∈Wn,m,ε iff∣∣m ·∑n

i=1 xi−n ·∑m

j=1 xj∣∣ < n ·m · ε. If n < m, this is equivalent

to

−n ·m · ε < (m− n) ·n∑i=1

xi − n ·m∑

j=n+1

xj < n ·m · ε

hence −n ·m · ε < (m− n) · a− n · b < n ·m · ε for a =∑n

i=1 xi and b =∑m

j=n+1 xj ; the sameapplies to the case m < n. Since there are only finitely many combinations of 〈a, b〉 satisfyingthese constraints, we conclude that Wn,m,ε ∈ B, so that the set W of all sequences for whichthe average sum converges is a member of B as well. ,

Page 62: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

50 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

1.6.1 Construction Through Transfinite Induction

We will in this section show that the σ-algebra generated by a subset of a Boolean σ-algebracan actually be constructed directly through transfinite induction. We have introduced σ(H)as a closure operation, viz., the smallest σ-algebra containing H; this is an operation whichworks from outside H. In contrast, the inductive construction works from the inside, con-structing σ(H) through operations with the elements of H, the elements derived from it, etc.In addition, the description of σ(A) given above is non-constructive. Transfinite inductionpermits us to construct σ(A) explicitly (if one dares to speak in these terms of a transfiniteoperation).

In order to describe it, we introduce two operators on the subsets of B as follows. Let H ⊆ B,thenHσ, Hδ

Hσ := ∨n∈N

an | an ∈ H for all n ∈ N,

Hδ := ∧n∈N

an | an ∈ H for all n ∈ N.

Thus Hσ contains all countable suprema of elements of H, and Hδ contains all countableinfima. Hence A is a Boolean sub σ-algebra of B iff

Aσ ⊆ A,Aδ ⊆ A, and −a | a ∈ A ⊆ A

hold.

So couldn’t we, when constructing σ(A), just take all complements, then all countable infimaand suprema of elements in A, and then their countable suprema and infima, and so on? Thisis the basic idea for the construction. But since the process indicated above is not guaranteedto terminate after a finite number of applications of the σ-and the δ-operations, we do atransfinite construction.

In order to implement this idea, we fix a set algebra A ⊆ B, hence A is a Boolean algebra.Thus we have only to focus on the infinitary operations, union and intersection of two elementsare special cases. Define by transfinite induction

A0 := A,

Aζ :=⋃η<ζ

Aη, if ζ is a limit ordinal

Aζ+1 := (Aζ)σ, if ζ is odd,

Aζ+1 := (Aζ)δ, if ζ is even,

Aω1 :=⋃ζ<ω1

Aζ .

It is clear that Aζ ⊆ C holds for each σ-algebra C which contains A, so that Aω1 ⊆ σ(A) isinferred.

Let us work on the other inclusion. It is sufficient to show that Aω1 is a σ-algebra. This isso because A ⊆ Aω1 , so that is this case Aω1 would contribute to the intersections defining

Page 63: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 51

σ(A), hence we could infer Aω1 ⊆ σ(A). We prove the assertion through a series of auxiliarystatements, noting that 〈Aζ | ζ < ω1〉 forms a chain with respect to set inclusion.

Lemma 1.6.7 For each ζ < ω1, if a ∈ Aζ , then −a ∈ Aζ+1.

Proof 0. The proof proceeds by transfinite induction on ζ; it will have to discuss the casethat ζ is a limit ordinal, and distinguish whether ζ is even or odd. Plan

1. The assertion is true for ζ = 0. Assume for the induction step that it is true for allη < ζ.

2. If ζ is a limit ordinal, we know that we can find for a ∈ Aζ an ordinal η < ζ with a ∈ Aη,hence by induction hypothesis −a ∈ Aη+1 ⊆ Aζ , because η+1 < ζ by the definition of a limitordinal (see Definition 1.4.8 on page 16).

If ζ is even, but not a limit ordinal, we can write ζ as ζ = ξ + 1. Then Aζ = (Aξ)σ, hencea =

∨n∈N an for some an ∈ Aξ ⊆ Aζ , so that −a =

∧n∈N(−an) ∈ (Aζ)δ = Aζ+1. The

argumentation for ζ odd is exactly the same. a

Thus Aω1 is closed under complementation. Closure under countable infima and suprema isshown similarly, but we have to cater for a countable sequence of countable ordinals.

Lemma 1.6.8 Aω1 is closed under countable infima and countable suprema.

The proof rests on the observation that the supremum of a countable number of countableordinals is countable again (or, putting it differently, that ω1 is not reachable by countable

Basic ob-servation

suprema of countable ordinals).

Proof We focus on countable suprema, the proof for infima works exactly in the same way.Let (an)n∈N be a sequence of elements in Aω1 , then we find ordinal numbers ζn < ω1 suchthat an ∈ Aζn . Because ζn is countable for each n ∈ N, we conclude from Proposition 1.4.17that ζ∗ :=

⋃n∈N ζn is a countable ordinal, so that ζ∗ < ω1. Because 〈Aζ | ζ < ω1〉 forms a

chain, we infer that an ∈ Aζ∗ for all n ∈ N. Consequently,∨n∈N an ∈ (Aζ∗)σ ⊆ Aω1 . a

Thus we have shown

Proposition 1.6.9 Aω1 = σ(A). a

To summarize, we have two possibilities to construct the σ-algebra generated by an algebraA: We can use the closure operation suggested by the σ-operator, or we can go throughthe explicit construction using transfinite induction. One usually prefers the first way, sinceit is easier to handle, and, as we will see, informations about the context will easier to befactored in. But sometimes one does not have another choice but going the inductive way. Animmediate use of this construction will be the Extension Theorem 1.6.29 for measures.

1.6.2 Factoring Through σ-Ideals

Factoring a Boolean σ-algebra through an ideal works as for general Boolean algebras, re-sulting in a Boolean algebra again. There is no reason why the factor algebra should be a

Page 64: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

52 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

σ-algebra, however, so if we want to obtain a σ-algebra we have to make stronger assumptionson the object used for factoring.

Definition 1.6.10 Let B be a Boolean algebra, I ⊆ B an ideal. I is called a σ-ideal iffsupn∈N an ∈ I, provided an ∈ I for all n ∈ N.

Not every ideal is a σ-ideal: F ⊆ N | F is finite is an ideal but certainly not a σ-ideal inP (N), even if P (N) is a Boolean σ-algebra.

The following statement is the σ-variant of Proposition 1.5.42; its proof follows [Aum54, p.79] quite closely.

Proposition 1.6.11 Let B be a Boolean σ-algebra and I ⊆ B be a σ-ideal. Then B/I is aBoolean σ-algebra.

Proof 1. Because each Boolean σ-algebra is a Boolean algebra, and each σ-ideal is an ideal,Approachwe may conclude from Proposition 1.5.42 that B/I is a Boolean algebra. Hence it remains tobe shown that this Boolean algebra is closed under countable suprema; since B/I is closedunder complementation, closedness under countable infima will follow.

2. Let an ∈ B, then a :=∨n∈N an ∈ B. We claim that [a]∼I =

∨n∈N [an]∼I . Because an ≤ a

for all n ∈ N, we conclude that [an]∼I ≤ [a]∼I for all n ∈ N, hence∨n∈N [an]∼I ≤ [a]∼I . Now

let [an]∼I ≤ [b]∼I for all n ∈ N, then we show that [a]∼I ≤ [b]∼I . In fact, because [an]∼I ≤ [b]∼Iwe conclude that cn := an(an∧b) ∈ I and b∧cn = ⊥ for all n ∈ N (since cn = an∧−(an∧b)).Thus an = cn∨(an∧cn), so that we have

∨n∈N an = (a∧b)∨

∨n∈N cn by the infinite distributive

law from Lemma 1.6.3. This implies a (a∧ b) =∨n∈N cn ∈ I, or, equivalently, [a]∼I ≤ [b]∼I .

Consequently, [a]∼I is the smallest upper bound to [an]∼I | n ∈ N. a

This construction will be put to use later on when we want to identify sets which differ onlyby a set of measure zero. Then it will turn out that this equivalence relation on subsets isbased on a σ-ideal. But before we can do that, we have to know what a measure is. This iswhat we will discuss next.

1.6.3 Measures

Boolean σ-algebras model events. The top element > is interpreted as an event which canhappen unconditionally and always, the bottom element ⊥ is the impossible event. Thecomplement of an event is an event, and if we have a countable sequence of events, thentheir supremum is an event, viz., the event that at least one of the event in the sequencehappens.

To illustrate, suppose that we have a set T of traders which may form unions or coalitions,then T as well as ∅ are coalitions; if A is a coalition, then T \A is a coalition as well, and if Anis a coalition for each n ∈ N, then we want to be able to form the “big” coalition

⋃n∈NAn.

Hence the set of all coalitions forms a σ-algebra.

We deal in the sequel with set based σ-algebras, so we fix a set S of events.

Definition 1.6.12 Let C ⊆ P (S) be a family of sets with ∅ ∈ C. A map µ : C → [0,∞] withµ(∅) = 0 is called

Page 65: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 53

1. monotone iff µ(A) ≤ µ(B) for A,B ∈ C, A ⊆ B,

2. additive iff µ(A ∪B) = µ(A) + µ(B) for all A,B ∈ C, with A ∪B ∈ C and A ∩B = ∅,

3. countably subadditive iff µ(⋃n∈NAn) ≤

∑n∈N µ(An), whenever (An)n∈N is a sequence

of sets in C with⋃n∈NAn ∈ C,

4. countably additive iff µ(⋃n∈NAn) =

∑n∈N µ(An), whenever (An)n∈N is a mutually

disjoint sequence of sets in C with⋃n∈NAn ∈ C,

If C is a σ-algebra, then a map µ : C → [0,∞] with µ(∅) = 0 is called a measure iff µ ismonotone and countably additive. If S can be written as S =

⋃n∈N Sn with Sn ∈ C and

µ(Sn) <∞ for all n ∈ N, then the measure is called σ-finite.

Note that we permit that µ assumes the value +∞. Clearly, a countably additive set functionis additive, and it is countably subadditive, provided it is monotone.

Example 1.6.13 Let S be a set, and define for a ∈ S,A ⊆ S

δa(A) :=

1, if a ∈ A0, otherwise.

Then δa is a measure on the power set of S. It is usually referred to a the Dirac measure ona. ,

Diracmeasure

A slightly more complicated example indicates the connection to ultrafilters.

Example 1.6.14 Let µ : P (S)→ 0, 1 be a binary valued measure. Define

F := A ⊆ S | µ(A) = 1

Then F is an ultrafilter on P (S). First, we check that F is a filter: ∅ 6∈ F is obvious, andif A ∈ F with A ⊆ B, then certainly B ∈ F . Let A,B ∈ F , then 2 = µ(A) + µ(B) =µ(A ∪ B) + µ(A ∩ B), hence µ(A ∩ B) = 1, thus A ∩ B ∈ F . Thus F is indeed a filter. It isalso an ultrafilter by Lemma 1.5.21, because A 6∈ F implies S \A ∈ F .

The converse construction, viz., to generate a binary valued measure from a filter, wouldrequire

⋃n∈NAn ∈ F if and only if there exists n ∈ N with An ∈ F for any disjoint family

(An)n∈N. This, however, leads to very deep questions on Set Theory, see [Jec06, Chapter 10]for a discussion. ,

Let us have a look at an important example.

Example 1.6.15 Let C := ]a, b] | a, b ∈ [0, 1] be all left open, right closed intervals ofthe unit interval. Put `(]a, b]) := b − a, hence `(I) is the length of interval I. Note that`(∅) = `(]a, a]) = 0. Certainly ` : C → R+ is monotone and additive.

1. If⋃ki=1]ai, bi] ⊆]a, b], and the intervals are disjoint, then

∑ki=1 `(]ai, bi]) ≤ `(]a, b]). The

proof proceeds by induction on the number k of intervals. For the induction step wehave mutually disjoint intervals with

⋃k+1i=1 ]ak, bk] ⊆]a, b]. Renumbering, if necessary,

we may assume that a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ ak ≤ bk ≤ ak+1 ≤ bk+1. Then∑ki=1 `(]ai, bi]) + `(]ak+1, bk+1]) ≤ `(]a1, bk]) + `(]ak+1, bk+1]) ≤ `(]a, b]), because ` is

monotone and additive.

Page 66: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

54 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

2. If⋃∞i=1]ai, bi] ⊆]a, b], and the intervals are disjoint, then

∑ki=1 `(]ai, bi]) ≤ `(]a, b]) for

all k, hence∞∑i=1

`(]ai, bi]) = supk∈N

k∑i=1

`(]ai, bi]) ≤ `(]a, b]).

3. If ]a, b] ⊆⋃ki=1]ai, bk] then `(]a, b]) ≤

∑ki=1 `(]ai, bi]) with not necessarily disjoint inter-

vals. This is established by induction on k. If k = 1, the assertion is obvious. Theinduction step proceeds as follows: Assume that ]a, b] ⊆

⋃k+1i=1 ]ai, bk]. By renumbering,

if necessary, we can assume that ak+1 < b ≤ bk+1. If ak+1 ≤ a, the assertion follows,so let us assume that a < ak+1. Then ]a, ak+1] ⊆

⋃ki=1]ai, bi], so that by the induction

hypothesis ak+1 − a = `(]a, ak+1]) ≥∑k

i=1 `(]ai, bi]). Thus

`(]b, a]) = b− a ≤ (ak+1 − a) + (bk+1 − ak+1) ≤k+1∑i=1

`(]ai, bi]).

4. Now assume that ]a, b] ⊆⋃∞i=1]ai, bi]. This is a little bit more complicated since we do

not know whether the interval ]a, b] is covered already by a finite number of intervals,so we have to resort to a little trick. The interval [a+ ε, b] is closed and bounded, hencecompact, for every fixed ε > 0; we also know that for each i ∈ N the semi-open interval]ai, bi] is contained in the open interval ]ai, bi + ε/2i[, so that we have

[a+ ε, b] ⊆∞⋃i=1

]ai, bi + ε/2i[

By the Heine-Borel Theorem 1.5.46 we can find a finite subset of these intervals whichcover [a+ ε, b], say [a+ ε, b] ⊆

⋃i∈K ]ai, bi + ε/2i[, with K ⊆ N finite. Hence

]a+ ε, b] ⊆⋃i∈K

]ai, bi + ε/2i],

and we conclude from the finite case that

b− (a+ ε) = `([a+ ε, b]) ≤∑i∈K

`(]ai, bi + ε/2i]) =

∞∑i=1

(bi + ε/2i − ai) <∞∑i=1

`(]ai, bi]) + ε.

Since ε > 0 was arbitrary, we have established

`(]a, b]) ≤∞∑i=1

`(]ai, bi]).

,

Thus we have shown

Proposition 1.6.16 Let C be the set of all left open, right closed intervals of the unit interval,and denote by `(]a, b]) := b− a the length of interval ]a, b] ∈ C. Then ` : C → R+ is monotoneand countably additive. a

Page 67: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 55

When having a look at C we note that this family is not closed under complementation, butthe complement of a set in C can be represented through elements of C, e.g., ]0, 1]\]1/3, 1/2] =]0, 1/3]∪]1/2, 1]. This is captured through the following definition.

Definition 1.6.17 R ⊆ P (S) is called a semiring iff

1. ∅ ∈ R,

2. R is closed under finite intersections,

3. If B ∈ R, then there exists a finite family of mutually disjoint sets C1, . . . , Ck ∈ R withS \B = C1 ∪ . . . ∪ Ck.

Thus the complement of a set in R can be represented through a finite disjoint union ofelements of R.

We want to extend ` : C → R+ from the semiring of left open, right closed intervals to ameasure λ on the σ-algebra σ(C). This measure is fairly important, it is called the Lebesgue

Lebesguemeasure

measure on the unit interval.

A first step towards an extension of ` to the σ-algebra generated by the intervals is theextension to the algebra generated by them. This can be accomplished easily once thisalgebra has been identified.

Lemma 1.6.18 Let C be the set of all left open, right closed intervals in ]0, 1]. Then thealgebra generated by C consists of all disjoint unions of elements of C.

Proof Denote by

D := n⋃i=1

]ai, bi] | n ∈ N, a1 ≤ b1 ≤ a2 ≤ b2 . . . ≤ an ≤ bn

Then all elements of D are certainly contained in the algebra generated by C. If we can showthat D is an algebra itself, we are done, because then D is the smallest algebra containingC.

D is certainly closed under finite unions and finite intersections, and ∅ ∈ D. Then

]0, 1] \n⋃i=1

]ai, bi] =]0, a1]∪ ]b1, a2] ∪ . . . ∪ ]bn, 1],

which is a member of D as well. Thus D is also closed under complementation, hence is analgebra. a

This permit us to extend ` to the algebra generated by the intervals:

Corollary 1.6.19 ` extends uniquely to the algebra generated by C such that the extensionis monotone and countably additive.

Proof Put

`( n⋃i=1

]ai, bi])

:=

n∑i=1

`(]ai, bi]),

Page 68: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

56 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

whenever ]ai, bi] ∈ C. This is well defined. Assume

n⋃i=1

]ai, bi] =m⋃j=1

]cj , dj ],

then ]ai, bi] can be represented as a disjoint union of those intervals ]cj , dj ] which it contains,so that we have

n∑i=1

`(]ai, bi]) =n∑i=1

m∑j=1

`(]ai, bi]∩]cj , dj ])

=

m∑j=1

m∑i=1

`(]cj , dj ]∩]ai, bi])

=

m∑j=1

`(]cj , dj ])

We may conclude from Example 1.6.15 that ` is countably additive on the algebra. a

For the sake of illustration, let us assume that we have Lebesgue measure constructed already,and let us compute λ(C) where C is the Cantor ternary set constructed in Example 1.6.4 onpage 48. The construction of the ternary set is done through sets Cn, each of which is theunion of 2n mutually disjoint intervals of length 3−n. If I is an interval of length 3−n, weknow that λ(I) = 3−n, so that λ(Cn) = (2/3)n. We also know that C1 ⊇ C2 ⊇ . . ., so that wehave a descending chain of sets with C =

⋂n∈NCn and infn∈N λ(Cn) = 0.

In order to compute λ(C), we need so know something about the behavior of measures whenmonotone limits of sets are encountered.

Lemma 1.6.20 Let µ : A → [0,∞] be a measure on the σ-algebra A.

1. If An ∈ A is a monotone increasing sequence of sets in A, and A =⋃n∈NAn, then

µ(A) = supn∈N µ(An).

2. If An ∈ A is a monotone decreasing sequence of sets in A, and A =⋂n∈NAn, then

µ(A) = infn∈N µ(An), provided µ(Ak) <∞ for some k ∈ N.

Proof 1. We can write An =⋃ni=1Bi with B1 := A1 and Bi := Ai \ Ai−1. Because the An

form an increasing sequence, the Bn are mutually disjoint. Assume without loss of generalitythat µ(An) <∞ for all n ∈ N (otherwise the assertion is trivial), then by countable additivityand through telescoping

µ(A) =∞∑i=1

µ(Bi) = µ(A1) +∞∑i=1

(µ(Ai+1)− µ(Ai)

)= lim

n→∞µ(An) = sup

n∈Nµ(An).

2. Assume µ(A1) <∞, then the sequence A1 \An is increasing towards A1 \A, hence

µ(A) = µ(A1)− µ(A1 \A) = µ(A1)− supn∈N

µ(A1 \An) = infn∈N

µ(An).

a

Page 69: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 57

Ok, so let us return to the discussion of Cantor’s set. We know that λ(Cn) = (2/3)n, andthat C1 ⊇ C2 ⊇ C3 . . ., so we conclude

λ(C) = infn∈N

λ(Cn) = 0.

We have identified a geometrically fairly complicated set which has measure zero. Thisset is geometrically not easy to visualize, since it does not contain an interval of positivelength.

Now fix a semiring C ⊆ P (S) and µ : C → [0,∞] with µ(∅) = 0, which is monotone andcountably subadditive. We will first compute an outer approximation for each subset of S byelements of C. But since the subsets of S may be as a whole somewhat inaccessible, and sinceC may be somewhat small, we try to cover the subsets of S by countable unions of elementsof C and take the best approximation we can, i.e., we take the infimum. Define

µ∗(A) := inf∑n∈N

µ(Cn) | A ⊆⋃n∈N

Cn, Cn ∈ C

for A ⊆ S. This is the outer measure of A associated with µ.

These are some interesting (for us, that is) properties of µ∗.

Lemma 1.6.21 µ∗ : P (S) → [0,∞] is monotone and countably subadditive, µ∗(∅) = 0. IfA ∈ C, then µ∗(A) = µ(A).

Proof 1. Let (An)n∈N be a sequence of subset of S, put A :=⋃n∈NAn. If

∑n∈N µ

∗(An) <∞,fixing n we find a cover Cn,m | m ∈ N ⊆ C for An with

µ(An) ≤∑m∈N

µ(Cn,m) ≤ µ∗(An) + ε/2n,

thus Cn,m | n,m ∈ N ⊆ C is a cover of A with

µ(A) ≤∑n,m∈N

µ(Cn,m) ≤∑n∈N

µ(An) + ε.

Since ε > 0 was arbitrary, we conclude µ∗(A) ≤∑

n∈N µ∗(An). If, however,

∑n∈N µ

∗(An) =∞, the assertion is immediate.

2. The other properties are readily seen. a

The next step is somewhat mysterious — it has been suggested by Caratheodory around1914 for the construction of a measure extension. It splits a set A = (A ∩X) ∪ (A ∩ S \X)along an arbitrary other set X, and looks what happens to the outer measure. If µ∗(A) =µ∗(A ∩ X) + µ∗(A ∩ S \ X), then A is considered well behaved. Those sets which are wellbehaved no matter what set X we use for splitting are considered next.

Definition 1.6.22 A set A ⊆ S is called µ-measurable iff µ∗(X) = µ∗(X∩A)+µ∗(X∩S\A)holds for all X ⊆ S. The set of all µ-measurable sets is denoted by Cµ. Cµ

Page 70: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

58 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

So take a µ-measurable set A and an arbitrary subset X ⊆ S, then X splits into a part X ∩Awhich belongs to A and another one X ∩ S \A outside of A. Measuring these pieces throughµ∗, we demand that they add up to µ∗(X) again.

These properties are immediate:

Lemma 1.6.23 The outer measure has these properties

1. µ∗(∅) = 0.

2. µ∗(A) ≥ 0 for all A ⊆ S.

3. µ∗ is monotone.

4. µ∗ is countably subadditive.

Proof We establish only the last property. Here we have to show that µ∗(⋃n∈NAn) ≤∑

n∈N µ∗(An). We may and do assume that all µ∗(An) are finite. Given ε > 0 we find for

each n ∈ N a sequence Bn,k ∈ C for An such that An ⊆⋃k∈NBn,k and

∑k∈N µ(Bn,k) ≤

µ∗(An) + ε/2n. Thus∑n,k∈N

µ(Bn,k) ≤∑n∈N

(µ∗(An) + ε/2n) <∑n∈N

µ∗(An) + ε,

which implies∑

n∈N µ∗(An) ≤ µ∗(

⋃n∈NAn), because

⋃n∈NAn ⊆

⋃n,k∈NBn,k, and because

ε > 0 was arbitrary. a

Because of countably subadditivity, we conclude

Corollary 1.6.24 A ∈ Cµ iff µ∗(X ∩A) + µ∗(X ∩ S \A) ≤ µ∗(X) for all X ⊆ S. a

Let us have a look at the set of all µ-measurable sets. It turns out that the originally givensets are all µ-measurable, and that Cµ is an algebra.

Proposition 1.6.25 Cµ is an algebra. Also if µ is additive, C ⊆ Cµ and µ(A) = µ∗(A) forall A ∈ C.

Proof 1. Cµ is closed under complementation; this is obvious from its definition, and S ∈ Cµ isalso clear. So we have only to show that Cµ is closed under finite intersections. For simplicity,denote complementation by ·c.

Now let A,B ∈ Cµ, we want to show

µ∗(X) ≥ µ∗((A ∩B) ∩X) + µ∗((A ∩B)c ∩X),

for each X ⊆ S; from Corollary 1.6.24 we infer that this implies A ∩ B ∈ Cµ. Since B ∈ Cµand then A ∈ Cµ, we know

µ∗(X) = µ∗(X ∩B) + µ∗(X ∩Bc)

= µ∗(X ∩ (A ∩B)

)+ µ∗

(X ∩ (Ac ∩B)

)+ µ∗

(X ∩ (A ∩Bc)

)+ µ∗

(X ∩ (Ac ∩Bc)

)≥ µ∗

(X ∩ (A ∩B)

)+ µ∗

(X ∩ ((Ac ∩B) ∪ (A ∩Bc) ∪ (Ac ∩Bc))

)(‡)= µ∗

(X ∩ (A ∩B)

)+ µ∗

(X ∩ (A ∩B)c

).

Page 71: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 59

Equality (‡) uses

(Ac ∩B) ∪ (A ∩Bc) ∪ (Ac ∩Bc) =(Ac ∩ (B ∪Bc)

)∪ (A ∩Bc)

= Ac ∪ (A ∩Bc)

= Ac ∪Bc.

Hence we see that A ∩B satisfies the defining inequality.

2. We still have to show that C ⊆ Cµ, and that µ∗ extends µ. Let A ∈ C, then S \ A =D1∪ . . .∪Dk for some mutually disjoint D1, . . . , Dk ∈ C, because C is a semiring. Fix X ⊆ S,and assume that µ∗(S) < ∞ (otherwise, the assertion is trivial). Given ε > 0 there existsin C a cover (An)n∈N of X with µ∗(X) <

∑n∈N µ(An) + ε. Now put Bn := A ∩ An and

Ci,n := An ∩Di. Then X ∩ A ⊆⋃n∈NBn with Bn ∈ C and X ∩ Ac ⊆

⋃n∈N,1≤i≤k Ci,n with

Ci,n ∈ C. Hence

µ∗(X ∩A) + µ∗(X ∩AC) ≤∑n∈N

µ(Bn) +∑

n∈N,1≤i≤kµ(Ci,n)

≤∑n∈N

µ(An)

< µ∗(X)− ε,

because µ is (finitely) additive. Hence A ∈ Cµ. µ∗ is an extension to µ by Lemma 1.6.21.a

But we can in fact say more on the behavior of µ∗ on Cµ: It turns out to be additive on thesplitting parts.

Lemma 1.6.26 Let D ⊆ Cµ be a finite or infinite family of mutually disjoint sets in Cµ, then

µ∗(X ∩⋃D∈D

D) =∑D∈D

µ∗(X ∩D)

holds for all X ⊆ S.

Proof 1. The proof goes like this: We establish the equality above for finite D, say, D = PlanA1, . . . , An with An ∈ Cµ for 1 ≤ j ≤ n. From this we obtain that the equality holds in thecountable case as well, because then

µ∗(X ∩∞⋃i=1

Ai) ≥ µ∗(X ∩n⋃i=1

Ai) =n∑i=1

µ∗(X ∩Ai),

for all n ∈ N, so that µ∗(X ∩⋃∞i=1Ai) ≥

∑∞i=1 µ

∗(X ∩ Ai), which together with countablesubadditivity gives the desired result.

2. The proof for µ∗(X ∩⋃ni=1Ai) =

∑ni=1 µ

∗(X ∩ Ai) proceeds by induction on n, startingwith n = 2. If A1 ∪A2 = S, this is just the definition that A1 (or A2) is µ-measurable, so theequality holds. If A1 ∪A2 6= S, we note that

µ∗(X) = µ∗((X ∩ (A1 ∪A2)) ∩A1

)+ µ∗

((X ∩ (A1 ∪A2)) ∩ S \A1

).

Page 72: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

60 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Evaluating the pieces, we see that(X ∩ (A1 ∪A2)

)∩A1 = X ∩A1,(

X ∩ (A1 ∪A2))∩ S \A1 = X ∩A2,

because A1 ∩A2 = ∅. The induction step is straightforward:

µ∗(X ∩n+1⋃i=1

Ai) = µ∗((X ∩n⋃i=1

An) ∪ (X ∩An+1))

=n∑i=1

µ∗(X ∩Ai) + µ∗(X ∩An+1)

=

n+1∑i=1

µ∗(X ∩Ai)

a

We can relax the condition on a set being a member of Cµ if we know that the domain C fromwhich we started is an algebra, and that µ is additive on C. Then we do not have to testwhether a µ-measurable set splits all the subsets of S, but it is rather sufficient that A splitsS, to be specific:

Proposition 1.6.27 Let C be an algebra, and µ : C → [0,+∞] be additive. Then A ∈ Cµ iffµ∗(A) + µ∗(S \A) = µ∗(S).

Proof This is a somewhat lengthy and laborious computation similarly to the one above,see [Bog07, 1.11.7, 1.11.8]. a

Returning to the general discussion, we have:

Proposition 1.6.28 Cµ is a σ-algebra, and µ∗ is countably additive on Cµ.

Proof 0. Let (An)n∈N be a countable family of mutually disjoint sets in Cµ, then we have toshow that A :=

⋃n∈NAn ∈ Cµ, thus we have to show that

µ∗(X ∩A) + µ∗(X ∩Ac) ≤ µ∗(X)

for each X ⊆ S (here ·c is complementation again). Fix X.

1. We know that Cµ is closed under finite unions, so we have for each n ∈ N

µ∗(X) ≥n∑i=1

µ∗(X ∩Ai) + µ∗(X ∩n⋂i=1

Aci )

≥n∑i=1

µ∗(X ∩Ai) + µ∗(X ∩A),

because⋂ni=1A

ci ⊇ Ac. Letting n→∞ we obtain the desired inequality.

3. Thus Cµ is closed under disjoint countable unions. Using the first entrance trick (Exer-cise 1.36) and the observation that Cµ is an algebra by Proposition 1.6.25, we convert each

Page 73: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 61

countable union into a countable union of mutually disjoint sets, so we have shown that Cµis a σ-algebra. Countable additivity of µ∗ on Cµ follows from Lemma 1.6.26 when puttingX := S. a

Summarizing, we have demonstrated this Extension Theorem.

Theorem 1.6.29 Let C is an algebra over a set S and µ : C → [0,∞] monotone and countablyadditive.

ExtensionTheorem

1. There exists an extension of µ to a measure on the σ-algebra σ(C) generated by C.

2. If µ is σ-finite the extension is uniquely determined.

Proof 0. For establishing the existence of an extension, we simply collect the results obtainedso far. For a finite measure we obtain uniqueness by following the construction of σ(C) through

Steps inthe proof

transfinite induction, as outlined in Section 1.6.1; note that finiteness is necessary here becausewe are bound by Lemma 1.6.20 to finite measures for establishing the limit of a decreasingsequence. Finally, we localize the measure in the σ-finite case to the countably many finitepieces which make up the entire space.

1. Proposition 1.6.28 shows that Cµ is a σ-algebra containing C, and that µ∗ is a measure onCµ. Hence σ(C) ⊆ Cµ, and we can restrict µ∗ to σ(C). Denote this restriction also by µ, thenµ is a measure on σ(C).

2. In order to establish uniqueness, assume first that µ(S) < ∞. Let ν be a measure whichextends µ to σ(C). Recall the construction of σ(C) through transfinite induction on page 50.We claim that

µ(A) = ν(A) for all A ∈ Cζholds for all ordinals ζ < ω1. Because C is an algebra, it is easy to see that for odd ordinalsζ a set A ∈ (Cζ)δ iff there exists a decreasing sequence (An)n∈N ⊆ Cζ with A =

⋂n∈NAn;

similarly, each element of (Cζ)σ can be represented as the union of an increasing sequence ofelements of Aζ if ζ is even. Assume for the induction step that ζ is odd, and let A ∈ Cζ+1,thus A =

⋂n∈NAn with A1 ⊇ A2 ⊇ . . . and An ∈ Cζ . Hence by Lemma 1.6.20

µ(A) = µ(⋂n∈N

An) = infn∈N

µ(An) = infn∈N

ν(An) = ν(A).

Thus µ and ν coincide on Cζ+1, if ζ is odd. One argues similarly, but with a monotoneincreasing sequence in the case that ζ is even. If µ and ν coincide on all Cη for all η withη < ζ for a limit number ζ, then it is clear that they also coincide on Cζ as well.

3. Assume that µ(S) =∞, but that there exists a sequence (Sn)n∈N in C with µ(Sn) <∞ andS =

⋃n∈N Sn. Because µ(S1 ∪ . . . ∪ Sn) ≤ µ(S1) + . . .+ µ(Sn) <∞, we may and do assume

that the sequence is monotonically increasing. Let µn(A) := µ(A ∩ Sn) be the localization ofµ to Sn. µn has a unique extension to σ(C), and since we have µ(A) = supn∈N µn(A) for allA ∈ σ(C), the assertion follows. a

But we are not quite done yet, witnessed by a glance at Lebesgue measure. There we startedfrom the semiring of intervals, but our uniqueness theorem states only what happens whenwe carry out our extension process starting from an algebra. It turns out to be most

Page 74: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

62 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

convenient to have a closer look at the construction of σ-algebras when the family of setswe start from has already some structure. This gives the occasion to introduce Dynkin’sπ-λ-Theorem. This is a very important tool, which sometimes simplifies the identification ofthe σ-algebra generated from some family of sets.

Theorem 1.6.30 (π-λ-Theorem) Let P be a family of subsets of S that is closed underfinite intersections (this is called a π-class). Then σ(P) is the smallest λ-class containing P,where a family L of subsets of S is called a λ-class iff it is closed under complements and

π-λ-Theorem

countable disjoint unions.

Proof 1. Let L be the smallest λ-class containing P , then we show that L is a σ-algebra.

2. We show first that it is an algebra. Being a λ-class, L is closed under complementation.Let A ⊆ S, then LA := B ⊆ S | A ∩B ∈ L is a λ-class again: if A ∩B ∈ L, then

A ∩ (S \B) = A \B = S \ ((A ∩B) ∪ (S \A)),

which is in L, since (A ∩B) ∩ S \A = ∅, and since L is closed under disjoint unions.

If A ∈ P, then P ⊆ LA, because P is closed under intersections. Because LA is a λ-system,this implies L ⊆ LA for all A ∈ P. Now take B ∈ L, then the preceding argument shows thatP ⊆ LB, and again we may conclude that L ⊆ LB. Thus we have shown that A ∩ B ∈ L,provided A,B ∈ L, so that L is closed under finite intersections. Thus L is a Booleanalgebra.

3. L is a σ-algebra as well. It is enough to show that L is closed under countable unions. Butsince ⋃

n∈NAn =

⋃n∈N

(An \

n−1⋃i=1

Ai

),

this follows immediately. a

Consider an immediate and fairly typical application. It states that two finite measures areequal on a σ-algebra, provided they are equal on a generator which is closed under finiteintersections. The proof technique called the Principle of good sets in [Els99] is worth noting:

Principle ofgood sets

We collect all sets for which the assertion holds into one family of sets and investigate itsproperties, starting from an originally given set. If we find that the family has the desiredproperty, then we look at the corresponding closure. With this in mind, we have a look atthe proof of the following statement.

Lemma 1.6.31 Let µ, ν be finite measures on a σ-algebra σ(B), where B is a family of setswhich is closed under finite intersections. Then µ(A) = ν(A) for all A ∈ σ(B), providedµ(B) = ν(B) for all B ∈ B.

Proof We investigate the family of all sets for which the assertion is true. Put

G := A ∈ σ(B) | µ(A) = ν(A),

then G has these properties:

• B ⊆ G by assumption.

Page 75: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.6. BOOLEAN σ-ALGEBRAS 63

• Since B is closed under finite intersections, S ∈ B ⊆ G.

• G is closed under complements.

• G is closed under countable disjoint unions; in fact, let (An)n∈N be a sequence of mutuallydisjoint sets in G and A :=

⋃n∈NAn, then

µ(A) =∑n∈N

µ(An) =∑n∈N

ν(An) = ν(A),

hence A ∈ G.

But this means that G is a λ-class containing B. But the smallest λ-class containing G is σ(B)by Theorem 1.6.30, so that we have now

σ(B) ⊆ G ⊆ σ(B),

the last inclusion coming from the definition of G. Thus we may conclude that G = σ(B),hence all sets in σ(B) have the desired property. a

We obtain as a slight extension to Theorem 1.6.29

Theorem 1.6.32 Let C is a semiring over a set S and µ : C → [0,∞] monotone and countablyadditive.

1. There exists an extension of µ to a measure on the σ-algebra σ(C) generated by C.

2. If µ is σ-finite, then the extension is uniquely determined.

a

The assumption on µ being σ-finite is in fact necessary:

Example 1.6.33 Let S be the semiring of all left open, right closed intervals on R, and put

µ(I) :=

0 if I = ∅,∞, otherwise.

Then µ has more than one extension to σ(S). For example, let c > 0 and put νc(A) := c · |A|with |A| as the number of elements of A. Plainly, νc extends µ for every c. ,

Consequently, the assumption that µ is σ-finite cannot be omitted in order to make sure thatthe extension is uniquely determined.

1.6.4 µ-Measurable Sets

Caratheodory’s approach gives even more than an extension to the σ-algebra generated froma semiring. This is what we will discuss next in order to point out a connection with thediscussion about the Axiom of Choice.

Fix for the time being an outer measure µ on P (S) which we assume to be finite. CallA ⊆ S a µ-null set iff we can find a µ-measurable set A1 with A ⊆ A1 and µ(A1) = 0.Thus a µ-null set is a set which is covered by a measurable set with µ-measure 0. Because

Page 76: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

64 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

µ(X∩S\A) ≤ µ(X) for every X ⊆ S, and because an outer measure is monotone, we concludethat each µ-null set is itself µ-measurable. In the same way we conclude that each set A whichcan be squeezed between two µ-measurable sets of the same measure (hence A1 ⊆ A ⊆ A2

with µ(A1) = µ(A2)) must be µ-measurable, because in this case A \ A1 ⊆ A \ A2 withµ(A \A2) = 0. Hence Cµ is complete in the sense that any A which can be sandwiched in this

Complete-ness

way is a member of Cµ.

This is a characterization of Cµ using these ideas.

Corollary 1.6.34 Let C be an algebra over a set S and µ : C → R+ monotone and countablyadditive with µ(∅) = 0. Then these statements are equivalent for A ⊆ S:

1. A ∈ Cµ.

2. There exists A1, A2 ∈ σ(C) with A1 ⊆ A ⊆ A2 and µ(A1) = µ(A2).

Proof The implication 2 ⇒ 1 follows from the discussion above, so we will look at 1 ⇒ 2.But this is trivial. a

Looking back at this development, we see that we can extend our measure far beyond theσ-algebra which is generated from the given semiring. One might even suspect that thisextension process gives us the whole power set of the set we started from as the domain forthe extended measure. That would of course be tremendously practical, because we thencould assign a measure to each subset. But, alas, if the Axiom of Choice is assumed, thesehopes are shattered. The following example demonstrates this. Before discussing it, however,we define and characterize µ-measurable sets on a σ-algebra.

If µ is a finite measure on σ-algebra B, we can define the outer measure µ∗(A) for any subsetA ⊆ S in the same way as we did for functions on a semiring. But since the algebraic structureof a σ-algebra is richer, it is not difficult to see that

µ∗(A) = infµ(B) | B ∈ B, A ⊆ B.

This is so because a cover of the set A through a countable union of elements on B is thesame as the cover of A through an element of B, because the σ-algebra B is closed undercountable unions. In a similar way we can try to approximate A from the inside, defining theinner measure throughµ∗, µ∗

µ∗(A) := supµ(B) | B ∈ B, A ⊇ B.

So µ∗(A) is the best approximation from the inside that is available to us. Of course, ifA ∈ B we have µ∗(A) = µ(A) = µ∗(A), because apparently A is the best approximation toitself.

We can perform the approximation through a sequence of sets, so we are able to precisely fixthe inner and the outer measure through elements of the σ-algebra.

Lemma 1.6.35 Let A ⊆ S and µ be a finite measure on the σ-algebra B.

1. There exists A∗ ∈ B such that µ∗(A) = µ(A∗).

2. There exists A∗ ∈ B such that µ∗(A) = µ(A∗).

Page 77: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.7. GAMES 65

Proof We demonstrate only the first part. For each n ∈ N there exists An ∈ B such that A ⊆Bn and µ(Bn) < µ(A)+1/n. Put An := B1∩ . . .∩Bn ∈ B, then A ⊆ An, µ(An) < µ(A)+1/n,and (An)n∈N decreases. Let A∗ :=

⋂n∈NAn ∈ B, then µ(A∗) = infn∈N µ(An) = µ∗(A) by the

second part of Lemma 1.6.20, because µ(A1) <∞. a

The set A∗ could be called the measurable closure of A, similarly, A∗ its measurable kernel.Using this terminology, we call a set µ-measurable iff its closure and its kernel give the samevalue.

Definition 1.6.36 Let µ be a finite measure on the σ-algebra B. A ⊆ S is called µ-measurable iff µ∗(A) = µ∗(A).

Every set in B is µ-measurable, and Bµ is the σ-algebra of all µ-measurable sets.

The example which has been announced above shows us that under the assumption of (AC)not every subset of the unit interval is λ-measurable, where λ is Lebesgue measure. Hencewe will present a set the inner and the outer measure of which are different.

Example 1.6.37 Define x α y iff x − y is rational for x, y ∈ [0, 1]. Then α is an equiva-lence relation, because the sum of two rational numbers is a rational number again. This issometimes called Vitali’s equivalence relation. The relation α partitions the interval [0, 1] into

Vitali’srelation

equivalence classes. Select from each equivalence class an element (which we can do by (AC)),denote by V the set of selected elements. Hence V ∩ [x]α contains for each x ∈ [0, 1] exactlyone element. We want to show that V is not λ-measurable, where λ is Lebesgue measure.

The set P := Q ∩ [0, 1] is countable. Define Vp := v + p | v ∈ V , for p ∈ P . If p, q ∈ P aredifferent, Vp ∩ Vq = ∅, since v1 + p = v2 + q implies v1 − v2 = q − p ∈ Q, thus v1 α v2, so v1

and v2 are in the same class, hence v1 = v2, thus p = q, which is a contradiction.

Put A :=⋃p∈P Vp, then [0, 1] ⊆ A ⊆ [0, 2]: Take x ∈ [0, 1], then there exists v ∈ V with

x α v, thus r := x− v ∈ Q, hence x ∈ Vr. On the other hand, if x ∈ Vr, then x = v+ r, hence0 ≤ x ≤ 2.

If A is λ-measurable, then λ(A) = 0 is impossible, because this would imply λ([0, 1]) = 0,since λ is monotone. Thus λ(A) > 0. But λ(Vp) = λ(V ) for each p, so that λ(A) = ∞ bycountable additivity. But this contradicts λ([0, 2]) = 2. Hence A is not λ-measurable, whichimplies that V is not λ-measurable. ,

So, let us record for later use: if (AC) holds, then there exists a subset of the unit intervalwhich is not Lebesgue measurable.

1.7 Games

We will now show that games are a tool with which we can show that (AC) can be replacedanother axiom, which will be the base for establishing that every subset of [0, 1] is Lebesguemeasurable. This will be done through a suitable two-person game. We have two players,Angel and Demon, playing against each other. For simplicity, we assume that playing meansoffering a natural number, and that the game — like True Love — never ends. Let A be aset of infinite sequences of natural numbers, then the game GA is played as follows. Angel

Page 78: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

66 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

starts with a0 ∈ N, Demon answers with b0 ∈ N, taking Angel’s move a0 into account. Angelreplies with a1, taking the game’s history 〈a0, b0〉 into account, then Demon answers with b1,contingent upon 〈a0, b0, a1〉, and so on. Angel wins this game, if the sequence 〈a0, b0, a1, b1, . . .〉is a member of A, otherwise Demon wins.

Let us have a look at strategies. Define

N := N∞

as the set of all sequences of natural numbers, and let

S :=⋃〈n1, . . . , nk〉 | k ≥ 0, n1, . . . , nk ∈ N

be the set of all finite sequences of natural numbers (with ε as the empty sequence). For easiernotation later on, we define appending an element to a finite sequence by 〈n1, . . . , nk〉 a n :=〈n1, . . . , nk, n〉. Su and Sg denote all sequences of odd, resp., even, length, the empty sequenceis denoted by ε, we assume ε ∈ Sg.S,Su,Sg

A strategy d for Angel is a map d : Sg → N which works in the following way: a0 := d(ε)is the first move of Angel, Demon replies with b0, then Angel answers with a1 := d(a0, b0),Demon reacts with b1, which Angel answers with a2 := d(a0, b0, a1, b1), and so on. If Angelplays according to strategy d and Demon’s moves are given by b := 〈b0, b1, . . .〉 ∈ N , thenthe game’s events are collected in d ∗ b ∈ N , where we define d ∗ b := 〈a0, b0, a1, b1, . . .〉 witha2k+1 = d(a0, b0, . . . , a2k, b2k〉 for k ≥ 0 and a0 = d(ε). Similarly, a strategy a for Demonis a map a : Su → N, working in this manner: If Angel plays a0, Demon answers withb0 := a(a0), then Angel plays a1, to which Demon replies with b1 := a(a0, b1, a1), and so on.If Angel’s moves are collected in a := 〈a0, a1, . . .〉, and if Demon plays strategy a, then theentire game is recorded in the sequence a ∗ a. Thus we define a ∗ a := 〈a0, b0, a1, b1, . . .〉 withbk = a(a0, b0, . . . , ak) for k ≥ 0.d ∗ b, a ∗ a

Definition 1.7.1 d : Sg → N is a winning strategy for Angel in game GA iff

d ∗ b | b ∈ N ⊆ A,

a : Su → N is a winning strategy for Demon in game GA iff

a ∗ a | a ∈ N ⊆ N \A.

It is clear that at most one of Angel and Demon can have a winning strategy. Suppose thatin the contrary both have one, say, d for Angel and a for Demon. Then d ∗ a ∈ A, since d iswinning for Angel, and d ∗ a 6∈ A, since a is winning for Demon. So this assumption does notmake sense.

We have a look at Banach-Mazur games, another formulation of games which is sometimesmore convenient. Each Banach-Mazur game can be transformed into a game which we havedefined above.

Banach-Mazurgame

Before discussing it, it will be convenient to introduce some notation. Let a, b ∈ S, hence aand b are finite sequences of natural numbers. We say that a b iff a is an initial piece of

Page 79: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.7. GAMES 67

b (including a = b), so there exists c ∈ S with b = ac; c is denoted by b/a. If we want toexclude equality, we write a ≺ b.

Example 1.7.2 The game is played over S, a subset B ⊆ N indicates a winning situation.Angel plays a0 ∈ S, Demon plays b0 with a0 b0, then Angel plays a1 with a0b0 a1,etc. Angel wins this game iff the finite sequence a0b0a1b1 . . . converges to an infinite sequencex ∈ B.

We encode this game in the following way. S is countable by Proposition 1.2.5, so write thisset as S = rn | n ∈ N. Put

A := 〈w0, w1, . . .〉 | rw0 rw1 rw2 . . . converges to a sequence in B.

It is then immediate that Angel has a strategy for winning the Banach-Mazur game iff it hasone for winning that game GA. ,

Consequently, this class of games will be called Banach-Mazur games throughout (we willencounter other games as well).

1.7.1 Determined Games

Games in which neither Angel nor Demon have a winning strategy are somewhat, well, inde-terminate and might be avoided. We see some similarity between a strategy and the selectionargument in (AC), because a strategy selects an answer among several possible choices, whilea choice function picks elements, each from a given set. This intuitive similarity will beinvestigated now.

Definition 1.7.3 A game GA is called determined iff either Angel or Demon has a winningstrategy.

Suppose that each game GA is determined, no matter what set A ⊆ N we chose, then we candefine a choice function for countable families of non-empty subsets of N .

Theorem 1.7.4 Assume that each game is determined. Then there exists a choice functionfor countable families of non-empty subsets of N .

Proof 1. Let F := Xn | n ∈ N be a countable family with ∅ 6= Xn ⊆ N for n ∈ N. Wewill define a function f : F → N such that f(Xn) ∈ Xn for all n ∈ N. The idea is to play agame which Angel cannot win, hence for which Demon has a winning strategy. To be specific,

The idea isto play a

gameif Angel plays 〈a0, a1, . . .〉 and Demon plays b := 〈b0, b1, . . .〉, then Demon wins iff b ∈ Xa0 .Since by assumption Demon has a winning strategy a, we then put

f(Xn) := 〈n, 0, 0, . . .〉 ∗ a.

2. Let us look at this idea. Put

A := 〈x0, x1, . . .〉 ∈ N | 〈x1, . . .〉 6∈ Xx0.

Suppose that Angel starts upon playing a0. Since Xa0 6= ∅, Demon can take an arbitraryb ∈ Xa0 and plays 〈b0, b1, . . .〉. Hence Angel cannot win, so B has a winning strategy a.

Page 80: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

68 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

3. Now look at 〈n, 0, 0, . . .〉 ∗ a 6∈ A, because a is a winning strategy. From the definition of Awe see that this is an element of Xn, so we have found a choice function indeed. a

The space N looks a bit artificial, just as a mathematical object to play around with. Butthis is not the case. It can be shown that there exists a bijection N → R with some de-sirable properties (we will not enter into this construction, however, but refer the reader toSection 4.4).

We state as a consequence of Theorem 1.7.4:

Corollary 1.7.5 Assume that each game is determined. Then there exists a choice functionfor countable families of non-empty subsets of R. a

Let us fix the assumption on the existence of a winning strategy for either Angel or Demonin an axiom, the Axiom of Determinacy.

(AD) Each game is determined.

Given Corollary 1.7.5, the relationship of the Axiom of Determinacy to the Axiom of Choiceis of interest.

Does (AD) imply (AC)?

The hope of establishing this are shattered, however, by this observation.

Proposition 1.7.6 If (AC) holds, there exists A ⊆ N such that GA is not determined.

Before entering the proof, we observe that the set of all strategies SA for Angel resp. SD forDemon has the same cardinality as the power set P (N) of N.

Proof 0. We have to find a set A ⊆ N such that neither Angel nor Demon has a winningstrategy for the game GA. By (AC), the sets SA resp. SD can be well-ordered, by theobservation just made we can write

SA = dα | α < ω1,SD = aα | α < ω1.

1. We will construct now disjoint sets X = xα | α < ω1 ⊆ N and Y = yα | α < ω1 ⊆ Nindexed by α | α < ω1, which will help define the game. Suppose xβ and yβ are definedfor all β < α. Then, because α is countable, the sets xβ | β < α and yβ | β < α arecountable as well, and there are uncountably many b ∈ N such that dα ∗ b 6∈ xβ | β < α.Take one of them and put yα := dα ∗ b. For the same reason, there are uncountably manya ∈ N such that a ∗ aα 6∈ yβ | β ≤ α; take one of them and put xα := a ∗ aα.

2. Clearly, X and Y are disjoint. Angel does not have a strategy for winning game GX .Suppose it has a winning strategy d, so that d = dα for some α < ω1. But yα = dα ∗ b 6∈ Xby construction, which is a contradiction. One similarly shows that Demon cannot have awinning strategy for game GX . Hence this game is not determined. a

Page 81: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.7. GAMES 69

1.7.2 Proofs Through Games

Games are a tool for proofs. The basic idea is to attach a statement to a game, and if Angelhas a strategy for winning the game, then the statement is established, otherwise it is not.Hence we have to encode the statement in such a way that this mechanism can be used, butwe have also to establish a scenario in which to argue. The formulation chosen suggests thatAngel has to have a winning strategy for winning a game, which in turn suggests that weassume a framework in which games are determined. But we have seen above that this is notwithout conflicts when considering (AC).

This section is devoted to establish that every subset of the unit interval is Lebesgue measur-able, provided each game is determined. We have seen in Example 1.6.37 that (AC) impliesthat there exists a set which is not Lebesgue measurable. Hence “it is natural to postulatethat Determinacy holds to the extent that it does not contradict the Axiom of Choice”, asT. Jech writes in his massive treatise of Set Theory [Jec06, p. 628].

We will discuss another example for using games as tools for proofs when we formulate aBanach-Mazur game for establishing properties of a topological space in Section 3.5.2.

The Goal. We want to show that each subset of the unit interval is measurable, providedeach game is determined. This is based on the observation that it is sufficient to establishthat λ∗(A) > 0 or λ∗([0, 1] \A) > 0 for each and every subset A ⊆ [0, 1], where, as above, λ isLebesgue measure on the unit interval. This is the reason, which follows from the constructionfor Vitali’s equivalence relation, see Example 1.6.37.

Lemma 1.7.7 Assume that there exists a subset of the unit interval which is not λ-measurab-le. Then there exists a subset M ⊆ [0, 1] with λ∗(M) = 0 and λ∗(M) = 1. a

The Basic Approach. Given an arbitrary subset X ⊆ [0, 1], we will define a game GXsuch that if there exists a winning strategy for Angel, then we can find a measurable subsetA ⊆ X which has positive Lebesgue measure (hence λ∗(X) > 0). If there exists, however, awinning strategy for Demon, then we can find a measurable subset A ⊆ [0, 1] with positiveLebesgue measure such that A ∩X = ∅ (hence λ∗([0, 1] \A) > 0).

Little Helpers. We need some preparations before we start. So let’s get on with it now asnot to interrupt the flow of discussion later on.

Lemma 1.7.8 Let (Fn)n∈N be a sequence of non-empty subsets of the unit interval [0, 1] suchthat

1. Each Fn is a finite union of closed intervals.

2. The sequence is monotonically decreasing, hence F1 ⊇ F1 ⊇ . . ..

3. The sequence of diameters diam(Fn) := supx,y∈Fn |x− y| tends to zero. Diameter

Then there exists a unique p ∈ [0, 1] with p =⋂n∈N Fn.

Page 82: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

70 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

Proof 1. It is clear from the last condition that there can be at most one point in theintersection of this sequence: Suppose there are two distinct points p, q in this intersection,then δ := |p− q| > 0. But there exists some n0 ∈ N with diam(Fm) < δ for all m ≥ n0. Thisis a contradiction.

2. Assume that⋂n∈N Fn = ∅. Put Gn := [0, 1] \ Fn, then Gn is the union of a finite number

of open intervals, say Gn = Hn,1 ∪ . . . ∪ Hn,kn , and [0, 1] ⊆⋃n∈NGn. By the Heine-Borel

Theorem 1.5.46 there exist a finite set of intervals Hni,ji with 1 ≤ i ≤ r, 1 ≤ ji ≤ kni suchthat [0, 1] ⊆

⋃ri=1Hni,ji . Because the sequence of the Fn decreases, the sequence (Gn)n∈N is

increasing, so we find an index N such that Hni,ji ⊆ GN for 1 ≤ i ≤ r, 1 ≤ ji ≤ kni . But thismeans [0, 1] ⊆ GN , thus FN = ∅, contradicting the assumption that all Fn are non-empty.a

A more general version of this Lemma will be found in Proposition 3.5.25 and in Propo-sition 3.6.67; interesting enough, Proposition 3.5.25 will also be an important tool in theinvestigation of the game in Section 3.5.2.

Another preparation concerns the convergence of an infinite product.

Lemma 1.7.9 Let (an)n∈N be a sequence of real numbers with 0 < an < 1 for all n ∈ N.Then the following statements are equivalent

1.∏i∈N(1− ai) := limn→∞

∏ni=1(1− ai) exists.

2.∑

n∈N an converges.

Proof One shows easily by induction on n that

n∏i=1

(1− ai) > 1−( n∑

1=1

an)

for n ≥ 2. Since 0 < an < 1 for all n ∈ N, this implies the equivalence. a

This has an interesting consequence, viz., that we have a positive infinite product, providedthe corresponding series converges. To be specific:

Corollary 1.7.10 Let (an)n∈N be a sequence of real numbers with 0 < an < 1 for all n ∈ N.Then the following statements are equivalent

1.∏i∈N(1− ai) is positive.

2.∑

n∈N an converges.

Proof 1. Put Qk :=∏ki=1 ai, Q := limk→∞Qk. Assume that

∑n∈N an converges, then there

exists m ∈ N such that dm :=∑∞

i=m < 1. Hence we have

QnQm

> 1− (am+1 + . . .+ an) > 1− dm

for n > m, so that

Q = limk→∞

Qk > Qm · (1− dm) > 0.

Page 83: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.7. GAMES 71

2. On the other hand, if the series diverges, then we can find an index m for N ∈ N such thata1 + . . .+ an > N whenever n > m. Hence

∏n∈N

1

1− an≤ lim

k→∞

1

1− (a1 + . . .+ ak)= 0

a

This observation will be helpful when looking at our game.

The Game. Before discussing the game proper, we set its stage. Fix a sequence (rn)n∈N ofpositive reals such that

∑n∈N rn < 1 and 1/2 > r1 > r2 > . . . .

Let k ∈ N be a natural number, and define Jk as the collection of sets S with these proper-ties

• S ⊆ [0, 1] is a finite union of closed intervals with rational end points.

• The diameter diam(S) = supx,y∈S |x− y| of S is smaller than 1/2k.

• The Lebesgue measure λ(S) of S is r1 · . . . · rk.

Put J0 := [0, 1] as the mandatory first draw of Angel. Note that Jk is countable for allk ∈ N, so that

⋃k≥0 Jk is countable as well by Proposition 1.2.6 (it is important to note this

was proved without using (AC)).Important

note

The game starts. We fix X ⊆ [0, 1] as the Great Prize; this is the set we want to investigate.Angels starts with choosing the unit interval S0 := [0, 1], Demon chooses a set S1 ∈ J1,then Angel chooses a set S2 ∈ J2 with S2 ⊂ S1 ⊂ S0, Demon chooses a set S3 ⊂ S2 withS3 ∈ J3, and so on. In this way, the game defines a decreasing sequence (Sn)n∈N of closed The gamesets the diameter of which tends to zero. By Lemma 1.7.8 there exists exactly one point pwith p ∈

⋂n∈N Sn. If p ∈ [0, 1] \X, then Angel wins, if p ∈ X, then Demon wins.

Analysis of the Game. First note that we will not encode the game into a syntactic formaccording to the definition of GA. This would require much encoding and decoding betweenthe formal representation and the informal one, so that the basic ideas might get lost. Sincelife is difficult enough, we stick to the informal representation, trusting that the formal oneis easily be derived from it, and focus on the ideas behind the game. After all, we want toprove something through this game which is not entirely trivial.

The game spawns a tree rooted at S0 := [0, 1] with offsprings all those elements S1 of J1 withS1 ⊂ S0. Continuing inductively, assume that we are at node Sk ∈ Jk, then this node has allelements S ∈ Jk+1 as offsprings for which S ⊂ Sk holds. Consequently, the tree’s depth willbe infinite, because the game continues forever. The offsprings of a node will be investigatedin a moment.

Page 84: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

72 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

We define for easier discussion the sets

Wk := 〈S0, . . . , Sk〉 ∈k∏i=0

Ji | S0 ⊃ S1 ⊃ . . . ⊃ Sk,

W∗ :=⋃k≥0

Wk

as the set of all paths which are possible in this game. Hence Angel chooses initially S0 = [0, 1],Demon chooses S1 ∈ J1 with S1 ⊂ S0 (hence 〈S0, S1〉 ∈ W1), so that 〈S0, S1, S2〉 ∈ W2, etc.W2n is the set of all possible paths after the n-th draw of Angel, and W2n+1 yields the stateof affairs after the n-th move of Demon.

For an analysis of strategies, we will fix now k ∈ N and a map Γ : Wk → Jk+1 suchthat Γ (S0, . . . , Sk) ⊂ Sk, hence 〈S0, . . . , Sk, Γ (S0, . . . , Sk)〉 = 〈S0, . . . , Sk〉 a Γ (S0, . . . , Sk) ∈Wk+1. Just to have a handy name for it, call such a map admissible at k.

Γ admissi-ble

Lemma 1.7.11 Assume Γ is admissible at k. Given 〈S0, . . . , Sk〉 ∈ Wk, there exists m ∈ Nand a finite sequence Tk+1,i ∈ Jk+1 for 1 ≤ i ≤ m such that

1. Tk+1,i ⊂ Sk for all i,

2. λ(⋃m

i=1 Γ (S0, . . . , Sk, Tk+1,i))≥ λ(Sk) · (1− 2 · rk+1),

3. The sets Γ (S0, . . . , Sk, Tk+1,1), . . . , Γ (S0, . . . , Sk, Tk+1,m) are mutually disjoint.

Proof The sets Tk+1,i are defined by induction. Assume that Tk+1,1, . . . , Tk+1,j is alreadydefined for j ≥ 0, put

Rj := Sk \j⋃i=1

Γ (S0, S1, . . . , Sk, Tk+1,i).

Now we have two possible cases: either

(‡) λ(Rj) > 2 · λ(Sk) · rk+1

or this inequality is false. Note that λ(Sk) = r1 · . . . · rk, and 1/2 > rk > rk+1, so that initiallyλ(R0) = λ(Sk) > 2·λ(Sk)·rk+1. Now assume that (‡) holds. Because Sk is the union of a finitenumber of closed intervals, and because Rj does not exhaust Sk, we conclude that Rj containsa subset P with diameter diam(P ) ≤ diam(Rj) ≤ 2−(k+1) such that λ(P ) > λ(Sk). We canselect P in such a way that it is a finite union of intervals. Then there exists Tk+1,j+1 ⊆ Pwhich belongs to Jk+1. Take it. Then the first property is satisfied.

This process continues until inequality (‡) becomes false, which gives the second property.Because

Γ (S0, . . . , Sk, Tk+1,i) ⊂ Tk+1,i ⊂ Sk \i−1⋃j=1

Γ (S0, . . . , Sk, Tk+1,j),

we conclude that the sets Γ (S0, . . . , Sk, Tk+1,1), . . . , Γ (S0, . . . , Sk, Tk+1,m) are mutually dis-joint. a

Now let a be a strategy for Demon, hence a :⋃k≥0W2k+1 →

⋃k≤0 J2k is a map such that

a(S0, . . . , S2k) ⊂ S2k. If the game’s history at time k is given by the path 〈S0, . . . , S2k〉 with

Page 85: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.7. GAMES 73

Angels having played S2k as a last move, then the game continues with a(S0, . . . , S2k) asDemon’s next move, so that the new path is just 〈S0, . . . , S2k〉 a a(S0, . . . , S2k).

Let’s see what happens if Angel selects the next move according to Lemma 1.7.11. Initially,Angels plays S0, then Demon plays a(S0), so that the game’s history is now 〈S0〉 a a(S0);let T0,1, . . . , T0,m0 be the sets selected according to Lemma 1.7.11 for this history, then thepossible continuations in the game are ti := 〈S0〉 a a(S0) a T0,i for 1 ≤ i ≤ m0, so thatDemon’s next move is ti a a(ti), thus

Ka

(〈S0〉 a a(S0)

):= 〈S0〉 a a(S0) a T0,i a a

(〈S0〉 a a(S0) a T0,i)

)| 1 ≤ i ≤ m0 ∈ W3

describes all possible moves for Demon in this scenario. Given a, this depends on S0 as thehistory up to that moment, and on the choice to Angel’s moves according to Lemma 1.7.11. Tosee the pattern, consider Demon’s next move. Take t = 〈t0, t1, t2, t3〉 ∈ Ka(S0 a a(S0)), thena(t) ∈ J4 with a(t) ⊂ t3, and choose T1,1, . . . , T1,m1 according to Lemma 1.7.11 as possiblenext moves for Angel, so that the set of all possible moves for Demon given this history is anelement of the set

Ka(t) = Ka(〈t1, t2, t3〉 a a(t1, t2, t3)) := t a T1,i a a(t a T1,i) | 1 ≤ i ≤ m1 ∈ W5.

This provides a window into what is happening. Now let us look at the broader picture.Denote by for t ∈ Wn by Ja(t) the set t a Tn,1, . . . , t a Tn,m, where Tn,1, . . . , Tn,m aredetermined for t and a according to Lemma 1.7.11 as the set of all possible moves for Angel.Hence given history t, Ja(t) is the set of all possible paths for which Demon has to providethe next move. Then put

Jna :=⋃

s2∈Ja(〈S0〉aa(S0))

⋃s4∈Ja(s2aa(s2))

. . .⋃

s2(n−1)∈Ja(s2(n−2)aa(s2(n−2)))

Ja(s2(n−1) a a(s2(n−1))

)with

J1a = Ja(〈S0〉 a a(S0)).

Finally, define

An :=⋃a(s2n) | s2n ∈ Jna .

Hence Jna contains all possible moves of Angel at time 2n, so that An tells us what Demoncan do at time 2n+ 1. These are the important properties of (An)n∈N:

Lemma 1.7.12 We have for all n ∈ N

1. λ(An) ≥ r1 ·∏ni=1(1− 2 · r2i)

2. An+1 ⊂ An

Proof 1. The second property follows immediately from Lemma 1.7.11, so we will focus onthe first property. It will be proved by induction on n. We infer from Lemma 1.7.11 that thesets a(s2n) are mutually disjoint, when s2n runs through Jna

2. The induction begins at n = 1. We obtain immediately from Lemma 1.7.11 that

λ(A1) = λ(⋃a(s2) | s2 ∈ Ja(〈S0〉 a a(S0))

)≥ r1 · (1− 2 · r2)

Page 86: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

74 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

(set Γ := a and k = 1).

2. Induction step n→ n+ 1. We infer from Lemma 1.7.11 that

(†) λ(⋃a(s2(n+1) | s2(n+1) ∈ Ja(s2n)

)≥ λ(a(s2n+1)) · (1− 2 · r2(n+1)).

Disjointness then implies

λ(An+1) =∑

s2n∈Jna

λ(⋃a(s2(n+1) | s2(n+1) ∈ Ja(s2n)

)≥

∑s2n∈Jna

λ(a(s2n)) · (1− 2 · r2(n+1)) (inequality (†))

= λ( ⋃s2n∈Jna

a(s2n+1))· (1− 2 · r2(n+1)) (disjointness)

= λ(An) · (1− 2 · r2(n+1)) (induction hypothesis)

≥ r1 ·n+1∏i=1

(1− 2 · r2i)

a

Now we are getting somewhere — we show that we can find for every element in⋂n∈NAn

a strategy so that the moves of Angel and of Demon converge to this point. To be morespecific:

Lemma 1.7.13 Assume that Demon adopts strategy a. For every point p ∈⋂n∈NAn there

exists for Angel a strategy dp with this property: If Angel plays dp and Demon plays a, then⋂∞i=0 Si = p, where S0, S1, . . . are the consecutive moves of the players.

Proof The sets s2n ∈ Jna are mutually disjoint for fixed n, so we find a unique sequences′2n ∈ Jna for which p ∈ a(s′2n). Represent s′2n = 〈S0, . . . , S2n〉, and let dp be a strategy forAngel such that dp(〈S0, . . . , S2n−1〉 a a(S0, . . . , S2n−1)) = S2n holds. Thus p ∈

⋂n∈N Sn, if

Angel plays dp and Demon plays a. a

Now let a be a winning strategy for Demon, then A :=⋂n∈NAn ⊆ [0, 1] \ X; this is the

outcome if Angel plays one of the strategy in dp | p ∈ A. There may be other strategiesfor Angel than the one described above, but no matter how Angel plays the game, we willend up in an element not in X. This implies λ(A) ≤ λ∗([0, 1] \ X). But we know fromLemma 1.7.12 that λ(A) ≥ r1 ·

∏∞i=1(1 − 2 · r2i) > 0 by Lemma 1.7.9 and it corollary,

consequently, λ∗([0, 1] \X) > 0. If, however, Demon does not have a winning strategy, thenAngel has one, if we assume that the game is determined. The argumentation is completelythe same as above to show that λ∗(X) > 0.

Thus we have shown:

Theorem 1.7.14 If each game is determined, then each subset of the unit interval is λ-measurable. a

We have seen that games are not only just for fun, but are a tool for investigating propertiesof sets. In fact, one can define games for investigating many topological properties, not all aslaborious as the one we have defined above.

Page 87: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.8. WRAPPING IT UP 75

1.8 Wrapping it up

This summarizes the discussion. Some hints for further information can be found in theBibliographic Notes. The Lecture Note [Her06] by H. Herrlich and the list of its references isa particularly rich rich bag of suggestions for further reading. The discussion in P. Taylor’sbook [Tay99, p. 58] (“Although we, at the cusp of the century, now reject Choice ...”) is alsoworth looking at, albeit from a completely different angle.

This is a small diagram indicating the dependencies discussed here.

(WO)

G

(ZL)

o

(AC)

l

<

5

s ¡(AD)

(MI) (MP)

The symbols provide a guide to the corresponding statements.

G Theorem 1.4.20

o Proposition 1.5.1

l Theorem 1.5.38

< Proposition 1.5.3

5 Existence of a non-determined game under (AC), Proposition 1.7.6

s Choice function for countable families under (AD), Theorem 1.7.4

¡ Measurability of every subset of [0, 1] under (AD), Theorem 1.7.14

1.9 Bibliographic Notes

This chapter contains mostly classical topics. The proof of Cantor’s enumeration and itsconsequences for enumerating the set of all finite sequences of natural numbers in takenfrom [KM76], so is the discussion of ordinals. Jech’s representation [Jec06] has been helpful aswell, so was [Gol96]. The books by Davis [Dav00] and by Aczel [Acz00] contain some grippinghistorical information on the subject of early set theory; the monograph [COP01] discussesimplications for computing when the Axiom of Foundations (page 5) is weakened.

Term rewriting is discussed in [BN98]; reduction systems (Example 1.3.7) are central to it.Aumanns’s classic [Aum54], unfortunately not as frequently used as this valuable book shouldbe, helped in discussing Boolean algebras, and the proof for the general distributive law inBoolean algebras as well as some exercises has been taken from [Bir67] and from [DP02],see also [Sta97] for finite lattices. The discussion on measure extension follows quite closelythe representation given in the first three chapters of [Bil95] with an occasional glimpseat [Els99] and the awesome [Bog07]. Finally, games are introduced as in [Jec06, Chapter 33],see also [Jec73]; the game-theoretic proofs on measurability are taken from [MS64]. Infiniteproducts are discussed at length in the delightful textbook [Bro08], see also [Chr64]. A general

Page 88: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

76 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

source for this chapter was the exposition by H. Herrlich [Her06], providing a tour d’horizon.A graphic view of the foundational crisis in mathematics at the turn to the twentieth centuryand B. Russell’s attempts to solve it can be found in [DPP09].

1.10 Exercises

Exercise 1.1 The Axiom of Pairs defines 〈a, b〉 :=a, a, b

, see page 4. Using the

Axioms of ZFC, show that 〈a, b〉 = 〈a′, b′〉 iff a = a′ and b = b′.

Exercise 1.2 Show that f : A → B is injective iff f−1 : P (B) → P (A) is surjective; f issurjective iff f−1 is injective.

Exercise 1.3 Define ≤d on N as in Example 1.3.2. Show that p is prime iff p is a minimalelement of N \ 1.

Exercise 1.4 Order S := P (N) \ N by inclusion as in Example 1.4. Show that the setA := 2 · n, 2 · n+ 1 | n ∈ N is bounded in S; does A have a smallest lower bound?

Exercise 1.5 Let S be a set, H : P (S) → P (S) be an order preserving map. Show thatA :=

⋃X ∈ P (S) | X ⊆ H(X) is a fixed point of H, i.e., satisfies H(A) = A. Moreover, A

is the greatest fixed point of H,i.e., if H(Y ) = Y , then Y ⊆ A.

Exercise 1.6 Let f : X → Y and g : Y → X be maps. Using Exercise 1.5 show thatthere exists disjoint subsets X1 and X2 of X and disjoint subsets Y1 and Y2 of Y such thatX = X1 ∪ X2, Y = Y1 ∪ Y2 and f

[X1

]= Y1, g

[Y2

]= X2. The map A 7→ X \ g

[Y \ f

[A]]

might be helpful.

This decomposition is attributed to S. Banach.

Exercise 1.7 Use Exercise 1.6 for a proof of the Schroder-Bernstein-Theorem 1.2.1.

Exercise 1.8 Show that there exist for the bijection J from Proposition 1.2.3 surjective mapsK : N0 → N0 and L : N0 → N0 such that J

(K(x), L(x)

)= x,K(x) ≤ x and L(x) ≤ x for all

x ∈ N0.

Exercise 1.9 Construct a bijection from the power set P (N) to R using the Schroder-Bernstein-Theorem 1.2.1.

Exercise 1.10 Show using the Schroder-Bernstein Theorem 1.2.1 that the set of all subsetsof N of size exactly 2 is countable. Extend this result by showing that the set of all subsets ofN of size exactly k is countable. Can you show without (AC) that the set of all finite subsetsof N is countable?

Exercise 1.11 Show that ω1 := α | α is a countable ordinal is an ordinal. Show that ω1

is not countable.

Exercise 1.12 An undirected graph G = (V,E) has nodes V and (undirected) edges E. Anedge connecting nodes x and y be written as x, y; note x 6= y. A subgraph G′ = (G′, E′) of Gis a graph with G′ ⊆ G and E′ ⊆ E. G is k-colorable iff there exists a map c : V → 1, . . . , ksuch that c(x) 6= c(y), whenever x, y ∈ E is an edge in G. Show that G is k-colorable iffeach of its finite subgraphs is k-colorable.

Page 89: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

1.10. EXERCISES 77

Exercise 1.13 Let B be a Boolean algebra, and define a b := (a ∨ b) ∧ −(a ∧ b), as inSection 1.5.6. Show that (B,,∧) is a commutative ring.

Exercise 1.14 Complete the proof of Proposition 1.5.3 by proving that (AC) ⇒ (MP).

Exercise 1.15 Complete the proof of Lemma 1.5.36.

Exercise 1.16 Using the notation of Section 1.5.1, show that v∗ |= ϕ iff ϕ ∈ M∗ usinginduction on the structure of ϕ.

Exercise 1.17 Do Exercise 1.12 again, using the Compactness Theorem 1.5.8.

Exercise 1.18 Let F be an ultrafilter over an infinite set S. Show that if F contains a finiteset, then there exists s ∈ S such that F = Fs, the ultrafilter defined by s, see Example 1.5.16.

Exercise 1.19 Consider this ordered set

>

K L M

I J

F G H

C D E

A B

Discuss whether these values exist, and deter-mine their values, if they do:

D ∧ E; D ∨ E; (J ∧ L) ∧K(L ∧ E) ∧K; L ∧ (E ∧K); C ∧ E(C ∨D) ∨ E; supC,D,E; C ∨ (D ∨ E)J ∧ (L ∨K); (J ∧ L) ∨ (J ∧K); C ∨G

Exercise 1.20 Let L be a lattice. An element s ∈ L with s 6= ⊥ is called join irreducible iffs = r ∨ t implies s = r or s = t. Element t covers element s iff s < t, and if s < v < t for noelement v. Show that if L is a finite distributive lattice, then s is join irreducible iff s coversexactly one element.

Exercise 1.21 Let P be a finite partially ordered set. Show that the down set I ∈ D(P ) isjoin irreducible iff I is a principal down set.

Exercise 1.22 Identify the join-irreducible elements in P (S) for S 6= ∅ and in the lattice ofall open intervals

]a, b[ | a ≤ b

, both ordered by inclusion.

Exercise 1.23 Show that in a lattice one distributive law implies the other one.

Exercise 1.24 Give an example for a down set in a lattice which is not an ideal.

Exercise 1.25 Show that in a distributive lattice c ∧ x = c ∧ y and c ∨ x = x ∨ y for some cimplies x = y.

Exercise 1.26 Let (G,+) be a commutative group. Show that the subgroups form a latticeunder the subset relation.

Exercise 1.27 Assume that in lattice L there exists for each a, b ∈ L the relative pseudo-complement a→ b of a in b; this is the largest element x ∈ L such that a ∧ x ≤ b. Show that

Page 90: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

78 CHAPTER 1. THE AXIOM OF CHOICE AND SOME OF ITS EQUIVALENTS

a pseudo-complemented lattice is distributive. Furthermore, show that each Boolean algebrais pseudo-complemented. Lattices with pseudo-complements are called Brouwerian latticesor Heyting algebras.

Exercise 1.28 A lattice is called complete iff it contains suprema and infima for arbitrarysubsets. Show that a bounded partially ordered set (L,≤) is a complete lattice if the infimumfor arbitrary sets exists. Conclude that the set of all equivalence relations on a set form acomplete lattice under inclusion.

Exercise 1.29 Let S 6= ∅ be a set, a ∈ S. Compute for the Boolean algebra B := P (S) andthe ideal I := P (S \ a) the factor algebra B/I

Exercise 1.30 Show that a topology τ forms a complete Brouwerian lattice under inclusion.

Exercise 1.31 Given a topological space (X, τ), the following conditions are equivalent forall x, y ∈ X.

1. xa ⊆ ya.

2. x ∈ ya.

3. x ∈ U implies y ∈ U for all open sets U .

Exercise 1.32 Characterize those ideals I in a Boolean algebraB for which the factor algebraB/I consists of exactly two elements.

Exercise 1.33 Let ∅ 6= A ⊆ P (S) be a finite family of sets with S ∈ A, say A =A1, . . . , An. Define AT :=

⋂i∈T Ai ∩

⋂i 6∈T S \Ai for T ⊆ 1, . . . , n.

1. P := AT | ∅ 6= T ⊆ 1, . . . , n, AT 6= ∅ forms a partition of S.

2. ⋃P0 | P0 ⊆ P is the smallest set algebra over S which contains A.

Exercise 1.34 As in Example 1.6.5 on page 48 let X := 0, 1N be the space of all infinitesequences. Put

C := A×∏j>k

0, 1 | k ∈ N, A ∈ P(0, 1k

).

Show that C is an algebra.

Exercise 1.35 Let X and C be as in Exercise 1.34. Show that

µ(A×

∏j>k

0, 1)

:=|A|2k

defines a monotone and countably additive map µ : C → [0, 1] with µ(∅) = 0.

Exercise 1.36 Show that a countably subadditive and monotone set function on a set algebrais additive.

Page 91: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Chapter 2

Categories

Many areas of Mathematics show surprising structural similarities, which suggests that itmight be interesting and helpful to focus on an abstract view, hereby unifying concepts.This view looks at the mathematical objects from the outside and studies the relationshipbetween them, for example groups and homomorphisms, or topological spaces together withcontinuous maps, or ordered sets with monotone maps. The list could be extended. It leadsto the general notion of a category. A category is based on a class of objects together withmorphisms for each two objects. Morphisms can be composed; composition follows laws whichare considered evident and natural.

This approach has considerable appeal to a software engineer as well. In software engineer-ing, the implementation details of a software system are usually not particularly importantfrom an architectural point of view, they are encapsulated in a component. In contrast, therelationship of components to each other is of interest because this knowledge is necessary forcomposing a system from its components. Roughly speaking, the architecture of a softwaresystem is characterized both through its components and their interaction, the static part ofwhich can be described through what we may perceive as morphisms.

This has been recognized fairly early in the software architecture community, witnessed bythe April 1995 issue of the IEEE Transactions on Software Engineering, which was devotedto software architecture and introduced some categorical language in discussing architectures.So the language of categories offers some attractions to software engineers, as can also be seenfrom, e.g., [Bar01, Dob03]. We will also see that the tool set of modal logics, another areawhich is important to software construction, profits substantially from constructions whichare firmly grounded in categories.

We will discuss categories here and introduce the reader to the basic constructions. The worldof categories is too rich to be captured in these few pages, so we have made an attempt toprovide the reader with some basic proficiency in categories, helping in learning her or himto get a grasp on the basic techniques. This modest goal is attained by blending the abstractmathematical development with a plethora of examples. We give a brief overview over thecontents.

79

Page 92: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

80 CHAPTER 2. CATEGORIES

Overview. The definition of a category and a discussion of their most elementary propertiesis found in Section 2.1, examples show that categories are indeed a very versatile and generalinstrument for mathematical modelling. Section 2.2 discusses constructions like products andcoproducts, which are familiar from other contexts, in this new language, and we look atpushouts and pullbacks, first in the familiar context of sets, then in a more general setting.Functors are introduced in Section 2.3 for relating categories to each other, and naturaltransformations permit functors to enter into a relationship. We show also that set valuedfunctors play a special role, which gives occasion to investigate more deeply the hom-sets ofa category. Products and coproducts have an appearance again, but this time as instances ofthe more general concept of limits resp. colimits.

Monads and Kleisli tripels as very special functors are introduced and discussed in Sec-tion 2.4, their relationship is investigated, and some examples are given, which provide anidea about the usefulness of this concept; a small section on monads in the programminglanguage Haskell provides a pointer to the practical use of monads. Next we show thatmonads are generated from adjunctions. This important concept is introduced and discussedin 2.5; we define adjunctions, show by examples that adjunctions are a colorfully bloomingand nourished flower in the garden of Mathematics, and give an alternative formulation interms of units and counits; we then show that each adjunction gives us a monad, and thateach monad also generates an adjunction. The latter part is interesting since it gives theoccasion of introducing the algebras for a monad; we discuss two examples fairly extensively,indicating what such algebras might look like.

While an algebra provides a morphism F a → a, a coalgebra provides a morphism a → F a.This is introduced an discussed in Section 2.6, many examples show that this concept modelsa broad variety of applications in the area of systems. Coalgebras and their properties arestudied, among them bisimulations, a concept which originates from the theory of concurrentsystems and which is captured now coalgebraically. The Kripke models for modal logics pro-vide an excellent play ground for coalgebras, so they are introduced in Section 2.7, examplesshow the broad applicability of this concept (but neighborhood models as a generalizationare introduced as well). We go a bit beyond a mere application of coalgebras and give alsothe construction of the canonical model through Lindenbaum’s construction of maximallyconsistent sets, which, by the way, provide an application of transfinite induction as well. Wefinally show that coalgebras may be put to use when constructing coalgebraic logics, a veryfruitful and general approach to modal logics and their generalizations.

2.1 Basic Definitions

We will define what a category is, and give some examples for categories. It shows that thisis a very general notion, covering also many formal structures that are studied in theoreticalcomputer science. A very rough description would be to say that a category is a bunchof objects which are related to each other, the relationships being called morphisms. Thisgives already the gist of the definition — objects which are related to each other. But therelationship has to be made a bit more precise for being amenable to further investigation.So here is the definition of a category.

Definition 2.1.1 A category K consists of a class |K| of objects and for any objects a, b in

Page 93: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.1. BASIC DEFINITIONS 81

|K| of a set homK(a, b) of morphisms with a composition operation , mapping homK(b, c)×homK(a, b) to homK(a, c) with the following properties

Identity For every object a in |K| there exists a morphism ida ∈ homK(a, a) with f ida =f = idb f , whenever f ∈ homK(a, b),

Associativity If f ∈ homK(a, b), g ∈ homK(b, c) and h ∈ homK(c, d), then h (g f) =(h g) f .

Note that we do not think that a category is based on a set of objects (which would yielddifficulties) but rather than on a class. In fact, if we would insist on having a set of objects,we could not talk about the category of sets, which is an important species for a category. Weinsist, however, on having sets of morphisms, because we want morphisms to be somewhatclearly represented. Usually we write for f ∈ homK(a, b) also f : a→ b, if the context is clear.Thus if f : a → b and g : b → c, then g f : a → c; one may think that first f is applied (orexecuted), and then g is applied to the result of f . Note the order in which the applicationis written down: g f means that g is applied to the result of f . The first postulate saysthat there is an identity morphism ida : a → a for each object a of K which does not havean effect on the other morphisms upon composition, so no matter if you do ida first and thenmorphism f : a → b, or if you do f first and then idb, you end up with the same result as ifdoing only f . Associativity is depicted through this diagram

af //

gf

$$b

g //

hg

::ch // d

Hence if you take the fast train g f from a to c first (no stop at b) and then switch to trainh or if you travel first with f from a to b and then change to the fast train h g (no stop atc), you will end up with the same result.

Given f ∈ homK(a, b), we call object a the domain, object b the codomain of morphismf .

Let us have a look at some examples.

Example 2.1.2 The category Set is the most important of them all. It has sets as its class Setof objects, and the morphisms homSet(a, b) are just the maps from set a to set b. The identitymap ida : a → a maps each element to itself, and composition is just composition of maps,which is associative: (

f (g h))(x) = f

(g h(x)

)= f

(g(h(x))

)=(f g

)(h(x)) =

((f g) h

)(x)

,

The next example shows that one class of objects can carry more than one kind of mor-phisms.

Page 94: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

82 CHAPTER 2. CATEGORIES

Example 2.1.3 The category Rel has sets as its class of objects. Given sets a and b, Relf ∈ homRel(a, b) is a morphism from a to b iff f ⊆ a× b is a relation. Given set a, define

ida := 〈x, x〉 | x ∈ a

as the identity relation, and define for f ∈ homRel(a, b), g ∈ homRel(b, c) the composition as

g f := 〈x, z〉 | there exists y ∈ b with 〈x, y〉 ∈ f and 〈y, z〉 ∈ g

Because existential quantifiers can be interchanged, composition is associative, and ida servesin fact as the identity element for composition. ,

But morphisms do not need to be maps or relations.

Example 2.1.4 Let (P,≤) be a partially ordered set. Define P by taking the class |P | ofobjects as P , and put

homP (p, q) :=

〈p, q〉, if p ≤ q∅, otherwise.

Then idp is 〈p, p〉, the only element of homP (p, p), and if f : p→ q, g : q → r, thus p ≤ q andq ≤ r, hence by transitivity p ≤ r, so that we put g f := 〈p, r〉. Let h : r → s, then

h (g f) = h 〈p, r〉 = 〈p, s〉 = 〈q, s〉 f = (h g) f

It is clear that idp = 〈p, p〉 serves as a neutral element. ,

A directed graph generates a category through all its finite paths. Composition of two pathsis then just their combination, indicating movement from one node to another, possibly viaintermediate nodes. But we also have to cater for the situation that we want to stay in anode.

Example 2.1.5 Let G = (V,E) be a directed graph. Recall that a path 〈p0, . . . , pn〉 is a finitesequence of nodes such that adjacent nodes form an edge, i.e., such that 〈pi, pi+1〉 ∈ E for0 ≤ i < n; each node a has an empty path 〈a, a〉 attached to it, which may or may not be anedge in the graph. The objects of the category F (G) are the nodes V of G, and a morphisma→ b in F (G) is a path connecting a with b in G, hence a path 〈p0, . . . , pn〉 with p0 = a andpn = b. The empty path serves as the identity morphism, the composition of morphism isjust their concatenation; this is plainly associative. This category is called the free category

Free cate-gory

generated by graph G. ,

These two examples base categories on a set of objects; they are instances of small categories.A category is called small iff the objects form a set (rather than a class).

The discrete category is a trivial but helpful example.

Example 2.1.6 Let X 6= ∅ be a set, and define a category K through |K| := X with

homK(x, y) :=

idx, x = y

∅, otherwise

This is the discrete category on X. ,

Page 95: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.1. BASIC DEFINITIONS 83

Algebraic structures furnish a plentiful source for examples. Let us have a look at groups,and at Boolean algebras.

Example 2.1.7 The category of groups has as objects all groups (G, ·), and as morphismsf : (G, ·) → (H, ∗) all maps f : G → H which are group homomorphisms, i.e., for whichf(1G) = 1H (with 1G, 1H as the respective neutral elements), for which f(a−1) = (f(a))−1

and f(a ·b) = f(a)∗f(b) always holds. The identity morphism id(G,·) is the identity map, andcomposition of homomorphisms is composition of maps. Because composition is inheritedfrom category Set, we do not have to check for associativity or for identity.

We do not associate the category with a particular name, it is simple referred to as to thecategory of groups. ,

Categoryof groups

Example 2.1.8 Similarly, the category of Boolean algebras has Boolean algebras as objects,and a morphism f : G→ H for the Boolean algebras G and H is a map f between the carriersets with these properties:

f(−a) = −f(a)

f(a ∧ b) = f(a) ∧ f(b)

f(>) = >

(hence also f(⊥) = ⊥, and f(a ∨ b) = f(a) ∨ f(b)). Again, composition of morphisms iscomposition of maps, and the identity morphism is just the identity map. ,

The next example deals with transition systems. Formally, a transition system is a directedgraph. But whereas discussing a graph puts the emphasis usually on its paths, a transitionsystem is concerned more with the study of, well, the transition between two states, hencethe focus is usually stronger localized. This is reflected when defining morphisms, which, aswe will see, come in two flavors.

Example 2.1.9 A transition system (S, S) is a set S of states together with a transition (S, S)relation S ⊆ S × S. Intuitively, s S s

′ iff there is a transition from s to s′. Transitionsystems form a category: the objects are transition systems, and a morphism f : (S, S)→(T, T ) is a map f : S → T such that s S s

′ implies f(s) T f(s′). This means that atransition from s to s′ in (S, S) entails a transition from f(s) to f(s′) in the transition system(T, T ). Note that the defining condition for f can be written as S ⊆ (f × f)−1

[ T

]with f × f : 〈s, s′〉 7→ 〈f(s), f(s′)〉. ,

The morphisms in Example 2.1.9 are interesting from a relational point of view. We willrequire an additional property which, roughly speaking, makes sure that we not only transporttransitions through morphisms, but that we are also able to capture transitions which emanatefrom the image of a state. So we want to be sure that, if f(s) T t, we obtain this transitionfrom a transition arising from s in the original system. This idea is formulated in the nextexample, it will arise again in a very natural manner in Example 2.6.12 in the context ofcoalgebras.

Example 2.1.10 We continue with transition systems, so we define a category which hastransition systems as objects. A morphism f : (S, S)→ (T, T ) in the present category isa map f : S → T such that for all s, s′ ∈ S, t ∈ T

Page 96: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

84 CHAPTER 2. CATEGORIES

Forward : s S s′ implies f(s) T f(s′) ,

Backward : if f(s) T t′, then there exists s′ ∈ S with f(s′) = t′ and s S s

′.

The forward condition is already known from Example 2.1.9, the backward condition is new.It states that if we start a transition from some f(s) in T , then this transition originates fromsome transition starting from s in S; to distinguish these morphisms from the ones consideredin Example 2.1.9, they are called bounded morphisms. The identity map S → S yields a

Boundedmorphisms

bounded morphism, and the composition of bounded morphisms is a bounded morphismagain. In fact, let f : (S, S) → (T, T ), g : (T, T ) → (U, U ) be bounded morphisms,and assume that g(f(s)) U u′. Then we can find t′ ∈ T with g(t′) = u′ and f(s) T t′,hence we find s′ ∈ S with f(s′) = t′ and s S s

′.

Bounded morphisms are of interest in the study of models for modal logics [BdRV01], seeLemma 2.7.25. ,

The next examples reverse arrows when it comes to define morphisms. The examples so farobserved the effects of maps in the direction in which the maps were defined. We will, however,also have an opportunity operate in the backward direction, and to see what properties theinverse image of a map is supposed to have.

We study this in the context of topological, and of measurable spaces.

Example 2.1.11 Let (S, τ) be a topological space, see Definition 1.5.47; hence τ ⊆ P (S)with ∅, S ∈ τ , τ is closed under finite intersections and arbitrary unions. Given anothertopological space (T, ϑ), a map f : S → T is called τ -ϑ-continuous iff the inverse image of anopen set is open again, i.e., iff f−1

[G]∈ ϑ for all G ∈ τ ; this will be discussed in greater depth

in Section 3.1.1. Category Top of topological spaces has all topological spaces as objects, andTopcontinuous maps as morphisms. The identity (S, τ) → (S, τ) is certainly continuous. Again,it follows that the composition of morphisms yields a morphism, and that their compositionis associative. ,

The next example deals with σ-algebras, which are of course also sets of subsets. Measurabilityis formulated similar to continuity in terms of the inverse rather than the direct image.

Example 2.1.12 Let S be a set, and assume that A is a σ-algebra on S. Then the pair (S,A)is called a measurable space, the elements of the σ-algebra are sometimes called A-measurablesets. The category Meas has as objects all measurable spaces.Meas

Given two measurable spaces (S,A) and (T,B), a map f : S → T is called a morphism ofmeasurable spaces iff f is A-B-measurable. This means that f−1

[B]∈ A for all B ∈ B,

hence the set s ∈ S | f(s) ∈ B is an A-measurable set for each B-measurable set B.Each σ-algebra is a Boolean algebra, but the definition of a morphism of measurable spacesdoes not entail that such a morphism induces a morphism of Boolean algebras, as defined inExample 2.1.8. Measurable maps are rather modelled on the prototype of continuous maps,for which the inverse image of an open set is open again. Replace “open” by “measurable”,then you obtain the definition of a measurable map. Consequently the behavior of f−1 ratherthan the one of f determines whether f belongs to this distinguished set of morphisms.

Page 97: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.1. BASIC DEFINITIONS 85

Thus the A-B-measurable maps f : S → T are the morphisms f : (S,A)→ (T,B) in categoryMeas. The identity morphism on (S,A) is the identity map (this map is measurable becauseid−1

[A]

= A ∈ A for each A ∈ A). Composition of measurable maps yields a measurable mapagain: let f : (S,A)→ (T,B) and g : (T,B)→ (U, C), then (g f)−1

[D]

= f−1[g−1[D]]∈ A,

for D ∈ C, because g−1[D]∈ B. It is clear that composition is associative, since it is based

on composition of ordinary maps. ,

Now that we know what a category is, we begin constructing new categories from givenones. We start by building on category Meas another interesting category, indicating thata category can be used as a building block for another one.

Example 2.1.13 A measurable space (S,A) together with a probability measure µ on A— see Definition 1.6.12 — is called a probability space and written as (S,A, µ). The categoryProb of all probability spaces has — you guessed it — as objects all probability spaces; Proba morphism f : (S,A, µ) → (T,B, ν) is a morphism f : (S,A) → (T,B) in Meas for theunderlying measurable spaces such that ν(B) = µ(f−1

[B]) holds for all B ∈ B. Thus the

ν-probability for event B ∈ B is the same as the µ-probability for all those s ∈ S the imageof which is in B. Note that f−1

[B]∈ A due to f being a morphism in Meas, so that

µ(f−1[B]) is in fact defined. ,

We go a bit further and combine two measurable spaces into a third one; this requires adjustingthe notion of a morphism, which are in this new category basically pairs of morphisms from theunderlying category. This shows the flexibility with which we may — and do — manipulatemorphisms.

Example 2.1.14 Denote for the measurable space (S,A) by P (S,A) the set of all subprob- P (S,A)ability measures. Define

βββA(A, r) := µ ∈ P (S,A) | µ(A) ≥ r,℘℘℘(X,A) := ℘℘℘(A) := σ

(βββA(A, r) | A ∈ A, 0 ≤ r ≤ 1

)Thus βββA(A, r) denotes all probability measures which evaluate the set A not smaller than r,

and ℘℘℘(X,A) collects all these sets into a σ-algebra; ℘℘℘(X,A) is sometimes called the weak-*-σ-algebra or the weak σ-algebra associated with A as the σ-algebra generated by the familyof sets (we will usually omit the carrier set from the notation). This makes

(P (S,A) ,℘℘℘(A)

)βS(A, r),℘℘℘(A)

a measurable space, based on the probabilities over (S,A).

Let (T,B) be another measurable space. A map K : S → P (T,B) is A-℘℘℘(B)-measurable iffs ∈ S | K(s)(B) ≥ r ∈ A for all B ∈ B; this follows from Exercise 2.7. We take as objectsfor our category the triplets

((S,A), (T,B),K

), where (S,A) and (T,B) are measurable spaces

and K : S → P (T,B) is A-℘℘℘(B)-measurable. A morphism

(f, g) :((S,A), (T,B),K

)→((S′,A′), (T ′,B′),K ′

)is a pair of morphisms

f : (S,A)→ (S′,A′) and g : (T,B)→ (T ′,B′)

Page 98: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

86 CHAPTER 2. CATEGORIES

such thatK(s)(g−1

[B′]) = K ′(f(s))(B′)

holds for all s ∈ S and for all B′ ∈ B′.

The composition of morphisms is defined component wise:

(f ′, g′) (f, g) := (f ′ f, g′ g).

Note that f ′ f and g′ g refer to the composition of maps, while (f ′, g′) (f, g) refers tothe newly defined composition in our new spic-and-span category (we should probably useanother symbol, but no confusion can arise, since the new composition operates on pairs). Theidentity morphism for

((S,A), (T,B),K

)is just the pair (idS , idT ). Because the composition

of maps is associative, composition in our new category is associative as well, and because(idS , idT ) is composed from identities, it is also an identity.

This category is sometimes called the category of stochastic relations. ,

Categoryof stochas-tic rela-tions

Before continuing, we introduce commutative diagrams. Suppose that we have in a categoryK morphisms f : a → b and g : b → c. The combined morphism g f is representedgraphically as

af //

gf

b

g

c

If the morphisms h : a→ d and ` : d→ c satisfy gf = `h, we have a commutative diagram;in this case we do not draw out the morphism in the diagonal.

af //

h

gf`h

&&

b

g

d

`// c

We consider automata next, to get some feeling for the handling of commutative diagrams, andas an illustration for an important formalism looked at through the glasses of categories.

Example 2.1.15 Given sets X and S of inputs and states, respectively, an automaton(X,S, δ) is defined by a map δ : X × S → S. The interpretation is that δ(x, s) is thenew state after input x ∈ X in state s ∈ S. Reformulating, δ(x) : s 7→ δ(x, s) is perceivedas a map S → S for each x ∈ X, so that the new state now is written as δ(x)(s); manip-ulating a map with two arguments in this way is called currying and will be examined inExample 2.5.2. The objects of our category of automata are the automata, and an automatonmorphism f : (X,S, δ) → (X,S′, δ′) is a map f : S → S′ such that this diagram commutesfor all x ∈ X:

Sf //

δ(x)

S′

δ′(x)

Sf

// S′

Page 99: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.1. BASIC DEFINITIONS 87

Hence we have f(δ(x)(s)) = δ′(x)(f(s)) for each x ∈ X and s ∈ S (or, in the old notation,f(δ(x, s)

)= δ′

(x, f(s)

)); this means that computing the new state and mapping it through

f yields the same result as computing the new state for the mapped one. The identity mapS → S yields a morphism, hence automata with these morphisms form a category.

Note that morphisms are defined only for automata with the same input alphabet. Thisreflects the observation that the input alphabet is usually given by the environment, whilethe set of states represents a model about the automata’s behavior, hence is at our disposalfor manipulation. ,

Whereas we constructed above new categories from given one in an ad hoc manner, categoriesalso yield new categories systematically. This is a simple example.

Example 2.1.16 Let K be a category; fix an object x on K. The objects of our newcategory are the morphisms f ∈ homK(a, x) for an object a. Given objects f ∈ homK(a, x)and g ∈ homK(b, x) in the new category, a morphism ϕ : f → g is a morphism ϕ ∈ homK(a, b)with f = g ϕ, so that this diagram commutes

aϕ //

f

b

g

x

Composition is inherited from K. The identity idf : f → f is ida ∈ homK(a, a), providedf ∈ homK(a, x). Since the composition in K is associative, we have only to make sure thatthe composition of two morphisms is a morphism again. This can be read off the followingdiagram: (ϕ ψ) h = ϕ (ψ h) = ϕ g = f .

aϕ //

f&&

ψϕ

''b

ψ //

g

c

hxx

x

This category is sometimes called the slice category K/x; the object x is interpreted as an K/xindex, so that a morphism f : a → x serves as an indexing function. A morphism ϕ : a → bin K/x is then compatible with the index operation. ,

The next example reverses arrows while at the same time maintaining the same class ofobjects.

Example 2.1.17 Let K be a category. We define Kop, the category dual to K, in the Kop

following way: the objects are the same as for the original category, hence |Kop| = |K|,and the arrows are reversed, hence we put homKop(a, b) := homK(b, a) for the objects a, b;the identity remains the same. We have to define composition in this new category. Letf ∈ homKop(a, b) and g ∈ homKop(b, c), then g ∗ f := f g ∈ homKop(a, c). It is readilyverified that ∗ satisfies all the laws for composition from Definition 2.1.1.

The dual category is sometimes helpful because it permits to cast notions into a uniformframework. ,

Page 100: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

88 CHAPTER 2. CATEGORIES

Example 2.1.18 Let us look at Rel again. The morphisms homRelop(S, T ) from S to T inRelop are just the morphisms homRel(T, S) in Rel. Take f ∈ homRelop(S, T ), then f ⊆ T×S,hence f t ⊆ S × T , where relation

f t := 〈s, t〉 | 〈t, s〉 ∈ f

is the transposed of relation f . The map f 7→ f t is injective and compatible with composition,moreover it maps homRelop(S, T ) onto homRel(T, S). But this means that Relop is essentiallythe same as Rel. ,

It is sometimes helpful to combine two categories into a product:

Lemma 2.1.19 Given categories K and L, define the objects of K×L as pairs 〈a, b〉, whereK ×La is an object in K, and b is an object in L. A morphism 〈a, b〉 → 〈a′, b′〉 in K × L iscomprised of morphisms a→ a′ in K and b→ b′ in L. Then K ×L is a category. a

We have a closer look at morphisms now. Experience tells us that injective and surjectivemaps are important, so a characterization in a category might be desirable. There is a smallbut not insignificant catch, however. We have seen that morphisms are not always maps,so that we are forced to find a characterization purely in terms of composition and equality,because this is all we have in a category. The following characterization of injective mapsprovides a clue for a more general definition.

Proposition 2.1.20 Let f : X → Y be a map, then these statements are equivalent.

1. f is injective.

2. If A is an arbitrary set, g1, g2 : A→ X are maps with f g1 = f g2, then g1 = g2.

Proof 1 ⇒ 2: Assume f is injective and f g1 = f g2, but g1 6= g2. Thus there existsx ∈ A with g1(x) 6= g2(x). But f(g1(x)) = f(g2(x)), and since f is injective, g1(x) = g2(x).This is a contradiction.

2 ⇒ 1: Assume the condition holds, but f is not injective. Then there exists x1 6= x2 withf(x1) = f(x2). Let A := ∗ and put g1(∗) := x1, g2(∗) := x2, thus f(x1) = (f g1)(∗) =(f g2)(∗) = f(x2). By the condition g1 = g2, thus x1 = x2. Another contradiction. a

This leads to a definition of the category version of injectivity as a morphism which is cance-lable on the left.

Definition 2.1.21 Let K be a category, a, b objects in K. Then f : a → b is called amonomorphism (or a mono) iff whenever g1, g2 : x → a are morphisms with f g1 = f g2,then g1 = g2.Mono

These are some simple properties of monomorphisms, which are also sometimes called monos.

Lemma 2.1.22 In a category K,

1. The identity is a monomorphism.

2. The composition of two monomorphisms is a monomorphism again.

3. If k f is a monomorphism for some morphism k, then f is a monomorphism.

Page 101: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.1. BASIC DEFINITIONS 89

Proof The first part is trivial. Let f : a→ b and g : b→ c both monos. Assume h1, h2 : x→ awith h1(gf) = h2(gf). We want to show h1 = h2. By associativity (h1g)f = (h2g)f.Because f is a mono, we conclude h1 g = h2 g, because g is a mono, we see h1 = h2.

Finally, let f : a → b and k : b → c. Assume h1, h2 : x → a with f h1 = f h2. We claimh1 = h2. Now f h1 = f h2 implies k f h1 = k f h2. Thus h1 = h2. a

In the same way we characterize surjectivity purely in terms of composition, exhibiting a nicesymmetry between the two notions.

Proposition 2.1.23 Let f : X → Y be a map, then these statements are equivalent.

1. f is surjective.

2. If B is an arbitrary set, g1, g2 : Y → B are maps with g1 f = g2 f , then g1 = g2.

Proof 1 ⇒ 2: Assume f is surjective, g1 f = g2 f , but g1(y) 6= g2(y) for some y. If wecan find x ∈ X with f(x) = y, then g1(y) = (g1 f)(x) = (g2 f)(x) = g2(y), which would bea contradiction. Thus y /∈ f

[X], hence f is not onto.

2 ⇒ 1: Assume that there exists y ∈ Y with y /∈ f[X]. Define g1, g2 : Y → 0, 1, 2

through

g1(y) :=

0, if y ∈ f

[X],

1, otherwise.g2(y) :=

0, if y ∈ f

[X],

2, otherwise.

Then g1 f = g2 f , but g1 6= g2. This is a contradiction. a

This suggests a definition of surjectivity through a morphism which is right cancelable.

Definition 2.1.24 Let K be a category, a, b objects in K. Then f : a → b is called aepimorphism (or an epi) iff whenever g1, g2 : b→ c are morphisms with g1 f = g2 f , then Epig1 = g2.

These are some important properties of epimorphisms, which are sometimes called epis:

Lemma 2.1.25 In a category K,

1. The identity is an epimorphism.

2. The composition of two epimorphisms is an epimorphism again.

3. If f k is an epimorphism for some morphism k, then f is an epimorphism.

Proof We sketch the proof only for the the third part:

g1 f = g2 f ⇒ g1 f k = g2 f k ⇒ g1 = g2.

a

This is a small application of the decomposition of a map into an epimorphism and a monomor-phism.

Page 102: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

90 CHAPTER 2. CATEGORIES

Proposition 2.1.26 Let f : X → Y be a map. Then there exists a factorization of f intom e with e an epimorphism and m a monomorphism.

Xf //

e

Y

M

m

>>

The idea of the proof may best be described in terms of X as inputs, Y as outputs of system f .We collect all inputs with the same functionality, and assign each collection the functionalitythrough which it is defined.

Proof Define

ker (f) := 〈x1, x2〉 | f(x1) = f(x2)

(the kernel of f). This is an equivalence relation on X:ker (f)

• reflexivity : 〈x, x〉 ∈ ker (f) for all x,

• symmetry : if 〈x1, x2〉 ∈ ker (f) then 〈x2, x1〉 ∈ ker (f);

• transitivity : 〈x1, x2〉 ∈ ker (f) and 〈x2, x3〉 ∈ ker (f) together imply 〈x1, x3〉 ∈ ker (f).

Define

e :

X → X/ker (f),

x 7→ [x]ker(f)

then e is an epimorphism. In fact, if g1 e = g2 e for g1, g2 : X/ker (f)→ B for some set B,then g1(t) = g2(t) for all t ∈ X/ker (f), hence g1 = g2.

Moreover

m :

X/ker (f) → Y

[x]ker(f) 7→ f(x)

is well defined, since if [x]ker(f) = [x′]ker(f), then f(x) = f(x′). It is also a monomorphism. Infact, if m g1 = m g2 for arbitrary g1, g2 : A → X/ker (f) for some set A, then f(g1(a)) =f(g2(a)) for all a, hence 〈g1(a), g2(a)〉 ∈ ker (f). But this means [g1(a)]ker(f) = [g2(a)]ker(f)

for all a ∈ A, so g1 = g2. Evidently f = m e. a

Looking a bit harder at the diagram, we find that we can say even more, viz., that thedecomposition is unique up to isomorphism.

Corollary 2.1.27 If the map f : X → Y can be written as f = e m = e′ m′ withepimorphisms e, e′ and monomorphisms m,m′, then there is a bijection b with e′ = b e andm = m′ b.

Proof Since the composition of bijections is a bijection again, we may and do assume withoutloss of generality that e : X → X/ker (f) maps x to its class [x]ker(f), and thatm : X/ker (f)→Y maps [x]ker(f) to f(x). Then we have this diagram for the primed factorization e′ : X → Z

Page 103: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.1. BASIC DEFINITIONS 91

and m′ : Z → Y :

Xf //

e

$$

e′

))

Y

X/ker (f)

m

::

OO

bZ

m′

NN

Note that

[x]ker(f) 6=[x′]ker(f)

⇔ f(x) 6= f(x′)

⇔ m′(e′(x)) 6= m′(e′(x′))

⇔ e(x) 6= e(x′)

Thus defining b([x]ker(f)) := e′(x) gives an injective map X/ker (f)→ Z. Given z ∈ Z, there

exists x ∈ X with e′(x) = z, hence b([x]ker(f)) = z, thus b is onto. Finally, m′(b([x]ker(f))) =

m′(e′(x)) = f(x) = m([x]ker(f)). a

This factorization of a morphism is called an epi/mono factorization, and we just have shownthat such a factorization is unique up to isomorphisms (a.k.a. bijections in Set).

The following example shows that epimorphisms are not necessarily surjective, even if theyare maps.

Example 2.1.28 Recall that (M, ∗) is a monoid iff ∗ : M ×M → M is associative with aneutral element 0M . For example, (Z,+) and (N, ·) are monoids, so is the set X∗ of all stringsover alphabet X with concatenation as composition, and the empty string as neutral element.A morphism f : (M, ∗) → (N, ‡) is a map f : M → N such that f(a ∗ b) = f(a)‡f(b), andf(0M ) = 0N .

Now let f : (Z,+)→ (N, ‡) be a morphism, then f is uniquely determined by the value f(1).This is so sincem = 1+. . .+1 (m times) form > 0, thus f(m) = f(1+. . .+1) = f(1)‡ . . . ‡f(1).Also f(−1)‡f(1) = f(−1 + 1) = f(0), so f(−1) is inverse to f(1), hence f(−m) is inverseto f(m). Consequently, if two morphisms map 1 to the same value, then the morphisms areidentical.

Note that the inclusion i : x 7→ x is a morphism i : (N0,+) → (Z,+). We claim that iis an epimorphism. Let g1 i = g2 i for some morphisms g1, g2 : (Z,+) → (M, ∗). Theng1(1) = (g1i)(1) = (g2i)(1) = g2(1). Hence g1 = g2. Thus epimorphisms are not necessarilysurjective. ,

Composition induces maps between the hom sets of a category, which we are going to studynow. Specifically, let K be a fixed category, take objects a and b and fix for the moment amorphism f : a → b. Then g 7→ f g maps homK(x, a) to homK(x, b), and h 7→ h f mapshomK(b, x) to homK(a, x) for each object x. We investigate g 7→ f g first. Define for anobject x of K the map homK(x, ·)

homK(x, f) :

homK(x, a) → homK(x, b)

g 7→ f g

Page 104: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

92 CHAPTER 2. CATEGORIES

Then homK(x, f) defines a map between morphisms, and we can determine through this mapwhether or not f is a monomorphism.

Lemma 2.1.29 f : a→ b is a monomorphism iff homK(x, f) is injective for all objects x.

Proof This follows immediately from the observation

f g1 = f g2 ⇔ homK(x, f)(g1) = homK(x, f)(g2).

a

Dually, define for an object x of K the maphomK(·, x)

homK(f, x) :

homK(b, x) → homK(a, x)

g 7→ g f

Note that we change directions here: f : a → b corresponds to homK(b, x) → homK(a, x).Note also that we did reuse the name homK(·, ·); but no confusion should arise, because thesignature tells us which map we specifically have in mind. Lemma 2.1.29 seems to suggestthat surjectivity of homK(f, x) and f being an epimorphism are related. This, however, isnot the case. But try this:

Lemma 2.1.30 f : a→ b is an epimorphism iff homK(f, x) is injective for each object x.

Proof homK(f, x)(g1) = homK(f, x)(g2) is equivalent to g1 f = g2 f . a

Not surprisingly, an isomorphism is an invertible morphism; this is described in our scenarioas follows.

Definition 2.1.31 f : a→ b is called an isomorphism iff there exists a morphism g : b→ asuch that g f = ida and f g = idb.

It is clear that morphism g is in this case uniquely determined: let g and g′ be morphisms withthe property above, then we obtain g = g idb = g (f g′) = (g f) g′ = ida g′ = g′.

When we are in the category Set of sets with maps, an isomorphism f is bijective. In fact,let g be chosen to f according to Definition 2.1.31, then

h1 f = h2 f ⇒ h1 f g = h2 f g ⇒ h1 = h2,f g1 = f g2 ⇒ g f g1 = g f g2 ⇒ g1 = g2,

so that the first line makes f an epimorphism, and the second one a monomorphism.

The following lemma is often helpful (and serves as an example of the popular art of diagramchasing).

Lemma 2.1.32 Assume that in this diagram

af //

k

bg //

`

c

m

x r

// y s// z

the outer diagram commutes, that the leftmost diagram commutes, and that f is an epimor-phism. Then the rightmost diagram commutes as well.

Page 105: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.2. ELEMENTARY CONSTRUCTIONS 93

Proof In order to show that m g = s ` it is enough to show that m g f = s ` f ,because we then can cancel f , since f is an epi. But now

(m g) f = m (g f)

= (s r) k (commutativity of the outer diagram)

= s (r k)

= s (` f) (commutativity of the leftmost diagram)

= (s `) f

Now cancel f . a

2.2 Elementary Constructions

In this section we deal with some elementary constructions, showing mainly how some im-portant constructions for sets can be carried over to categories, hence are available in moregeneral structures. Specifically, we will study products and sums (coproducts) as well aspullbacks and pushouts. We will not study more general constructs at present, in particularwe will not have a look at limits and colimits. Once products and pullbacks are understood,the step to limits should not be too complicated, similarly for colimits, as the reader can seein the brief discussion in Section 2.3.3.

We fix a category K.

2.2.1 Products and Coproducts

The Cartesian product of two sets is just the set of pairs. In a general category we do nothave a characterization through sets and their elements at our disposal, so we have to fillthis gap by going back to morphisms. Thus we require a characterization of product throughmorphisms. The first thought is using the projections 〈x, y〉 7→ x and 〈x, y〉 7→ y, since a paircan be reconstructed through its projections. But this is not specific enough. An additionalcharacterization of the projections is obtained through factoring: if there is another pair ofmaps pretending to be projections, they better be related to the “genuine” projections. Thisis what the next definition expresses.

Definition 2.2.1 Given objects a and b in K. An object c is called the product of a and biff

1. there exist morphisms πa : c→ a and πb : c→ b,

2. for each object d and morphisms αa : d → a and αb : d → b there exists a uniquemorphism ρ : d→ c such that αa = πa ρ and αb = πb ρ.

Morphisms πa and πb are called projections to a resp. b.

Thus αa and αb factor uniquely through πa and πb. Note that we insist on having a uniquefactor, and that the factor should be the same for both pretenders. We will see in a minutewhy this is a sensible assumption. If it exists, the product of objects a and b is denoted bya× b; the projections πa and πb are usually understood and not mentioned explicitly.

Page 106: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

94 CHAPTER 2. CATEGORIES

This diagram depicts the situation:

dαa

yy

ρ !

αb

%%

(Product)

a cπaoo

πb// b

Lemma 2.2.2 If the product of two objects exists, it is unique up to isomorphism.

Proof Let a and b be the objects in question, also assume that c1 and c2 are products withmorphisms πi,a → a and πi,b → b as the corresponding morphisms, i = 1, 2.

Because c1 together with π1,a and π1,b is a product, we find a unique morphism ξ : c2 → c1

with π2,a = π1,a ξ and π2,b = π1,b ξ; similarly, we find a unique morphism ζ : c1 → c2 withπ1,a = π2,a ζ and π1,b = π2,b ζ.

c1π1,a

xx

π1,b

&&ζ

a b

c2

π2,a

ff

π2,b

88ξ

GG

Now look at ξ ζ: We obtain

π1,a ξ ζ = π2,a ζ = π1,a

π1,b ξ ζ = π2,b ζ = π1,b

Then uniqueness of the factorization implies that ξ ζ = idc1 , similarly, ζ ξ = idc2 . Thus ξand ζ are isomorphisms. a

Let us have a look at some examples, first and foremost sets.

Example 2.2.3 Consider the category Set with maps as morphisms. Given sets A and B,we claim that A × B together with the projections πA : 〈a, b〉 7→ a and πB : 〈a, b〉 7→ bconstitute the product of A and B in Set. In fact, if ϑA : D → A and ϑB : D → B are mapsfor some set D, then ρ : d 7→ 〈ϑA(d), ϑB(d)〉 satisfies the equations ϑA = πA ρ, ϑB = πB ρ,and it is clear that this is the only way to factor, so ρ is uniquely determined. ,

If sets carry an additional structure, this demands additional attention.

Example 2.2.4 Let (S,A) and (T,B) be measurable spaces, so we are now in the categoryMeas of measurable spaces with measurable maps as morphisms, see Example 2.1.12. Forconstructing a product, one is tempted to take the product S×T is Set and to find a suitableσ-algebra C on S×T such that the projections πS and πT become measurable. Thus C wouldhave to contain π−1

S

[A]

= A × T and π−1T

[B]

= S × B for each A ∈ A and each B ∈ B.Because a σ-algebra is closed under intersections, C would have to contain all measurablerectangles A×B with sides in A and B. So let’s try this:

C := σ(A×B | A ∈ A, B ∈ B

)

Page 107: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.2. ELEMENTARY CONSTRUCTIONS 95

Then clearly πS : (S × T, C)→ (S,A) and πT : (S × T, C)→ (T,B) are morphisms in Meas.Now let (D,D) be a measurable space with morphisms ϑS : D → S and ϑT : D → T , anddefine ρ as above through ρ(d) := 〈ϑS(d), ϑT (d)〉. We claim that ρ is a morphism in Meas.It has to be shown that ρ−1

[C]∈ D for all C ∈ C. We have a look at all elements of C for

which this is true, and we define

G := C ∈ C | ζ−1[C]∈ D.

We plan to use the principle of good sets from page 62. If we can show that G = C, we aredone. It is evident that G is a σ-algebra, because the inverse image of a map respects countableBoolean operations. Moreover, if A ∈ A and B ∈ B, then ρ−1

[A×B

]= ϑ−1

S

[A]∩ϑ−1

T

[B]∈ D,

so that A×B ∈ G, provided A ∈ A, B ∈ B. But now we have

C = σ(A×B | A ∈ A, B ∈ B) ⊆ G ⊆ C.

Hence each element of C is a member of G, thus ρ is D-C-measurable. Again, the constructionshows that there is no other possibility for defining ρ. Hence we have shown that two objectsin the category Meas of measurable spaces with measurable maps have a product.

The σ-algebra C which is constructed above is usually denoted by A⊗B and called the product A⊗ Bσ-algebra of A and B. ,

The next example requires a forward reference to the construction of the product measure inSection 4.9. I suggest that you skip them on first reading; just to make things easier, I havemarked them with a special symbol.

Example 2.2.5 s While the category Meas has products, the situation changes whentaking probability measures into account, hence when changing to the category Prob ofprobability spaces, see Example 2.1.13. The product measure µ⊗ν of two probability measuresµ on σ-algebra A resp. ν on B is the unique probability measure on the product σ-algebraA⊗ B with (µ⊗ ν)(A×B) = µ(A) · ν(B) for A ∈ A and B ∈ B, in particular,

πS : (S × T,A⊗ B, µ⊗ ν)→ (S,A, µ),

πT : (S × T,A⊗ B, µ⊗ ν)→ (T,B, ν)

are morphisms in Prob.

Now define S := T := [0, 1] and take in each case the smallest σ-algebra which is generatedby the open intervals as a σ-algebra, hence put A := B := B([0, 1]); λ is Lebesgue measureon B([0, 1]). Define

κ(E) := λ(x ∈ [0, 1] | 〈x, x〉 ∈ E)

for E ∈ A ⊗ B (well, we have to show that x ∈ [0, 1] | 〈x, x〉 ∈ E ∈ B([0, 1]), wheneverE ∈ A⊗ B. This is relegated to Exercise 2.10). Then

πS : (S × T,A⊗ B, κ)→ (S,A, λ)

πT : (S × T,A⊗ B, κ)→ (T,B, λ)

are morphisms in Prob, because

κ(π−1S

[G]) = κ(G× T ) = λ(x ∈ [0, 1] | 〈x, x〉 ∈ G× T) = λ(G)

Page 108: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

96 CHAPTER 2. CATEGORIES

for G ∈ B(S). If we could find a morphism f : (S×T,A⊗B, κ)→ (S×T,A⊗B, λ⊗λ) factoringthrough the projections, f would have to be the identity; thus would imply that κ = λ ⊗ λ,but this is not the case: take E := [1/2, 1]× [0, 1/3], then κ(E) = 0, but (λ⊗ λ)(E) = 1/6.

Thus we conclude that the category Prob of probability spaces does not have products. ,

The product topology on the Cartesian product of the carrier sets of topological spaces isfamiliar, open sets in the product just contain open rectangles. The categorical view is thatof a product in the category of topological spaces.

Example 2.2.6 Let (T, τ) and (S, ϑ) be topological spaces, and equip the Cartesian productS×T with the product topology τ ×ϑ. This is the smallest topology on S×T which containsall the open rectangles G × H with G ∈ τ and H ∈ ϑ. We claim that this is a productin the category Top of topological spaces. In fact, the projections πS : S × T → S andπT : S × T → T are continuous, because, e.g, π−1

S

[G]

= G× T ∈ τ × ϑ. Now let (D, ρ) be atopological space with continuous maps ξS : D → S and ξT : D → T , and define ζ : D → S×Tthrough ζ : d 7→ 〈ξS(d), ξT (d)〉. Then ζ−1

[G × H

]= ξ−1

S

[G]∩ ξ−1

T

[H]∈ ρ, and since the

inverse image of a topology under a map is a topology again, ζ : (D,D) → (S × T, τ × ϑ)is continuous. Again, this is the only way to define a morphism ζ so that ξS = πS ζ andξT = πT ζ. ,

The category coming from a partially ordered set from Example 2.1.4 is investigated next.

Example 2.2.7 Let (P,≤) be a partially ordered set, considered as a category P . Leta, b ∈ P , and assume that a and b have a product x in P . Thus there exist morphismsπa : x → a and πb : x → b, which means by the definition of this category that x ≤ a andx ≤ b hold, hence that x is a lower bound to a, b. Moreover, if y is such that there aremorphisms τa : y → a and τb : y → b, then there exists a unique σ : y → x with τa = πa σand τb : πb σ. Translated into (P,≤), this means that if y ≤ a and y ≤ b, then y ≤ x(morphisms in P are unique, if they exist). Hence the product x is just the greatest lowerbound of a, b.

So the product corresponds to the infimum. This example demonstrates again that productsdo not necessarily exist in a category, and, if they exist, are not always what one would expect.,

Given morphisms f : x → a and g : x → b, and assuming that the product a × b exists, wewant to “lift” f and g to the product, i. e., we want to find a morphism h : x → a × b withf = πa h and g = πb h. Let us see how this is done in Set: Here f : X → A and g : X → Bare maps, and one defines the lifted map h : X → A × B through h : x 7→ 〈f(x), g(x)〉, sothat the conditions on the projections is satisfied. The next lemma states that this is alwayspossible in a unique way.

Lemma 2.2.8 Assume that the product a × b exists for the objects a and b. Let f : x → aand g : x → b be morphisms. Then there exists a unique morphism q : x → a × b such thatf = πa q and g = πb q. Morphism q is denoted by f × g.

Page 109: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.2. ELEMENTARY CONSTRUCTIONS 97

Proof The diagram looks like this:

xf

||

g

""q !

a a× bπaoo πb // b

Because f : x → a and g : x → b, there exists a unique q : x → a × b with f = πa q andg = πb q. This follows from the definition of the product. a

Let us look at the product through our homK-glasses. If a × b exists, and if ζa : d → a andζb : d→ b are morphisms, we know that there is a unique ξ : d→ a×b rendering this diagramcommutative

dζa

ww

ξ !

ζb

''a a× bπaoo

πb// b

Thus the map

pd :

homK(d, a)× homK(d, b) → homK(d, a× b)〈ζa, ζb〉 7→ ξ

is well defined. In fact, we can say more

Proposition 2.2.9 pd is a bijection.

Proof Assume ξ = pd(f, g) = pd(f′, g′). Then f = πa ξ = f ′ and g = πb ξ = g′.

Thus 〈f, g〉 = 〈f ′, g′〉. Hence pd is injective. Similarly, one shows that pd is surjective: Leth ∈ homK(d, a × b), then πa h : d → a and πb h : d → b are morphisms, so there existsa unique h′ : d → a × b with πa h′ = πa h and πb h′ = πb h. Uniqueness implies thath = h′, so h occurs in the image of pd. a

Let us consider the construction dual to the product.

Definition 2.2.10 Given objects a and b in category K, the object s together with morphismsia : a → s and ib : b → s is called the coproduct(or the sum) of a and b iff for each object twith morphisms ja : a→ t and jb : b→ t there exists a unique morphism r : s→ t such thatja = r ia and jb = r ib. Morphisms ia and ib are called injections, the coproduct of a andb is denoted by a+ b.

This is the corresponding diagram:

t (Coproduct)

a

ja

99

ia// s

r !

OO

b

jb

ee

iboo

These are again some simple examples.

Example 2.2.11 Let (P,≤) be a partially ordered set, and consider category P , as in Ex-ample 2.2.7. The coproduct of the elements a and b is just the supremum supa, b. This is

Page 110: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

98 CHAPTER 2. CATEGORIES

shown with exactly the same arguments which have been used in Example 2.2.7 for showingthe the product of two elements corresponds to their infimum. ,

And then there is of course category Set.

Example 2.2.12 Let A and B be disjoint sets. Then S := A ∪B together with

iA :

A → S

a 7→ aiB :

B → S

b 7→ b

form the coproduct of A and B. In fact, if T is a set with maps jA : A→ T and jB : B → T ,then define

r :

S → T

s 7→ jA(a), if s = iA(a),

s 7→ jB(b), if s = iB(b)

Then jA = r iA and jB = r iB, and these definitions are the only possible ones.

Note that we needed for this construction to work disjointness of the participating sets.Consider for example A := −1, 0, B := 0, 1 and let T := −1, 0, 1 with jA(x) := −1,jB(x) := +1. No matter where we embed A and B, we cannot factor jA and jB uniquely.

If the sets are not disjoint, we first do a preprocessing step and embed them, so that theembedded sets are disjoint. The injections have to be adjusted accordingly. So the followingconstruction would work: Given sets A and B, define S := 〈a, 1〉 | a ∈ A ∪ 〈b, 2〉 | b ∈ Bwith iA : a 7→ 〈a, 1〉 and b 7→ 〈b, 2〉. Note that we do not take a product like S × 1, butrather use a very specific construction; this is so since the product is determined uniquelyonly by isomorphism, so we might not have gained anything by using that product. Of course,one has to be sure that the sum is not dependent in an essential way on this embedding. ,

The question of uniqueness is answered through this observation. It relates the coproduct inK to the product in the dual category Kop (see Example 2.1.17).

Proposition 2.2.13 The coproduct s of objects a and b with injections ia : a → s andib : b → s in category K is the product in category Kop with projections ia : s →op a andis : s→op b.

Proof Revert in diagram (Coproduct) on page 97 to obtain diagram (Product) onpage 94. a

Corollary 2.2.14 If the coproduct of two objects in a category exists, it is unique up toisomorphisms.

Proof Proposition 2.2.13 together with Lemma 2.2.2. a

Let us have a look at the coproduct for topological spaces.

Example 2.2.15 Given topological spaces (S, τ) and (T, ϑ), we may and do assume that Sand T are disjoint. Otherwise wrap the elements of the sets accordingly; put

A† := 〈a, 1〉 | a ∈ A,B‡ := 〈b, 2〉 | b ∈ B,

Page 111: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.2. ELEMENTARY CONSTRUCTIONS 99

and consider the topological spaces (S†, G† | G ∈ τ) and (T ‡, H‡ | H ∈ ϑ) instead of(S, τ) and (T, ϑ). Define on the coproduct S+T of S and T in Set with injections iS and iTthe topology

τ + ϑ := W ⊆ S + T | i−1S

[W]∈ τ and i−1

T

[W]∈ ϑ.

This is a topology: Both ∅ and S+T are members of τ +ϑ, and since τ and ϑ are topologies,τ + ϑ is closed under finite intersections and arbitrary unions. Moreover, both iS : (S, τ) →(S + T, τ + ϑ) and iT : (T, ϑ)→ (S + T, τ +H) are continuous; in fact, τ + ϑ is the smallesttopology on S + T with this property.

Now assume that jS : (S, τ) → (R, ρ) and jT : (T, ϑ) → (R, ρ) are continuous maps, and letr : S + T → R be the unique map determined by the coproduct in Set. Wouldn’t is be niceif r would be continuous? Actually, it is. Let W ∈ ρ be open in R, then i−1

S

[r−1[W]]

=(r iS)−1

[W]

= j−1S

[W]∈ τ , similarly, i−1

S

[r−1[W]]∈ ϑ, thus by definition, r−1

[W]∈ τ+ϑ.

Hence we have found the factorization jS = r iS and jT = r iT in the category Top. Thisfactorization is unique, because it is inherited from the unique factorization in Set. Hencewe have shown that Top has finite coproducts. ,

A similar construction applies to the category of measurable spaces.

Example 2.2.16 Let (S,A) and (T,B) be measurable spaces; we may assume again that thecarrier sets S and T are disjoint. Take the injections iS : S → S + T and iT : T → S + Tfrom Set. Then

A+ B := W ⊆ S + T | i−1S

[W]∈ A and i−1

T

[W]∈ B

is a σ-algebra, iS : (S,A)→ (S + T,A+B) and iT : (T,B)→ (S + T,A+B) are measurable.The unique factorization property is established in exactly the same way as in Top. ,

Example 2.2.17 Let us consider the category Rel of relations, which is based on sets asobjects. If S and T are sets, we again may and do assume that they are disjoint. ThenS + T = S ∪ T together with the injections

IS := 〈s, iS(s)〉 | s ∈ S,IT := 〈t, iT (t)〉 | t ∈ T

form the coproduct, where iS and iT are the injections into S+T from Set. In fact, we haveto show that we can find for given relations qS ⊆ S ×D and qT ⊆ T ×D a unique relationQ ⊆ (S + T ) × D with qS = IS Q and qT = IT Q. The choice is fairly straightforward:Define

Q := 〈iS(s), q〉 | 〈s, q〉 ∈ qS ∪ 〈iT (t), q〉 | 〈t, q〉 ∈ qT .

Thus

〈s, q〉 ∈ IS Q⇔ there exists x with 〈s, x〉 ∈ IS and 〈x, q〉 ∈ Q⇔ 〈s, q〉 ∈ qS .

Hence qS = IS Q, similarly, qT = IT Q. It is clear that no other choice is possible.

Consequently, the coproduct is the same as in Set. ,

We have just seen in a simple example that dualizing, i.e., going to the dual category, isvery helpful. Instead of proving directly that the coproduct is uniquely determined up to

Page 112: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

100 CHAPTER 2. CATEGORIES

isomorphism, if it exists, we turned to the dual category and reused the already establishedresult that the product is uniquely determined, casting it into a new context. The duality,however, is a purely structural property, it usually does not help us with specific constructions.This became apparent when we constructed the coproduct of two sets; it did not help here atall that we knew how to construct the product of two sets, even though product and coproductare intimately related through dualization. We will make the same observation when we dealwith pullbacks and pushouts.

2.2.2 Pullbacks and Pushouts

Sometimes one wants to complete the square as in the diagram below on the left handside:

b

g

a

f// c

di2 //

i1

b

g

a

f// c

Hence one wants to find an object d together with morphisms i1 : d → a and i2 : d → brendering the diagram on the right hand side commutative. This completion should be ascoarse as possible in the following sense. If we have another objects, say, e with morphismsj1 : e→ a and j2 : e→ b such that f j1 = g j2, then we want to be able to uniquely factorthrough i1 and i2.

This is captured in the following definition.

Definition 2.2.18 Let f : a→ c and g : b→ c be morphisms in K with the same codomain.An object d together with morphisms i1 : d → a and i2 : d → b is called a pullback of f andg iff

1. f i1 = g i2,

2. If e is an object with morphisms j1 : e→ a and j2 : e→ b such that f j1 = g j2, thenthere exists a unique morphism h : e→ d such that j1 = i1 h and j2 = i2 h.

If we postulate the existence of the morphism h : e → d, but do not insist on its uniqueness,then d with i1 and i2 is called a weak pullback.

A diagram for a pullback looks like this

e j2

!!

j1

##

h

! d

i2 //

i1

b

g

a

f// c

Page 113: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.2. ELEMENTARY CONSTRUCTIONS 101

It is clear that a pullback is unique up to isomorphism; this is shown in exactly the sameway as in Lemma 2.2.2. Let us have a look at Set as an important example to get a firstimpression on the inner workings of a pullback.

Example 2.2.19 Let f : X → Z and g : Y → Z be maps. We claim that

P := 〈x, y〉 ∈ X × Y | f(x) = g(y)

together with the projections πX : 〈x, y〉 7→ x and πY : 〈x, y〉 7→ y is a pullback for f and g.

Let 〈x, y〉 ∈ P , then

(f πX)(x, y) = f(x) = g(y) = (g πY )(x, y),

so that the first condition is satisfied. Now assume that jX : T → X and jY : T → Ysatisfies f(jX(t)) = g(jY (t)) for all t ∈ T . Thus 〈jX(t), jY (t)〉 ∈ P for all t, and definingr(t) := 〈jX(t), jY (t)〉, we obtain jX = πX r and jY = πY r. Moreover, this is the onlypossibility to define a factor map with the desired property.

An interesting special case occurs for X = Y and f = g. Then P = ker (f), so that the kernelof a map occurs as a pullback in category Set. ,

As an illustration for the use of a pullback construction, look at this simple statement.

Lemma 2.2.20 Assume that d with morphisms ia : d → a and ib : d → b is a pullback forf : a→ c and g : b→ c. If g is a mono, so is ia.

Proof Let g1, g2 : e→ d be morphisms with ia g1 = ia g2. We have to show that g1 = g2

holds. If we know that ib g1 = ib g2, we may use the definition of a pullback and capitalizeon the uniqueness of the factorization. But let’s see.

From ia g1 = ib g2 we conclude f ia g1 = f ib g2, and because f ia = g ib, we obtaing ib g1 = g ib g2. Since g is a mono, we may cancel on the left of this equation, and weobtain, as desired, ib g1 = ib g2.

But since we have a pullback, there exists a unique h : e→ d with ia g1 = ia h(= ia g2

)and ibg1 = ibh

(= ibg2

). We see that the morphisms g1, g2 and h have the same properties

with respect to factoring, so they must be identical by uniqueness. Hence g1 = h = g2, andwe are done. a

This is another simple example for the use of a pullback in Set.

Example 2.2.21 Let R be an equivalence relation on a set X with projections π1 : 〈x1, x2〉 7→x1; the second projection π2 : R→ X is defined similarly. Then

R

π1

π2 // X

ηR

X ηR// X/R

(with ηR : x 7→ [x]R) is a pullback diagram. In fact, the diagram commutes. Let α, β : M → Xbe maps with αηR = βηR, thus [α(m)]R = [β(m)]R for all m ∈M ; hence 〈α(m), β(m)〉 ∈ Rfor all m. The only map ϑ : M → R with α = π1 ϑ and β = π2 ϑ is ϑ(m) := 〈α(m), β(m)〉.,

Page 114: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

102 CHAPTER 2. CATEGORIES

Pullbacks are compatible with products in a sense which we will make precise in a moment.Before we do that, however, we need an auxiliary statement:

Lemma 2.2.22 Assume that the products a × a′ and b × b′ exist in category K. Givenmorphisms f : a→ b and f ′ : a′ → b′, there exists a unique morphism f × f ′ : a× a′ → b× b′such that

πb f × f ′ = f πaπb′ f × f ′ = f ′ πa′

Proof Apply the definition of a product to the morphisms f πa : a× a′ → b and f ′ πa′ :a× a′ → b′. a

The morphism f × f ′ constructed in the lemma renders both parts of this diagram commu-tative.

a

f

a× a′

f×f ′

πaoo πa′ // a′

f ′

b b× b′πboo

πb′// b′

Denoting this morphism as f × f ′, we note that × is overloaded for morphisms, a look at do-mains and codomains indicates without ambiguity, however, which version is intended.

Quite apart from its general interest, this is what we need Lemma 2.2.22 for.

Lemma 2.2.23 Assume that we have these pullbacks

af //

g

b

h

a′f ′ //

g′

b′

h′

c

k// d c′

k′// d′

Then this is a pullback diagram as well

a× a′ f×f ′ //

g×g′

b× b′

h×h′

c× c′k×k′

// d× d′

Proof 1. We show first that the diagram commutes. It is sufficient to compute the projec-tions, from uniqueness then equality will follow. Allora:

πd (h× h′) (f × f ′) = (h πa) (f × f ′) = h f πaπd (k × k′) (g × g′) = k πc (g × g′) = k g πa = h f πa.

A similar computation is carried out for πa′ .

2. Let j : t → c × c′ and ` : t → b × b′ be morphisms such that (k × k′) j = (h × h′) `,then we claim that there exists a unique morphism r : t → a × a′ such that j = (g × g′) r

Page 115: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.2. ELEMENTARY CONSTRUCTIONS 103

and ` = (f × f ′) r. The plan is to obtain r from the projections and then show that thismorphism is unique.

3. We show that this diagram commutes

tπb`

))πcj

a b

h

ck

// d

We have

k (πc j) = (k πc) j = (πd k × k′) j

= πd (k × k′ j) (‡)= πd (h× h′ `)

= (πd h× h′) ` = (h πb) `= h (πb `)

In (‡) we use Lemma 2.2.22. Using the primed part of Lemma 2.2.22 we obtain k′ (πc′ j) =h′ (πb′ `).

Because the left hand side in the assumption is a pullback diagram, there exists a uniquemorphism ρ : t→ a with πc j = g ρ, πb ` = f ρ. Similarly, there exists a unique morphismρ′ : t→ a′ with πc′ j = g′ ρ′, πb′ ` = f ′ ρ′.

4. Put r := ρ× ρ′, then r : t→ a× a′, and we have this diagram

a′g′ // c′

t

ρ′77

ρ''

ρ×ρ′ // a× a′πa′

OO

πa

g×g′ // c× c′πc′

OO

πc

a g

// c

Henceπc (g × g′) (ρ× ρ′) = g πa (ρ× ρ′) = g ρ = πc j,πc′ (g × g′) (ρ× ρ′) = g′ πa′ (ρ× ρ′) = g′ ρ′ = πc′ j.

Because a morphism into a product is uniquely determined by its projections, we concludethat (g × g′) (ρ× ρ′) = j. Similarly, we obtain (f × f ′) (ρ× ρ′) = `.

5. Thus r = ρ × ρ′ can be used for factoring; it remains to be shown that this is the onlypossible choice. In fact, let σ : t→ a×a′ be a morphism with (g×g′)σ = j and (f×f ′)σ = `,then it is enough to show that πa σ has the same properties as ρ, and that πa′ σ has thesame properties as ρ′. Calculating the composition with g resp. f , we obtain

g πa σ = πc (g × g′) σ = πc j,f πa σ = πd (f × f ′) σ = πb `.

Page 116: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

104 CHAPTER 2. CATEGORIES

This implies πa σ = ρ by uniqueness of ρ, the same argument implies πa′ σ = ρ′. But thismeans σ = ρ× ρ′, and uniqueness is established. a

Let’s dualize. The pullback was defined so that the upper left corner of a diagram is filled inan essentially unique way, the dual construction will have to fill the lower right corner of adiagram in the same way. But by reversing arrows, we convert a diagram in which the lowerright corner is missing into a diagram without an upper left corner:

a

// c ⇒ c

b b // a

The corresponding construction is called a pushout.

Definition 2.2.24 Let f : a → b and g : a → c be morphisms in category K with the samedomain. An object d together with morphisms pb : b→ d and pc : c→ d is called the pushoutof f and g iff these conditions are satisfied:

1. pb f = pc g

2. if qb : b→ e and qc : c→ e are morphisms such that qb f = qc g, then there exists aunique morphism h : d→ e such that qb = h pb and qc = h qc.

This diagram obviously looks like this:

ag //

f

b

pb qb

c pc//

qc 00

d!

h e

It is clear that the pushout of f ∈ homK(a, b) and g ∈ homK(a, c) is the pullback of f ∈homKop(b, a) and of g ∈ homKop(c, a) in the dual category. This, however, does not reallyprovide assistance when constructing a pushout. Let us consider specifically the category Setof sets with maps as morphisms. We know that dualizing a product yields a sum, but it isnot quite clear how to proceed further. The next example tells us what to do.

Example 2.2.25 We are in the category Set of sets with maps as morphisms now. Considermaps f : A→ B and g : A→ C. Construct on the sum B+C the smallest equivalence relationR which contains R0 := 〈(iB f)(a), (iC g)(a)〉 | a ∈ A. Here iB and iC are the injectionsof B resp. C into the sum. Let D the factor space (A+B)/R with pB : b 7→ [iB(b)]R andpC : c 7→ [iC(C)]R. The construction yields pB f = pC g, because R identifies the embeddedelements f(a) and g(a) for any a ∈ A.

Now assume that qB : B → E and qC : C → E are maps with qB f = qC g. Letq : D → E be the unique map with q iB = qB and q iC = qC (Lemma 2.2.8 together withProposition 2.2.13). Then R0 ⊆ ker (q): Let a ∈ A, then

q(iB(f(a))) = qB(f(a)) = qC(g(a)) = q(iC(f(a)),

Page 117: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 105

so that 〈iB(f(a)), iC(f(a))〉 ∈ ker (q). Because ker (q) is an equivalence relation on D, weconclude R ⊆ ker (q). Thus h([x]R) := q(x) defines a map D/R→ E with

h(pB(b)) = h([iB(b)]R) = q(iB(b)) = qB(b),h(pC(c)) = h([iC(c)]R) = q(iC(c)) = qC(c)

for b ∈ B and c ∈ C. It is clear that there is no other way to define a map h with the desiredproperties. ,

So we have shown that the pushout in the category Set of sets with maps exists. To illustratethe construction, consider the pushout of two factor maps. In this example ρ∨ τ denotes thesmallest equivalence relation which contains the equivalence relations ρ and τ .

Example 2.2.26 Let ρ and ϑ be equivalence relations on a set X with factor maps ηρ : X →X/ρ and ηϑ : X → X/ϑ. Then the pushout of these maps is X/(ρ ∨ ϑ) with ζρ : [x]ρ 7→ [x]ρ∨ϑand ζϑ : [x]ϑ 7→ [x]ρ∨ϑ as the associated maps. In fact, we have ηρ ζρ = ηϑ ζϑ, so the firstproperty is satisfied. Now let tρ : X/ρ→ E and tϑ : X/ϑ→ E be maps with tρ ηρ = tϑ ηϑfor a set E, then h : [x]ρ∨ϑ 7→ tρ([x]ρ) maps X/(ρ ∨ ϑ) to E with plainly tρ = h ζρ andtϑ = h ζϑ; moreover, h is uniquely determined by this property. Because the pushout isuniquely determined up to isomorphism by Lemma 2.2.2 and Proposition 2.2.13, we haveshown that the supremum of two equivalence relations in the lattice of equivalence relationscan be computed through the pushout of its components. ,

2.3 Functors and Natural Transformations

We introduce functors which help in transporting information between categories in a waysimilar to morphisms, which are thought to transport information between objects. Of course,we will have to observe some properties in order to capture the intuitive understanding ofa functor as a structure preserving element in a formal way. Functors themselves can berelated, leading to the notion of a natural transformation. Given a category, there is aplethora of functors and natural transformations provided by the hom sets; this is studiedin some detail, first, because it is a built-in for every category, second because the YonedaLemma relates this rich structure to set based functors, which in turn will be used whenstudying adjunctions.

2.3.1 Functors

Loosely speaking, a functor is a pair of structure preserving maps between categories: it mapsone category to another one in a compatible way. A bit more precise, a functor F betweencategories K and L assigns to each object a in category K an object F (a) in L, and itassigns each morphism f : a → b in K a morphism F (f) : F (a) → F (b) in L; some obviousproperties have to be observed. In this way it is possible to compare categories, and to carryproperties from one category to another one. To be more specific:

Definition 2.3.1 A functor F : K → L assigns to each object a in category K an objectF (a) in category L and maps each hom set homK(a, b) of K to the hom set homL(F (a),F (b))of L subject to these conditions

Page 118: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

106 CHAPTER 2. CATEGORIES

• F (ida) = idF (a) for each object a of K,

• if f : a→ b and g : b→ c are morphisms in K, then F (g f) = F (g) F (f).

A functor F : K →K is called an endofunctor on K.

The first condition says that the identity morphisms in K are mapped to the identity mor-phisms in L, and the second condition tell us that F has to be compatible with compositionin the respective categories. Note that for specifying a functor, we have to say what thefunctor does with objects, and how the functor transforms morphisms. By the way, we oftenwrite F (a) as F a, and F (f) as F f .

Let us have a look at some examples. Trivial examples for functors include the identityfunctor IdK , which maps objects resp. morphisms to itself, and the constant functor ∆x foran object x, which maps every object to x, and every morphism to idx.

Example 2.3.2 Consider the category Set of sets with maps as morphisms. Given set X,PX is a set again; define P(f)(A) := f

[A]

for the map f : X → Y and for A ⊆ X, thenPf : PX → PY . We check the laws for a functor:

• P(idX)(A) = idX[A]

= A = idPX(A), so that PidX = idPX .

• let f : X → Y and g : Y → Z, then Pf : PX → PY and Pg : PY → PZ with

(P(g) P(f))(A) = P(g)(P(f)(A)) = g[f[A]]

= g(f(a)) | a ∈ A = (g f)[A]

= P(g f)(A)

for A ⊆ X. Thus the power set functor P is compatible with composition of maps.

,

Example 2.3.3 Given a category K and an object a of K, associate

a+ : x 7→ homK(a, x)

a+ : x 7→ homK(x, a).

with a together with the maps on hom-sets homK(a, ·) resp. homK(·, a). Then a+ is a functorhom func-tors

K → Set.

In fact, given morphism f : x → y, we have a+f : homK(a, x) → homK(a, y), taking g intof g. Plainly, a+(idx) = idhomK(a,x) = ida+(x), and

a+(g f)(h) = (g f) h = g (f h) = a+(g)(a+(f)(h)),

if f : x→ y, g : y → z and h : a→ x. ,

Functors come in handy when we want to forget part of the structure.

Example 2.3.4 Let Meas be the category of measurable spaces. Assign to each measurablespace (X, C) its carrier set X, and to each morphism f : (X, C) → (Y,D) the correspondingmap f : X → Y . It is immediately checked that this constitutes a functor Meas → Set.Similarly, we might forget the topological structure by assigning each topological space itscarrier set, and assign each continuous map to itself. These functors are sometimes calledforgetful functors. ,

Page 119: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 107

The following example twists Example 2.3.4 a little bit.

Example 2.3.5 Assign to each measurable space (X, C) its σ-algebra B(X, C) := C. Letf : (X, C) → (Y,D) be a morphism in Meas; put B(f) := f−1, then B(f) : B(Y,D) →B(X, C), because f is C-D-measurable. We plainly have B(idX,C) = idB(X,C) and B(g f) =(g f)−1 = f−1 g−1 = B(f) B(g), so B : Meas→ Set is no functor, although it behaveslike one. Don’t panic! If we reverse arrows, things work out properly: B : Meas→ Setop

is, as we have just shown, a functor (the dual Kop of a category K has been introduced inExample 2.1.17).

This functor could be called the Borel functor (the measurable sets are sometimes called theBorel sets). ,

Definition 2.3.6 A functor F : K → Lop is called a contravariant functor between K andL; in contrast, a functor according to Definition 2.3.1 is called covariant.

If we talk about functors, we always mean the covariant flavor, contravariance is mentionedexplicitly.

Let us complete the discussion from Example 2.3.3 by considering a+, which takes f : x→ yto a+f : homK(y, a) → homK(x, a) through g 7→ g f . a+ maps the identity on x to theidentity on homK(x, a). If g : y → z, we have

a+(f)(a+(g)(h)) = a+(f)(h g) = h g f = a+(g f)(h)

for h : z → a. Thus a+ is a contravariant functor K → Set, while its cousin a+ is covari-ant.

Functors may also be used to model structures.

Example 2.3.7 Consider this functor S : Set → Set which assigns each set X the set XN

of all sequences over X; the map f : X → Y is assigned the map S : (xn)n∈N 7→(f(xn)

)n∈N.

Evidently, idX is mapped to idXN , and it is easily checked that S(g f) = S(g)S(f). HenceS constitutes an endofunctor on Set. ,

Example 2.3.8 Similarly, define the endofunctor F on Set by assigning X to XN∪X∗ withX∗ as the set of all finite sequences over X. Then FX has all finite or infinite sequences overthe set X. Let f : X → Y be a map, and let (xi)i∈I ∈ FX be a finite or infinite sequence,then put (F f)(xi)i∈I :=

(f(xi)

)i∈I∈ FY . It is not difficult to see that F satisfies the laws

for a functor. ,

The next example deals with automata which produce an output (in contrast to Exam-ple 2.1.15 where we mainly had state transitions in view).

Example 2.3.9 An automaton with output (A,B,X, δ) has an input alphabet A, an outputalphabet B and a set X of states with a map δ : X ×A→ X ×B; δ(x, a) = 〈x′, b〉 yields thenext state x′ and the output b, if the input is a in state x. A morphism f : (X,A,B, δ) →(Y,A,B, ϑ) of automata is a map f : X → Y such that ϑ(f(x), a) = (f × idB)(δ(x, a)) for allx ∈ X, a ∈ A, thus (f × idB) δ = ϑ (f × idA). This yields apparently a category AutO,the category of automata with output.

We want to expose the state space X in order to make it a parameter to an automata, becauseinput and output alphabets are given from the outside, so for modeling purposes only states

Page 120: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

108 CHAPTER 2. CATEGORIES

are at our disposal. Hence we reformulate δ and take it as a map δ∗ : X → (X × B)A

with δ∗(x)(a) := δ(x, a). Now f : (X,A,B, δ) → (Y,A,B, ϑ) is a morphism iff this diagramcommutes

Xf //

δ∗

Y

ϑ∗

(X ×B)Af•

// (Y ×B)A

with f•(t)(a) := (f × idB)(t(a)). Let’s see why this is the case. Given x ∈ X, a ∈ A, we have

f•(δ∗(x))(a) = (f × idB)(δ∗(x)(a)) = (f × idB)(δ(x, a)) = ϑ(f(x), a) = ϑ∗(f(x))(a),

thus f•(δ∗(x)) = ϑ∗(f(x)) for all x ∈ X, hence f• δ∗ = ϑ∗ f , so the diagram is commutativeindeed. Define F (X) := (X ×A)B, for an object (A,B,X, δ) in category AutO and and putF (f) := f• for the automaton morphism f : (X,A,B, δ) → (Y,A,B, ϑ), thus F (f) rendersthis diagram commutative:

At

F (f)(t)

##X ×B

f×idB// Y ×B

We claim that F : AutO → Set is functor. Let g : (Y,A,B, ϑ)→ (Z,A,B, ζ) be a morphism,then F (g) makes this diagram commutative for all s ∈ (Y ×B)A

As

F (g)(s)

##X ×B

g×idB// Y ×B

In particular, we have for s := F (f)(t) with an arbitrary t ∈ (X × B)A this commutativediagram

AF (f)(t)

F (g)(F (f)(t))

##Y ×B

g×idB// Z ×B

Thus the outer diagram commutes

At

vvF (f)(t)

F (g)(F (f)(t))

((X ×B

f×idB// Y ×B

g×idB// Z ×B

Page 121: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 109

Consequently, we have

F (g)(F (f)(t)) = (g × idB) (f × idB) t= ((g f)× idB) t= F (g f)(t).

Now F (idX) = id(X×B)A is trivial, so that we have established indeed that F : AutO → Set

is a functor, assigning states to possible state transitions. ,

The next example shows that we may perceive labeled transition systems as functors basedon the power set functor.

Example 2.3.10 A labeled transition system is a collection of transitions indexed by a setof actions. Formally, given a set A of actions,

(S, ( a)a∈A

)is a labeled transition system iff

a ⊆ S×S for all a ∈ A. Thus state s may go into state s′ after action a ∈ A; this is writtenas s a s

′. A morphism f :(S, ( S,a)a∈A

)→(T, ( T,a)a∈A

)of transition systems is a map

f : S → T such that s S,a s′ implies f(s) T,a f(s′) for all actions a, cp. Example 2.1.9.

We model a transition system(S, ( a)a∈A

)as a map F : S → P (A× S) with F (s) :=

〈a, s′〉 | s a s′ (or, conversely, we may recover a from F : a= 〈s, s′〉 | 〈a, s′〉 ∈ F (s)),

thus F (s) ⊆ A × S collects actions and new states. This suggests defining a map F (S) :=P (A× S) which can be made a functor once we have decided what to do with morphisms

f :(S, ( S,a)a∈A

)→(T, ( T,a)a∈A

).

Take V ⊆ A× S and define F (f)(V ) := 〈a, f(s)〉 | 〈a, s〉 ∈ V , (clearly we want to leave theactions alone). Then we have

F (g f)(V ) = 〈a, g(f(s))〉 | 〈a, s〉 ∈ V = 〈a, g(y)〉 | 〈a, y〉 ∈ F (f)(V )= F (g)(F (f)(V ))

for a morphism g :(T, ( T,a)a∈A

)→(U, ( U,a)a∈A

). Thus we have shown that F (g f) =

F (g) F (f) holds. Because F maps the identity to the identity, F is a functor from thecategory of labeled transition systems to Set. ,

The next examples deal with functors induced by probabilities.

Example 2.3.11 Given a set X, define the support supp(p) for a map p : X → [0, 1] assupp(p) := x ∈ X | p(x) 6= 0. A discrete probability p on X is a map p : X → [0, 1] with

Supportsupp

finite support such that ∑x∈X

p(x) :=∑

x∈supp(p)

p(x) = 1.

Denote byD(X) := p : X → [0, 1] | p is a discrete probability

the set of all discrete probabilities. Let f : X → Y be a map, and define

D(f)(p)(y) :=∑

x∈X|f(x)=y

p(x).

Page 122: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

110 CHAPTER 2. CATEGORIES

Because D(f)(p)(y) > 0 iff y ∈ f[supp(p)

], D(f)(p) : Y → [0, 1] has finite support, and∑

y∈YD(f)(p)(y) =

∑y∈Y

∑x∈X|f(x)=y

p(x) =∑x∈X

p(x) = 1.

It is clear that D(idX)(p) = p, so we have to check whether D(g f) = D(g) D(f) holds.

We use a little trick for this, which will turn out to be helpful later as well. Define

p(A) := sup∑x∈F

p(x) | F ⊆ A ∩ supp(p) finite

for p ∈D(X) and A ⊆ X, then p is a probability measure on PX. A direct calculation showsthat D(f)(p)(y) = p(f−1

[y]), and D(f)(B) = p(f−1

[B]) for B ⊆ Y hold. Thus we obtain

for the maps f : X → Y and g : Y → Z

D(g f)(p)(z) = p((g f)−1

[z])

= p(f−1

[g−1[z]])

= D(f)(p)(g−1[z]) = D(f)(D(g)(p))(z)

Thus D(g f) = D(g) D(f), as claimed.

Hence D is an endofunctor on Set, the discrete probability functor . It is immediate that all

Discreteprobabilityfunctor D

the arguments above hold also for probabilities the support of which is countable; but sincewe will discuss an interesting example on page 141 which deal with the finite case, we stickto that here. ,

There is a continuous version of this functor as well. We generalize things a bit and formulatethe example for subprobabilities.

Example 2.3.12 We are now working in the category Meas of measurable spaces withmeasurable maps as morphisms. Given a measurable space (X,A), the set S(X,A) of allsubprobability measures is a measurable space with the weak σ-algebra ℘℘℘(A) associated withA, see Example 2.1.14. Hence S maps measurable spaces to measurable spaces. Define for amorphism f : (X,A)→ (Y,B)

S(f)(µ)(B) := µ(f−1[B])

for B ∈ B. Then S(f) : S(X,A) → S(Y,B) is ℘℘℘(A)-℘℘℘(B)-measurable. by Exercise 2.8. Nowlet g : (Y,B)→ (Z, C) be a morphism in Meas, then we show as in Example 2.3.11 that

S(g f)(µ)(C) = µ(f−1[g−1[C]]

) = S(f)(S(g)(µ)

)(C),

for C ∈ C, thus S(g f) = S(g) S(f). Since S preserves the identity, S : Meas→Meas isSubprobabilityfunctor S

an endofunctor, the (continuous space) subprobability functor. ,

The next two examples deal with upper closed sets, the first one with these sets proper, thesecond one with a more refined version, viz., with ultrafilters. Upper closed sets are used,e.g., for the interpretation of game logic, a variant of modal logics, see Example 2.7.22 andSection 4.1.3.

Page 123: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 111

Example 2.3.13 Call a subset V ⊆ PS upper closed iff A ∈ V and A ⊆ B together implyB ∈ V ; for example, each filter is upper closed. Denote by

V S := V ⊆ PS | V is upper closed

the set of all upper closed subsets of PS. Given f : S → T , defineV , Upperclosed sets

(V f)(V ) := W ⊆ PT | f−1[W]∈ V

for V ∈ V S. Let W ∈ V (V ) and W0 ⊇W , then f−1[W]⊆ f−1

[W0

], so that f−1

[W0

]∈ V ,

hence V f : V S → VW . It is easy to see that V (g f) = V (g) V (f), provided f : S → T ,and g : T → V . Moreover, V (idS) = idV (S). Hence V is an endofunctor on the category Set

of sets with maps as morphisms. ,

Ultrafilters are upper closed, but are much more complex than plain upper closed sets, sincethey are filters, and they are maximal. Thus we have to look a bit closer at the propertieswhich the functor is to represent.

Example 2.3.14 Let

US := q | q is an ultrafilter over S

assign to each set S its ultrafilters, to be more precise, all ultrafilters of the power set of S.U ,

UltrafiltersThis is the object part of an endofunctor over the category Set with maps as morphisms.Given a map f : S → T , we have to define Uf : US → UT . Before doing so, a preliminaryconsideration will help.

One first notes that, given two Boolean algebras B and B′ and a Boolean algebra morphismγ : B → B′, γ−1 maps ultrafilters over B′ to ultrafilters over B. In fact, let w be an ultrafilterover B′, put v := γ−1

[w]; we go quickly over the properties of an ultrafilter should have.

First, v does not contain the bottom element ⊥B of B, for otherwise ⊥B′ = γ(⊥B) ∈ w. Ifa ∈ v and b ≥ a, then γ(b) ≥ γ(a) ∈ w, hence γ(b) ∈ w, thus b ∈ v; plainly, v is closed under∧. Now assume a 6∈ v, then γ(a) 6∈ w, hence γ(−a) = −γ(a) ∈ w, since w is an ultrafilter.Consequently, −a ∈ v. This establishes the claim.

Given a map f : S → T , define Ff : PT → PS through Ff := f−1. This is a homomorphismof the Boolean algebras PT and PS, thus F−1

f maps US to UT . Put U(f) := F−1f ; note that

we reverse the arrows’ directions twice. It is clear that U(idS) = idU(S), and if g : T → Z,then

U(g f) = F−1gf = (Ff Fg)−1 = F−1

g F−1f = U(g) U(f).

This shows that U is an endofunctor on the category Set of sets with maps as morphisms(U is sometimes denoted by β). ,

We can use functors for constructing new categories from given ones. As an example we definethe comma category associated with two functors.

Definition 2.3.15 Let F : K → L and G : M → L be functors. The comma category(F ,G) associated with F and G has as objects the triplets 〈a, f, b〉 with objects a from K, bfrom M , and morphisms f : F a → Gb. A morphism (ϕ,ψ) : 〈a, f, b〉 → 〈a′, f ′, b′〉 is a pair

Page 124: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

112 CHAPTER 2. CATEGORIES

of morphisms ϕ : a→ a′ of K and ψ : b→ b′ of M such that this diagram commutes

F a

f

Fϕ // F a′

f ′

Gb

Gψ// Gb′

Composition of morphism is component wise.

The slice category K/x defined in Example 2.1.16 is apparently the comma category (IdK , ∆x).

Functors can be composed, yielding a new functor. The proof for this statement is straight-forward.

Proposition 2.3.16 Let F : C → D and G : D → E be functors. Define (G F )a :=G(F a) for an object a of C, and (G F )f := G(F f) for a morphism f : a → b in C, thenG F : C → E is a functor. a

2.3.2 Natural Transformations

We see that we can compose functors in an obvious way. This raises the question whether ornot functors themselves form a category. But we do not yet have morphisms between functorsat out disposal. Natural transformations will assume this role. Nevertheless, the questionremains, but it will not be answered in the positive; this is so because morphisms betweenobjects should form a set, and it will be clear that this is not the case. Pumplun [Pum99]points at some difficulties that might arise and arrives at the pragmatic view that for practicalproblems this question is not particularly relevant.

But let us introduce natural transformations between functors F ,G now. The basic idea isthat for each object a, F a is transformed into Ga in a way which is compatible with thestructure of the participating categories.

Definition 2.3.17 Let F ,G : K → L be covariant functors. A family η = (ηa)a∈|K| is calleda natural transformation η : F → G iff ηa : F a → Ga is a morphism in L for all objects a

ηa : F a →Ga

in K such that this diagram commutes for any morphism f : a→ b in K

a

f

F aηa //

F f

Ga

Gf

b F b ηb// Gb

Thus a natural transformation η : F → G is a family of morphisms, indexed by the objectsof the common domain of F and G; ηa is called the component of η at a.

If F and G are both contravariant functors K → L, we may perceive them as covariant

Page 125: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 113

functors K → Lop, so that we get for the contravariant case this diagram:

a

f

F aηa // Ga

b F b ηb//

F f

OO

Gb

Gf

OO

Let us have a look at some examples.

Example 2.3.18 a+ : x 7→ homK(a, x) yields a (covariant) functor K → Set for eachobject a in K, see Example 2.3.3 (just for simplifying notation, we use again a+ rather thanhomK(a,−), see page 91). Let ζ : b → a be a morphism in K, then this induces a naturaltransformation ηζ : a+ → b+ with

ηζ,x :

a+(x) → b+(x)

g 7→ g ζ

In fact, look at this diagram with a K-morphism f : x→ y:

x

f

a+(x)ηζ,x //

a+(f)

b+(x)

b+(f)

y a+(y) ηζ,y

// b+(y)

Then we have for h ∈ a+(x) = homK(a, x)

(ηζ,y a+(f))(h) = ηζ,y(f h) = (f h) ζ= f (h ζ) = b+(f)(ηζ,x(h))= (b+(f) ηζ,x)(h)

Hence ηζ is in fact a natural transformation. ,

This is an example in the category of groups:

Example 2.3.19 Let K be the category of groups (see Example 2.1.7). It is not difficult tosee that K has products. Define for a group H the map FH(G) := H ×G on objects, and iff : G → G′ is a morphism in K, define FH(f) : H ×G → H ×G′ through FH(f) : 〈h, g〉 7→〈h, f(g)〉. Then FH is an endofunctor on K. Now let ϕ : H → K be a morphism. Then ϕinduces a natural transformation ηϕ upon setting

ηϕ,G :

FH → FK

〈h, g〉 7→ 〈ϕ(h), g〉.

In fact, let ψ : L→ L′ be a group homomorphism, then this diagram commutes

L

ψ

FH Lηϕ,L //

FH ψ

FK L

FK ψ

L′ FH L′ ηϕ,L′// FK L′

Page 126: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

114 CHAPTER 2. CATEGORIES

To see this, take 〈h, `〉 ∈ FH L = H × L, and chase it through the diagram:

(ηϕ,L′ FH ψ)(h, `) = 〈ϕ(h), ϕ(`)〉 = (FK(ψ) ηϕ,L)(h, `).

,

Consider as another example a comma category (F ,G) (Definition 2.3.15). There are functorsakin to a projection which permit to recover the original functors, and which are connectedthrough a natural transformation. To be specific:

Proposition 2.3.20 Let F : K → L and G : M → L be functors. Then there are functorsS : (F ,G)→K and R : (F ,G)→ L rendering this diagram commutative:

(F ,G)

S

R //M

G

KF

// L

There exists a natural transformation η : F S → G R.

Proof Put for the object 〈a, f, b〉 of (F ,G) and the morphism (ϕ,ψ)

S〈a, f, b〉 := a, S(ϕ,ψ) := ϕ,R〈a, f, b〉 := b, R(ϕ,ψ) := ψ.

Then it is clear that the desired equality holds. Moreover, η〈a,f,b〉 := f is the desired naturaltransformation. The crucial diagram commutes by the definition of morphisms in the commacategory. a

Example 2.3.21 Assume that the product a×b for the objects a and b in category K exists,then Proposition 2.2.9 tells us that we have for each object d a bijection pd : homK(d, a) ×homK(d, b) → homK(d, a × b). Thus (πa pd)(f, g) = f and (πb pd)(f, g) = g for everymorphism f : d→ a and g : d→ b. Actually, pd is the component of a natural transformationp : F → G with F := homK(−, a) × homK(−, b) and G := homK(−, a × b) (note thatthis is short hand for the obvious assignments to objects and functors). Both F and Gare contravariant functors from K to Set. So in order to establish naturalness, we have toestablish that the following diagram commutes

c

f

homK(c, a)× homK(c, b)pc // homK(c, a× b)

d homK(d, a)× homK(d, b) pd//

F f

OO

homK(d, a× b)

G f

OO

Now take 〈g, h〉 ∈ homK(d, a)× homK(d, b), then

πa((pc F f)(g, h)

)= g f = πa

((G f) pd

)(g, h),

πb((pc F f)(g, h)

)= h f = πb

((G f) pd

)(g, h).

From this, commutativity follows. ,

Page 127: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 115

We will — for the sake of illustration — define two ways of composing natural transformations.One is somewhat canonical, since it is based on the composition of morphisms, the other oneis a bit tricky, since it involves the functors directly. Let us have a look at the direct onefirst.

Lemma 2.3.22 Let η : F → G and ζ : G→H be natural transformations. Then

(ϑ ζ)a := ϑa ζa

defines a natural transformation ϑ ζ : F →H. ϑ ζ

Proof Let K be the domain of functor F , and assume that f : a → b is a morphism in K.Then we have this diagram

a

f

F a

F f

ηa // Ga

Gg

ζa //Ha

Hf

b F b ηb// Gb

ζb//Hb

Then

H(f) (ϑ ζ)a = H(f) ϑa ζa = ϑb G(f) ζa = ϑb ζb F (f) = (ϑ ζ)b F (f).

Hence the outer diagram commutes. a

The next composition is slightly more involved.

Proposition 2.3.23 Given natural transformations η : F → G and ϑ : S → R for functorsF ,G : K → L and S,R : L→M . Then ϑGa S(ηa) = R(ηa) ϑF a always holds. Put

(ϑ ∗ η)a := ϑGa S(ηa).

Then ϑ ∗ η defines a natural transformation S F → R G. ϑ ∗ η is called the Godement ϑ ∗ ηproduct of η and ϑ.

Proof 1. Because ηa : F a→ Ga, this diagram commutes by naturality of ϑ:

S(F a)

S(ηa)

ϑFa // R(F a)

R(ηa)

S(Ga)

ϑGa// R(Ga)

This establishes the first claim.

2. Now let f : a → b be a morphism in K, then the outer diagram commutes, since S is afunctor, and since ϑ is a natural transformation.

Page 128: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

116 CHAPTER 2. CATEGORIES

S(F a)

S(F f)

F (ηa) //

(ϑ∗η)a

((S(Ga)

S(Gf)

ϑGa // R(Ga)

R(Gf)

S(F b)F (ηb)

//

(ϑ∗η)b

66S(Gb)

ϑGb// R(Gb)

Hence ϑ ∗ η : S F → R G is natural indeed. a

In [ML97], η ϑ is called the vertical, and η ∗ ϑ the horizontal composition of the naturaltransformations η and ϑ. If η : F → G is a natural transformation, then the morphisms(F η)(a) := F ηa : (F F )(a) → (F G)(a) and (ηF )(a) := ηF a : (F F )(a) → (G F )(a)are available.

We know from Example 2.3.3 that homK(a,−) defines a covariant set valued functor; supposewe haven another set valued functor F : K → Set. Can we somehow compare these functors?This question looks on first sight quite strange, because we do not have any yardstick tocompare these functors against. On second thought, we might use natural transformationsfor such an endeavor. It turns out that for any object a of K the set F a is essentially givenby the natural transformations η : homK(a,−) → F . We will show now that there existsa bijective assignment between F a and these natural transformations. The reader mightwonder about this somewhat intricate formulation; it is due to the observation that thesenatural transformation in general do not form a set but rather a class, so that we cannot setup a proper bijection (which would require sets as the basic scenario).

Lemma 2.3.24 Let F : K → Set be a functor; given the object a of K and a naturaltransformation η : homK(a,−)→ F , define the Yoneda isomorphism

ya,F (η) := η(a)(ida) ∈ F a

Then ya,F is bijective (i.e., onto, and one-to-one).

Proof 0. The assertion is established by defining for each t ∈ F a a natural transformationσa,F (t) : homK(a,−) → F which is inverse to ya,F . This is also the outline for the proof:

Outline forthe proof

we define a map, establish that it is a natural transformation, and show that it is inverse toya,F .

1. Given an object b of K and t ∈ F a, put

(σa,F (t)

)b

:= σa,F (t)(b) :

homK(a, b) → F b

f 7→ (F f)(t)

(note that F f : F a → F b for f : a → b, hence (F f)(t) ∈ F b). This defines a naturaltransformation σa,F (t) : homK(a,−)→ F . In fact, if f : b→ b′, then

σa,F (t)(b′)(homK(a, f)g) = σa,F (t)(b′)(f g) = F (f g)(t)= (F f)(F (g)(t)) = (F f)(σa,F (t)(b)(g)).

Page 129: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 117

Hence σa,F (t)(b′) homK(a, f) = (F f) σa,F (t)(b).

2. We obtain

(ya,F σa,F )(t) = ya,F (σa,F (t)) = σa,F (t)(a)(ida) = (F ida)(t) = idF a(t) = t

That’s not too bad, so let us try to establish that σa,F ya,F is the identity as well. Given anatural transformation η : homK(a,−)→ F , we obtain

(σa,F ya,F )(η) = σa,F (ya,F (η)) = σa,F (ηa(ida)).

Thus we have to evaluate σa,F (ηa(ida)). Take an object b and a morphism f : a → b,then

σa,F (ηa(ida)(b)(f)) = (F f)(ηa(ida))

= (F (f) ηa)(ida)= (ηb homK(a, f))(ida) (η is natural)

= ηb(f) (since homK(a, f) ida = f ida = f)

Thus σa,F (ηa(ida)) = η. Consequently we have shown that ya,F is left and right invertible,hence is a bijection. a

Now consider the set valued functor homK(b,−), then the Yoneda embedding says thathomK(b, a) can be mapped bijectively to the natural transformations from homK(a,−) tohomK(b,−). This means that these natural transformations are essentially the morphismsb→ a, and, conversely, each morphism b→ a yields a natural transformation homK(a,−)→homK(b,−). The following statement makes this observation precise.

Proposition 2.3.25 Given a natural transformation η : homK(a,−) → homK(b,−), thereexists a unique morphism g : b → a such that ηc(h) = h g for every object c and everymorphism h : a→ c (thus η = homK(g,−)).

Proof 0. Let y := ya,homK(a,−) and σ := σa,homK(a,−). Then y is a bijection with (yσ)(η) = ηand (σ y)(h) = h.

1. Put g := ηa(ida), then g ∈ homK(b, a), since ηa : homK(a, a) → homK(b, a) and ida ∈homK(a, a). Now let h ∈ homK(a, c), then

ηc(h) = σ(ηa(ida)

)(c)(h) (since η = y σ)

= σ(g)(c)(h) (Definition of σ)

= homK(b, g)(h) (homK(b,−) is the target functor)

= h g

2. If η = homK(g,−), then ηa(ida) = homK(g, ida) = ida g = g, so g : b → a is uniquelydetermined. a

A final example for natural transformations comes from measurable spaces, dealing with theweak-σ-algebra. We consider in Example 2.3.5 the contravariant functor which assigns to eachmeasurable space its σ-algebra, and we have defined in Example 2.1.14 the weak σ-algebraon its set of probability measures together with a set of generators. We show that this set ofgenerators yields a family of natural transformations between the two contravariant functorsinvolved.

Page 130: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

118 CHAPTER 2. CATEGORIES

Example 2.3.26 The contravariant functor B : Meas → Set assigns to each measurablespace its σ-algebra, and to each measurable map its inverse. Denote by W := P B thefunctor that assigns to each measurable space the weak σ-algebra on its probability measures;W : Meas→ Set is contravariant as well. Recall from Example 2.1.14 that the set

βββA(A, r) := µ ∈ P (S,A) | µ(A) ≥ r

denotes the set of all probability measures which evaluate the measurable set A not smallerthan a given r, and that the weak σ-algebra on P (S,A) is generated by all these sets. Weclaim that β(·, r) is a natural transformation B → W . Thus we have to show that thisdiagram commutes

(S,A)

f

B(S,A)βββA(·,r) //W (S,A)

(T,B) B(T,B)βββB(·,r)

//

Bf

OO

W (T,B)

W f

OO

Recall that we have B(f)(C) = f−1[C]

for C ∈ B, and that W (f)(D) = P (f)−1[D], ifD ⊆ P (T,B) is measurable. Now, given C ∈ B, by expanding definitions we obtain

µ ∈W (f)(βββB(C, r)) ⇔ µ ∈ P (f)−1[βββB(C, r)]⇔ P (f) (µ) ∈ βββB(C, r)

⇔ P (f) (µ)(C) ≥ r ⇔ µ(f−1[C]) ≥ r

⇔ µ ∈ βββA(B(f)(C), r)

Thus the diagram commutes in fact, and we have established that the generators for the weakσ-algebra come from a natural transformation. ,

2.3.3 Limits and Colimits

We define above some constructions which permit to build new objects in a category fromgiven ones, e.g., the product from two objects or the pushout. Each time we had someuniversal condition which had to be satisfied.

We will discuss the general construction very briefly and refer the reader to [ML97, BW99,Pum99], where they are studied in great detail.

Definition 2.3.27 Given a functor F : K → L, a cone on F consists of an object c in L andof a family of morphisms pd : c→ F d in L for each object d in K such that pd′ = (F g) pdfor each morphism g : d→ d′ in K.

So a cone (c, (pd)d∈|K|) on F looks like, well, a cone:

cpd

ww

pd′

''F d

F g// F d′

d g// d′

Page 131: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.3. FUNCTORS AND NATURAL TRANSFORMATIONS 119

A limiting cone provides a factorization for each other cone, to be specific

Definition 2.3.28 Let F : K → L be a functor. The cone (c, (pd)d∈|K|) is a limit of Fiff for every cone (e, (qd)d∈|K|) on F there exists a unique morphism f : e → c such thatqd = pd f for each object d in K.

Thus we have locally this situation for each morphism g : d→ d′ in K:

e

!f

qd

qd′

c

pdwwpd′ ''

F dF g

// F d′

d g// d′

The unique factorization probably gives already a clue for the application of this concept. Letus interpret two known examples in the light of this concept.

Example 2.3.29 Let X := 1, 2 and K be the discrete category on X (see Example 2.1.6).Put F 1 := a and F 2 := b for the objects a, b ∈ |L|. Assume that the product a × b withprojections πa and πb exists in L, and put p1 := πa, p2 := πb. Then (a × b, p1, p2) is a limitof F . Clearly, this is a cone on F , and if q1 : e → a and q2 : e → b are morphisms, thereexists a unique morphism f : e→ a× b with q1 = p1 f and q2 = p2 f by the definition ofa product. ,

The next example shows that a pullback can be interpreted as a limit.

Example 2.3.30 Let a, b, c objects in category L with morphisms f : a → c and g : b → c.Define category K by |K| := a, b, c, the hom-sets are defined as follows

homK(x, y) :=

idx, x = y

f, x = a, y = c

g, x = b, y = c

∅, otherwise

Let F be the identity on |K| with F f := f,F g := g, and F idx := idx for x ∈ |K|. If objectp together with morphisms ta : p → a and tb : p → b is a pullback for f and g, then it isimmediate that (p, ta, tb, tc) is a limit cone for F , where tc := f ta = g tb. ,

Dualizing the concept of a cone, we obtain cocones.

Definition 2.3.31 Given a functor F : K → L, an object c ∈ |L| together with morphismssd : FD → c for each object d of K such that sd = sd′ F g for each morphism g : d→ d′ iscalled a cocone on F .

Page 132: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

120 CHAPTER 2. CATEGORIES

Thus we have this situation

dg // d′

F dF g //

sd''

F d′

sd′ww

c

A colimit is then defined for a cocone.

Definition 2.3.32 A cocone (c, (sd)d∈|K|) is called a colimit for the functor F : K → Liff for every cocone (e, (td)d∈|K|) for F there exists a unique morphism f : c → e such thattd = f sd for every object d ∈ |K|.

So this yields

dg // d′

F dF g //

sd

''td

F d′

sd′

wwtd′

c

f !e

Coproducts are examples of cocones.

Example 2.3.33 Let a and b be objects in category L and assume that their coproduct a+bwith injections ja and jb exists in L. Take again I := 1, 2 and let K be the discrete categoryover I. Put F 1 := a and F 2 := b, then it follows from the definition of the coproduct thatthe cocone (a+ b, ja, jb) is a colimit for F . ,

One shows that the pushout can be represented as a colimit in the same way as in Exam-ple 2.3.30 for the representation of the pullback as a limit.

Both limits and colimits are powerful general concepts for representing important construc-tions with and on categories. We will encounter them later on, albeit mostly indirectly.

2.4 Monads and Kleisli Tripels

We have now functors and natural transformations at our disposal, and we will put then towork. The first application we will tackle concerns monads. Moggi’s work [Mog91, Mog89]shows a connection between monads and computation which we will discuss now. Kleislitripels as a practical disguise for monads are introduced first, and it will be shown throughManes’ Theorem that they are equivalent in the sense that each Kleisli tripel generates amonad, and vice versa, in a reversible construction. Some examples for monads follow, and

Page 133: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.4. MONADS AND KLEISLI TRIPELS 121

we will finally have a brief look at the monadic construction in the programming languageHaskell.

2.4.1 Kleisli Tripels

Assume that we work in a category K and interpret values and computations of a pro-gramming language in K. We need to distinguish between the values of a type a and thecomputations of type a, which are of type T a. For example

Values vs.computa-

tions

Non-deterministic computations Taking the values from a set A yields computations oftype TA = Pf (A), where the latter denotes all finite subsets of A.

Probabilistic computations Taking values from a set A will give computations in the setTA = DA of all discrete probabilities on A, see Example 2.3.11.

Exceptions Here values of type A will result in values taken from TA = A + E with E asthe set of exceptions.

Side effects Let L is the set of addresses in the store and U the set of all storage cells,a computation of type A will assign each element of UL an element of A or anotherelement of UL, thus we have TA = (A+ UL)U

L.

Interactive input Let U be the set of characters, then TA is the set of all trees with finitefan out, so that the internal nodes have labels coming from U , and the leaves have labelstaken from A.

In order to model this, we require an embedding of the values taken from a into the compu-tations of type T a, which is represented as a morphism ηa : a → T a. Moreover, we want tobe able to “lift” values to computations in this sense: if f : a → T b is a map from values tocomputations, we want to extend f to a map f∗ : T a→ T b from computations to computa-tions (thus we will be able to combine computations in a modular fashion). Understandinga morphism a → T b as a program performing computations of type b on values of type a,this lifting will then permit performing computations of type b depending on computationsof type a.

This leads to the definition of a Kleisli tripel.

Definition 2.4.1 Let K be a category. A Kleisli tripel (T , η,−∗) over K consists of a mapT : |K| → |K| on objects, a morphism ηa : a → T a for each object a, an operation ∗ such

Kleislitripel

that f∗ : T a→ T b, if f : a→ T b with the following properties:

À η∗a = idT a.

Á f∗ ηa = f , provided f : a→ T b.

 g∗ f∗ = (g∗ f)∗ for f : a→ T b and g : b→ T c.

The first property says that lifting the embedding ηa : a → T a will give the identity onT a. The second condition says that applying the lifted morphism f∗ to an embedded valueηa will yield the same value as the given f . The third condition says that combining lifted

Page 134: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

122 CHAPTER 2. CATEGORIES

morphisms is the same as lifting the lifted second morphism applied to the value of the firstmorphism.

The category associated with a Kleisli tripel has the same objects as the originally givencategory (which is not too much of a surprise), but morphisms will correspond to programs:a program which performs a computation of type b on values of type a. Hence a morphismin this new category is of type a→ T b (this is a morphism on K).

Definition 2.4.2 Given a Kleisli tripel (T , η,−∗) over category K, the Kleisli category KTKT

is defined as follows

• |KT | = |K|, thus KT has the same objects as K.

• homKT(a, b) = homK(a,T b), hence f is a morphism a → b in KT iff f : a → T b is a

morphism in K.

• The identity for a in KT is ηa : a→ T a.

• The composition g ∗ f of f ∈ homKT(a, b) and g ∈ homKT

(b, c) is defined throughg ∗ f := g∗ f .

We have to show that Kleisli composition is associative: in fact, we have

(h ∗ g) ∗ f = (h ∗ g)∗ f= (h∗ g)∗ f (definition of h ∗ g)

= h∗ g∗ f (property Â)

= h∗ (g ∗ f) (definition of g ∗ f)

= h ∗ (f ∗ g)

Thus KT is in fact a category. The map on objects in a Kleisli category extends to a functor(note that we did not postulate for a Kleisli triple that T f is defined for morphisms). Thisfunctor is associated with two natural transformations which together form a monad. We willfirst define what a monad formally is, and then discuss the construction in some detail.

2.4.2 Monads

Definition 2.4.3 A monad over a category K is a triple (T , η, µ) with these properties:

Ê T is an endofunctor on K.

Ë η : IdK → T and µ : T 2 → T are natural transformations. η is called the unit, µ themultiplication of the monad.

Ì These diagrams commute

T 3 a

Tµa

µTa // T 2 a

µa

T aηTa //

idTa ''

T 2 a

µa

T aT ηaoo

idTawwT 2 a µa

// T a T a

Page 135: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.4. MONADS AND KLEISLI TRIPELS 123

Each Kleisli triple generates a monad, and vice versa. This is what Manes’ Theorem says:

Theorem 2.4.4 Given a category K, there is a one-to-one correspondence between Kleislitriples and monads.

Proof 0. The proof will be somewhat longish, because so many properties have to beestablished or checked. In the first part, T will be extended to a functor, and a multiplication

Outlineand

strategy

will be defined (the unit remains what it is in the Kleisli triple), and the laws for a monad willbe established. The second part will define the −∗-operation and establish the correspondingproperties of a Kleisli triple, again, the unit remains what it is in the monad. The proof as awhole demonstrates the interaction of the concepts in a very clean way; this is why I did nonput it into separate pieces.

1. Let (T , η,−∗) be a Kleisli triple. We will extend T to a functor K → K, and define themultiplication; the monad’s unit will be η. Define

T f := (ηb f)∗, if f : a→ b,

µa := (idT a)∗.

Then µ is a natural transformation T 2 → T . Clearly, µa : T 2a → T a is a morphism. Letf : a→ b be a morphism in K, then we have

µb T 2f = id∗T b (ηT b (ηb f)∗)∗

=(id∗T b (ηT b (ηb f)∗)

)∗(by Â)

=(idT b (ηb f)∗

)∗(since id∗T b ηT b = idT b)

= (ηb f)∗∗.

Similarly, we obtain

(T f) µa = (ηb f)∗ id∗T a =((ηb f)∗ idT a

)∗= (ηb f)∗∗.

Hence µ : T 2 → T is natural. Because we obtain for the morphisms f : a → b and g : b → cthe identity

(T g) (T f) = (ηc g)∗ (ηb f)∗ =((ηc g)∗ ηb f

)∗= (ηc g f)∗ = T (g f),

and since by À

T ida = (ηa idT a)∗ = η∗a = idT a,

we conclude that T is an endofunctor on K.

We check the laws for unit and multiplication according to Ì. One notes first that

µa ηT a = id∗T a ηT a(‡)= idT a

(in equation (‡) we use Á), and that

µa T a = id∗T a(ηT a ηa)∗ = (id∗T a ηT a ηa)∗(†)= η∗a = (ηa ida)∗ = T (ida)

Page 136: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

124 CHAPTER 2. CATEGORIES

(in equation (†) we use Á again). Hence the rightmost diagram in Ì commutes. Turning tothe leftmost diagram, we note that

µa µT a = id∗T a id∗T 2a = (id∗T a idT 2a)∗ (%)

= µ∗a,

using  in equation (%). On the other hand,

µa (T µa) = id∗T a (T id∗T a) = id∗T a (ηT a id∗T a)∗ = id∗T a∗ = µ∗a,

because id∗T a ηT a = idT a by Á. Hence the leftmost diagram commutes as well, and we havedefined a monad indeed.

2. To establish the converse, define f∗ := µb (T f) for the morphism f : a→ T b. We obtainfrom the right hand triangle η∗a = µa (T ηa) = idT a, thus À holds. Since η : IdK → T isnatural, we have (T f) ηa = ηT b f for f : a→ T b. Hence

f∗ ηa = µb (T f) ηa = µb ηT b f = f

by the left hand side of the right triangle, giving Á. Finally, note that due to µ : T 2 → Tbeing natural, we have for g : b→ T c the commutative diagram

T 2b

T 2g

µb // T b

T g

T 3c µT c// T 2c

Then

g∗ f∗ = µc (T g) µb (T f)

= µc µT c (T 2g) (T f) (since (T g) µb = µT c T 2g)

= µc (Tµc) T (T (g) f) (since µc µT c = µc (Tµc))

= µc T (µc T (g) f)

= µc T (g∗ f)

= (g∗ f)∗

This establishes  and shows that this defines a Kleisli tripel. a

Taking a Kleisli tripel and producing a monad from it, one suspects that one might end upwith a different Kleisli triple for the generated monad. But this is not the case; just for therecord:

Corollary 2.4.5 If the monad is given by a Kleisli tripel, then the Kleisli triple defined bythe monad coincides with the given one. Similarly, if the Kleisli tripel is given by the monad,then the monad defined by the Kleisli tripel coincides with the given one.

Proof We use the notation from above. Given the monad, put f+ := idT b (ηb f)∗,then

f+ = µT b (ηb f)∗ = (idT a f)∗ = f∗.

Page 137: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.4. MONADS AND KLEISLI TRIPELS 125

On the other hand, given the Kleisli tripel, put T0f := (ηb f)∗, then

T0f = µb T (ηb f) = µb T (ηb) = T f.

a

Some examples explain this development. Theorem 2.4.4 tells us that the specification of aKleisli tripel will give us the monad, and vice versa. Thus we are free to specify one or theother; usually the specification of the Kleisli tripel is shorter and more concise.

Example 2.4.6 Nondeterministic computations may be modelled through a map f : S →P (T ): given a state (or an input, or whatever) from set S, the set f(s) describes the set ofall possible outcomes. Thus we work in category Set with maps as morphisms and take thepower set functor P as the functor. Define

ηS(x) := x,

f∗(B) :=⋃x∈B

f(x)

for the set S, for B ⊆ S and the map f : S → P (T ). Then clearly ηS : S → P (S), andf∗ : P (S)→ P (T ). We check the laws for a Kleisli tripel:

À Since η∗S(B) =⋃x∈B ηS(x) = B, we see that η∗S = idP(S)

Á It is clear that f∗ ηa = f holds for f : S → P (S).

 Let f : S → P (T ) and g : T → P (U), then

u ∈ (g∗ f∗)(B)⇔ u ∈ g(y) for some x ∈ B and some y ∈ f(x)

⇔ u ∈ g∗(f(x)) for some x ∈ B

Thus (g∗ f∗)(B) = (g∗ f)∗(B).

Hence the laws for a Kleisli tripel are satisfied. Let us just compute µS = id∗P(S): Given

β ∈ P (P (S)), we obtain

µS(β) = id∗P(S) =⋃B∈β

B =⋃β.

The same argumentation can be carried out when the power set functor is replaced by thefinite power set functor Pf : S 7→ A ⊆ S | A is finite with the obvious definition of Pf onmaps. ,

In contrast to nondeterministic computations, probabilistic ones argue with probability dis-tributions. We consider the discrete case first, and here we focus on probabilities with finitesupport.

Example 2.4.7 We work in the category Set of sets with maps as morphisms and considerthe discrete probability functor DS := p : S → [0, 1] | p is a discrete probability, seeExample 2.3.11. Let f : S →DS be a map and p ∈DS, put

f∗(p)(s) :=∑t∈S

f(t)(s) · p(t).

Page 138: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

126 CHAPTER 2. CATEGORIES

Then ∑s∈S

f∗(p)(s) =∑s

∑t

f(t)(s) · p(t) =∑t

∑s

f(t)(s) · p(t) =∑t

p(t) = 1,

hence f∗ : DS → DS. Note that the set 〈s, t〉 ∈ S × T | f(s)(t)p(s) > 0 is finite,because p has finite support, and because each f(s) has finite support as well. Since each ofthe summands is non-negative, we may reorder the summations at our convenience. Definemoreover

ηS(s)(s′) := dS(s)(s′) :=

1, s = s′

0, otherwise,

so that ηS(s) is the discrete Dirac measure on s. Then

À η∗S(p)(s) =∑

s′ dS(s)(s′) · p(s′) = p(s), hence we may conclude that η∗S p = p.

Á f∗(ηS)(s) = f(s) is immediate.

 Let f : S →DT and g : T →DU , then we have for p ∈DS and u ∈ U

(g∗ f∗)(p)(u) =∑

t∈T g(t)(u) · f∗(p)(u) =∑

t∈T∑

s∈S g(t)(u) · f(s)(t) · p(s)=∑〈s,t〉∈S×T g(t)(u) · f(s)(t) · p(s) =

∑s∈S[∑

t∈T g(t)(u) · f(s)(t)]· p(s)

=∑

s∈S g∗(f(s))(u) · p(s) = (g∗ f)∗(p)(u)

Again, we are not bound to any particular order of summation.

We obtain for M ∈ (D D)S

µS(M)(s) = id∗DS(M)(s) =∑

q∈D(S)

M(q) · q(s).

The last sum extends over a finite set, because the support of M is finite. ,

Since programs may fail to halt, one works sometimes in models which are formulated interms of subprobabilities rather than probabilities. This is what we consider next, extendingthe previous example to the case of general measurable spaces. Recall that examples requiringtechniques from Chapter 4 are marked.

Example 2.4.8 s We work in the category of measurable spaces with measurablemaps as morphisms, see Example 2.1.12. In Example 2.3.12 the subprobability functor wasintroduced, and it was shown that for a measurable space S the set SS of all subprobabilities isa measurable space again (we omit in this example the σ-algebra from notation, a measurablespace is for the time being a pair consisting of a carrier set and a σ-algebra on it). Aprobabilistic computation f on the measurable spaces S and T produces from an input of anelement of S a subprobability distribution f(s) on T , hence an element of ST . We want f tobe a morphism in Meas, so f : S → ST is assumed to be measurable.

We know from Example 2.1.14 and Exercise 2.7 that f : S → ST is measurable iff theseconditions are satisfied:

1. f(s) ∈ S(T ) for all s ∈ S, thus f(s) is a subprobability on (the measurable sets of) T .

2. For each measurable set D in T , the map s 7→ f(s)(D) is measurable.

Page 139: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.4. MONADS AND KLEISLI TRIPELS 127

Returning to the definition of a Kleisli tripel, we define for the measurable space S, f : S →ST ,

eS := δS ,

f∗(µ)(B) :=

∫Sf(s)(B) µ(ds) (µ ∈ SS,B ⊆ T measurable).

Thus eS(x) = δS(x), the Dirac measure associated with x, and f∗ : SS → SS is a morphism(in this example, we write e for the unit, and m for the multiplication). Note that f∗(µ) ∈S(T ) in the scenario above; in order to see whether the properties of a Kleisli tripel aresatisfied, we need to know how to integrate with this measure. Standard arguments likeLevi’s Theorem 4.8.2 show that∫

Th df∗(µ) =

∫S

∫Th(t) f(s)(dt) µ(ds), (2.1)

whenever h : T → R+ is measurable and bounded.

Let us again check the properties a Kleisli tripel. Fix B as a measurable subset of S, f : S →SS and g : T → SU as morphisms in Meas.

À Let µ ∈ SS, then

e∗S(µ)(B) =

∫SδS(x)(B) µ(dx) = µ(B),

hence e∗S = idSS .

Á If x ∈ S, then

f∗(eS(x))(B) =

∫Sf(s)(B) δS(x)(ds) = f(x)(B),

since∫S h dδS(x) = h(x) for every measurable map h. Thus f∗ eS = f .

 Given µ ∈ SS, we have

(g∗ f∗)(µ)(B) = g∗(f∗(µ))(B)

=

∫Tg(t)(B) f∗(µ)(dt)

(2.1)=

∫S

∫Tg(t)(B) f(s)(dt) µ(ds)

=

∫Sg∗(f(s))(B) µ(ds)

= (g∗ f)∗(µ)(B)

Thus g∗ f∗ = (g∗ f)∗.

Hence (S, e,−∗) forms a Kleisli triple over the category Meas of measurable spaces.

Let us finally determine the monad’s multiplication. We have for M ∈ (S S)S and themeasurable set B ⊆ S

mS(M)(B) = id∗S(S)(M)(B) =

∫S(S)

τ(B) M(dτ)

,

Page 140: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

128 CHAPTER 2. CATEGORIES

The underlying monad has been investigated by M. Giry, so it is called in her honor the Girymonad , and S is called the Giry functor. Both are used extensively as the machinery on which

Girymonad

Markov transition systems are based.

The next example shows that ultrafilter define a monad as well.

Example 2.4.9 Let U be the ultrafilter functor on Set, see Example 2.3.14. Define for theset S and the map f : S → UT

ηS(s) := A ⊆ S | s ∈ A,f∗(U) :=

B ⊆ T | s ∈ S | B ∈ f(s) ∈ U

,

provided U ∈ US is an ultrafilter. Then ∅ 6∈ f∗(U), since ∅ 6∈ U . ηS(s) is the principalultrafilter associated with s ∈ S (see page 30), hence ηS : S → US. Because the intersectionof two sets is a member of an ultrafilter iff both sets are elements of it,

s ∈ S | B1 ∩B2 ∈ f(s) = s ∈ S | B1 ∈ f(s) ∩ s ∈ S | B2 ∈ f(s),

f∗(U) is closed under intersections, moreover, B ⊆ C and B ∈ f∗(U) imply C ∈ f∗(U). IfB 6∈ f∗(U), then s ∈ S | f(s) ∈ B 6∈ U , hence s ∈ S | B 6∈ f(s) ∈ U , thus S \B ∈ f∗(U),and vice versa. Hence f∗(U) is an ultrafilter, thus f∗ : US → UT .

We check whether (U , η,−∗) is a Kleisli tripel.

À Since B ∈ η∗S(U) iff B = s ∈ S | s ∈ B ∈ U , we conclude that η∗S = idUS .

Á Similarly, if f : S → UT and s ∈ S, then B ∈ (f∗ηS)(s) iff B ∈ f(s), hence f∗ηS = f.

 Let f : S → UT and g : T → UW . Then

B ∈ (g∗ f∗)(U)⇔ s ∈ S | t ∈ T | B ∈ g(t) ∈ f(s) ∈ U⇔ B ∈ (g∗ f)∗(U)

for U ∈ US. Consequently, g∗ f∗ = (g∗ f)∗.

Let us compute the monad’s multiplication. Define for B ⊆ S the set

[B] := C ∈ US | B ∈ C

as the set of all ultrafilters on S which contain B as an element, then an easy computationshows

µS(V ) = id∗US(V ) = B ⊆ S | [B] ∈ V for V ∈ (U U)S. ,

Example 2.4.10 This example deals with upper closed subsets of the power set of a set, seeExample 2.3.13. Let again

V S := V ⊆ PS | V is upper closed

be the endofunctor on Set which assigns to set S all upper closed subsets of PS. We definethe components of a Kleisli tripel as follows: ηS(s) is the principal ultrafilter generated bys ∈ S, which is upper closed, and if f : S → V T is a map, we put

f∗(V ) := B ⊆ T | s ∈ S | B ∈ f(s) ∈ V

Page 141: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.4. MONADS AND KLEISLI TRIPELS 129

for V ∈ V T , see in Example 2.4.9.

The argumentation in Example 2.4.9 carries over and shows that this defines a Kleisli tripel.,

These examples show that monads and Kleisli tripels are constructions which model manycomputationally interesting subjects. After looking at the practical side of this, we return tothe discussion of the relationship of monads to adjunctions, another important concept.

2.4.3 Monads in Haskell

The functional programming language Haskell thrives on the construction of monads. Wehave a brief look.

Haskell permits the definition of type classes; the definition of a type class requires thespecification of the types on which the class is based, and the signature of the functionsdefined by this class. The definition of class Monad is given below (actually, it is rather aspecification of Kleisli tripels).

class Monad m where(>>=) :: m a -> (a -> m b) -> m breturn :: a -> m a(>>) :: m a -> m b -> m bfail :: String -> m a

Thus class Monad is based on type constructor m, it specifies four functions of which >>= andreturn are the most interesting. The first one is called bind and used as an infix operator:given x of type m a and a function f of type a -> m b, the evaluation of x >>= f will yield aresult of type m b. This corresponds to f∗. The function return takes a value of type a andevaluates to a value of type m a; hence it corresponds to ηa (the name return has probablynot been a fortunate choice). The function >>, usually used as an infix operator as well, isdefined by default in terms of >>=, and function fail serves to handling exceptions; bothfunctions will not concern us here.

Not every conceivable definition of the functions return and the bind function >>= are suitablefor the definition of a monad. These are the laws the Haskell programmer has to enforce, andit becomes evident that these are just the laws for a Kleisli tripel from Definition 2.4.1:

return x >>= f == f xp >>= return == pp >>= (\x -> (f x >>= g)) == (p >>= (\x -> f x)) >>= g

(here x is not free in g; \x -> f x is Haskell’s way of expressing the anonymous functionλx.fx). The compiler for Haskell cannot check these laws, so the programmer has to makesure that they hold.

We demonstrate the concept with a simple example. Lists are a popular data structure. Theyare declared as a monad in this way:

instance Monad [] wherereturn t = [t]x >>= f = concat (map f x)

Page 142: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

130 CHAPTER 2. CATEGORIES

This makes the polymorphic type constructor [] for lists a monad; it specifies essentially thatf is mapped over the list x (x has to be a list, and f a function defined on the list’s base typeyielding a list as a value); this results in a list of lists which then will be flattened through anapplication of function concat. This example should clarify things:

>>> q = (\w -> [0 .. w])>>> [0 .. 2] >>= q[0,0,1,0,1,2]

The effect is explained in the following way: The definition of the bind operation >>= requiresthe computation of

concat (map q [0 .. 2]) = concat [(q 0), (q 1), (q 2)]== concat [[0], [0, 1], [0, 1, 2]]== [0,0,1,0,1,2].

We check the laws of a monad.

• We have return x == [x], hence

return x >>= f == [x] >>= f== concat (map f [x])== concat [f x]== f x

• Similarly, if p is a given list, then

p >>= return == concat (map return p)== concat [[x] | x <- p]== [x | x <- p]== p

• For the third law, if p is the empty list, then the left and the right hand side are emptyas well. Hence let us assume that p = [x1, .., xn]. We obtain for the left hand side

p >>= (\x -> (f x >>= g))== concat (map (\x -> (f x >>= g)) p)== concat (concat [map g (f x) | x <- p]),

and for the right hand side

(concat [f x | x <- p]) >>= g== ((f x1) ++ (f x2) ++ .. ++ (f xn)) >>= g== concat (map g ((f x1) ++ (f x2) ++ .. ++ (f xn)))== concat (concat [map g (f x) | x <- p])

(this argumentation could of course be made more precise through a proof by inductionon the length of list p, but this would lead us too far from the present discussion).

Kleisli composition >=> can be defined in a monad as follows:

(>=>) :: Monad m => (a -> m b) -> (b -> mc) -> (a -> m c)f >=> g = \x -> (f x) >>= g

This gives in the first line a type declaration for operation >=> by indicating that the infixoperator >=> takes two arguments, viz., a function the signature of which is a -> m b, asecond one with signature b -> m c, and that the result will be a function of type a -> m c,

Page 143: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 131

as expected. The precondition to this type declaration is that m is a monad. The body of thefunction will use the bind operator for binding f x to g; this results in a function dependingon x. It can be shown that this composition is associative.

2.5 Adjunctions and Algebras

An adjunction relates two functors F : K → L and G : L→K in a systematic way. We de-fine this formally and investigate some examples in order to show that this is a natural conceptwhich arises in a variety of situations. In fact, we will show that monads are closely relatedto adjunctions via algebras, so we will study algebras as well and provide the correspondingconstructions.

2.5.1 Adjunctions.

We define the basic notion of an adjunction and show that an adjunction defines a pair ofnatural transformations through universal arrows (which is sometimes taken as the basis foradjunctions).

Definition 2.5.1 Let K and L be categories. Then (F ,G, ϕ) is called an adjunction iff

1. F : K → L and G : L→K are functors,

2. for each object a in L and x in K there is a bijection

ϕx,a : homL(Fx, a)→ homK(x,Ga)

which is natural in x and a.

F is called the left adjoint to G, G is called the right adjoint to F .

That ϕx,a is natural for each x, a means that for all morphisms f : a→ b in L and g : y → xin K both diagrams commute:

homL(Fx, a)ϕx,a //

f∗

homK(x,Ga)

(Gf)∗

homL(Fx, a)ϕx,a //

(F g)∗

homK(x,Ga)

g∗

homL(Fx, b) ϕx,b

// homK(x,Gb) homL(F y, a) ϕy,a// homK(y,Ga)

Here f∗ := homL(Fx, f) and g∗ := homK(g,Ga) are the hom-set functors associated with fresp. g, similar for (Gf)∗ and for (F g)∗; for the hom-set functors see Example 2.3.3.

Let us have a look at currying as a simple example.

Example 2.5.2 A map f : X×Y → Z is sometimes considered as a map f : X → (Y → Z),so that f(x, y) is considered as the value F (x)(y) at y for the “higher order” map F (x) :=λb.f(x, b). This technique is popular in functional programming, it is called currying , and Curryingwill be discussed now.

Page 144: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

132 CHAPTER 2. CATEGORIES

Fix a set E and define the endofunctors F ,G : Set → Set by F := − × E resp. G := −E .Thus we have in particular (F f)(x, e) := 〈f(x), e〉 and (Gf)(g)(e) := f(g(e)), wheneverf : X → Y is a map.

Define the map ϕX,A : homSet(FX,A)→ homSet(X,GA) by ϕX,A(k)(x)(e) := k(x, e). ThenϕX,A is a bijection. In fact, let k1, k2 : FX → A be different maps, then k1(x, e) 6= k2(x, e)for some 〈x, e〉 ∈ X ×E, hence ϕX,A(k1)(x)(e) 6= ϕX,A(k2)(x)(e), so that ϕX,A is one-to-one.Let ` : X → GA be a map, then ` = ϕX,A(k) with k(x, e) := `(x)(e). Thus ϕX,A is onto.

In order to show that ϕ is natural both in X and in A, take maps f : A→ B and g : Y → Xand trace k ∈ homSet(FX,A) through the diagrams in Definition 2.5.1. We have

ϕX,B(f∗(k))(x)(e) = f∗(k)(x, e) = f(k(x, e)) = f(ϕX,A(k)(x, e)) = (Gf)∗(ϕX,A(k)(x)(e).

Similarly,

g∗(ϕX,A(k))(y)(e) = k(g(y), e) = (F g)∗(k)(y, e) = ϕY,A((F g)∗(k))(y)(e).

This shows that (F ,G, ϕ) with ϕ as the currying function is an adjunction. ,

Another popular example is furnished through the diagonal functor.

Example 2.5.3 Let K be a category such that for any two objects a and b their product a×bexists. Recall the definition of the Cartesian product of categories from Lemma 2.1.19. Definethe diagonal functor ∆ : K →K ×K through ∆a := 〈a, a〉 for objects and ∆f := 〈f, f〉 formorphism f . Conversely, define T : K ×K →K by putting T (a, b) := a× b for objects, andT 〈f, g〉 := f × g for morphism 〈f, g〉.

Let 〈k1, k2〉 ∈ homK×K(∆a, 〈b1, b2〉), hence we have morphisms k1 : a → b1 and k2 : a →b2. By the definition of the product, there exists a unique morphism k : a → b1 × b2with k1 = π1 k and k2 = π2 k, where πi : b1 × b2 → bi are the projections, i = 1, 2.Define ϕa,b1×b2(k1, k2) := k, then it is immediate that ϕa,b1×b2 : homK×K(∆a, 〈b1, b2〉) →homK(a,T (b1, b2) is a bijection.

Let 〈f1, f2〉 : 〈a1, a2〉 → 〈b1, b2〉 be a morphism, then the diagram

homK×K(∆x, 〈a1, a2〉)

〈f1,f2〉∗

ϕx,〈a1,a2〉 // homK(x, a1 × a2)

(T (f1,f2))∗

homK×K(∆x, 〈b1, b2〉) ϕx,〈b1,b2〉// homK(x, b1 × b2)

splits into the two commutative diagrams

homK(x, ai)

fi,∗

πiϕx,〈a1,a2〉 // homK(x, ai)

(πi(T (f1,f2)))∗

homK(x, bi) πiϕx,〈b1,b2〉// homK(x, bi)

for i = 1, 2, hence is commutative itself. One argue similarly for a morphism g : b→ a. Thusthe bijection ϕ is natural.

Hence we have found out that (∆,T , ϕ) is an adjunction, so that the diagonal functor hasthe product functor as an adjoint. ,

Page 145: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 133

A map f : X → Y between sets provides us with another example, which is the special caseof a Galois connection: A pair f : P → Q and g : Q → P of monotone maps between thepartially ordered sets P and Q form a Galois connection iff f(p) ≥ q ⇔ p ≤ g(q) for allp ∈ P, q ∈ Q.

Example 2.5.4 Let X and Y be sets, then the inclusion on PX resp. PY makes these setscategories of their own, see Example 2.1.4. Given a map f : X → Y , define f? : PX → PYas the direct image f?(A) := f

[A]

and f! : PY → PX as the inverse image f!(B) := f−1[B].

Now we have for A ⊆ X and B ⊆ Y

B ⊆ f?(A)⇔ B ⊆ f[A]⇔ f−1

[B]⊆ A⇔ f!(B) ⊆ A.

This means in terms of the hom-sets that homPY(B, f?(A)) 6= ∅ iff homPX(f!(B), A) 6= ∅.Hence this gives an adjunction (f!, f?, ϕ). ,

Back to the general development. This auxiliar statement will help in some computa-tions.

Lemma 2.5.5 Let (F ,G, ϕ) be an adjunction, f : a → b and g : y → x be morphisms in Lresp. K. Then we have

(Gf) ϕx,a(t) = ϕx,b(f t),ϕx,a(t) g = ϕy,a(t F g)

for each morphism t : Fx→ a in L.

Proof Chase t through the left hand diagram of Definition 2.5.1 to obtain

((Gf)∗ ϕx,a)(t) = (Gf) ϕx,a(t) = ϕx,b(f∗(t)) = ϕx,b(f t).

This yields the first equation, the second is obtained from tracing t through the diagram onthe right hand side. a

An adjunction induces natural transformations which make this important construction easierto handle, and which helps indicating connections of adjunctions to monads and Eilenberg-Moore algebras. Before entering the discussion, universal arrows are introduced.

Definition 2.5.6 Let S : C →D be a functor, and c an object in C.

1. the pair 〈r, u〉 is called a universal arrow from c to S iff r is an object in C andu : c → Sr is a morphism in D such that for any arrow f : c → Sd there exists aunique arrow f ′ : r → d in C such that f = (Sf ′) u.

2. the pair 〈r, v〉 is called a universal arrow from S to c iff r is an object in C and v : Sr → cis a morphism in D such that for any arrow f : Sd → c there exists a unique arrowf ′ : d→ r in C such that f = v (Sf ′).

Thus, if the pair 〈r, u〉 is universal from c to S, then each arrow c→ Sd in C factors uniquelythrough the S-image of an arrow r → d in C. Similarly, if the pair 〈r, v〉 is universal from Sto c, then each D-arrow Sd→ c factors uniquely through the S-image of an C-arrow d→ r.These diagrams depict the situation for a universal arrow u : c→ Sr resp. a universal arrow

Page 146: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

134 CHAPTER 2. CATEGORIES

v : Sr → c.

cu //

f &&

Sr

Sf ′

r

! f ′

Srv // c r

Sd d Sd

Sf ′

OO

f

88

d

f ′!

OO

This is a characterization of a universal arrow from c to S.

Lemma 2.5.7 Let S : C → D be a functor. Then 〈r, u〉 is an universal arrow from c toS iff the function ψd which maps each morphism f ′ : r → d to the morphism (Sf ′) u is anatural bijection homC(r, d)→ homD(c,Sd).

Proof 1. If 〈r, u〉 is an universal arrow, then bijectivity of ψd is just a reformulation of thedefinition. It is also clear that ψd is natural in d, because if g : d → d′ is a morphism, thenS(g′ f ′) u = (Sg′) (Sg) u.

2. Now assume that ψd : homC(r, d) → homD(c,Sd) is a bijection for each d, and choose inparticular r = d. Define u := ψr(idr), then u : c → Sr is a morphism in D. Consider thisdiagram for an arbitrary f ′ : r → d

homC(r, r)

homC(r,f ′)

ψr // homD(c,Sr)

homD(s,Sf ′)

homC(r, d)ψd

// homD(c,Sd)

Given a morphism f : c → Sd in D, there exists a unique morphism f ′ : r → d such thatf = ψd(f

′), because ψd is a bijection. Then we have

f = ψd(f′)

=(ψd homC(r, f ′)

)(idr)

=(homD(c,Sf ′) ψr

)(idr) (commutativity)

= homD(c,Sf ′) u (u = ψr(idr))

= (Sf ′) u.

a

Universal arrows will be used now for a characterization of adjunctions in terms of naturaltransformations (we will sometimes omit the indices for the natural transformation ϕ thatcomes with an adjunction).

Theorem 2.5.8 Let (F ,G, ϕ) be an adjunction for the functors F : K → L and G : L→K.Then there exist natural transformations η : IdK → G F and ε : F G → IdL with theseproperties:

1. the pair 〈Fx, ηx〉 is a universal arrow from x to G for each x in K, and ϕ(f) = Gf ηxholds for each f : Fx→ a,

2. the pair 〈Ga, εa〉 is universal from F to a for each a in L, and ϕ−1(g) = εa F g holdsfor each g : x→ Ga,

Page 147: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 135

3. the composites

GηG //

idG((

G F G

FF η //

idF((

F G F

εF

G F

are the identities for G resp. F .

Proof 1. Put ηx := ϕx,Fx(idFx), then ηx : x → GFx. In order to show that 〈Fx, ηx〉 is auniversal arrow from x to G, we take a morphism f : x→ Ga for some object a in L. Since(F ,G, ϕ) is an adjunction, we know that there exists a unique morphism f ′ : Fx → a suchthat ϕx,a(f

′) = f . We have also this commutative diagram

homK(Fx,Fx)

hom(Fx,f ′)

ϕx,Fx // homL(x,GFx)

homL(x,Gf ′)

homK(Fx, a) ϕx,a// homL(x,Ga)

Thus

(Gf ′) ηx =(homL(x,Gf ′) ϕx,Fx

)(idFx)

=(ϕx,a homK(Fx, f ′)

)(idFx)

= ϕx,a(f′)

= f

2. η : IdK → G F is a natural transformation. Let h : x→ y be a morphism in K, then wehave by Lemma 2.5.5

G(Fh) ηx = G(Fh) ϕx,Fx(idFx) = ϕx,F y(Fh idFx)= ϕx,F y(idF y Fh) = ϕy,F y(idF y) h= ηy h.

3. Put εa := ϕ−1Ga,a(idGa) for the object a in L, then the properties for ε are proved in exactly

the same way as for those of η.

4. From ϕx,a(f) = Gf ηx we obtain

idGa = ϕ(εa) = Gεa ηGa = (Gε ηG)(a),

so that Gε ηG is the identity transformation on G. Similarly, ηF F ε is the identity forF . a

The transformation η is sometimes called the unit of the adjunction, whereas ε is called itscounit . The converse to Theorem 2.5.8 holds as well: from two transformations η and ε with

Unit,counit

the signatures as above one can construct an adjunction. The proof is a fairly straightforwardverification.

Page 148: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

136 CHAPTER 2. CATEGORIES

Proposition 2.5.9 Let F : K → L and G : L → K be functors, and assume that naturaltransformations η : IdK → G F and ε : F G→ IdL are given so that (Gε) (ηG) is theidentity of G, and (εF ) (F η) is the identity of F . Define ϕx,a(k) := (Gk) ηx, wheneverk : Fx→ a is a morphism in L. Then (F ,G, ϕ) defines an adjunction.

Proof 1. Define ϑx,a(`) := εa F g for ` : x→ Ga, then we have

ϕx,a(ϑx,a(g)) = G(εa F g) ηx= (Gεa) (GF g) ηx= (Gεa) ηGa g (η is natural)

=((Gε ηG)a

) g

= idGag

= g

Thus ϕx,a ϑx,a = idhomL(x,Ga). Similarly, one shows that ϑx,a ϕx,a = idhomK(Fx,a), so thatϕx,a is a bijection.

2. We have to show that ϕx,a is natural for each x, a, so take a morphism f : a→ b in L andchase k : Fx→ a through this diagram.

homL(Fx, a)ϕx,a //

f∗

homK(x,Ga)

(Gf)∗

homL(Fx, b) ϕx,b// homK(x,Gb)

Then((Gf)∗ ϕx,a

)(k) = (Gf Gk) ηx = G(f k) ηx = ϕx,b(f∗ k).

a

Thus it is sufficient to identify its unit and its counit for identifying an adjunction. Thisincludes verifying the identity laws of the functors for the corresponding compositions. Thefollowing example has another look at currying (Example 2.5.2), demonstrating the approachand suggesting that identifying unit and counit is sometimes easier than working with theoriginally given definition.

Example 2.5.10 Continuing Example 2.5.2, we take the definitions of the endofunctors Fand G from there. Define for the set X the natural transformations η : IdSet → G F andε : F G→ IdSet through

ηX :

X → (X × E)E

x 7→ λe.〈x, e〉and

εX :

(X × E)E × E → X

〈g, e〉 7→ g(e)

Note that we have (Gf)(h) = f h for f : XE → Y E and h ∈ XE , so that we obtain

GεX(ηGX(g))(e) = (εX ηGX(g))(e) = εX(ηGX(g))(e)= εX(ηGX(g)(e)) = εX(g, e)= g(e),

Page 149: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 137

whenever e ∈ E and g ∈ GX = XE , hence (Gε) (ηG) = idG. One shows similarly that(εF ) (F η) = idF through

εFX(F ηX(x, e)) = ηX(x)(e) = 〈x, e〉.

,

Now let (F ,G, ϕ) be an adjunction with functors F : K → L and G : L → K, the unit ηand the counit ε. Define the functor T through T := G F . Then T : K → K defines anendofunctor on category K with µa := (GεF ) (a) = GεF a as a morphism µa : T (a)→ T a.Because εa : FGa → a is a morphism in L, and because ε : F G → IdL is natural, thediagram

(F G F G)aε(FG)a //

(F G)εa

(F G)a

εa

(F G)a εa

// a

is commutative. This means that this diagram

F G F Gε(F G) //

(F G)ε

F Gε

F G ε// IdK

of functors and natural transformations commutes. Multiplying from the left with G andfrom the right with F gives this diagram.

G F G F G FGε(F GF ) //

(GF G)εF

G F G F

GεF

G F G FGεF

// G F

Because Tµ = (GF G)εF , and Gε(F GF ) = µT , this diagram can be written as

T 3 Tµ //

µT

T 2

µ

T 2

µ// T

This gives the commutativity of the left hand diagram in Definition 2.4.3 for a monad. BecauseLo andbehold!

Gε ηG is the identity on G, we obtain

GεF a ηGF a = (Gε ηG)(F a) = GF a,

which implies that the diagram

T aµTa //

idTa ''

T 2a

µa

T a

Page 150: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

138 CHAPTER 2. CATEGORIES

commutes. On the other hand, we know that εF F η is the identity on F ; this yields

GεF a GF ηa = G(εF F η)a = GF a.

Hence we may complement the last diagram:

T aµTa //

idTa ''

T 2a

µa

T aT ηaoo

idTawwT a

This gives the right hand side diagram in Definition 2.4.3 for a monad. We have shown

Proposition 2.5.11 Each adjunction defines a monad. a

It turns out that we not only may proceed from an adjunction to a monad, but that it is alsopossible to traverse this path in the other direction.

2.5.2 Eilenberg-Moore Algebras

We will show that a monad defines an adjunction. In order to do that, we have to representthe functorial part of a monad as the composition of two other functors, so we need a secondcategory for this. The algebras which are defined for a monad provide us with this category.So we will define algebras (and in a later chapter, their counterparts, coalgebras), and we willstudy them. This will help us in showing that each monad defines an adjunction. Finally, wewill have a look at two examples for algebras, in order to illuminate this concept.

Given a monad (T , η, µ) in a category K, a pair 〈x, h〉 consisting of an object x and amorphism h : Tx → x in K is called an Eilenberg-Moore algebra for the monad iff thefollowing diagrams commute

T 2xTh //

µx

Tx

h

xηx //

idx&&

Tx

h

Tx

h// x x

The morphism h is called the structure morphism of the algebra, x its carrier.

Eilenberg-Moorealgebra

An algebra morphism f : 〈x, h〉 → 〈x′, h′〉 between the algebras 〈x, h〉 and 〈x′, h′〉 is a mor-phism f : x→ x′ in K which renders the diagram

Tx

T f

h // x

f

Tx′h′

// x′

commutative. Eilenberg-Moore algebras together with their morphisms form a categoryAlg(T ,η,µ). We will usually omit the reference to the monad. Fix for the moment (T , η, µ) as

Page 151: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 139

a monad in category K, and let Alg := Alg(T ,η,µ) be the associated category of Eilenberg-Moore algebras.Alg(T ,η,µ)

We give some simple examples.

Lemma 2.5.12 The pair 〈Tx, µx〉 is a T -algebra for each x in K.

Proof This is immediate from the laws for η and µ in a monad. a

These algebras are usually called the free algebras for the monad. Morphisms in the baseFree

algebracategory K translate into morphisms in Alg through functor T .

Lemma 2.5.13 If f : x → y is a morphism in K, then T f : 〈Tx, µx〉 → 〈T y, µy〉 is amorphism in Alg. If 〈x, h〉 is an algebra, then h : 〈Tx, µx〉 → 〈x, h〉 is a morphism in Alg.

Proof Because µ : T 2 → T is a natural transformation, we see µy T 2f = (T f) µx. Thisis just the defining equation for a morphism in Alg. The second assertion follows also fromthe defining equation of an algebra morphism. a

We will identify the algebras for the power set monad now, which are closely connected tosemi-lattices. Recall that an ordered set (X,≤) is a sup-semi lattice iff each subset has itssupremum in X.

Example 2.5.14 The algebras for the monad (P, η, µ) in the category Set of sets with maps(sometimes called the Manes monad) may be identified with the complete sup semi-lattices.

Manesmonad

We will show this now.

Assume first that ≤ is a partial order on a set X that is sup-complete, so that supA exists foreach A ⊆ X. Define h(A) := supA, then we have for each A ∈ P (P (X)) from the familiarproperties of the supremum

sup(⋃A) = sup sup a | a ∈ A.

This translates into(h µX

)(A) =

(h (Ph)

)(A). Because x = supx holds for each x ∈ X,

we see that 〈X,h〉 defines an algebra.

Assume on the other hand that 〈X,h〉 is an algebra, and put

x ≤ x′ ⇔ h(x, x′) = x′

for x, x′ ∈ X. This defines a partial order: reflexivity and antisymmetry are obvious. Transi-tivity is seen as follows: assume x ≤ x′ and x′ ≤ x′′, then

h(x, x′′) = h(h(x), h(x′, x′′)) =(h (Ph)

)(x, x′, x′′)

=(h µX

)(x, x′, x′′) = h(x, x′, x′′)

=(h µX

)(x, x′, x′, x′′) =

(h (Ph)

)(x, x′, x′, x′′)

= h(x′, x′′) = x′′.

It is clear from x ∪ ∅ = x for every x ∈ X that h(∅) is the smallest element. Finally, ithas to be shown that h(A) is the smallest upper bound for A ⊆ X in the order ≤. We may

Page 152: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

140 CHAPTER 2. CATEGORIES

assume that A 6= ∅. Suppose that x ≤ t holds for all x ∈ A, then

h(A ∪ t) = h(⋃

x∈Ax, t)

=(h µX

)(x, t | x ∈ A)

=(h (Ph)

)(x, t | x ∈ A) = h

(h(x, t) x ∈ A

)= h(t) = t.

Thus, if x ≤ t for all x ∈ A, hence h(A) ≤ t, thus h(A) is an upper bound to A, and similarly,h(A) is the smallest upper bound. ,

We have shown that each adjunction defines a monad, and — as announced above — nowturn to the converse. In fact, we will show that each monad defines an adjunction, the monadof which is the given monad.

Fix the monad (T , η, µ) over category K, and define as above Alg := Alg(T ,η,µ) as the cate-gory of Eilenberg-Moore algebras. We intend to define an adjunction, so by Proposition 2.5.9it will be most convenient approach to solve the problem by defining unit and counit, afterthe corresponding functors have been identified.

Lemma 2.5.15 Define F a := 〈T a, µa〉 for the object a ∈ |K|, and if f : a→ b is a morphismif K, define F f := T f . Then F : K → Alg is a functor.

Proof We have to show that F f : 〈T a, µa〉 → 〈T b, µb〉 is an algebra morphism. Sinceµ : T 2 → T is natural, we obtain this commutative diagram

T aµa //

T f

T 2a

T 2f

T b µb// T 2b

But this is just the defining definition for an algebra morphism. a

This statement is trivial:

Lemma 2.5.16 Given an Eilenberg-Moore algebra 〈x, h〉 ∈ |Alg|, define G(x, h) := x; iff : 〈x, h〉 → 〈x′, h′〉 is a morphism in Alg, put Gf := f . Then G : Alg → K is a functor.Moreover we have G F = T . a

We require two natural transformations, which are defined now, and which are intended toserve as the unit and as the counit, respectively, for the adjunction. We define for the unitη the originally given η, so that η : IdK → G F is a natural transformation. The counitε is defined through ε〈x,h〉 := h, so that ε〈x,h〉 : (F G)(x, h) → IdAlg(x, h). This defines anatural transformation ε : F G→ IdAlg. In fact, let f : 〈x, h〉 → 〈x′, h′〉 be a morphism inAlg, then — by expanding definitions — the diagram on the left hand side translates to theone on the right hand side, which commutes:

(F G)(x, h)

(F G)f

ε〈x,h〉 // 〈x, h〉

f

〈Tx, µx〉

T f

h // 〈x, h〉

f

(F G)(x′, h′) ε〈x′,h′〉// 〈x′, h′〉 〈Tx′, µx′〉

h′// 〈x′, h′〉

Page 153: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 141

Now take an object a ∈ |K|, then

(εF F η)(a) = εF a(F ηa) = ε〈T a,µa〉(T ηa) = µa(T ηa) = idF a.

On the other hand, we have for the algebra 〈x, h〉

(Gε ηG)(x, h) = Gε〈x,h〉(ηG〈x,h〉) = Gε〈x,h〉(ηx) = ε〈x,h〉(ηx) = hηx(∗)= idx = idG〈x,h〉

where (∗) uses that h : Tx→ x is the structure morphism of an algebra. Taken together, wesee that η and ε satisfy the requirements of unit and counit for an adjunction according toProposition 2.5.9.

Hence we have nearly established

Proposition 2.5.17 Every monad defines an adjunction. The monad defined by the adjunc-tion is the original one.

Proof We have only to prove the last assertion. But this is trivial, because (GεF )a =(Gε)〈T a, µa〉 = Gµa = µa. a

Algebras for discrete probabilities. We identify now the Eilenberg-Moore algebras forthe functor D, which assigns to each set its discrete subprobabilities with finite support, seeExample 2.3.11. Some preliminary and motivating observations are made first.

Put

Ω := 〈α1, . . . , αk〉 | k ∈ N, αi ≥ 0,

k∑i=1

αi ≤ 1

as the set of all positive convex coefficients, and call a subset V of a real vector space positiveconvex iff

∑ki=1 αi · xi ∈ V. for x1, . . . , xk ∈ V , 〈α1, . . . , αk〉 ∈ Ω. Positive convexity is to be

related to subprobabilities: if∑k

i=1 αi · xi is perceived as an observation in which item xi is

assigned probability αi, then clearly∑k

i=1 αi ≤ 1 under the assumption that the observationis incomplete, i.e., that not every possible case has been observed.

Suppose a set X over which we formulate subprobabilities is embedded as a positive convexset into a linear space V over R. In this case we could read off a positive convex combinationfor an element the probabilities with which the respective components occurs.

These observations meet the intuition about positive convexity, but it has the drawback thatwe have to look for a linear space V into which X to embed. It has the additional shortcomingthat once we did identify V , the positive convex structure on X is fixed through the vectorspace, but we will see soon that we need flexibility. Consequently, we propose an abstractdescription of positive convexity, much in the spirit of Pumplun’s approach [Pum03]. Thusthe essential properties (for us, that is) of positive convexity are described intrinsically forX without having to resort to a vector space, leading to the definition of a positive convexstructure.

Definition 2.5.18 A positive convex structure p on a set X has for each α = 〈α1, . . . , αn〉 ∈Ω a map αp : Xn → X which we write as

∑p

Page 154: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

142 CHAPTER 2. CATEGORIES

αp(x1, . . . , xn) =∑p

1≤i≤n αi · xi,

such that

I∑p

1≤i≤n δi,k · xi = xk, where δi,j is Kronecker’s δ (thus δi,j = 1 if i = j, and δi,j = 0,otherwise),

J the identity∑p1≤i≤n αi ·

(∑p1≤k≤m βi,k · xk

)=∑p

1≤k≤m

(∑p1≤i≤n αiβi,k

)· xk

holds whenever 〈α1, . . . , αn〉, 〈βi,1, . . . , βi,m〉 ∈ Ω, 1 ≤ i ≤ n.

Property I looks quite trivial, when written down this way. Rephrasing, it states that themap

〈δ1,k, . . . , δn,k〉p : Tn → T,

which is assigned to the n-tuple 〈δ1,k, . . . , δn,k〉 through p acts as the projection to the kth

component for 1 ≤ k ≤ n. Similarly, property J may be re-coded in a formal but less conciseway. Thus we will use freely the notation from vector spaces, omitting in particular theexplicit reference to the structure whenever possible. Hence simple addition α1 · x1 + α2 · x2

will be written rather than∑p

1≤i≤2 αi · xi, with the understanding that it refers to a givenpositive convex structure p on X.

It is an easy exercise to establish that for a positive convex structure the usual rules formanipulating sums in vector spaces apply, e.g., 1 · x = x,

∑ni=1 αi · xi =

∑ni=1,αi 6=0 αi · xi, or

the law of associativity, (α1 ·x1 +α2 ·x2)+α3 ·x3 = α1 ·x1 +(α2 ·x2 +α3 ·x3). Nevertheless, careshould be observed, for of course not all rules apply: we cannot in general conclude x = x′

from α · x = α · x′, even if α 6= 0.

A morphism ϑ : 〈X1, p1〉 → 〈X2, p2〉 between positive convex structures is a map ϑ : X1 → X2

such that

ϑ(∑p1

1≤i≤n αi · xi)

=∑p2

1≤i≤n αi · ϑ(xi)

holds for x1, . . . , xn ∈ X and 〈α1, . . . , αn〉 ∈ Ω. In analogy to linear algebra, ϑ will becalled an affine map. Positive convex structures with their morphisms form a categoryStrConv.

We need some technical preparations, which are collected in the following

Lemma 2.5.19 Let X and Y be sets.

1. Given a map f : X → Y , let p = α1 · δa1 + . . . + αn · δan be the linear combinationof Dirac measures for x1, . . . , xn ∈ X with positive convex 〈α1, . . . , αn〉 ∈ Ω. ThenD(f)(p) = α1 · δf(x1) + . . .+ αn · δf(xn).

2. Let p1, . . . , pn be discrete subprobabilities X, and let M = α1 · δp1 + . . . + αn · δpn bethe linear combination of the corresponding Dirac measures in (D D)X with positiveconvex coefficients 〈α1, . . . , αn〉 ∈ Ω. Then µX(M) = α1 · p1 + . . .+ αn · pn.

Proof The first part follows directly from the observation D(f)(δx)(B) = δx(f−1[B]) =

δf(x)(B), and the second one is easily inferred from the formula for µ in Example 2.4.7.a

Page 155: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.5. ADJUNCTIONS AND ALGEBRAS 143

The algebras are described now without having to resort to DX through an intrinsic character-ization using positive convex structures with affine maps. This characterization is comparableto the one given by Manes for the power set monad (which also does not resort explicitly tothe underlying monad or its functor); see Example 2.5.14.

Lemma 2.5.20 Given an algebra 〈X,h〉 for D, define for x1, . . . , xn ∈ X and the positiveconvex coefficients 〈α1, . . . , αn〉 ∈ Ω

〈α1, . . . , αn〉p(x1, . . . , xn) := h(∑n

i=1 αi · δxi)

This defines a positive convex structure p on X.

Proof 1. Write∑n

i=1 αi · xi := h(∑n

i=1 αi · δxi) for convenience. Because h(∑n

i=1 δi,j · δxi)

=h(δxj ) = xj , property I in Definition 2.5.18 is satisfied.

2. Proving property J, we resort to the properties of an algebra and a monad:∑ni=1 αi ·

(∑mk=1 βi,k · xk

)= h

(∑ni=1 αi · δ∑m

k=1 βi,k·xk)

(2.2)

= h(∑n

i=1 αi · δh(∑mk=1 βi,k·δxk)

)(2.3)

= h(∑n

i=1 αi · S(h)(δ∑m

k=1 βi,k·δxk

))(2.4)

=(h S(h)

)(∑ni=1 αi · δ∑m

k=1 βi,k·δxk

)(2.5)

=(h µX

)(∑ni=1 αi · δ∑m

k=1 βi,k·δxk

)(2.6)

= h(∑n

i=1 αi · µX(δ∑m

k=1 βi,k·δxk

))(2.7)

= h(∑n

i=1 αi ·(∑m

k=1 βi,k · δxk))

(2.8)

= h(∑m

k=1

(∑ni=1 αi · βi,k

)δxk)

(2.9)

=∑m

k=1

(∑ni=1 αi · βi,k

)xk. (2.10)

The equations (2.2) and (2.3) reflect the definition of the structure, equation (2.4) appliesδh(τ) = S(h)(δτ ), equation (2.5) uses the linearity of S(h) according to Lemma 2.5.19, equa-tion (2.6) is due to h being an algebra. Winding down, equation (2.7) uses Lemma 2.5.19again, this time for µX , equation (2.8) uses that µX δτ = τ, equation (2.9) is just rearrangingterms, and equation (2.10) is the definition again. a

The converse holds as well, as we will show now.

Lemma 2.5.21 Let p be a positive convex structure on X. Put

h(∑n

i=1 αi · δxi)

:=∑p

1≤i≤n αi · xi

for 〈α1, . . . , αn〉 ∈ Ω and x1, . . . , xn ∈ X. Then 〈X,h〉 is an algebra.

Proof 0. We show first hat h is well defined, and then we establish that h is an affine map, so Outlinethat we may interchange the application of h with summation. Then we apply the elementaryproperties established in Lemma 2.5.19 for µX : (D D)X →DX to show that the equationh µX = idX holds.

1. We first check that h is well-defined: This is so since∑ni=1 αi · δxi =

∑mj=1 α

′j · δx′j

Page 156: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

144 CHAPTER 2. CATEGORIES

implies that ∑ni=1,αi 6=0 αi · δxi =

∑mj=1,α′j 6=0 α

′j · δx′j ,

hence given i with αi 6= 0 there exists j with α′j 6= 0 such that xi = x′j with αi = α′j and viceversa. Consequently,∑p

1≤i≤n αi · xi =∑p

1≤i≤n,αi 6=0 αi · xi =∑p

1≤j≤n,α′j 6=0α′j · x′j =

∑p1≤j≤n α

′j · x′j

is inferred from the properties of positive convex structures. Thus h : DX → X.

An easy induction using property J shows that h is an affine map, i.e., that we have

h(∑n

i=1 αi · τi)

=∑p

1≤i≤n αi · h(τi) (2.11)

for 〈α1, . . . , αn〉 ∈ Ω and τ1, . . . , τn ∈DX.

Now let f =∑n

i=1 αi · δτi ∈D2X with τ1, . . . , τn ∈DX. Then we obtain from Lemma 2.5.19that µXf =

∑ni=1 αi ·τi. Consequently, we obtain from (2.11) that h(µXf) =

∑p1≤i≤n αi ·h(τi).

On the other hand, Lemma 2.5.19 implies together with (2.11)

(h Dh)f = h(∑p

1≤i≤n αi · (Dh)(τi))

=∑p

1≤i≤n αi · h((Dh)(τi)

)=∑p

1≤i≤n αi · h(δh(τi))

=∑p

1≤i≤n αi · h(τi),

because h(δh(τi)) = h(τi). We infer from I that h µX = idX . a

Hence we have established

Proposition 2.5.22 Each positive convex structure on X induces an algebra for DX. a

Summarizing, we obtain a complete characterization of the Eilenberg-Moore algebras for thismonad.

Theorem 2.5.23 The Eilenberg-Moore algebras for the discrete probability monad are exactlythe positive convex structures. a

This characterization carries over to the probabilistic version of the monad; we leave thesimple formulation to the reader. A similar characterization is possible for the continuousversion of this functor, at least in Polish spaces. This requires a continuity condition, however,and is further discussed in Section 4.10.2.

2.6 Coalgebras

A coalgebra for a functor F is characterized by a carrier object c and by a morphism c →F c. This fairly general structure can be found in many applications, as we will see. So wewill first define formally what a coalgebra is, and then provide a gallery of examples, someof them already discussed in another disguise, some of them new. The common thread istheir formulation as a coalgebra. The fundamental notion of bisimilarity is introduced, andbisimilar coalgebras will be discussed, indicating some interesting facets of the possibilities todescribe behavioral equivalence of some sorts.

Page 157: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 145

Definition 2.6.1 Given the endofunctor F on category K, an object a on K together witha morphism f : a → F a is a coalgebra (a, f) for K. Morphism f is sometimes called thedynamics of the coalgebra, a its carrier.

Comparing the definitions of an algebra and a coalgebra, we see that for a coalgebra thefunctor F is an arbitrary endofunctor on K, while an algebra requires a monad and compat-ibility with unit and multiplication. Thus coalgebras are conceptually simpler by imposingless constraints.

We are going to enter now the gallery of examples and start with coalgebras for the powerset functor. This example will be with us for quite some time, in particular when we willinterpret modal logics. A refinement of this example will be provided by labelled transitionsystems.

Example 2.6.2 We consider the category Set of sets with maps as morphisms and the powerset functor P. An P-coalgebra consists of a set A and a map f : A→ P(A). Hence we havef(a) ⊆ A for all a ∈ A, so that a Set-coalgebra can be represented as a relation 〈a, b〉 | b ∈f(a), a ∈ A over A. If, conversely, R ⊆ A×A is a relation, then f(a) := b ∈ A | 〈a, b〉 ∈ Ris a map f : A→ P(A). ,

A slight extension is to be observed when we introduce actions, formally captures as labelsfor our transitions. Here a transition is dependent on an action which then serves as a labelto the corresponding relation.

Example 2.6.3 Let us interpret a labeled transition system(S, ( a)a∈A

)over state space

S with set A of actions, see Example 2.3.10. Then a⊆ S × S for all actions a ∈ A.

Working again in Set, we define for the set S and for the map f : S → T

TS := P(A× S),

(T f)(B) := 〈a, f(x)〉 | 〈a, x〉 ∈ B

(hence T = P(A×−)). Define f(s) := 〈a, s′〉 | s a s′, thus f : S → TS is a morphism in

Set. Consequently, a labeled transition system is interpreted as a coalgebra for the functorP(A×−). ,

Example 2.6.4 Let A be the inputs, B the outputs and X the states of an automaton withoutput, see Example 2.3.9. Put F := (− × B)A. For f : X → Y we have this commutativediagram.

At

(F f)t

##X ×B

f×idB// Y ×B

Let (S, f) be an F -coalgebra, thus f : S → FS = (S × B)A. Input a ∈ A in state s ∈ Syields f(s)(a) = 〈s′, b〉, so that s′ is the new state, and b is the output. Hence automata withoutput are perceived as coalgebras, in this case for the functor (−×B)A. ,

While the automata in Example 2.6.4 are deterministic (and completely specified), we canalso use a similar approach to modelling nondeterministic automata.

Page 158: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

146 CHAPTER 2. CATEGORIES

Example 2.6.5 Let A,B,X be as in Example 2.6.4, but take this time F := P(−×B)A asa functor, so that this diagram commutes for f : X → Y :

A

t

zz

(F f)t

$$P(X ×B)

P(f×idB)// P(Y ×B)

Thus P(f ×B)(D) = 〈f(x), b〉 ∈ Y ×B | 〈x, b〉 ∈ B. Then (S, g) is an F coalgebra iff inputa ∈ A in state s ∈ S gives g(s)(a) ∈ S ×B as the set of possible new states and outputs.

As a variant, we can replace P(− × B) by Pf (− × B), so that the automaton presents onlya finite number of alternatives. ,

Binary trees may be modelled through coalgebras as well:

Example 2.6.6 Put FX := ∗+X ×X, where ∗ is a new symbol. If f : X → Y , put

F (f)(t) :=

∗, if t = ∗〈x1, x2〉, if t = 〈x1, x2〉.

Then F is an endofunctor on Set. Let (S, f) be an F -coalgebra, then f(s) ∈ ∗ + S × S.This is interpreted that s is a leaf iff f(s) = ∗, and an inner node with offsprings 〈s1, s2〉,if f(s) = 〈s1, s2〉. Thus such a coalgebra represents a binary tree (which may be of infinitedepth). ,

The following example shows that probabilistic transitions may also be modelled as coalge-bras.

Example 2.6.7 Working in the category Meas of measurable spaces with measurable maps,we have introduced in Example 2.4.8 the subprobability functor S as an endofunctor onMeas. Let (X,K) be a coalgebra for S (we omit here the σ-algebra from the notation), thenK : X → SX is measurable, so that

1. K(x) is a subprobability on (the measurable sets of) X,

2. for each measurable set D ⊆ X, the map x 7→ K(x)(D) is measurable,

see Example 2.1.14 and Exercise 2.7. Thus K is a subprobabilistic transition kernel (or astochastic relation) on X. ,

Let us have a look at the upper closed sets introduced in Example 2.3.13. Coalgebras for thisfunctor will be used for an interpretation of games, see Example 2.7.22.

Example 2.6.8 Let V S := V ⊆ PS | V is upper closed. This functor has been studiedin Example 2.3.13. A coalgebra (S, f) for V is a map f : S → V S, so that f(s) ⊆ P(S) isupper closed, hence A ∈ f(s) and B ⊇ A imply B ∈ f(s) for each s ∈ S. We interpret f(s) asthe set of states a player may reach in state s, so that if the player can reach A and A ⊆ B,then the player certainly can reach B.

V is the basis for neighborhood models in modal logics, see, e.g., [Che89, Ven07] and page 170.,

Page 159: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 147

It is natural to ask for morphisms of coalgebras, which relate coalgebras to each other. Thisis a fairly straightforward definition.

Definition 2.6.9 Let F be an endofunctor on category K, then t : (a, f) → (b, g) is acoalgebra morphism for the F -coalgebras (a, f) and (b, g) iff t : a → b is a morphism in Ksuch that g t = F (t) f .

Thus t : (a, f) → (b, g) is a coalgebra morphism iff t : a → b is a morphism so that thisdiagram commutes:

a

f

t // b

g

F aF t

// F b

It is clear that F -coalgebras form a category with coalgebra morphisms as morphisms. Wereconsider some previously discussed examples and shed some light on the morphisms forthese coalgebras.

Example 2.6.10 Continuing Example 2.6.6 on binary trees, let r : (S, f) → (T, g) be amorphism for the F -coalgebras (S, f) and (T, g). Thus g r = F (r) f . This entails

1. f(s) = ∗, then g(r(s)) = (F r)(f(s)) = ∗ (thus s is a leaf iff r(s) is one),

2. f(s) = 〈s1, s2〉, then g(r(s)) = 〈t1, t2〉 with t1 = r(s1) and t2 = r(s2) (thus r(s) branchesout to 〈r(s1), r(s2)〉, provided s branches out to 〈s1, s2〉).

A coalgebra morphism preserves the tree structure. ,

Example 2.6.11 Continuing the discussion of deterministic automata with output from Ex-ample 2.6.4, let (S, f) and (T, g) be F -coalgebras and r : (S, F ) → (T, g) be a morphism.Given state s ∈ S, let f(s)(a) = 〈s′, b〉 be the new state and the output, respectively, afterinput a ∈ A for automaton (S, f). Then g(r(s))(a) = 〈r(s′), b〉, so after input a ∈ A theautomaton (T, g) will be in state r(s) and give the output b, as expected. Hence coalgebramorphisms preserve the working of the automata. ,

Example 2.6.12 Continuing the discussion of transition systems from Example 2.6.3, let(S, f) and (T, g) be labelled transition systems with A as the set of actions. Thus a transitionfrom s to s′ on action a is given in (S, f) iff 〈a, s′〉 ∈ f(s). Let us just for convenience writes a,S s

′ iff this is the case, similarly, we write t a,T t′ iff t, t′ ∈ T with 〈a, t′〉 ∈ g(t).

Now let r : (S, f) → (T, g) be a coalgebra morphism. We claim that for given s ∈ Swe have a transition r(s) a,T t0 for some t0 iff we can find s0 such that s a,S s0 andr(s0) = t0. Because r : (S, f)→ (T, g) is a coalgebra morphism, we have g r = (T r) f withT = P(A×−). Thus

g(r(s)) = P(A× r)(s) = 〈a, r(s′)〉 | 〈a, s′〉 ∈ f(s).

Consequently,

r(s) a,T t0 ⇔ 〈a, t0〉 ∈ g(r(s))

⇔ 〈a, t0〉 = 〈a, r(s0)〉 for some 〈a, s0〉 ∈ f(s)

⇔ s a,S s0 for some s0 with r(s0) = t0

Page 160: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

148 CHAPTER 2. CATEGORIES

This means that the transitions in (T, g) are essentially controlled by the morphism r and thetransitions in (S, f). Hence a coalgebra morphism between transition systems is a boundedmorphism in the sense of Example 2.1.10. ,

Example 2.6.13 We continue the discussion of upper closed sets from Example 2.6.8. Let(S, f) and (T, g) be V -coalgebras, so this diagram is commutative for morphism r : (S, F )→(T, g):

Sr //

f

T

g

V S

V r// V T

Consequently, W ∈ g(r(s)) iff r−1[W]∈ f(s). Taking up the interpretation of sets of states

which may be achieved by a player, we see that it1 may achieve W in state r(s) in (T, g) iffit may achieve in (S, f) the set r−1

[W]

in state s. ,

2.6.1 Bisimulations

The notion of bisimilarity is fundamental for the application of coalgebras to system modelling.Bisimilar coalgebras behave in a similar fashion, witnessed by a mediating system.

Definition 2.6.14 Let F be an endofunctor on a category K. The F -coalgebras (S, f) and(T, g) are said to be bisimilar iff there exists a coalgebra (M,m) and coalgebra morphisms

(S, f) (M,m)oo // (T, g) The coalgebra (M,m) is called mediating.

Thus we obtain this characteristic diagram with ` and r as the corresponding morphisms.

S

f

M

m

`oo r // T

g

FF FM

F `oo

F r// FT

This gives us f ` = (F `) m and together with g r = (F r) m. It is easy to see why(M,m) is called mediating.

Bisimilarity was originally investigated when concurrent systems became of interest. Theoriginal formulation, however, was not coalgebraic but rather relational. Here it is:

Definition 2.6.15 Let (S, S) and (T, T ) be transition systems. Then B ⊆ S×T is calleda bisimulation iff for all 〈s, t〉 ∈ B these conditions are satisfied:

1. if s S s′, then there is a t′ ∈ T such that t T t

′ and 〈s′, t′〉 ∈ B.

2. if t T t′, then there is a s′ ∈ S such that s S s

′ and 〈s′, t′〉 ∈ B.

1The present author is not really sure about the players’ gender — players are considered female in theoverwhelming majority of papers in the literature, but addressed as Angel or Demon. This may be politicallycorrect, but does not seem to be biblically so with a view toward Matthew 22:30. To be on the safe side — itis so hopeless to argue with feminists — players are neutral in the present treatise.

Page 161: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 149

Hence a bisimulation simulates transitions in one system through the other one. On first sight,these notions of bisimilarity are not related to each other. Recall that transition systems arecoalgebras for the power set functor P. This is the connection:

Theorem 2.6.16 Given the transition systems (S, S) and (T, T ) with the associated P-coalgebras (S, f) and (T, g), then these statements are equivalent for B ⊆ S × T :

1. B is a bisimulation.

2. There exists a P-coalgebra structure h on B such that (S, f) (B, h)oo // (T, g)with the projections as morphisms is mediating.

Proof That (S, f) (B, h)πSoo πT // (T, g) is mediating follows from commutativity of this

diagram.

S

f

B

h

πSoo πT // T

g

P(S) P(B)

P(πS)oo

P(πT )// P(T )

1 ⇒ 2: We have to construct a map h : B → P(B) such that f(πS(s, t)) = P(πS)(h(s, t))and f(πT (s, t)) = P(πT )(h(s, t)) for all 〈s, t〉 ∈ B. The choice is somewhat obvious: put for〈s, t〉 ∈ B

h(s, t) := 〈s′, t′〉 | s S s′, t T t

′.

Thus h : B → P(B) is a map, hence (B, h) is a P-coalgebra.

Now fix 〈s, t〉 ∈ B, then we claim that f(s) = P(πS)(h(s, t)).

“⊆”: Let s′ ∈ f(s), hence s S s′, thus there exists t′ with 〈s′, t′〉 ∈ B such that t T t′,

hence

s′ ∈ πS(s0, t0) | 〈s0, t0〉 ∈ h(s, t)= s0 | 〈s0, t0〉 ∈ h(s, t) for some t0= P(πS)(h(s, t)).

“⊇”: If s′ ∈ P(πS)(h(s, t)), then in particular s S s′, thus s′ ∈ f(s).

Thus we have shown that P(πS)(h(s, t)) = f(s) = f(πS(s, t)). One shows P(πT )(h(s, t)) =g(t) = f(πT (s, t)) in exactly the same way. We have constructed h such that (B, h) is aP-coalgebra, and such that the diagrams above commute.

2 ⇒ 1: Assume that h exists with the properties described in the assertion, then we have toshow that B is a bisimulation. Now let 〈s, t〉 ∈ B and s S s

′, hence s′ ∈ f(s) = f(πS(s, t)) =P(πS)(h(s, t)). Thus there exists t′ with 〈s′, t′〉 ∈ h(s, t) ⊆ B, and hence 〈s′, t′〉 ∈ B. Weclaim that t T t′, which is tantamount to saying t′ ∈ g(t). But g(t) = P(πT )(h(s, t)),and 〈s′, t′〉 ∈ h(s, t), hence t′ ∈ P(πT )(h(s, t)) = g(t). This establishes t T t′. A similarargument finds s′ with s S s

′ with 〈s′, t′〉 ∈ B in case t T t′.

This completes the proof. a

Page 162: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

150 CHAPTER 2. CATEGORIES

Thus we may use bisimulations for transition systems as relations and bisimulations as coal-gebras interchangeably, and this characterization suggests a definition in purely coalgebraicterms for those cases in which a set-theoretic relation is not available or not adequate. Theconnection to P-coalgebra morphisms and bisimulations is further strengthened by investi-gating the graph of a morphism (recall that the graph graph(r) of a map r : S → T is therelation 〈s, r(s)〉 | s ∈ S).

Proposition 2.6.17 Given coalgebras (S, f) and (T, g) for the power set functor P, r :(S, f)→ (T, g) is a morphism iff graph(r) is a bisimulation for (S, f) and (T, g).graph(r)

Proof 1. Assume that r : (S, f) → (T, g) is a morphism, so that g r = P(r) f . Nowdefine

h(s, t) := 〈s′, r(s′)〉 | s′ ∈ f(s) ⊆ graph(r)

for 〈s, t〉 ∈ graph(r). Then g(πT (s, t)) = g(t) = P(πT )(h(s, t)) for t = r(s).

“⊆”: If t′ ∈ g(t) for t = r(s), then

t′ ∈ g(r(s)) = P(r)(f(s)) = r(s′) | s′ ∈ f(s)= P(πT )(〈s′, r(s′)〉 | s′ ∈ f(s)) = P(πT )(h(s, t))

“⊇”: If 〈s′, t′〉 ∈ h(s, t), then s′ ∈ f(s) and t′ = r(s′), but this implies t′ ∈ P(r)(f(s)) =g(r(s)).

Thus g πT = P(πT ) h. The equation f πS = P(πS) h is established similarly.

Hence we have found a coalgebra structure h on graph(r) such that

(S, f) (graph(r), h)πSoo πT // (T, g)

are coalgebra morphisms, so that (graph(r), h) is now officially a bisimulation.

2. If, conversely, (graph(r), h) is a bisimulation with the projections as morphisms, then wehave r = πT π−1

S . Then πT is a morphism, and π−1S is a morphism as well (note that we

work on the graph of r). So r is a morphism. a

Let us have a look at upper closed sets from Example 2.4.10. There we find a comparablesituation. We cannot, however, translate the definition directly, because we do not have accessto the transitions proper, but rather to the sets from which the next state may come from.Let (S, f) and (T, g) be V -coalgebras, and assume that 〈s, t〉 ∈ B. Assume X ∈ f(s), thenwe want to find Y ∈ g(t) such that, when we take t′ ∈ Y , we find a state s′ ∈ X with s′ beingrelated via B to s′, and vice versa. Formally:

Definition 2.6.18 Let

V S := V ⊆ P(S) | V is upper closed

be the endofunctor on Set which assigns to set S all upper closed subsets of PS. Given V -coalgebras (S, f) and (T, g), a subset B ⊆ S × T is called a bisimulation of (S, f) and (T, g)iff for each 〈s, t〉 ∈ B

Page 163: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 151

1. For all X ∈ f(s) there exists Y ∈ g(t) such that for each t′ ∈ Y there exists s′ ∈ X with〈s′, t′〉 ∈ B.

2. For all Y ∈ g(t) there exists X ∈ f(s) such that for each s′ ∈ X there exists t′ ∈ Y with〈s′, t′〉 ∈ B.

We have then a comparable characterization of bisimilar coalgebras.

Proposition 2.6.19 Let (S, f) and (T, g) be coalgebras for V . Then the following statementsare equivalent for B ⊆ S × T with πS

[B]

= S and πT[B]

= T

1. B is a bisimulation of (S, f) and (T, g).

2. There exists a coalgebra structure h on B so that the projections πS : B → S, πT : B → T

are morphisms (S, f) (B, h)πSoo πT // (T, g).

Proof 1 ⇒ 2: Define 〈s, t〉 ∈ B

h(s, t) := D ⊆ B | πS[D]∈ f(s) and πT

[D]∈ f(t).

Hence h(s, t) ⊆ P (S), and because both f(s) and g(t) are upper closed, so is h(s, t).

Now fix 〈s, t〉 ∈ B. We show first that f(s) = πS[Z]| Z ∈ h(s, t). From the definition of

h(s, t) it follows that πS[Z]∈ f(s) for each Z ∈ h(s, t). So we have to establish the other

inclusion. Let X ∈ f(s), then X = πS[π−1S

[X]], because πS : B → S is onto, so it suffices to

show that π−1S

[X]∈ h(s, t), hence that πT

[π−1S

[X]]∈ g(t). Given X there exists Y ∈ g(t) so

that for each t′ ∈ Y there exists s′ ∈ X such that 〈s′, t′〉 ∈ B. Thus Y = πT[(X × Y ) ∩ B

].

But this implies Y ⊆ πT[π−1S

[X]], hence Y ⊆ πT

[π−1S

[X]]∈ g(t). One similarly shows that

g(t) = πT[Z]| Z ∈ h(s, t).

In a second step, we show that

πS[Z]| Z ∈ h(s, t) = C | π−1

S

[C]∈ h(s, t).

In fact, if C = πS[Z]

for some Z ∈ h(s, t), then Z ⊆ π−1S

[C]

= π−1S

[πS[Z]]

, hence π−1S

[C]∈

h(s, t). If, conversely, Z := π−1S

[C]∈ h(s, t), then C = πS

[Z]. Thus we obtain

f(s) = πS[Z]| Z ∈ h(s, t) = C | π−1

S

[C]∈ h(s, t) = (V πS)(h(s, t)).

for 〈s, t〉 ∈ B. Summarizing, this means that πS : (B, h) → (S, f) is a morphism. A verysimilar proof shows that πT : (B, h)→ (T, g) is a morphism as well.

2 ⇒ 1: Assume, conversely, that the projections are coalgebra morphisms, and let 〈s, t〉 ∈ B.Given X ∈ f(s), we know that X = πS

[Z]

for some Z ∈ h(s, t). Thus we find for any t′ ∈ Ysome s′ ∈ X with 〈s′, t′〉 ∈ B. The symmetric property of a bisimulation is established exactlyin the same way. Hence B is a bisimulation for (S, f) and (T, g). a

Encouraged by these observations, we define bisimulations for set based functors, i.e., forendofunctors on the category Set of sets with maps as morphisms. This is nothing but aspecialization of the general notion of bisimilarity, taking specifically into account that inSet we may consider subsets of the Cartesian product, and that we have projections at ourdisposal.

Page 164: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

152 CHAPTER 2. CATEGORIES

Definition 2.6.20 Let F be an endofunctor on Set. Then R ⊆ S×T is called a bisimulationfor the F -coalgebras (S, f) and (T, g) iff there exists a map h : R → F (R) rendering thisdiagram commutative:

S

f

R

h

πSoo πT // T

g

F (S) F (R)

FπSoo

FπT// F (T )

These are immediate consequences:

Lemma 2.6.21 ∆S := 〈s, s〉 | s ∈ S is a bisimulation for every F -coalgebra (S, f). If Ris a bisimulation for the F -coalgebras (S, f) and (T, g), then R−1 is a bisimulation for (T, g)and (S, f). a

It is instructive to look back and investigate again the graph of a morphism r : (S, f)→ (T, g),where this time we do not have the power set functor — as in Proposition 2.6.17 — but ageneral endofunctor F on Set.

Corollary 2.6.22 Given coalgebras (S, f) and (T, g) for the endofunctor F on Set, r :(S, f)→ (T, g) is a morphism iff graph(r) is a bisimulation for (S, f) and (T, g).

Proof 0. The proof for Proposition 2.6.17 needs some small adjustments, because we do notknow how exactly functor F is operating on maps.

1. If r : (S, f) → (T, g) is a morphism, we know that g r = F (r) f . Consider the mapτ : S → S × T which is defined as s 7→ 〈s, r(s)〉, thus F (τ) : F (S)→ F (S × T ). Define

h :

graph(r) → F (graph(r))

〈s, r(s)〉 7→ F (τ)(f(s))

Then it is not difficult to see that both g πT = F (πT ) h and f πS = F (πS) h holds.Hence (graph(r), h) is an F -coalgebra mediating between (S, f) and (T, g).

2. Assume that graph(r) is a bisimulation for (S, f) and (T, g), then both πT and π−1S are

morphisms for the F -coalgebras, so the proof proceeds exactly as the corresponding one forProposition 2.6.17. a

We will study some properties of bisimulations now, including a preservation property offunctor F : Set→ Set. This functor will be fixed for the time being.

We may construct bisimulations from morphisms. This is what we show first.

Lemma 2.6.23 Let (S, f), (T, g) and (U, h) be F -coalgebras with morphisms ϕ : (S, f) →(T, g) and ψ : (S, f)→ (U, h). Then the image of S under ϕ× ψ,

〈ϕ,ψ〉[S] := 〈ϕ(s), ψ(s)〉 | s ∈ S

is a bisimulation for (T, g) and (U, h).

Page 165: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 153

Proof 1. Look at this diagram

〈ϕ,ψ〉[S]

i

πT

vv

πU

((T Sϕoo

ψ//

j

GG

U

Here j(s) := 〈ϕ(s), ψ(s)〉, hence j : S → 〈ϕ,ψ〉[S] is surjective. We can find a map i :〈ϕ,ψ〉[S] → S so that j i = id〈ϕ,ψ〉[S] using the Axiom of Choice: For each r ∈ 〈ϕ,ψ〉[S]there exists at least one s ∈ S with r = 〈ϕ(s), ψ(s)〉. Pick for each r such an s and call iti(r), thus r = 〈ϕ(i(r)), ψ(i(r))〉. So we have a left inverse to j, which will help us in theconstruction below.

2. We want to define a coalgebra structure for 〈ϕ,ψ〉[S] such that the diagram below com-mutes, i.e., forms a bisimulation diagram. Put k := F (j) f i, then we have

T

g

〈ϕ,ψ〉[S]

k

πToo πU // U

h

FT F (〈ϕ,ψ〉[S])FπT

ooFπU

// FU

Now

F (πT ) k = F (πT ) F (j) f i= F (πT j) f i= F (ϕ) f i (since πT j = ϕ)

= g ϕ i (since F (ϕ) f = g ϕ)

= g πT

Hence the left hand diagram commutes. Similarly

F (πU ) k = F (πU j) f i = F (ψ) f i = h ψ i = h πU

Thus we obtain a commutative diagram on the right hand side as well. a

This technical result is applied to the composition of relations:

Lemma 2.6.24 Let R ⊆ S × T and Q ⊆ T ×U be relations, and put X := 〈s, t, u〉 | 〈s, t〉 ∈R, 〈t, u〉 ∈ Q. Then

R Q = 〈πS πR, πU πQ〉[X].

Proof Simply trace an element of R Q through this construction:

〈s, u〉 ∈ R⇔ ∃t ∈ T : 〈s, t〉 ∈ R, 〈t, u〉 ∈ Q⇔ ∃t ∈ T : 〈s, t, u〉 ∈ X⇔ ∃t ∈ T : s = (πS πR)(s, t, u) and u = (πU πQ)(s, t, u).

a

Looking at X in its relation to the projections, we see that X is actually a weak pullback(Definition 2.2.18), to be precise:

Page 166: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

154 CHAPTER 2. CATEGORIES

Lemma 2.6.25 Let R,Q,X be as above, then X is a weak pullback of πRT : R → T and

πQT : Q→ T , so that in particular πQT πQ = πRT πR.

Proof 1. We establish first that this diagram

X

πR

πQ // Q

πTQ

RπRT

// T

commutes. In fact, given 〈s, t, u〉 ∈ X, we know that 〈s, t〉 ∈ R and 〈t, u〉 ∈ Q, hence(πQT πQ)(s, t, u) = πQT (t, u) = t and (πRT πR)(s, t, u) = πRT (s, t) = t.

2. Now for the pullback property in its weak form. If f1 : Y → R and f2 : Y → Q aremaps for some set Y such that πRT f1 = πRT f2, we can write f1(y) = 〈fS1 (y), fT2 (y)〉 ∈ Rand f2(y) = 〈fT2 (y), fU2 (y)〉 ∈ Q. Put σ(y) := 〈fS1 (y), fT2 (y), fU2 (y)〉, then σ : Y → X withf1 = πR σ and f2 = πq σ. Thus X is a weak pullback. a

It will turn out that the functor should preserve the pullback property. Preserving the unique-ness property of a pullback will be too strong a requirement, but preserving weak pullbackswill be helpful and not too restrictive.

Definition 2.6.26 Functor F preserves weak pullbacks iff F maps weak pullbacks to weakpullbacks.

Take a weak pullback diagram, then

H

f ′

g′

))

H

f ′

g′

**!!P

f

g// X

h

translates under F to FP

F f

F g// FX

Fh

Yi

// Z FYF i

// FZ

We want to show that the composition of bisimulations is a bisimulation again: this requiresthat the functor preserves weak pullbacks. Before we state and prove a corresponding prop-erty, we need an auxiliary statement which is of independent interest, viz., that the weakpullback of bisimulations forms a bisimulation again. To be specific:

Lemma 2.6.27 Assume that functor F preserves weak pullbacks, and let r : (S, f)→ (T, g)and s : (U, h)→ (T, g) be morphisms for the F -coalgebras (S, f), (T, g) and (U, h). Then thereexists a coalgebra structure p : P → FP for the weak pullback P of r and s with projectionsπS and πT such that (P, p) is a bisimulation for (S, f) and (U, h).

Proof We will need these diagrams

S

f

r // R

g

U

h

soo

FSF r

// FT FUF s

oo

(2.12)

Page 167: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 155

P

πS

πU // U

s

S r// T

(2.13)

S

f

PπSoo πU //

? ?

U

h

FS FPFπS

ooFπU

// FU

(2.14)

While the first two diagrams are helping with the proof’s argument, the third diagram has agap in the middle; this is this gap which we have to fill. We want to find an arrow P → FPso that the diagrams will commute. Actually, the weak pullback will help us obtaining this Planinformation.

Because

F (r) f πS = g r πS (diagram 2.12, left)

= g s πU (diagram 2.13)

= F (s) h πU (diagram 2.12, right)

we may conclude that F (r) f πS = F (s) h πU . Diagram 2.13 is a pullback diagram.Because F preserves weak pullbacks, this diagram can be complemented by an arrow P → FPrendering the upper triangles commutative.

PhπU

**fπS

!!FP

FπS

FπU // FU

F s

FSF r

// FT

Hence there exists p : P → FP with F (πS) p = f πS and F (πU ) p = h πU . Thus pmakes diagram (2.14) a bisimulation diagram. a

Now we are in a position to show that the composition of bisimulations is a bisimulationagain, provided the functor F behaves decently.

Proposition 2.6.28 Given the F -coalgebras (S, f), (T, g) and (U, h), assume that R is abisimulation of (S, f) and (T, g), Q is a bisimulation of (T, g) and (U, h), and assume moreoverthat F preserves weak pullbacks. Then R Q is a bisimulation of (S, f) and (U, h).

Proof We can write RQ = 〈πS πR, πU πQ〉[X] with X := 〈s, t, u〉 | 〈s, t〉 ∈ R, 〈t, u〉 ∈ Q.Since X is a weak pullback of πRT and πQT by Lemma 2.6.25, we know that X is a bisimulationof (R, r) and (Q, q), with r and q as the dynamics of the corresponding F -coalgebras. πS πR :X → S and πU πQ : X → U are morphisms, thus 〈πS πR, πU πQ〉[X] is a bisimulation,since X is a weak pullback. Thus the assertion follows from Lemma 2.6.24. a

Page 168: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

156 CHAPTER 2. CATEGORIES

The proof shows in which way the existence of the morphism P → FP is used for achievingthe desired properties.

Bisimulations on a single coalgebra may have an additional structure, viz., they may be equiv-alence relations as well. Accordingly, we call these bisimulations bisimulation equivalences.

Bisimulationequivalence

Hence given a coalgebra (S, f), a bisimulation equivalence α for (S, f) is a bisimulation for(S, f) which is also an equivalence relation. While bisimulations carry properties which areconcerned with the coalgebraic structure, an equivalence relation is purely related to the setstructure. It is, however, fairly natural to ask in view of the properties which we did exploreso far (Lemma 2.6.21, Proposition 2.6.28) whether or not we can take a bisimulation and turnit into an equivalence relation, or at least do so under favorable conditions on functor F . Wewill deal with this question and some of its cousins now.

Observe first that the factor space of a bisimulation equivalence can be turned into a coalge-bra.

Lemma 2.6.29 Let (S, f) be an F -coalgebra, and α be a bisimulation equivalence on (S, f).Then there exists a unique dynamics αR : S/α→ F (S/α) with F (ηα) f = αR ηα.

Proof Because α is in particular a bisimulation, we know that there exists by Theorem 2.6.16a dynamics ρ : α→ F (α) rendering this diagram commutative.

S

f

α

ρ

π(1)Soo

π(2)S // S

f

FS FαFπ

(1)S

ooFπ

(2)S

// FS

The obvious choice for the dynamics αR would be to define αR([s]α) := (F (ηα) f)(s), butOutlinethis is only possible if we know that the map is well defined, so we have to check whether(F (ηα) f)(s1) = (F (ηα) f)(s2) holds, whenever s1 α s2.

But this holds indeed, for s1 α s2 means 〈s1, s2〉 ∈ α, so that f(s1) = f(π(1)S (s1, s2)) =

(F (π(1)S ) ρ)(s1, s2), similarly for f(s2). Because α is an equivalence relation, we have ηα

π(1)S = ηα π(2)

S . Thus

F (ηα)(f(s1)) =(F (ηα π(1)

S ) ρ)(s1, s2)

=(F (ηα π(2)

S ) ρ)(s1, s2)

= F (ηα)(f(s2))

This means that αR is in fact well defined, and that ηα is a morphism. Hence the dynamicsαR exists and renders ηα a morphism.

Now assume that βR : S/α→ F (S/α) satisfies also F (ηα) f = βR ηα. But then βR ηα =F (ηα) f = αR ηα, and, since ηα is onto, it is an epi, so that we may conclude βR = αR.Hence αR is uniquely determined. a

Bisimulations can be transported along morphisms, if the functor preserves weak pullbacks.

Page 169: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 157

Proposition 2.6.30 Assume that F preserves weak pullbacks, and let r : (S, f) → (T, g) bea morphisms. Then

1. If R is a bisimulation on (S, f), then (r × r)[R]

= 〈r(s), r(s′)〉 | 〈s, s′〉 ∈ R is abisimulation on (T, g).

2. If Q is a bisimulation on (T, g), then (r × r)−1[Q]

= 〈s, s′〉 | 〈r(r), r(s′)〉 ∈ Q is abisimulation on (S, f).

Proof 0. Note that graph(r) is a bisimulation by Corollary 2.6.22, because r is a mor-phism.

1. We claim that

(r × r)[R]

=(graph(r)

)−1 R graph(r)

holds. Granted that, we can apply Proposition 2.6.28 together with Lemma 2.6.21 forestablishing the first property. But 〈t, t′〉 ∈ (r × r)

[R]

iff we can find 〈s, s′〉 ∈ R with〈t, t′〉 = 〈r(s), r(s′)〉, hence 〈r(s), s〉 ∈ graph(r)−1, 〈s, s′〉 ∈ R and 〈s′, r(s′)〉 ∈ graph(r),hence iff 〈t, t′〉 ∈ graph(r)−1 R graph(r). Thus the equality holds indeed.

2. Similarly, we show that (r × r)−1[Q]

= graph(r) R graph(r)−1. This is left to thereader. a

For investigating further structural properties, we need

Lemma 2.6.31 If (S, f) and (T, g) are F -coalgebras, then there exists a unique coalgebraicstructure on S + T such that the injections iS and iT are morphisms.

Proof We have to find a morphism S + T → F (S + T ) such that this diagram is commuta-tive

S

f

iS // S + T

T

g

iToo

FSF iS

// F (S + T ) FTF iT

oo

Because F (iS) f : S → F (S+T ) and F (iT ) g : T → F (S+T ) are morphisms, there existsa unique morphism h : S + T → F (S + T ) with h iS = F (iS) f and h iT = F (iT ) g.Thus (S + T, h) is a coalgebra, and iS as well as iT are morphisms. a

The attempt to establish a comparable property for the product could not work with theuniversal property for products, as a look at the diagram for the universal property of theproduct will show.

We obtain as a consequence that bisimulations are closed under finite unions.

Lemma 2.6.32 Let (S, f) and (T, g) be coalgebras with bisimulations R1 and R2. ThenR1 ∪R2 is a bisimulation.

Page 170: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

158 CHAPTER 2. CATEGORIES

Proof 1. We can find morphisms ri : Ri → FRi for i = 1, 2 rendering the correspondingbisimulation diagrams commutative. Then R1 +R2 is an F -coalgebra with

R1

r1

j1 // R1 +R2

r

R2

r2

j2oo

FR1F j1

// F (R1 +R2) FR2F j2

oo

as commuting diagram, where ji : Ri → R1 +R2 is the respective embedding, i = 1, 2.

2. We claim that the projections π′S : R1 + R2 → S and π′T : R1 + R2 → T are morphisms.We establish this property only for π′S . First note that π′S j1 = πR1

S , so that we havef π′S j1 = F (π′S)F (j1) r1 = F (π′S) r j1, similarly, f π′S j2 = F (π′S) r j2. Thus wemay conclude that f π′S = F (π′S) r, so that indeed π′S : R1 +R2 → S is a morphism.

3. Since R1 + R2 is a coalgebra, we know from Lemma 2.6.23 that 〈π′S , π′T 〉[R1 + R2] is abisimulation. But this equals R1 ∪R2. a

We briefly explore lattice properties for bisimulations on a coalgebra. For this, we investigatethe union of an arbitrary family of bisimulations. Looking back at the union of two bisim-ulations, we used their sum as an intermediate construction. A more general considerationrequires the sum of an arbitrary family. The following definition describes the coproduct asa specific form of a colimit, see Definition 2.3.32 .

Definition 2.6.33 Let (sk)k∈I be an arbitrary non-empty family of objects on a category K.The object s together with morphisms ik : sk → s is called the coproduct of (sk)k∈I iff givenmorphisms jk : sk → t for an object t there exists a unique morphism j : s→ t with jk = j ikfor all k ∈ I. s is denoted as

∑k∈I sk.

Taking I = 1, 2, one sees that the coproduct of two objects is in fact a special case of thecoproduct just defined. The following diagram gives a general idea.

. . . sr1ir1

''jr1

. . . srkirk

wwjrk

. . .

s

jt

The coproduct is uniquely determined up to isomorphisms.

Example 2.6.34 Consider the category Set of sets with maps as morphisms, and let (Sk)k∈Ibe a family of sets. Then

S :=⋃k∈I〈s, k〉 | s ∈ Sk

is a coproduct. In fact, ik : s 7→ 〈s, k〉 maps Sk to S, and if jk : Sk → T , put j : S → T withj(s, k) := jk(s), then jk = j ik for all k. ,

We put this new machinery to use right away, returning to our scenario given by functorF .

Page 171: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 159

Proposition 2.6.35 Assume that F preserves weak pullbacks. Let (Rk)k∈I be a family ofbisimulations for coalgebras (S, f) and (T, g). Then

⋃k∈I Rk is a bisimulation for these coal-

gebras.

Proof 1. Given k ∈ I, let rk : Rk → FRk be the morphism on Rk such that πS,k : (S, f)→(Rk, rk) and πT,k : (T, g) → (Rk, rk) are morphisms for the coalgebras involved. Then thereexists a unique coalgebra structure r on

∑k∈I Rk such that i` : (R`, r`) → (

∑k∈I Rk, r) is a

coalgebra structure for all ` ∈ I. This is shown exactly through the same argument as in theproof of Lemma 2.6.32 (mutatis mutandis: replace the coproduct of two bisimulations by thegeneral coproduct).

2. The projections π′S :∑

k∈I Rk → S and π′T :∑

k∈I Rk → T are morphisms, and one showsexactly as in the proof of Lemma 2.6.32 that⋃

k∈IRk = 〈π′S , π′T 〉[

∑k∈I

Rk].

An application of Lemma 2.6.23 now establishes the claim. a

This is applied to an investigation of the lattice structure on the set of all bisimulationsbetween coalgebras.

Proposition 2.6.36 Assume that F preserves weak pullbacks. Let (Rk)k∈I be a non-emptyfamily of bisimulations for coalgebras (S, f) and (T, g). Then

1. There exists a smallest bisimulation R∗ with Rk ⊆ R∗ for all k.

2. There exists a largest bisimulation R∗ with Rk ⊇ R∗ for all k.

Proof 1. We claim that R∗ =⋃k∈I Rk. It is clear that Rk ⊆ R∗ for all k ∈ I. If R′ is a

bisimulation on (S, f) and (T, g) with Rk ⊆ R′ for all k, then⋃k Rk ⊆ R′, thus R∗ ⊆ R′. In

addition, R∗ is a bisimulation by Proposition 2.6.35. This establishes part 1.

2. Put

R := R | R is a bisimulation for (S, f) and (T, g) with R ⊆ Rk for all k

If R = ∅, we put R∗ := ∅, so we may assume that R 6= ∅. Put R∗ :=⋃R. By Proposi-

tion 2.6.35 this is a bisimulation for (S, f) and (T, g) with Rk ⊆ R∗ for all k. Assume that R′

is a bisimulation for for (S, f) and (T, g) with R′ ⊆ Rk for all k, then R′ ∈ R, hence R′ ⊆ R∗,so R∗ is the largest one. This settles part 2. a

Looking a bit harder at bisimulations for (S, f) alone, we find that the largest bisimulation isactually an equivalence relation. But we have to make sure first that a largest bisimulationexists at all.

Proposition 2.6.37 If functor F preserves weak pullbacks, then there exists a largest bisim-ulation R∗ on coalgebra (S, f). R∗ is an equivalence relation.

Proof 1. LetR := R | R is a bisimulation on (S, f).

Then ∆S ∈ R, hence R 6= ∅. We know from Lemma 2.6.21 that R ∈ R entails R−1 ∈ R,and from Proposition 2.6.35 we infer that R∗ :=

⋃R ∈ R. Hence R∗ is a bisimulation on

(S, f).

Page 172: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

160 CHAPTER 2. CATEGORIES

2. We use the characterization of the infimum as the supremum of the lower bounds. R∗ isIdeaeven an equivalence relation.

• Since ∆S ∈ R, we know that ∆S ⊆ R∗, thus R∗ is reflexive.

• Because R∗ ∈ R we conclude that (R∗)−1 ∈ R, thus (R∗)−1 ⊆ R∗. Hence R∗ issymmetric.

• SinceR∗ ∈ R, we conclude from Proposition 2.6.28 thatR∗R∗ ∈ R, henceR∗R∗ ⊆ R∗.This means that R∗ is transitive.

a

This has an interesting consequence. Given a bisimulation equivalence on a coalgebra, we donot only find a larger one which contains it, but we can also find a morphism between thecorresponding factor spaces. This is what I mean:

Corollary 2.6.38 Assume that functor F preserves weak pullbacks, and that α is a bisimula-tion equivalence on (S, f), then there exists a unique morphism ϑα : (S/α, fα)→ (S/R∗, fR∗),where fα : S/α→ F (S/α) and fR∗ : S/R∗ → F (S/R∗) are the induced dynamics.

Proof 0. The dynamics fα : S/α → F (S/α) and fR∗ : S/R∗ → F (S/R∗) exist by thedefinition of a bisimulation.

1. Define

ϑ([s]α) := [s]R∗

for s ∈ S. This is well defined. In fact, if s α s′ we conclude by the maximality of R∗ thats R∗ s′, so [s]α = [s′]α implies [s]R∗ = [s′]R∗ .

2. We claim that ϑα is a morphism, hence that the right hand side of this diagram commutes;the left hand side of the diagram is just for nostalgia.

S

f

ηα // S/α

ϑα // S/R∗

fR∗

FSF ηα

// F (S/α)Fϑα

// F (S/R∗)

Now ϑα ηα = ηR∗ , and the outer diagram commutes. The left diagram commutes becauseηα : (S, f)→ S/fα is a morphism, moreover, ηα is a surjective map. Hence the claim followsfrom Lemma 2.1.32, so that ϑα is a morphism indeed.

3. If ϑ′α is another morphism with these properties, then we have ϑ′α ηα = ηR∗ = ϑα ηα,and since ηα is surjective, it is an epi by Proposition 2.1.23, which implies ϑα = ϑ′α. a

This is all very well, but where do we get bisimulation equivalences from? If we cannot findexamples for them, the efforts just spent may run dry. Fortunately, we are provided with amplebisimulation equivalences through coalgebra morphisms, specifically through their kernel (fora definition see page 90). It will turn out that all such equivalences can be generated in thisway.

Page 173: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 161

Proposition 2.6.39 Assume that F preserves weak pullbacks, and that ϕ : (S, f) → (T, g)is a coalgebra morphism. Then ker (ϕ) is a bisimulation equivalence on (S, f). Conversely, ifα is a bisimulation equivalence on (S, f), then there exists a coalgebra (T, g) and a coalgebramorphism ϕ : (S, f)→ (T, g) with α = ker (ϕ).

Proof 1. We know that ker (ϕ) is an equivalence relation; since ker (ϕ) = graph(ϕ) graph(ϕ)−1, we conclude from Corollary 2.6.22 that ker (ϕ) is a bisimulation.

2. Let α be a bisimulation equivalence on (S, f), then the factor map ηα : (S, f)→ (S/α, fα)is a morphism by Lemma 2.6.29, and ker (ηα) = 〈s, s′〉 | [s]α = [s′]α = α. a

We know what morphisms are, and usually morphisms are studied also in the context ofcongruences as those equivalence relations which respect the underlying structure. This iswhat we will do next.

2.6.2 Congruences

Bisimulations compare two systems with each other, while a congruence permits to talk aboutelements in a coalgebra which behave in a similar manner. Let us have a look at Abeliangroups. An equivalence relation α on an Abelian group (G,+) is a congruence iff g α h andg′ α h′ together imply (g + g′) α (h+ h′). This means that α is compatible with the groupstructure; an equivalent formulation says that there exists a group structure on G/α such thatthe factor map ηα : G→ G/α is a group morphism. Thus the factor map is the harbinger ofthe good news. We translate this observation now into the language of coalgebras.

Definition 2.6.40 Let (S, f) be an F -coalgebra for the endofunctor F on the category Setof sets. An equivalence relation α on S is called an F -congruence iff there exists a coalgebrastructure fα on S/α such that ηα : (S, f)→ (S/α, fα) is a coalgebra morphism.

Thus we want that this diagram

S

f

ηα // S/α

FSF ηa

// F (S/α)

is commutative, so that we have

fα([s]α) = (F ηα)(f(s))

for each s ∈ S. A brief look at Lemma 2.6.29 shows that bisimulation equivalences arecongruences, and we see from Proposition 2.6.39 that the kernels of coalgebra morphisms arecongruences, provided the functor F preserves weak pullbacks.

Hence congruences and bisimulations on a coalgebra are actually very closely related. Theyare, however, not the same, because we have

Proposition 2.6.41 Let ϕ : (S, f) → (T, g) be a morphism for the F -coalgebras (S, f) and(T, g). Assume that ker (Fϕ) ⊆ ker

(F ηker(ϕ)

). Then ker (ϕ) is a congruence for (S, f).

Page 174: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

162 CHAPTER 2. CATEGORIES

Proof Define fker(ϕ)([s]ker(ϕ)) := F (ηker(ϕ))(f(s)) for s ∈ S. Then fker(ϕ) : S/ker (ϕ) →F (S/ker (ϕ)) is well defined. In fact, assume that [s]ker(ϕ) = [s′]ker(ϕ), then g(ϕ(s)) = g(ϕ(s′)),

so that (Fϕ)(f(s)) = (Fϕ)(f(s′)), consequently 〈f(s), f(s′)〉 ∈ ker (Fϕ). By assumption,(F ηker(ϕ))(f(s)) = (F ηker(ϕ))(f(s′)), so that fker(ϕ)([s]ker(ϕ)) = fker(ϕ)([s

′]ker(ϕ)). It is clearthat ηα is a coalgebra morphism.

a

The definition of a congruence is not tied to functors which operate on the category of sets.The next example leaves this category and considers the category of measurable spaces,introduced in Example 2.1.12. The subprobability functor S, introduced in Example 2.3.12,is an endofunctor on Meas, and we know that the coalgebras for this functor are just thesubprobabilistic transition kernels K : (S,A)→ (S,A), see Example 2.6.7.

Definition 2.6.42 A measurable map f : (S,A) → (T,B) measurable spaces (S,A) and(T,B) is called final iff B is the largest σ-algebra on T which renders f measurable.

Final mea-surablemap

Thus if f is onto we conclude from f−1[B]∈ A that B ∈ B, because f−1 is injective. Given

an equivalence relation α on S, we can make the factor space S/α a measurable space byendowing it with the final σ-algebra A/α with respect to ηα, compare Exercise 2.25.

This is the definition then of a morphism for coalgebras for the Giry functor (see Exam-ple 2.4.8).

Definition 2.6.43 Let (S,A,K) and (T,B, L) be coalgebras for the subprobability functor,then ϕ : (S,A,K)→ (T,B, L) is a coalgebra morphism iff ϕ : (S,A)→ (T,B) is a measurablemap such that this diagram commutes.

S

K

ϕ // T

L

S(S,A)Sϕ

// S(T,B)

Thus we have

L(ϕ(s))(B) = S(ϕ)(K(s))(B) = K(s)(ϕ−1[B])

for each s ∈ S and for each measurable set B ∈ B. We will investigate the kernel of amorphism now in order to obtain a result similar to the one reported in Proposition 2.6.41.The crucial property in that development has been the comparison of the kernel ker (Fϕ)with ker

(F ηker(ϕ)

). We will concentrate on this property now.

Call a morphism ϕ strong iff ϕ is surjective and final. Now fix a strong morphism ϕ : K → L.Strongmorphism

A measurable subset A ∈ A is called ϕ-invariant iff a ∈ A and ϕ(a) = ϕ(a′) together implya′ ∈ A, so that A ∈ A is ϕ invariant iff A is the union of ker (ϕ)-equivalence classes.

We obtain this characterization of ϕ-invariant sets (note that this is an intrinsic property ofthe map proper):

Page 175: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.6. COALGEBRAS 163

Lemma 2.6.44 Let ϕ : (S,A)→ (T,B) be a strong morphism, and define

Σϕ := A ∈ A | A is ϕ-invariant.

Then

1. Σϕ is a σ-algebra.

2. Σϕ is isomorphic to ϕ−1[B]| B ∈ B as a Boolean σ-algebra.

Proof 1. Clearly, both ∅ and S are ϕ-invariant, and the complement of an invariant setis invariant again. Invariant sets are closed under countable unions. Hence Σϕ is a σ-algebra.

2. Given B ∈ B, it is clear that ϕ−1[B]

is ϕ-invariant; since the latter is also a measurablesubset of S, we conclude that ϕ−1

[B]| B ∈ B ⊆ Σϕ. Now let A ∈ Σϕ, we claim that

A = ϕ−1[ϕ[A]]

. In fact, since ϕ(a) ∈ ϕ[A]

for a ∈ A, the inclusion A ⊆ ϕ−1[ϕ[A]]

is trivial.Let a ∈ ϕ−1

[ϕ[A]]

, so that there exists a′ ∈ A with ϕ(a) = ϕ(a′). Since A is ϕ-invariant,we conclude a ∈ A, establishing the other inclusion. Because ϕ is final and surjective, weinfer from this representation that ϕ

[A]∈ B, whenever A ∈ Σϕ, and that ϕ−1 : B → Σϕ is

surjective. Since ϕ is surjective, ϕ−1 is injective, hence ϕ−1 yields a bijection. The latter mapis compatible with the operations of a Boolean σ-algebra, so it is an isomorphism. a

This result is usually the unbeknownst and deeper reason why the construction of variouskinds of bisimulations for Markov transition systems work. In our context, it helps in estab-lishing the crucial property for kernels.

Corollary 2.6.45 Let ϕ : K → L be a strong morphism, then ker (Sϕ) ⊆ ker(Sηker(ϕ)

).

Proof Let 〈µ, µ′〉 ∈ ker (Sϕ), thus (Sϕ)(µ)(B) = (Sϕ)(µ′)(B) for all B ∈ B. Now letC ∈ A/ker (ϕ), then η−1

ker(ϕ)

[C]∈ Σϕ, so that there exists by Lemma 2.6.44 some B ∈ B such

that η−1ker(ϕ)

[C]

= ϕ−1[B]. This is the central property. Hence

(Sηker(ϕ))(µ)(C) = µ(η−1ker(ϕ)

[C]) = µ(ϕ−1

[B])

= (Sϕ)(µ)(B) = (Sϕ)(µ′)(B)= (Sηker(ϕ))(µ

′)(C),

so that 〈µ, µ′〉 ∈ ker(Sηker(ϕ)

). a

Now everything is in place to show that the kernel of a strong morphism is a congruence forthe S-coalgebra (S,A,K).

Proposition 2.6.46 Let ϕ : K → L be a strong morphism for the S-coalgebras (S,A,K) and(T,B, L). Then ker (ϕ) is a congruence for (S,A,K).

Proof 0. We have to find a coalgebra structure on the measurable space (S/ker (ϕ),A/ker (ϕ)) Planfirst; the candidate is obvious. After having established that this is possible indeed, we checkthe condition for a congruence. This invites an application of Proposition 2.6.41 throughCorollary 2.6.45.

1. We want define the coalgebra Kker(ϕ) on (S/ker (ϕ),A/ker (ϕ)) upon setting

Kker(ϕ)([s]ker(ϕ))(C) := (Sηker(ϕ))(K(s))(C)(= K(s)(η−1

ker(ϕ)

[C]))

Page 176: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

164 CHAPTER 2. CATEGORIES

for C ∈ A/ker (ϕ), but we have to be sure first that this is well defined. In fact, let[s]ker(ϕ) = [s′]ker(ϕ), which means ϕ(s) = ϕ(s′), hence L(ϕ(s)) = L(ϕ(s′)), so that (Sϕ)K(s) =

(Sϕ)K(s′), because ϕ : K → L is a morphism. But the latter equality implies 〈K(s),K(s′)〉 ∈ker (Sϕ) ⊆ ker

(Sηker(ϕ)

), the inclusion holding by Corollary 2.6.45. Thus we conclude

(Sηker(ϕ))(K(s)) = (Sηker(ϕ))(K(s′)),

so that Kker(ϕ) is well defined indeed.

2. It is immediate that C 7→ Kker(ϕ)([s]ker(ϕ))(C) is a subprobability on A/ker (ϕ) for fixeds ∈ S, so it remains to show that t 7→ Kker(ϕ)(t)(C) is a measurable map on the factor space(S/ker (ϕ),A/ker (ϕ)). Let q ∈ [0, 1], and consider for C ∈ A/ker (ϕ) the set

G := t ∈ S/ker (ϕ) | Kker(ϕ)(t)(C) < q.

We have to show that G ∈ A/ker (ϕ). Because C ∈ A/ker (ϕ), we know that A := η−1ker(ϕ)

[C]∈

Σϕ, hence it is sufficient to show that the set H := s ∈ S | K(s)(A) < q is a member ofΣf . Since K is the dynamics of a S-coalgebra, we know that H ∈ A, so it remains to showthat H is ϕ-invariant. Because A ∈ Σf , we infer from Lemma 2.6.44 that A = ϕ−1

[B]

forsome B ∈ B. Now take s ∈ H and assume ϕ(s) = ϕ(s′). Thus

K(s′)(A) = K(s′)(ϕ−1[B]) = (Sϕ)(K(s′))(B) = L(ϕ(s′))(B)

= L(ϕ(s))(B) = K(s)(A)< q,

so that H ∈ Σϕ indeed. Because H = η−1ker(ϕ)

[G], it follows that G ∈ A/ker (f), and we are

done. a

We will deal with coalgebras now when interpreting modal logics. This connection betweenmodal logics and coalgebras is at first sight fairly surprising, but becomes at second sightinteresting because modal logics are firmly tied to transition systems, and we have seen thattransition systems can be interpreted as coalgebras. So one wants to know what coalgebraicproperties are reflected in the the relational interpretation of modal logics, in particularthe relation to the coalgebraic reading of bisimilarity is interesting and may promise newinsights.

2.7 Modal Logics

This section will discuss modal logics and provide a closer look at the interface between modelsfor this logics and coalgebras. Thus the topics of this section may be seen as an applicationand illustration of coalgebras.

We will define the language for the formulas of modal logics, first for the conventional logicswhich permits expressing sentences like “it is possible that formula ϕ holds” or “formula ϕholds necessarily”, then for an extended version, allowing for modal operators that governmore than one formula. The interpretation through Kripke models is discussed, and it be-comes clear that at least elementary elements of the language of categories are helpful ininvestigating these logics. For completeness, we also give the construction for the canonicalmodel, displaying the elegant construction through the Lindenbaum Lemma.

Page 177: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 165

It shows that coalgebras can be used directly in the interpretation of modal logics. Wedemonstrate that a set of predicate liftings define a modal logics, discuss briefly expressivityfor these modal logics, and display an interpretation of CTL*, one of the basic logics formodel checking, through coalgebras.

We fix a set Φ of propositional letters. Φ

Definition 2.7.1 The basic modal language L(Φ) over Φ is given by this grammar L(Φ)

ϕ ::= ⊥ | p | ϕ1 ∧ ϕ2 | ¬ϕ | 3ϕ

with p ∈ Φ.

We introduce additional operators

> denotes ¬⊥,ϕ1 ∨ ϕ2 denotes ¬(¬ϕ1 ∧ ¬ϕ2),

ϕ1 → ϕ2 denotes ¬ϕ1 ∨ ϕ2,

2ϕ denotes ¬3¬ϕ.

The constant ⊥ denotes falsehood, consequently, > = ¬⊥ denotes truth, negation ¬ andconjunction ∧ should not come as a surprise; informally, 3ϕ means that it is possible thatformula ϕ holds, while 2ϕ expresses that ϕ holds necessarily. Syntactically, this looks likepropositional logic, extended by the modal operators 3 and 2.

Before we have a look at the semantics of modal logic, we indicate that this logic is syn-tactically sometimes a bit too restricted; after all, the modal operators operate only on oneargument at a time.

The extension we want should offer modal operators with more arguments. For this, weintroduce the notion of a modal similarity type t = (O, ρ), which is a set O of operators, each (O, ρ)operator ∆ ∈ O has an arity ρ(∆) ∈ N0. Note that ρ(∆) = 0 is not excluded; these modalconstants will not play a distinguished role, however, they are sometimes nice to have.

Clearly, the set 3 together with ρ(3) = 1 is an example for such a modal similaritytype.

Definition 2.7.2 Given a modal similarity type t = (O, ρ) and the set Φ of propositionalletters, the extended modal language L(t, Φ) is given by this grammar:

ϕ ::= ⊥ | p | ϕ1 ∧ ϕ2 | ¬ϕ | ∆(ϕ1, . . . , ϕk)

with p ∈ Φ and ∆ ∈ O such2 that ρ(∆) = k.

We also introduce for the general case operators which negate on the negation of the argumentsof a modal operator; they are called nablas, the nabla ∇ of ∆ is defined through (∆ ∈O, ρ(∆) = k)

∇(ϕ1, . . . , ϕk) := ¬∆(¬ϕ1, . . . ,¬ϕk)2In this section, ∆ does not denote the diagonal of a set (as elsewhere in this book); we could have used

another letter, but the symmetry of ∆ and ∇ is too good to be missed.

Page 178: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

166 CHAPTER 2. CATEGORIES

Hence 2 is the nabla of 3.

It is time to have a look at some examples.

Example 2.7.3 Let O = F ,P with ρ(F ) = ρ(P ) = 1; the operator F looks into thefuture, and P into the past. This may be useful, e.g., when you are traversing a tree and arevisiting an inner node. The future may then look at all nodes in its subtree, the past at allnodes on a path from the root to this tree.

Then tFut := (O, ρ) is a modal similarity type. If ϕ is a formula in L(tFut, Φ), formula Fϕ istrue iff ϕ will hold in the future, and Pϕ is true iff ϕ did hold in the past. The nablas aredefined as

Gϕ := ¬F¬ϕ (ϕ will always be the case)

Hϕ := ¬P¬ϕ (ϕ has always been the case).

Look at some formulas:

Pϕ→ GPϕ: If something has happened, it will always have happened.

Fϕ→ FFϕ: If ϕ will be true in the future, then it will be true in the future that ϕ will betrue.

GFϕ→ FGϕ: If ϕ will be true in the future, then it will at some point be always true.

,

The next example deals with a simple model for sequential programs.

Example 2.7.4 Take Ψ as a set of atomic programs (think of elements of Ψ as executableprogram components). The set of programs is defined through this grammar

t ::= ψ | t1 ∪ t2 | t1; t2 | t∗ | ϕ?

with ψ ∈ Ψ and ϕ a formula of the underlying modal logic.

Here t1 ∪ t2 denotes the nondeterministic choice between programs t1 and t2, t1; t2 is thesequential execution of t1 and t2 in that order, and t∗ is iteration of program t a finite numberof times (including zero). The program ϕ? tests whether or not formula ϕ holds; ϕ? serves asa guard: (ϕ?; t1)∪(¬ϕ?; t2) tests whether ϕ holds, if it does t1 is executed, otherwise, t2 is. Sothe informal meaning of 〈t〉ϕ is that formula ϕ holds after program t is executed (we use hereand later an expression like 〈t〉ϕ rather than the functional notation or just juxtaposition).

So, formally we have the modal similarity type tPDL := (O, ρ) withO := 〈t〉 | t is a program.This logic is known as PDL — propositional dynamic logic. ,

The next example deal with games and a syntax very similar to the one just explored forPDL.

Example 2.7.5 We introduce two players, Angel and Demon, playing against each other,taking turns. So Angel starts, then Demon makes the next move, then Angel replies, etc.

For modelling game logic, we assume that we have a set Γ of simple games; the syntax forgames looks like this:

g ::= γ | g1 ∪ g2 | g1 ∩ g2 | g1; g2 | gd | g∗ | g× | ϕ?

Page 179: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 167

with γ ∈ Γ and ϕ a formula of the underlying logic. The informal interpretation of g1 ∪ g2,g1; g2, g∗ and ϕ? are as in PDL (Example 2.7.4), but as actions of player Angel. The actionsof player Demon are indicated by

g1 ∩ g2: Demon chooses between games g1 and g2; this is called demonic choice (in contrastto angelic choice g1 ∪ g2).

g×: Demon decides to play game g a finite number of times (including not at all).

gd: Angel and Demon change places.

Again, we indicate through 〈g〉ϕ that formula ϕ holds after game g. We obtain the similaritytype tGL := (O, ρ) with O := 〈g〉 | g is a game and ρ = 1. The corresponding logic is calledgame logic ,

Another example is given by arrow logic. Assume that you have arrows in the plane; you cancompose them, i.e., place the beginning of one arrow at the end of the first one, and you canreverse them. Finally, you can leave them alone, i.e., do nothing with an arrow.

Example 2.7.6 The set O of operators for arrow logic is given by ,⊗, skip with ρ() = 2,ρ(⊗) = 1 and ρ(skip) = 0. The arrow composed from arrows a1 and a2 is arrow a1 a2, ⊗a1

is the reversed arrow a1, and skip does nothing. ,

2.7.1 Frames and Models

For interpreting the basic modal language, we introduce frames. A frame models transitions,which are at the very heart of modal logics. Let us have a brief look at a modal formula like2p for some propositional letter p ∈ Φ. This formula models “p always holds”, which impliesa transition from the current state to another one, in which p is assumed to hold always;without a transition, we would not have to think whether p always holds — it would justhold or not. Hence we need to have transitions at our disposal, thus a transition system, asin Example 2.1.9. In the current context, we take the disguise of a transition system as arelation. All this is captured in the notion of a frame.

Definition 2.7.7 A Kripke frame F := (W,R) for the basic modal language is a set W 6= ∅of states together with a relation R ⊆ W ×W . W is sometimes called the set of worlds, Rthe accessibility relation.

The accessibility relation of a Kripke frame does not yet carry enough information about themeaning of a modal formula, since the propositional letters are not captured by the frame.This is the case, however, in a Kripke model.

Definition 2.7.8 A Kripke model (or simply a model, for the time being) M = (W,R, V )for the basic modal language consists of a Kripke frame (W,R) together with a map V : Φ→P (W ).

So, roughly speaking, the frame part of a Kripke model caters for the propositional and themodal part of the logic whereas the map V takes care of the propositional letters. This nowpermits us to define the meaning of the formulas for the basic modal language. We stateunder which conditions a formula ϕ is true in a world w ∈ W ; this is expressed through

Page 180: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

168 CHAPTER 2. CATEGORIES

M, w |= ϕ; note that this will depend on the model M, hence we incorporate it usually into M, w |= ϕthe notation. Here we go.

M, w |= ⊥ is always false.

M, w |= p⇔ w ∈ V (p), if p ∈ Φ.M, w |= ϕ1 ∧ ϕ2 ⇔M, w |= ϕ1 and M, w |= ϕ2.

M, w |= ¬ϕ⇔M, w |= ϕ is false.

M, w |= 3ϕ⇔ there exists v with 〈w, v〉 ∈ R and M, v |= ϕ.

The interesting part is of course the last line. We want 3ϕ to hold in state w; by our informalunderstanding this means that a transition into a state such that ϕ holds in this state ispossible. But this means that there exists some state v with 〈w, v〉 ∈ R such that ϕ holdsin v. This is just the formulation we did use above. Look at 2ϕ; an easy calculation showsthat M, w |= 2ϕ iff M, w |= ϕ for all v with 〈w, v〉 ∈ R; thus, no matter what transition fromworld w to another world v we make, and M, v |= ϕ holds, then M, w |= 2ϕ. But we wantto emphasize that for M, w |= 3ϕ to hold we infer that w has at least one successor underrelation R.

We define [[ϕ]]M as the set of all states in which formula ϕ holds. Formally,

[[ϕ]]M := w ∈W |M, w |= ϕ.

Let us look at some examples.

Example 2.7.9 Put Φ := p, q, r as the set of propositional letters, W := 1, 2, 3, 4, 5 asthe set of states; relation R is given through

1 // 2 // 3 // 4 // 5

Finally, put

V (`) :=

2, 3, ` = p

1, 2, 3, 4, 5, ` = q

∅, ` = r

Then we have for the Kripke model M := (W,R, V ) for example

M, 1 |= 32p: This is so since M, 3 |= p (because 3 ∈ V (p)), thus M, 2 |= 2p, hence M, 1 |=32p.

M, 1 6|= 32p→ p: Since 1 6∈ V (p), we have M, 1 6|= p.

M, 2 |= 3(p ∧ ¬r): The only successor to 2 in R is state 3, and we see that 3 ∈ V (p) and3 6∈ V (r).

M, 1 |= q ∧3(q ∧3(q ∧3(q ∧3q))): Because 1 ∈ V (q) and 2 is the successor to 1, we in-vestigate whether M, 2 |= q ∧ 3(q ∧ 3(q ∧ 3q)) holds. Since 2 ∈ V (q) and 〈2, 3〉 ∈ R,we look at M, 3 |= q ∧ 3(q ∧ 3q); now 〈3, 4〉 ∈ R and M, 3 |= q, so we investigateM, 4 |= q ∧3q. Since 4 ∈ V (q) and 〈4, 5〉 ∈ R, we find that this is true. Let ϕ denotethe formula q∧3(q∧3(q∧3(q∧3q))), then this peeling off layers of parentheses showsthat M, 2 6|= ϕ, because M, 5 |= 3p does not hold.

Page 181: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 169

M, 1 6|= 3ϕ ∧ q: Since M, 2 6|= ϕ, and since state 2 is the only successor to 1, we see thatM, 1 6|= ϕ.

M, w |= 2q: This is true for all worlds w, because w′ ∈ V (q) for all w′ which are successorsto some w ∈W .

,

Example 2.7.10 We have two propositional letters p and q, as set of states we put W :=1, 2, 3, 4, 6, 8, 12, 24, and we say

x R y ⇔ x 6= y and x divides y.

This is what R looks like without transitive arrows:

1 //

2 //

4 //

8

!!3 // 6 // 12 // 24

Put V (p) := 4, 8, 12, 24 and V (q) := 6. Define the Kripke model M := (W,R, V ), thenwe obtain for example

M, 4 |= 2p: The set of successor to state 4 is just 8, 12, 24 which is a subset of V (p).

M, 6 |= 2p: Here we may reason in the same way.

M, 2 6|= 2p: State 6 is a successor to 2, but 6 6∈ V (p).

M, 2 |= 3(q ∧2p) ∧3(¬q ∧2p): State 6 is a successor to state 2 with M, 6 |= q ∧ 2p, andstate 4 is a successor to state 2 with M, 4 |= ¬q ∧2p

,

Let us introduce some terminology which will be needed later. We say that a formula ϕ isglobally true in a Kripke model M with state space W iff [[ϕ]]M = W , hence iff M, w |= ϕfor all states w ∈ W ; this is indicated by M |= ϕ. If [[ϕ]]M 6= ∅, thus if there exists w ∈ Wwith M, w |= ϕ, we say that formula ϕ is satisfiable; ϕ is said to be refutable or falsifiable iff¬ϕ is satisfiable. A set Σ of formulas is said to be globally true iff M, w |= Σ for all w ∈ W

Satisfiabilityetc

(where we put M, w |= Σ iff M, w |= ϕ for all ϕ ∈ Σ). Σ is satisfiable iff M, w |= Σ for somew ∈W .

Kripke models are but one approach for interpreting modal logics. We observe that for a giventransition system (S, ) the set N(s) := s′ ∈ S | s s′ may consist of more than onestate; one may consider N(s) as the neighborhood of state s. An external observer may notbe able to observe N(s) exactly, but may determine that N(s) ⊆ A for some subset A ⊆ S.Obviously, N(s) ⊆ A and A ⊆ B implies N(s) ⊆ B, so that the sets defined by containingthe neighborhood N(s) of a state sforms an upper closed set. This leads to the definition ofneighborhood frames.

Definition 2.7.11 Given a set S of states, a neighborhood frame N := (S,N) is defined bya map N : S → V (S) := V ⊆ P (S) | V is upper closed. N is called an effectivity function

Effectivityfunction

on S.

Page 182: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

170 CHAPTER 2. CATEGORIES

The set V (S) of all upper closed families of subsets S was introduced Example 2.3.13.

So if we consider state s ∈ S in a neighborhood frame, then N(s) is an upper closed set whichgives all sets the next state may be a member of. These frames occur in a natural way intopological spaces.

Example 2.7.12 Let T be a topological space, then

V (t) := A ⊆ T | U ⊆ A for some open neighborhood U of t

defines a neighborhood frame (T, V ). ,

Another straightforward example is given by ultrafilters.

Example 2.7.13 Given a set S, define

U(x) := U ⊆ S | x ∈ U,

the ultrafilter associated with x. Then (S,U) is a neighborhood frame. ,

Each Kripke frame gives rise to a neighborhood frame in this way:

Example 2.7.14 Let (W,R) be a Kripke frame, and define for the world w ∈W the sets

VR(w) := A ∈ P (W ) | R(w) ⊆ A,V ′R(w) := A ∈ P (W ) | R(w) ∩A 6= ∅

(with R(w) := v ∈ W | 〈w, v〉 ∈ R), then both (W,VR) and (W,V ′R) are neighborhoodframes. ,

A neighborhood frame induces a map on the power set of the state space into this power set.This map is used sometimes for an interpretation in lieu of the neighborhood function. Fixa map P : S → V S for illustrating this. Given a subset A ⊆ S, we determine those statesϑP (A) which can achieve a state in A through P ; hence ϑP (A) := s ∈ S | A ∈ P (s). Thisyields a map ϑP : P (S) → P (S), which is monotone since P (s) is upper closed for each s.Conversely, given a monotone map ϑ : P (S) → P (S), we define Pϑ : S → V (S) throughRϑ(s) := A ⊆ S | s ∈ ϑ(A). It is plain that ϑRϑ = ϑ and RϑP = P .

Definition 2.7.15 Given a set S of states, a neighborhood frame (S,N), and a map V : Φ→P (S), associating each propositional letter with a set of states. Then N := (S,N, V ) is calleda neighborhood model

We define validity in a neighborhood model by induction on the structure of a formula, thistime through the validity sets.

[[>]]N := S,

[[p]]N := V (p), if p ∈ Φ,[[ϕ1 ∧ ϕ2]]N := [[ϕ1]]N ∩ [[ϕ2]]N ,

[[¬ϕ]]N := S \ [[ϕ]]N ,

[[2ϕ]]N := s ∈ S | [[ϕ]]N ∈ N(s).

Page 183: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 171

In addition, we put N , s |= ϕ iff s ∈ [[ϕ]]N . Consider the last line and assume that theN , s |= ϕneighborhood frame underlying the model is generated by a Kripke frame (W, R), so thatA ∈ N(w) iff R(w) ⊆ A. Then N , w′ |= 2ϕ translates into w′ ∈ w ∈ S | R(w) ⊆ [[ϕ]]N , sothat N , w |= 2ϕ iff each world which is accessible from world w satisfies ϕ; this is what wewant. Extending the definition above, we put

[[3ϕ]]N := s ∈ S | S \ [[ϕ]]N 6∈ N(s),

so that N , s |= 3ϕ iff N , s |= ¬2¬ϕ.

Back to the general discussion. We generalize the notion of a Kripke model for capturingextended modal languages. The idea for an extension is straightforward — for interpreting amodal formula given by a modal operator of arity n we require a subset of Wn+1. This leadsto the definition of a frame, adapted to this purpose.

Definition 2.7.16 Given a similarity type t = (O, ρ), F = (W, (R∆)∆∈O) is said to be a t-frame iff W 6= ∅ is a set of states, and R∆ ⊆W ρ(∆)+1 for each ∆ ∈ O. A t-model M = (F, V )is a t-frame F with a map V : Φ→ P (W ).

Given a t-model M, we define the interpretation of formulas like ∆(ϕ1, . . . , ϕn) and its nabla-cousin ∇(ϕ1, . . . , ϕn) in this way:

• M, w |= ∆(ϕ1, . . . , ϕn) iff there exist w1, . . . , wn with

1. M, wi |= ϕi for 1 ≤ i ≤ n,

2. 〈w,w1, . . . , wn〉 ∈ R∆,

if n > 0,

• M, w |= ∆ iff w ∈ R∆ for n = 0,

• M, w |= ∇(ϕ1, . . . , ϕn) iff(〈w,w1, . . . , wn〉 ∈ R∆ implies M, wi |= ϕi for all i ∈

1, . . . , n)

for all w1, . . . , wn ∈W , if n > 0,

• M, w |= ∇ iff w 6∈ R∆, if n = 0.

In the last two cases, ∇ is the nabla for modal operator ∆.

Just in order to get a grip on these definitions, let us have a look at some examples.

Example 2.7.17 The setO of modal operators consists just of the unary operators 〈a〉, 〈b〉, 〈c〉,the relations on the set W := w1, w2, w3, w4 of worlds are given by

Ra := 〈w1, w2〉, 〈w4, w4〉,Rb := 〈w2, w3〉,Rc := 〈w3, w4〉.

There is only one propositional letter p and put V (p) := w2. This comprises a t-modelM. We want to check whether M, w1 |= 〈a〉p → 〈b〉p holds. Allora: In order to establishwhether or not M, w1 |= 〈a〉p holds, we have to find a state v such that 〈w1, v〉 ∈ Ra andM, v |= p; state w2 is the only possible choice. But M, w1 6|= p, because w1 6∈ V (p). HenceM, w1 6|= 〈a〉p→ 〈b〉p. ,

Page 184: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

172 CHAPTER 2. CATEGORIES

Example 2.7.18 Let W = u, v, w, s be the set of worlds, we take O := ♦,♣ withρ(♦) = 2 and ρ(♣) = 3. Put R♦ := 〈u, v, w〉 and R♣ := 〈u, v, w, s〉. The set Φ ofpropositional letters is p0, p1, p2 with V (p0) := v, V (p1) := w and V (p2) := s. Thisyields a model M.

1. We want to determine [[♦(p0, p1)]]M. From the definition of |= we see that

M, x |= ♦(p0, p1) iff ∃x0, x1 : M, x0 |= p0 and M, x1 |= p1 and〈x, x0, x1〉 ∈ R♦.

We obtain by inspection [[♦(p0, p1)]]M = u.

2. We have M, u |= ♣(p0, p1, p2). This is so since M, v |= p0, M, w |= p1, and M, s |= p2

together with 〈u, v, w, s〉 ∈ R♣.

3. Consequently, we have [[♦(p0, p1)→ ♣(p0, p1, p2)]]M = u.

,

Example 2.7.19 Let’s look into the future and into the past. We are given the unaryoperators O = F ,P as in Example 2.7.3. The interpretation requires two binary relationsRF and RP ; we have defined the corresponding nablas G resp H. Unless we want to changethe past, we assume that RP = R−1

F , so just one relation R := RF suffices for interpretingthis logic. Hence

M, x |= Fϕ: This is the case iff there exists z ∈W such that 〈x, z〉 ∈ R and M, z |= ϕ.

M, x |= Pϕ: This is true iff there exists z ∈W with 〈v, x〉 ∈ R and M, v |= ϕ.

M, x |= Gϕ: This holds iff we have M, y |= ϕ for all y with 〈x, y〉 ∈ R.

M, x |= Hϕ: Similarly, for all y with 〈y, x〉 ∈ R we have M, y |= ϕ.

,

The next case is a little more involved since we have to construct the relations from theinformation that is available. In the case of PDL (see Example 2.7.4), we have only infor-mation about the behavior of atomic programs, and we construct from it the relations forcompound programs, since, after all, a compound program is composed from the atomicprograms according to the rules laid down in Example 2.7.4.

Example 2.7.20 Let Ψ be the set of all atomic programs, and assume that we have for eacht ∈ Ψ a relation Rt ⊆W ×W ; so if atomic program t is executed in state s, then Rt(s) yieldsthe set of all possible successor states after execution. Now we define by induction on thestructure of the programs these relations.

Rπ1∪π2 := Rπ1 ∪Rπ2 ,Rπ1;π2 := Rπ1 Rπ2 ,

Rπ∗ :=⋃n≥0

Rπn .

(here we put Rπ0 := 〈w,w〉 | w ∈W, and Rπn+1 := Rπ Rπn). Then, if 〈x, y〉 ∈ Rπ1∪π2 , weknow that 〈x, y〉 ∈ Rπ1 or 〈x, y〉 ∈ Rπ2 , which reflects the observation that we can enter a newstate y upon choosing between π1 and π2. Hence executing π1 ∪ π2 in state x, we should be

Page 185: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 173

able to enter this state upon executing one of the programs. Similarly, if in state x we executefirst π1 in x and then π2, we should enter an intermediate state z after executing π1 and thenexecute π2 in state z, yielding the resulting state. Executing π∗ means that we execute πn afinite number of times (probably not at all). This explains the definition for Rπ∗ .

Finally, we should define Rϕ? for a formula ϕ. The intuitive meaning of a program like ϕ?;πis that we want to execute π, provided formula ϕ holds. This suggests defining

Rϕ? := 〈w,w〉 |M, w |= ϕ.

Note that we rely here on a model M which is already defined.

Just to get familiar with these definitions, let us have a look at the composition operator.

M, x |= 〈π1;π2〉ϕ⇔ ∃v : M, v |= ϕ and 〈x, v〉 ∈ Rπ1;π2

⇔ ∃w ∈W∃v ∈ [[ϕ]]M : 〈x,w〉 ∈ Rπ1 and 〈w, v〉 ∈ Rπ2⇔ ∃w ∈ [[〈π2〉ϕ]]M : 〈x,w〉 ∈ Rπ1⇔M, x |= 〈π1〉〈π2〉ϕ

This means that 〈π1;π2〉ϕ and 〈π1〉〈π2〉ϕ are semantically equivalent, which is intuitivelyquite clear.

The test operator is examined in the next formula. We have

Rϕ?;π = Rϕ? Rπ = 〈x, y〉 |M, x |= x and 〈x, y〉 ∈ Rπ = Rϕ? ∩Rπ.

hence M, y |= 〈ϕ?;π〉ψ iff M, y |= ϕ and M, y |= 〈π〉ψ, so that

M, y |= (〈ϕ?;π1〉 ∪ 〈¬ϕ?;π2〉)ϕ iff

M, y |= 〈π1〉ϕ, if M, y |= ϕ

M, y |= 〈π2〉ϕ, otherwise

,

The next example shows that we can interpret PDL in a neighborhood model as well.

Example 2.7.21 We associate with each atomic program t ∈ Ψ of PDL an effectivity func-tion Et : W → V (W ) on the state space W . Hence if we execute t in state w, then Et(w) isthe set of all subsets A of the states so that the next state is a member of A (we say that theprogram t can achieve a state in A). Hence

(W, (Et)t∈Ψ

)is a neighborhood frame. We have

indicated on page 170 that we can construct from a neighborhood function a relation); so wedefine

R′t(A) := w ∈W | A ∈ Rt(w),giving a function R′t : P (W ) → P (W ). R′t is monotone, since Rt is an effectivity function:A ⊆ B, and w ∈ R′t(A), then A ∈ Rt(w), hence B ∈ Rt(w), thus w ∈ R′t(B). These maps canbe extended to programs along the syntax for programs in the following way, which is verysimilar to the one for relations:

R′π1∪π2 := R′π1 ∪R′π2 ,

R′π1;π2 := R′π1 R′π2

R′π∗ :=⋃n≥0

R′πn

Page 186: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

174 CHAPTER 2. CATEGORIES

with R′π0 and R′πn defined as above.

Assume that we have again a function V : Φ → P (W ), yielding a neighborhood model N .The definition above are used now for the interpretation of formulas 〈π〉ϕ through

[[〈π〉ϕ]]N := R′π([[ϕ]]N ).

Interpreting Rπ(w) := A ⊆ W | w ∈ R′π(A) as the sets which can be achieved by theexecution of program π, we have [[〈π〉ϕ]]N = w ∈ W | [[ϕ]]N ∈ Rπ(w), so that [[〈π〉ϕ]]Ndescribes the set of all states for which [[ϕ]]N can be achieved upon execution of π. Thedefinition of Rϕ? carries over, so that this yields an interpretation of PDL.

The construction of R′π was technically necessary, since the direct composition of effectivityfunctions is difficult. ,

Turning to game logic (see Example 2.7.5), we note that neighborhood models are suited tointerpret this logic as well. Assign for each atomic game γ ∈ Γ to Angel the effectivity functionPγ , then Pγ(s) indicates what Angel can achieve when playing γ in state s. Specifically,A ∈ Pγ(s) indicates that Angel has a strategy for achieving by playing γ in state s that thenext state of the game is a member of A. We will not formalize the notion of a strategy herebut appeal rather to an informal understanding. The dual operator permit converting a gameinto its dual, where players change roles: the moves of Angel become moves of Demon, andvice versa.

We should note that the Banach-Mazur games modelled in Section 1.7 and the games consid-ered here display significant differences. First, the Banach-Mazur games are played over theplayground NN, and it should always be possible that a Banach-Mazur game, which is playedover another domain, can be mapped to this urform. By construction, those games continueinfinitely, and they have a well-defined notion of strategy, which permits to define what awinning strategy is. As pointed just out, the games we are about to consider do not have aformal definition of a strategy, we work rather with the informal notion that, e.g., Angel hasa strategy to achieve something. When discussing games, we will be careful to distinguishboth varieties.

Let us just indicate informally by 〈γ〉ϕ that Angel has a strategy in game γ which makessure that game γ results in a state which satisfies formula ϕ. We assume the game to bedetermined : if one player does not have a winning strategy, then the other one has. Thusif Angle does not have a ¬ϕ-strategy, then Demon has a ϕ-strategy, and vice versa. Thismeans that we can derive the way Demon plays the game from the way Angel does, and viceversa.

Example 2.7.22 As in Example 2.7.5 we assume that games are given thorough this gram-mar

g ::= γ | g1 ∪ g2 | g1 ∩ g2 | g1; g2 | gd | g∗ | g× | ϕ?

with γ ∈ Γ , the set of atomic games. We assume that the game is determined, hence we mayexpress demonic choice g1 ∩ g2 through (gd1 ∪ gd2)d, and demonic iteration g× through angelic

iteration((gd)∗

)d).

Assign to each γ ∈ Γ an effectivity function Pγ on the set W of worlds, and put

P ′γ(A) := w ∈W | A ∈ Pγ(w).

Page 187: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 175

Hence w ∈ P ′γ(A) indicates that Angel has a strategy to achieve A by playing game γ in statew. We extend P ′ to games along the lines of the games’ syntax, and put

[[〈g〉ϕ]]N := P ′g([[ϕ]]N ).

for neighborhood model N , the game g and formula ϕ (see the construction after Defini-tion 2.7.15). This is the extension:

P ′g1∪g2(A) := P ′g1(A) ∪ P ′g2(A), P ′gd(A) := W \ P ′g(W \A),

P ′g1;g2(A) := P ′g1(P ′g2(A)), P ′g1∩g2(A) := P ′(gd1∪gd2)d

(A),

P ′g∗(A) :=⋃n≥0

P ′gn(A), P ′g×(A) := P ′((gd)∗

)d(A),

P ′ϕ?(A) := [[ϕ]]N ∩A.

A straightforward suggestion for the interpretation of game logic is the approach throughKripke models, very much in line with the interpretation of modal logics in general. Thereare, however, some difficulties associated with this idea, which are discussed in [PP03, Section3], in particular [PP03, Theorem 1].

This is the reason why Kripke models are not adequate for interpreting game logics: If gamesare interpreted through Kripke models, the interpretation turns out to be disjunctive. Thismeans that 〈g1; (g2∪g3)〉ϕ is semantically equivalent to 〈g1; g2∪g1; g3〉ϕ for all games g1, g2, g3.This, however, is not desirable: Angle’s decision after playing g1 whether to play g2 or g3

should not be equivalent to decide whether to play g1; g2 or g1; g3. Neighborhood models intheir greater generality do not display this equivalence, so they are more general. ,

In this little gallery of examples, let us finally have a look at arrow logic, see Example 2.7.6.

Example 2.7.23 Arrows are interpreted as vectors, hence, e.g., as pairs. Let W be a setof states, then we take W ×W as the domain of our interpretation. We have three modaloperators.

• The nullary operator skip is interpreted through Rskip := 〈w,w〉 | w ∈W.

• The unary operator ⊗ is interpreted through R⊗ :=⟨〈a, b〉, 〈b, a〉

⟩| a, b ∈W

.

• The binary operator is intended to model composition, thus one end of the first arrowshould be the be other end of the second arrow, hence R :=

⟨〈a, b〉, 〈b, c〉, 〈a, c〉

⟩|

a, b, c ∈W.

With this, we obtain for example M, 〈w1, w2〉 |= ψ1ψ2 iff there exists v such that M, 〈w1, v〉 |=ψ1 and M, 〈v, w2〉 |= ψ2. ,

Frames are related through frame morphisms. Take a frame (W,R) for the basic modallanguage, then R : W → P (W ) is perceived as a coalgebra for the power set functor. Thishelps in defining morphisms.

Definition 2.7.24 Let F = (W,R) and G = (X,S) be Kripke frames. A frame morphismf : F→ G is a map f : W → X which makes this diagram commutative:

f framemorphism

Page 188: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

176 CHAPTER 2. CATEGORIES

W

R

f // X

S

P (W )Pf

// P (X)

Hence we have for a frame morphism f : F→ G the condition

S(f(w)) = (Pf)(R(w)) = f[R(w)

]= f(w′) | w′ ∈ R(w).

for all w ∈W .

This is a characterization of frame morphisms.

Lemma 2.7.25 Let F and G be frames, as above. Then f : F→ G is a frame morphism iffthese conditions hold

1. w R w′ implies f(w) S f(w′).

2. If f(w) S z, then there exists w′ ∈W with z = f(w′) and w R w′.

Proof 1. These conditions are necessary. In fact, if 〈w,w′〉 ∈ R, then f(w′) ∈ f[R(w)

]=

S(f(w)), so that 〈f(w), f(w′)〉 ∈ S. Similarly, assume that f(w) S z, thus z ∈ S(f(w)) =P (f) (R(w)) = f

[R(w)

]. Hence there exists w′ with 〈w,w′〉 ∈ R and z = f(w′).

2. The conditions are sufficient. The first condition implies f[R(w)

]⊆ S(f(w)). Now assume

z ∈ S(f(w)), hence f(w) S z, thus there exists w′ ∈ R(w) with f(w′) = z, consequently,z = f(w′) ∈ f

[R(w)

]. a

We see that the bounded morphisms from Example 2.1.10 appear here again in a naturalcontext.

If we want to compare models for the basic modal language, then we certainly should be ableto compare the underlying frames. But this is not yet enough, because the interpretation foratomic propositions has to be taken care of.

Definition 2.7.26 Let M = (W,R, V ) and N = (X,S, Y ) be models for the basic modallanguage and f : (W,R) → (X,S) be a frame morphism. Then f : M → N is said to be a

f modelmorphism

model morphism iff f−1 Y = V .

Hence f−1[Y (p)

]= V (p) for a model morphism f and for each atomic proposition p, thus

M, w |= p iff N, f(w) |= p for each atomic proposition. This extends to all formulas of thebasic modal language.

Proposition 2.7.27 Assume M and N are models as above, and f : M → N is a modelmorphism. Then

M, w |= ϕ iff N, f(w) |= ϕ

for all worlds w of M, and for all formulas ϕ.

Proof 0. The assertion is equivalent to

[[ϕ]]M = f−1[[[ϕ]]N

]

Page 189: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 177

for all formulas ϕ. This is the claim which will be established by induction on the structureLine of at-tack

of a formula now.

1. If p is an atomic proposition, then this is just the definition of a frame morphism to be amodel morphism:

[[p]]M = V (p) = f−1[Y (p)

]= [[p]]N.

Assume that the assertion holds for ϕ1 and ϕ2, then

[[ϕ1 ∧ ϕ2]]M = [[ϕ1]]M ∩ [[ϕ2]]M = f−1[[[ϕ1]]N

]∩ f−1

[[[ϕ2]]N

]= f−1

[[[ϕ1]]N ∩ [[ϕ2]]N

]= f−1

[[[ϕ1 ∧ ϕ2]]N

]Similarly, one shows that [[¬ϕ]]M = f−1

[[[¬ϕ]]N

].

2. Now consider 3ϕ, assume that the hypothesis holds for formula ϕ, then we have

[[3ϕ]]M = w | ∃w′ ∈ R(w) : w′ ∈ [[ϕ]]M= w | ∃w′ ∈ R(w) : f(w′) ∈ [[ϕ]]N (by hypothesis)

= w | ∃w′ : f(w′) ∈ S(f(w)), f(w′) ∈ [[ϕ]]N (by Lemma 2.7.25)

= f−1[x | ∃x′ ∈ S(x) : x′ ∈ [[ϕ]]N

]= f−1

[[[3ϕ]]N

]Thus the assertion holds for all formulas ϕ. a

The observation from Proposition 2.7.27 permits comparing worlds which are given throughtwo models. Two worlds are said to be equivalent iff they cannot be separated by a formula,i.e., iff they satisfy exactly the same formulas.

Definition 2.7.28 Let M and N be models with state spaces W resp. X. States w ∈W andx ∈ X are called modally equivalent iff we have

M, w |= ϕ iff N, x |= ϕ

for all formulas ϕ

Hence if f : M→ N is a model morphism, then w and f(w) are modally equivalent for eachworld w of M. One might be tempted to compare models with respect to their transitionbehavior; after all, underlying a model is a transition system, a.k.a. a frame. This leadsdirectly to this notion of bisimilarity for models — note that we have to take the atomicpropositions into account.

Definition 2.7.29 Let M = (W,R, V ) and N = (X,S, Y ) be models for the basic modallanguage, then a relation B ⊆W ×X is called a bisimulation iff

1. If w B x, then w and x satisfy the same propositional letters (“atomic harmony”).

2. If w B x and w R w′, then there exists x′ with x S x′ and w′ B x′ (forth condition).

3. If w B x and x S x′, then there exists w′ with w R w′ and w′ B x′ (back condition).

Atomicharmony,

forth, back

States w and x are called bisimilar iff there exists a bisimulation B with 〈w, x〉 ∈ B.

Page 190: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

178 CHAPTER 2. CATEGORIES

Hence the forth condition says for a pair of worlds 〈w, x〉 ∈ B that, if w R w′ there existsx′ with 〈w′, x′〉 ∈ B such that x S x

′, similarly for the back condition. So this rings a bell:we did discuss this in Definition 2.6.15. Consequently, if models M and N are bisimilar, thenthe underlying frames are bisimilar coalgebras.

Consider this example for bisimilar states.

Example 2.7.30 Let relation B be defined through

B := 〈1, a〉, 〈2, b〉, 〈2, c〉, 〈3, d〉, 〈4, e〉, 〈5, e〉

with V (p) := a, d, V (q) := b, c, e.

The transitions for M are given through

1 // 2 // 3

5 4

N is given through

b

a

@@

d // e

c

??

Then B is a bisimulation ,

The first result relating bisimulation and modal equivalence is intuitively quite clear. Sincea bisimulation reflects the structural similarity of the transition structure of the underlyingtransition systems, and since the validity of modal formulas is determined through this tran-sition structure (and the behavior of the atomic propositional formulas), it does not come asa surprise that bisimilar states are modally equivalent.

Proposition 2.7.31 Let M and N be models with states w and x. If w and x are bisimilar,then they are modally equivalent.

Proof 0. Let B be the bisimulation for which we know that 〈w, x〉 ∈ B. We have to showthat

M, w |= ϕ⇔ N, x |= ϕ

for all formulas ϕ. This is done by induction on the formula.

1. Because of atomic harmony, the equivalence holds for propositional formulas. It is also clearthat conjunction and negation are preserved under this equivalence, so that the remaining(and interesting) case of proving the equivalence for a formula 3ϕ under the assumption thatit holds for ϕ.

“⇒”: Assume that M, w |= 3ϕ holds. Thus there exists a world w′ in M with w R w′ andM, w′ |= ϕ. Hence there exists by the forward condition a world x′ in N with x S x′ and〈w′, x′〉 ∈ B such that N, x′ |= ϕ by the induction hypothesis. Because x′ is a successorto x, we conclude N, x |= 3ϕ.

“⇐”: This is shown in the same way, using the back condition for B.

a

Page 191: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 179

The converse holds only under the restrictive condition that the models are image finite. Thuseach state has only a finite number of successor states; formally, model (W,R, V ) is calledimage finite iff for each world w the set R(w) is finite. Then the famous Hennessy-Milner

Imagefinite

Theorem says

Theorem 2.7.32 If the models M and N are image finite, then modal equivalent states arebisimilar.

Proof 1. Given two modal equivalent states w∗ and x∗, we have to find a bisimulation B with〈w∗, x∗〉 ∈ B. The only thing we know about the states is that they are modally equivalent,hence that they satisfy exactly the same formulas. This suggests to define Plan

B := 〈w′, x′〉 | w′ and x′ are modally equivalent

and to establish B as a bisimulation. Since by assumption 〈w∗, x∗〉 ∈ B, this will then provethe claim.

2. If 〈w, x〉 ∈ B, then both satisfy the same atomic propositions by the definition of modalequivalence. Now let 〈w, x〉 ∈ B and w R w′. Assume that we cannot find x′ with x S x′ and〈w′, x′〉 ∈ B. We know that M, w |= 3>, because this says that there exists a successor to w,viz., w′. Since w and x satisfy the same formulas, N, x |= 3> follows, hence S(x) 6= ∅. LetS(x) = x1, . . . , xk. Then, since w and xi are not modally equivalent, we can find for eachxi ∈ S(x) a formula ψi such that M, w′ |= ψi, but N, xi 6|= ψi,. Hence M, w |= 3(ψ1∧. . .∧ψk),but N, w 6|= 3(ψ1 ∧ . . . ∧ ψk). This is a contradiction, so the assumption is false, and we canfind x′ with x S x′ and 〈w′, x′〉 ∈ B.

The other conditions for a bisimulation are shown in exactly the same way. a

Neighborhood models can be compared through morphisms as well. Recall that the functorV underlies a neighborhood frame, see Example 2.3.13.

Definition 2.7.33 Let N = (W,N, V ) and M = (X,M, Y ) be neighborhood models for thebasic modal language. A map f : W → X is called a neighborhood morphism f : N →M iff

f neigh-borhood

morphism

• N f = (V f) M ,

• V = f−1 Y .

A neighborhood morphism is a morphism for the neighborhood frame (the definition of whichis straightforward), respecting the validity of atomic propositions. In this way, the definitionfollows the pattern laid out for morphisms of Kripke models. Expanding the definition above,f : N →M is a neighborhood morphism iff these conditions hold: B ∈ N(f(w)) iff f−1

[B]∈

M(w) for all B ⊆ X and all worlds w ∈ W , and V (p) = f−1[Y (p)

]for all atomic sentences

p ∈ Φ. Morphisms for neighborhood models preserve validity in the same way as morphismsfor Kripke models do:

Proposition 2.7.34 Let f : N → M be a neighborhood morphism for the neighborhoodmodels N = (W,N, V ) and M = (X,M, Y ). Then

N , w |= ϕ⇔M, f(w) |= ϕ

Page 192: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

180 CHAPTER 2. CATEGORIES

for all formulas ϕ and for all states w ∈W .

Proof The proof proceeds by induction on the structure of formula ϕ. The induction startswith ϕ an atomic proposition. The assertion is true in this case because of atomic harmony,see the proof of Proposition 2.7.27. We pick only the interesting modal case for the inductionstep. Hence assume the assertion is established for formula ϕ, then

M, f(w) |= 2ϕ⇔ [[ϕ]]M ∈M(f(w)) (by definition)

⇔ f−1[[[ϕ]]M

]∈ N(w) (f is a morphism)

⇔ [[ϕ]]N ∈ N(w) (by induction hypothesis)

⇔ N , w |= 2ϕ

a

We will not pursue this observation further at this point, but rather turn to the constructionof a canonic model. When we will discuss coalgebraic logics, however, this striking structuralsimilarity of models and their morphisms will be shown to be an instance of more generalpattern.

Before proceeding, we introduce the notion of a substitution, which is a map σ : Φ→ L(t, Φ).Substi-tution

We extend a substitution in a natural way to formulas. Define by induction on the structureof a formula

pσ := σ(p), if p ∈ Φ,(¬ϕ)σ := ¬(ϕσ),

(ϕ1 ∧ ϕ2)σ := ϕσ1 ∧ ϕσ2 ,(∆(ϕ1, . . . , ϕk)

)σ:= ∆(ϕσ1 , . . . , ϕ

σk), if ∆ ∈ O with ρ(∆) = k.

2.7.2 The Lindenbaum Construction

We will show now how we obtain from a set of formulas a model which satisfies exactly theseformulas. The scenario is the basic modal language, and it is clear that not every set offormulas is in a position to generate such a model.

Let Λ be a set of formulas, then we say that

• Λ is closed under modus ponens iff ϕ ∈ Λ and ϕ→ ψ together imply ψ ∈ Λ;

• Λ is closed under uniform substitution iff given ϕ ∈ Λ we may conclude that ϕσ ∈ Λ forall substitutions σ.

These two closure properties turn out to be crucial for the generation of a model from a setof formulas. Those sets which satisfy them will be called modal logics, to be precise:

Definition 2.7.35 Let Λ be a set of formulas of the basic modal language. Λ is called amodal logic iff these conditions are satisfied:

1. Λ contains all propositional tautologies.

2. Λ is closed under modus ponens and under uniform substitution.

Page 193: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 181

If formula ϕ ∈ Λ, then ϕ is called a theorem of Λ; we write this as `Λ ϕ. `Λ ϕ

Example 2.7.36 These are some instances of elementary properties for modal logics.

1. If Λi is a modal logic for each i ∈ I 6= ∅, then⋂i∈I Λi is a modal logic. This is fairly

easy to check.

2. We say for a formula ϕ and a frame F over W as a set of states that ϕ holds in thisframe (in symbols F |= ϕ) iff M, w |= ϕ for each w ∈ W and each model M which is F |= ϕbased on F. Let C be a class of frames, then

ΛC :=⋂F∈Cϕ | F |= ϕ

is a modal logic. We abbreviate ϕ ∈ ΛC by C |= ϕ.

3. Define similarly M |= ϕ for a model M iff M, w |= ϕ for each world w of M. Then put M |= ϕfor a class M of models

ΛM :=⋂

M∈Mϕ |M |= ϕ.

Then there are sets M for which ΛM is not a modal language. In fact, take a model Mwith world W and two propositional letters p, q with V (p) = W and V (q) 6= W , thenM, w |= p for all w, hence M |= p, but M 6|= q. On the other hand, q = pσ under thesubstitution σ : p 7→ q. Hence ΛM is not closed under uniform substitution.

,

This formalizes the notion of deduction:

Definition 2.7.37 Let Λ be a logic, and Γ ∪ ϕ a set of modal formulas.

• ϕ is deducible in Λ from Γ iff either `Λ, or if there exist formulas ψ1, . . . , ψk ∈ Γ suchthat `Λ (ψ1 ∧ . . . ∧ ψk)→ ϕ. We write this down as Γ `Λ ϕ. Γ `Λ ϕ

• Γ is Λ-consistent iff Γ 6`Λ ⊥, otherwise Γ is called Λ-inconsistent.

• ϕ is called Λ-consistent iff ϕ is Λ-consistent.

This is a simple and intuitive criterion for inconsistency. We fix for the discussions below amodal logic Λ.

Lemma 2.7.38 Let Γ be a set of formulas. Then these statements are equivalent

1. Γ is Λ-inconsistent.

2. Γ `Λ ϕ ∧ ¬ϕ for some formula ϕ.

3. Γ `Λ ψ for all formulas ψ.

Proof 1 ⇒ 2: Because Γ `Λ ⊥, we know that ψ1 ∧ . . . ∧ ψk → ⊥ is in Λ for some formulasψ1, . . . , ψk ∈ Γ . But ⊥ → ϕ ∧ ¬ϕ is a tautology, hence Γ `Λ ϕ ∧ ¬ϕ.

Page 194: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

182 CHAPTER 2. CATEGORIES

2 ⇒ 3: By assumption there exists ψ1, . . . , ψk ∈ Γ such that `Λ ψ1 ∧ . . . ∧ ψk → ϕ ∧ ¬ϕ,and ϕ ∧ ¬ϕ → ψ is a tautology for an arbitrary formula ψ, hence `Λ ϕ ∧ ¬ϕ → ψ. ThusΓ `Λ ψ.

3 ⇒ 1: We have in particular Γ `Λ ⊥. a

Λ-consistent sets have an interesting compactness property.

Lemma 2.7.39 A set Γ of formulas is Λ-consistent iff each finite subset is Γ is Λ-consistent.

Proof If Γ is Λ-consistent, then certainly each finite subset is. If, on the other hand, eachfinite subset is Λ-consistent, then the whole set must be consistent, since consistency is testedwith finite witness sets. a

Proceeding on our path to finding a model for a modal logic, we define normal logics. Theselogics are closed under some properties which appear as fairly, well, normal, so it is notsurprising that they will play an important role.

Definition 2.7.40 Modal logic Λ is called normal iff it satisfies these conditions for all propo-Normalsitional letters p, q ∈ Φ and all formulas ϕ:

(K) `Λ 2(p→ q)→ (2p→ 2q),

(D) `Λ 3p↔ ¬2¬p,

(G) If `Λ ϕ, then `Λ 2ϕ.

Property (K) states that if it is necessary that p implies q, then the fact that p is necessarywill imply that q is necessary. Note that the formulas in Λ do not have a semantics yet,they are for the time being just syntactic entities. Property (D) connects the constructors3 and 2 in the desired manner. Finally, (G) states that, loosely speaking, if something isthe case, then it is necessarily the case. We should finally note that (K) and (D) are bothformulated for propositional letters only. This, however, is sufficient for modal logics, sincethey are closed under uniform substitution.

In a normal logic, the equivalence of formulas is preserved by the diamond.

Lemma 2.7.41 Let Λ be a normal modal logic, then `Λ ϕ↔ ψ implies `Λ 3ϕ↔ 3ψ.

Proof We show that `Λ ϕ → ψ implies `Λ 3ϕ → 3ψ, the rest will follow in the sameway.

`Λ ϕ→ ψ ⇒`Λ ¬ψ → ¬ϕ (contraposition)

⇒`Λ 2(¬ψ → ¬ϕ) (by (G))

⇒`Λ (2(¬ψ → ¬ϕ))→ (2¬ψ → 2¬ϕ) (uniform substitution, (K))

⇒`Λ 2¬ψ → 2¬ϕ (modus ponens)

⇒`Λ ¬2¬ϕ→ ¬2¬ψ (contraposition)

⇒`Λ 3ϕ→ 3ψ (by (D))

a

Page 195: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 183

Let us define a semantic counterpart to Γ `Λ. Let F be a frame and Γ be a set of formulas,then we say that Γ holds on F (written as F |= Γ ) iff each formula in Γ holds in each modelF |= Γwhich is based on frame F (see Example 2.7.36). We say that Γ entails formula ϕ (Γ |=F ϕ)iff F |= Γ implies F |= ϕ. This carries over to classes of frames in an obvious way. Let C be aclass of frames, then Γ |=C ϕ iff we have Γ |=F ϕ for all frames F ∈ C.

Definition 2.7.42 Let C be a class of frames, then the normal logic Λ is called C-sound iffΛ ⊆ ΛC. If Λ is C-sound, then C is called a class of frames for Λ.

Note that C-soundness indicates that `Λ ϕ implies F |= ϕ for all frames F ∈ C and for allformulas ϕ.

This example dwells on traditional names.

Example 2.7.43 Let Λ4 be the smallest modal logic which contains 33p → 3p (if it ispossible that p is possible, then p is possible), and let K4 be the class of transitive frames.Then Λ4 is K4-sound. In fact, it is easy to see that M, w |= 33p → 3p for all worlds w,whenever M is a model the frame of which carries a transitive relation. ,

Thus C-soundness permits us to conclude that a formula which is deducible from Γ holds alsoin all frames from C. Completeness goes the other way: roughly, if we know that a formulaholds in a class of frames, then it is deducible. To be more precise:

Definition 2.7.44 Let C be a class of frames and Λ a normal modal logic.

1. Λ is strongly C-complete iff for any set Γ ∪ ϕ of formulas Γ |=C ϕ implies Γ `Λ ϕ.

2. Λ is weakly C-complete iff C |= ϕ implies `Λ ϕ for any formula ϕ.

This is a characterization of completeness.

Proposition 2.7.45 Let Λ and C be as above.

1. Λ is strongly C-complete iff every Λ-consistent set of formulas is satisfiable for someF ∈ C.

2. Λ is weakly C-complete iff every Λ-consistent formula is satisfiable for some F ∈ C.

Proof 1. If Λ is not strongly C-complete, then we can find a set Γ of formulas and a formulaϕ with Γ |=C ϕ, but Γ 6`Λ ϕ. Then Γ ∪ ¬ϕ is Λ-consistent, but this set cannot be satisfiedon C. So the condition for strong completeness is sufficient. It is also necessary. In fact, wemay assume by compactness that Γ is finite. Thus by consistency Γ 6`Λ ⊥, hence Γ 6|=C ⊥ bycompleteness, thus there exists a frame F ∈ C with F |= Γ but F 6|= ⊥.

2. This is but a special case of cardinality 1. a

Consistent sets are not yet sufficient for the construction of a model, as we will see soon. Weneed consistent sets which cannot be extended further without jeopardizing their consistency.To be specific:

Definition 2.7.46 The set Γ of formulas is maximally Λ-consistent iff Γ is Λ-consistent,and it is is not properly contained in a Λ consistent set.

Thus if we have a maximal Λ-consistent set Γ , and if we know that Γ ⊂ Γ0 with Γ 6= Γ0,then we know that Γ0 is not Λ-consistent. This criterion is sometimes a bit unpractical, but

Page 196: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

184 CHAPTER 2. CATEGORIES

we have

Lemma 2.7.47 Let Λ be a normal logic and Γ be a maximally Λ-consistent set of formulas.Then

1. Γ is closed under modus ponens.

2. Λ ⊆ Γ .

3. ϕ ∈ Γ or ¬ϕ ∈ Γ for all formulas ϕ.

4. ϕ ∨ ψ ∈ Γ iff ϕ ∈ Γ or ψ ∈ Γ for all formulas ϕ,ψ.

5. ϕ1 ∧ ϕ2 ∈ Γ if ϕ1, ϕ2 ∈ Γ .

Proof 1. Assume that ϕ ∈ Γ and ϕ → ψ ∈ Γ , but ψ 6∈ Γ . Then Γ ∪ ψ is inconsistent,hence Γ ∪ ψ `Λ ⊥ by Lemma 2.7.38. Thus we can find formulas ψ1, . . . , ψk ∈ Γ such that`Λ ψ ∧ ψ1 ∧ . . . ∧ ψk → ⊥. Because `Λ ϕ ∧ ψ1 ∧ . . . ∧ ψk → ψ ∧ ψ1 ∧ . . . ∧ ψk, we concludeΓ `Λ ⊥. This contradicts Λ-consistency by Lemma 2.7.38.

2. In order to show that Λ ⊆ Γ , we assume that there exists ψ ∈ Λ such that ψ 6∈ Γ , thenΓ ∪ ψ is inconsistent, hence `Λ ψ1 ∧ . . . ∧ ψk → ¬ψ for some ψ1, . . . , ψk ∈ Λ (here weuse Γ ∪ ψ `Λ ψ and Lemma 2.7.38). By propositional logic, `Λ ψ → ¬(ψ1 ∧ . . . ∧ ψk),hence ψ ∈ Λ implies Γ `Λ ¬(ψ1 ∧ . . . ∧ ψk). But Γ `Λ ψ1 ∧ . . . ∧ ψk, consequently, Γ isΛ-inconsistent.

3. If both ϕ 6∈ Γ and ¬ϕ 6∈ Γ , Γ is Λ-inconsistent.

4. Assume first that ϕ ∨ ψ ∈ Γ , but ϕ 6∈ Γ and ψ 6∈ Γ , hence both Γ ∪ ϕ and Γ ∪ ψ areinconsistent. Thus we can find ψ1, . . . , ψk, ϕ1, . . . , ϕn ∈ Γ with `Λ ψ1 ∧ . . . ∧ ψk → ¬ψ and`Λ ϕ1 ∧ . . . ∧ ϕn → ¬ϕ. This implies `Λ ψ1 ∧ . . . ∧ ψk ∧ ϕ1 ∧ . . . ∧ ϕn → ¬ψ ∧ ¬ϕ, and byarguing propositionally, `Λ (ψ ∨ ϕ) ∧ ψ1 ∧ . . . ∧ ψk ∧ ϕ1 ∧ . . . ∧ ϕn → ⊥, which contradictsΛ-consistency of Γ . For the converse, assume that ϕ ∈ Γ . Since ϕ → ϕ ∨ ψ is a tautology,we obtain ϕ ∨ ψ from modus ponens.

5. Assume ϕ1 ∧ ϕ2 6∈ Γ , then ¬ϕ1 ∨ ¬ϕ2 ∈ Γ by part 3 Thus ¬ϕ1 ∈ Γ or ¬ϕ2 ∈ Γ by part 4,hence ϕ1 6∈ Γ or ϕ2 6∈ Γ . a

Hence consistent sets have somewhat convenient properties; they remind the reader probablyof the properties of an ultrafilter in Lemma 1.5.35, and we will use this close relationship whenestablishing Godel’s Completeness Theorem in Section 3.6.1. But how do we construct them?The famous Lindenbaum Lemma states that we may obtain them by enlarging consistentsets.

From now on we fix a normal modal logic Λ.

Lemma 2.7.48 If Γ is a Λ-consistent set, then there exists a maximal Λ-consistent set Γ+

with Γ ⊆ Γ+.

We will give two proofs for the Lindenbaum Lemma, depending on the cardinality of the setof all formulas. If the set Φ of propositional letters is countable, the set of all formulas iscountable as well, so the first proof may be applied. If, however, we have more than a countablenumber of formulas, then this proof will fail to exhaust all formulas, and we have to applyanother method, in this case transfinite induction (in the disguise of Tuckey’s Lemma). The

Page 197: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 185

basic idea, however, is the same in each case. Based on the observation that either ϕ or ¬ϕis a member of a maximally consistent set, we take the consistent set presented to us, andlook at each formula. If adding the formula will leave the set consistent, we add it, otherwise,otherwise we add its negation. If the set of all formulas is not countable, the process of addingformulas will be controlled by a well-ordering, i.e., through transfinite induction in the formof Tuckey’s Lemma, if it is countable, however, life is easier and we may access the set of allformulas through enumerating them. Plan

Proof (First — countable case) Assume that the set of all formulas is countable, and letϕn | n ∈ N be an enumeration of them. Define by induction

Γ0 := Γ,

Γn+1 := Γn ∪ ψn,

where

ψn :=

ϕn, if Γn ∪ ϕn is consistent,

¬ϕn, otherwise.

Put

Γ+ :=⋃n∈N

Γn.

Then these properties are easily checked:

• Γn is consistent for all n ∈ N0.

• Either ϕ ∈ Γ+ or ¬ϕ ∈ Γ+ for all formulas ϕ.

• If Γ+ `Λ ϕ, then ϕ ∈ Γ+.

• Γ+ is maximal.

a

It may be noted that this proof is fairly similar to the second proof of the CompactnessTheorem 1.5.8 for propositional logic in Section 1.5.1, suggested for the countable case.

Proof (Second — general case) Let

C := Γ ′ | Γ ′ is Λ-consistent and Γ ⊆ Γ ′.

Then C contains Γ , hence C 6= ∅, and C is ordered by inclusion. By Tuckey’s Lemma, itcontains a maximal chain C0. Let Γ+ :=

⋃C0. Then Γ+ is a Λ-consistent set which contains

Γ as a subset. While the latter is evident, we have to take care of the former. Assume that Γ+

is not Λ-consistent, hence Γ+ `Λ ϕ ∧ ¬ϕ for some formula ϕ. Thus we find ψ1, . . . , ψk ∈ Γ+

with `Λ ψ1 ∧ . . . ∧ ψk → ϕ ∧ ¬ϕ. Given ψi ∈ Γ+, we find Γi ∈ C0 with ψi ∈ Γi. SinceC0 is linearly ordered, we find some Γ ′ among them such that Γi ⊆ Γ ′ for all i. Henceψ1, . . . , ψk ∈ Γ ′, so that Γ ′ is not Λ-consistent. This is a contradiction. Now assume thatΓ+ is not maximal, then there exists ϕ such that ϕ 6∈ Γ+ and ¬ϕ 6∈ Γ+. If Γ+ ∪ ϕ isnot consistent, Γ+ ∪ ¬ϕ is, and vice versa, so either one of Γ+ ∪ ϕ and Γ+ ∪ ¬ϕ isconsistent. But this means that C0 is not maximal. a

Page 198: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

186 CHAPTER 2. CATEGORIES

We are in a position to construct a model now, specifically, we will define a set of states, atransition relation and the validity sets for the propositional letters. Put

W ] := Σ | Σ is Λ-consistent and maximal,R] := 〈w, v〉 ∈W ] ×W ] | for all formulas ψ,ψ ∈ v implies 3ψ ∈ w,

V ](p) := w ∈W ] | p ∈ w for p ∈ Φ.

Then M] := (W ], R], V ]) is called the canonical model for Λ.

This is another view of relation R]:

Lemma 2.7.49 Let v, w ∈W ], then wR]v iff 2ψ ∈ w implies ψ ∈ v for all formulas ψ.

Proof 1. Assume that 〈w, v〉 ∈ R], but that ψ 6∈ v for some formula ψ. Since v is maximal, weconclude from Lemma 2.7.47 that ¬ψ ∈ v, hence the definition of R] tells us that 3¬ψ ∈ w,which in turn implies by the maximality of w that ¬3¬ψ 6∈ w, hence 2ψ 6∈ w.

2. If 3ψ 6∈ w, then by maximality ¬3ψ ∈ w, so 2¬ψ ∈ w, which means by assumption that¬ψ ∈ v. Hence ψ 6∈ v. a

The next lemma gives a more detailed look at the transitions which are modelled by R].

Lemma 2.7.50 Let w ∈W ] with 3ϕ ∈ w. Then there exists a state v ∈W ] such that ϕ ∈ vand w R] v.

Proof 0. Because we can extend Λ-consistent sets to maximal consistent ones by the Lin-denbaum Lemma 2.7.48, it is enough to show that v0 := ϕ ∪ ψ | 2ψ ∈ w is Λ-consistent.PlanIf this succeeds, we extend the set v0 to obtain v.

1. Assume it is not. Then we have `Λ (ψ1 ∧ . . . ∧ ψk) → ¬ϕ for some ψ1, . . . , ψk ∈ v0, fromwhich we obtain with (G) and (K) that `Λ 2(ψ1∧. . .∧ψk)→ 2¬ϕ. Because 2ψ1∧. . .∧2ψk →2(ψ1∧. . .∧ψk), this implies `Λ 2ψ1∧. . .∧2ψk → 2¬ϕ. Since 2ψ1, . . . ,2ψk ∈ w, we concludefrom Lemma 2.7.47 that 2ψ1 ∧ . . . ∧ 2ψk ∈ w, thus we have 2¬ϕ ∈ w by modus ponens,hence ¬3ϕ ∈ w. Since w is maximal, this implies 3ϕ 6∈ w. This is a contradiction. So v0

is consistent, thus there exists by the Lindenbaum Lemma a maximal consistent set v withv0 ⊆ v. We have in particular ϕ ∈ v, and we know that 2ψ ∈ w implies ψ ∈ v, hence〈w, v〉 ∈ R]. a

This helps in characterizing the model, in particular the validity relation |= by the well-knownTruth Lemma.

Lemma 2.7.51 M], w |= ϕ iff ϕ ∈ w

Proof The proof proceeds by induction on formula ϕ. The statement is trivially true ifϕ = p ∈ Φ is a propositional letter. The set of formulas for which the assertion holds isPlancertainly closed under Boolean operations, so the only interesting case is the case that theformula in question has the shape 3ϕ, and that the assertion is true for ϕ.

“⇒”: If M], w |= 3ϕ, then we can find some v with w R] v and M], v |= ϕ. Thus thereexists v with

Page 199: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 187

〈w, v〉 ∈ R] such that ϕ ∈ v by hypothesis, which in turn means 3ϕ ∈ w.

“⇐”: Assume 3ϕ ∈ w, hence there exists v ∈ W ] with w R] v and ϕ ∈ v, thus M], v |= ϕ.But this means M], w |= 3ϕ.

a

Finally, we obtain

Theorem 2.7.52 Any normal logic is complete with respect to its canonical model.

Proof Let Σ be a Λ-consistent set for the normal logic Λ. Then there exists by Lindenbaum’sLemma 2.7.48 a maximal Λ-consistent set Σ+ with Σ ⊆ Σ+. By the Truth Lemma we havenow M], Σ+ |= Σ. a

We will generalize modal logics now to coalgebraic logics, which is particularly streamlinedto be interpreted within the context of coalgebras, and which contains the interpretation ofmodal logics as a special case. Neighborhood models may be captured in this realm as well,which indicates that the coalgebraic way of thinking about logics is a useful generalization ofthe relational way.

2.7.3 Coalgebraic Logics

We have seen several points where coalgebras and modal logics touch each other, for example,morphisms for Kripke models are based on morphisms for the underlying P-coalgebra, as acomparison of Example 2.1.10 and Lemma 2.7.25 demonstrates. Let M = (W,R, V ) be aKripke model, then the accessibility relation R ⊆ W ×W can be perceived as a map, againdenoted by R, with the signature W → P (W ). Map V : Φ → P (W ), which indicatesthe validity of atomic propositions, can be encoded through a map V1 : W → P (Φ) uponsetting V1(w) := p ∈ Φ | w ∈ V (p). Both V and V1 describe the same relation 〈p, w〉 ∈Φ×W |M, w |= p, albeit from different angles. They are trivially interchangeable for eachother. This new representation has the advantage of describing the model from vantage pointw.

Define FX := P (X) × P (Φ) for the set X, and put, given map f : X → Y , (F f)(A,Q) :=〈f[A], Q〉 = 〈(Pf)A,Q〉 for A ⊆ X,Q ⊆ Φ, then F is an endofunctor on Set. Hence we

obtain from the Kripke model M the F -coalgebra (W,γ) with γ(w) := R(w) × V1(w). Thisconstruction can easily be reversed: given a F -coalgebra (W,γ), we put R(w) := π1(γ(w)) andV1(w) := π2(γ(w)) and construct V from V1, then (W,R, V ) is a Kripke model (here π1, π2 arethe projections). Thus Kripke models and F -coalgebras are in an one-to-one correspondencewith each other. This correspondence goes a bit deeper, as can be seen when consideringmorphisms.

Proposition 2.7.53 Let M = (W,R, V ) and N = (X,S, Y ) be Kripke models with associatedF-coalgebras (W,γ) resp. (X, δ). Then these statements are equivalent for a map f : W → X

1. f : (W,γ)→ (X, δ) is a morphism of coalgebras.

2. f : M→ N is a morphism of Kripke models.

Page 200: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

188 CHAPTER 2. CATEGORIES

Proof 1 ⇒ 2: We obtain for each w ∈ W from the defining equation (F f) γ = δ fthe equalities f

[R(w)

]= S(f(w)), and V1(w) = Y1(f(w)). Since f

[R(w)

]= (Pf)(R(w)), we

conclude that (Pf) R = S f , so f is a morphism of the P-coalgebras. We have moreoverfor each atomic sentence p ∈ Φ

w ∈ V (p)⇔ p ∈ V1(w)⇔ p ∈ Y1(f(w))⇔ f(w) ∈ Y (p).

This means V = f−1 Y , so that f : M→ N is a morphism.

2 ⇒ 1: Because we know that S f = (Pf) R, and because one shows as above thatV1 = Y1 f , we obtain for w ∈W

(δ f)(w) = 〈S(f(w)), Y1(f(w))〉 = 〈(Pf)(R(w)), V1(w)〉 =((F f) γ

)(w).

Hence f : (W,γ)→ (X, δ) is a morphism for the F -coalgebras. a

Given a world w, the value of γ(w) represents the worlds which are accessible from w, makingsure that the validity of the atomic propositions is maintained; recall that they are not affectedby a transition. This information is to be extracted in a variety of ways. We need predicateliftings for this.

Before we define liftings, however, we observe that the same mechanism works for neighbor-hood models.

Example 2.7.54 Let N = (W,N, V ) be a neighborhood model. Define functor G by puttingG(X) := V (X)× P (Φ) for sets, and if f : X → Y is a map, put (Gf)(U,Q) := 〈(V f)U,Q〉.Then G is an endofunctor on Set. The G-coalgebra (W, ν) associated with N is definedthrough ν(w) := 〈N(w), V1(w)〉 (with V1 defined through V as above).

Let M = (X,M, Y ) be another neighborhood model with associated coalgebra (X,µ). Ex-actly the same proof as the one for Proposition 2.7.53 shows that f : N →M is a neighbor-hood morphism iff f : (W, ν)→ (X,µ) is a coalgebra morphism. ,

Proceeding to define predicate liftings, let Pop : Set → Set be the contravariant power setfunctor, i.e., given the set X, Pop(X) is the power set P (X) of X, and if f : X → Y is amap, then (Popf) : Pop(Y )→ Pop(Y ) works as B 7→ f−1

[B].

Definition 2.7.55 Given a (covariant) endofunctor T on Set, a predicate lifting λ for T isa monotone natural transformation λ : Pop → Pop T .

Interpret A ∈ Pop(X) as a predicate on X, then λX(A) ∈ Pop(TX) is a predicate on TX,hence λX lifts the predicate into the realm of functor T ; the requirement of naturalness isintended to reflect compatibility with morphisms, as we will see below. Thus a predicatelifting helps in specifying a requirement on the level of sets, which it then transports onto thelevel of those sets that are controlled by functor T . Technically, this requirement means thatthis diagram commutes, whenever f : X → Y is a map:

PX λX // P(TX)

PY

f−1

OO

λY// P(TY )

(T f)−1

OO

Page 201: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 189

Hence we have λX(f−1[G]) = (T f)−1

[λY (G)

]for any G ⊆ Y .

Finally, monotonicity says that λX(D) ⊆ λX(E), whenever D ⊆ E ⊆ X; this conditionmodels the requirement that informations about states should only depend on their precursors.Informally it is reflected in the rule ` (ϕ→ ψ)→ (2ϕ→ 2ψ)

This example illuminates the idea.

Example 2.7.56 Let F = P (−)×PΦ be defined as above, put for the set X and for D ⊆ X

λX(D) := 〈D′, Q〉 ∈ P (X)× P (Φ) | D′ ⊆ D.

This defines a predicate lifting λ : Pop → Pop F . In fact, let f : X → Y be a map andG ⊆ Y , then

λX(f−1[G]) = 〈D′, Q〉 | D′ ⊆ f−1

[G]

= 〈D′, Q〉 | f[D′]⊆ G

= (F f)−1[〈G′, Q〉 ∈ P (Y )× P (Φ) | G′ ⊆ G

]= (F f)−1

[λY (G)

](remember that F f leaves the second component of a pair alone). It is clear that λX ismonotone for each set X.

Let γ : W → FW be the coalgebra associated with Kripke model M := (W,R, V ), and lookat this (ϕ is a formula):

w ∈ γ−1[λW ([[ϕ]]M)

]⇔ γ(w) ∈ λW ([[ϕ]]M)

⇔ 〈R(w), V1(w)〉 ∈ λW ([[ϕ]]M)

⇔ R(w) ⊆ [[ϕ]]M

⇔ w ∈ [[2ϕ]]M

This means that we can describe the semantics of the 2-operator through a predicate lifting,which cooperates with the coalgebra’s dynamics.

Note that it would be equally possible to do this for the 3-operator: define the lifting throughD 7→ 〈D′, Q〉 | D′ ∩D 6= ∅. But we’ll stick to the 2-operator, keeping up with tradition. ,

Example 2.7.57 The same technique works for neighborhood models. In fact, let (W, ν) bethe G-coalgebra associated with neighborhood model N = (W,N, V ) as in Example 2.7.54,and define

λX(D) := 〈V,Q〉 ∈ V (X)× P (Φ) | D ∈ V .

Then λX : P (X) → P (V (X)× P (Φ)) is monotone, because the elements of V X are upperclosed. If f : (W, ν)→ (X,µ) is a G-coalgebra morphism, we obtain for D ⊆ X

λW (f−1[D]) = 〈V,Q〉 ∈ V (W )× P (Φ) | f−1

[D]⊆ V

= 〈V,Q〉 ∈ V (W )× P (Φ) | D ∈ (V f)(V )= (Gf)−1

[〈V ′, Q〉 ∈ V (X)× P (Φ) | D ∈ V ′

]= (Gf)−1

[λX(D)

]

Page 202: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

190 CHAPTER 2. CATEGORIES

Consequently, λ is a predicate lifting for G. We see also for formula ϕ

w ∈ λW ([[ϕ]]N )⇔ 〈[[ϕ]]N , V1(w)〉 ∈ λX([[ϕ]]N )

⇔ [[ϕ]]N ∈ N(w) (by definition of ν)

⇔ w ∈ [[2ϕ]]N

Hence we can define the semantics of the 2-operator also in this case through a predicatelifting. ,

There is a general mechanism permitting us to define predicate liftings, which is outlined inthe next lemma.

Lemma 2.7.58 Let η : T → P be a natural transformation, and define

λX(D) := c ∈ TX | ηX(c) ⊆ D

for D ⊆ X. Then λ defines a predicate lifting for T .

Proof It is clear from the construction that D 7→ λX(D) defines a monotone map, so wehave to show that the diagram below is commutative for f : X → Y .

PX λX // PTX

PYλY

//

f−1

OO

PTY

(T f)−1

OO

We note thatηX(c) ⊆ f−1

[E]⇔ f

[ηX(c)

]⊆ E ⇔ (Pf)(ηX(c)) ⊆ E

and(Pf) ηX = ηY (T f),

because η is natural. Hence we obtain for E ⊆ Y :

ηX(f−1[E]) = c ∈ TX | ηX(c) ⊆ f−1

[E]

= c ∈ TX |((Pf) ηX

)(c) ⊆ E

= c ∈ TX | (ηY T f)(c) ⊆ E= (T f)−1

[d ∈ TY | ηY (d) ⊆ E

]=((T f)−1 ηY

)(E).

a

Let us return to the endofunctor F = P (−) × P (Φ) and fix for the moment an atomicproposition p ∈ Φ. Define the constant function

λp,X(D) := 〈D′, Q〉 ∈ FX | p ∈ Q.

Then an easy calculation shows that λp : Pop → Pop F is a natural transformation, hence apredicate lifting for F . Let γ : W → FW be a coalgebra with carrier W which correspondsto the Kripke model M = (W,R, V ), then

w ∈ (γ−1 λp,W )(D)⇔ γ(w) ∈ λp,W (D)⇔ p ∈ π2(γ(w))⇔ w ∈ V (p),

Page 203: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 191

which means that we can use λp for expressing the meaning of formula p ∈ Φ. A very similarconstruction can be made for functor G, leading to the same conclusion.

We cast this into a more general framework now. Let `X : X → 0 be the unique map fromset X to the singleton set 0. Given A ⊆ T (0), define λA,X(D) := c ∈ TX | (T `X)(c) ∈A = (T `X)−1

[A]. This defines a predicate lifting for T . In fact, let f : X → Y be a map,

then `X = `Y f , so

(T f)−1 (T `Y )−1 =((T `Y ) (T f)

)−1=(T (`Y f)

)−1= (T `X)−1,

henceλA,X(f−1

[B]) = (T f)−1

[λA,Y (B)

].

As we have seen, this construction is helpful for capturing the semantics of atomic proposi-tions.

Negation can be treated as well in this framework. Given a predicate lifting λ for T , we definefor the set X and A ⊆ X the set

λ¬X(A) := (TX) \ λX(X \A),

then this defines a predicate lifting for T . This is easily checked: monotonicity of λ¬ followsfrom λ being monotone, and since f−1 is compatible with the Boolean operations, naturalityfollows.

Summarizing, those operations which are dear to us when interpreting modal logics througha Kripke model or through a neighborhood model can also be represented using predicateliftings.

We now take a family L of predicate liftings and define a logic for it.

Definition 2.7.59 Let T be an endofunctor on the category Set of sets with maps, and letL be a set of predicate listings for T . The formulas for the language L(L) are defined through L(L)

ϕ ::= ⊥ | ϕ1 ∧ ϕ2 | ¬ϕ | [λ]ϕ

with λ ∈ L.

The semantics of a formula in L(L) in a T -coalgebra (W,γ) is defined recursively by describingthe sets of worlds [[ϕ]]γ in which formula ϕ holds (with w |=γ ϕ iff w ∈ [[ϕ]]γ): w |=γ ϕ

[[⊥]]γ := ∅[[ϕ1 ∧ ϕ2]]γ := [[ϕ1]]γ ∩ [[ϕ2]]γ

[[¬ϕ]]γ := W \ [[ϕ]]γ

[[[λ]]ϕ]γ := (γ−1 λC)([[ϕ]]γ).

The most interesting definition is of course the last one. It is defined through a modality forthe predicate lifting λ, and it says that formula [λ]ϕ holds in world w iff the transition γ(w)achieves a state which is lifted by λ from one in which ϕ holds. Hence each successor to wsatisfies the predicate for ϕ lifted by λ.

Page 204: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

192 CHAPTER 2. CATEGORIES

Example 2.7.60 Continuing Example 2.7.56, we see that the simple modal logic can bedefined as the modal logic for L = λ ∪ λp | p ∈ Φ, where λ is defined in Example 2.7.56,and λp are the constant liftings associated with Φ. ,

We obtain also in this case the invariance of validity under morphisms.

Proposition 2.7.61 Let f : (W,γ)→ (X, δ) be a T -coalgebra morphism. Then

w |=γ ϕ⇔ f(w) |=δ ϕ

holds for all formulas ϕ ∈ L(L) and all worlds w ∈W .

Proof The proof proceeds by induction on ϕ, the interesting case occurring for a modalformula [λ]ϕ with λ ∈ L. So assume that the hypothesis is true for ϕ, then we have

f−1[[[[λ]]ϕ]δ

]=((δ f)−1 λD

)([[ϕ]]δ)

=((T (f) γ)−1 λD

)([[ϕ]]δ) (f is a morphism)

=(γ−1 (T f)−1 λD

)([[ϕ]]δ)

=(γ−1 λC f−1

)([[ϕ]]δ) (λ is natural)

=(γ−1 λC

)([[ϕ]]γ) (by hypothesis)

= [[[λ]]ϕ]γ

a

Let (C, γ) be a T -coalgebra, then we define the theory of cThγ(c)

Thγ(c) := ϕ ∈ L(L) | c |=γ ϕ

for c ∈ C. Two worlds which have the same theory cannot be distinguished through formulasof the logic L(L).

Definition 2.7.62 Let (C, γ) and (D, δ) be T -coalgebras, c ∈ C and d ∈ D.

• We call the states c and d are logically equivalent iff Thγ(c) = Thδ(d).

• The states c and d are called behaviorally equivalent iff there exists a T -coalgebra (E, ε)

and morphisms (C, γ)f→ (E, ε)

g← (D, δ) such that f(c) = g(d).

Logical,behavioralequivalence

Thus, logical equivalence looks locally at all the formulas which are true in a state, and thencompares two states with each other. Behavioral equivalence looks for an external instance,viz., a mediating coalgebra, and at morphisms; whenever we find states the image of whichcoincide, we know that the states are behaviorally equivalent.

This implication is fairly easy to obtain.

Proposition 2.7.63 Behaviorally equivalent states are logically equivalent.

Proof Let c ∈ C and d ∈ D be behaviorally equivalent for the T -coalgebras (C, γ) and (D, δ),and assume that we have a mediating T -coalgebra (E, ε) with morphisms

(C, γ)f−→ (E, ε)

g←− (D, δ).

Page 205: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 193

and f(c) = g(d). Then we obtain

ϕ ∈ Thγ(c)⇔ c |=γ ϕ⇔ f(c) |=ε ϕ⇔ g(d) |=ε ϕ⇔ d |=δ ϕ⇔ ϕ ∈ Thδ(d)

from Proposition 2.7.61. a

We have seen that coalgebras are useful when it comes to generalize modal logics to coalge-braic logics. Morphisms arise in a fairly natural way in this context, giving rise to definingbehaviorally equivalent coalgebras. It is quite clear that bisimilarity can be treated on thislevel as well, by introducing a mediating coalgebra and morphisms from it; bisimilar statesare logically equivalent, the argument to show this is exactly as in the case above throughProposition 2.7.61. In each case the question arises whether the implications can be reversed— are logically equivalent states behaviorally equivalent? Bisimilar? Answering this questionrequires a fairly elaborate machinery and depends strongly on the underlying functor. Wewill not discuss this question here but rather point to the literature, e.g., to [Pat04]. For thesubprobability functor some answers and some techniques can be found in [DS11].

The following example discusses the basic modal language with no atomic propositions.

Example 2.7.64 We interpret L(3) with Φ = ∅ through P-coalgebras, i.e., through tran-sition systems. Given a transition system (S,R), denote by ∼ the equivalence provided bylogical equivalence, so that s ∼R s′ iff states s and s′ cannot be separated through a for-mula in the logic, i.e., iff ThR(s) = ThR(s′). Then η∼ : (S,R) → (S/∼, R/∼) is a coalgebramorphism. Here

R/∼ := 〈[s1] , [s2]〉 | 〈s1, s2〉 ∈ R.

In fact, look at this diagram

S

R

η∼ // S/∼

R/∼

P (S)P(η∼)

// P (S/∼)

Then

[s2] ∈ R/∼([s1])⇔ [s2] ∈ [s] | s ∈ [s1] = P (η∼)(R/∼([s1])

),

which means that the diagram commutes. We denote the factor model (S/∼, R/∼) by (S′, R′),and denote the class of an element without an indication of the equivalence relation. It willbe clear from the context from which set of worlds a state will be taken.

Call the transition systems (S,R) and (T, L) logically equivalent iff for each state in one systemthere exists a logically equivalent state in the other one. We carry over behavioral equivalenceand bisimilarity from individual states to systems, taking the discussion for coalgebras inSection 2.6.1 into account. Call the transition systems (S,R) and (T, L) behaviorally equivalentiff there exists a transition system (U,M) with surjective morphisms

(S,R)f // (U,M) (T, L).

goo

Finally, they are called bisimilar iff there exists a transition system (U,M) with surjective Bisimilar

Page 206: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

194 CHAPTER 2. CATEGORIES

morphisms

(S,R) (U,M)foo g // (T, L).

We claim that logical equivalent transition systems have isomorphic factor spaces under theequivalence induced by the logic, provided both are image finite. Consider this diagram

S′

R′

ζ // T ′

L′

P (S′)

P(ζ)// P (T ′)

with ζ([s]) := [t] iff ThR(s) = ThL(t). Thus ζ preserves classes of logically equivalent states.It is immediate that ζ : S′ → T ′ is a bijection, so commutativity has to be established.

Before working on the diagram, we show first that for any 〈t, t′〉 ∈ L and for any s ∈ S withThR(s) = ThL(t) there exists s′ ∈ S with 〈s, s′〉 ∈ R and ThR(s′) = ThL(t′) (written moregraphically in terms of arrows, we claim that t →L t

′ and ThR(s) = ThL(t) together implythe existence of s′ with s→R s

′ and ThR(s′) = ThL(t′)). This is established by adapting theidea from the proof of the Hennessy-Milner Theorem 2.7.32 to the situation at hand. Well,then: Assume that such a state s′ cannot be found. Since L, t′ |= >, we know that L, t |= 3>,thus ThL(t) = ThR(s) 6= ∅. Let R(s) = s1, . . . , sk for some k ≥ 1, then we can find foreach si a formula ψi with L, t′ |= ψi and R, si 6|= ψi. Thus L, t |= 3(ψ1 ∧ . . . ∧ ψk), butR, s 6|= 3(ψ1 ∧ . . . ∧ ψk), which contradicts the assumption that ThR(s) = ThL(t). This usesonly image finiteness of (S,R), by the way.

Now let s ∈ S with [t1] ∈ L′(ζ([s])

)= L′

([t])

for some t ∈ T . Thus 〈t, t1〉 ∈ L, so we finds1 ∈ S with ThR(s1) = ThL(t1) and 〈s, s1〉 ∈ R. Consequently, [t1] = ζ([s1]) ∈ P(ζ)

(R′([s])

).

Hence L′(ζ([s])

)⊆ P (ζ)

(R′([s])

).

Working on the other inclusion, we take [t1] ∈ P (ζ)(R′([s]

), and we want to show that [t1] ∈

L′(ζ([s])

). Now [t1] = ζ([s1]) for some s1 ∈ S with 〈s, s1〉 ∈ R, hence ThR(s1) = ThL(t1). Put

[t] = ζ([s]), thus ThR(s) = ThL(t). Because (T, L) is image finite as well, we may concludefrom the Hennessy-Milner argument above — by interchanging the roles of the transitionsystems — that we can find t2 ∈ T with 〈t, t2〉 ∈ L so that ThL(t2) = ThR(s1) = ThL(t1).This impies [t2] = [t1] and [t1] ∈ L′([t]) = L′

(ζ([s])

). Hence L′

(ζ([s])

)⊇ P (ζ)

(R′([s])

).

Thus the diagram above commutes, and we have shown that the factor models are isomor-phic. Consequently, two image finite transition systems which are logically equivalent arebehaviorally equivalent with one of the factors acting as a mediating system. ,

Clearly, behaviorally equivalent systems are bisimilar, so that we obtain these relation-ships

bisimilarityProposition 2.7.27

ttlogical equivalence

image finiteness// behavioral equivalence

Theorem 2.6.16jj

Page 207: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.7. MODAL LOGICS 195

We finally give an idea of modelling CTL* as a popular logic for model checking coalgebraically.This shows how this modelling technique is applied, and it shows also that some additionalsteps become necessary, since things are not always straightforward.

Example 2.7.65 The logic CTL* is used for model checking [CGP99]. The abbreviationCTL stands for computational tree logic. CTL* is actually one of the simpler members of thisfamily of tree logics used for this purpose, some of which involve continuous time [BHHK03,Dob07]. The logic has state formulas and path formulas, the former ones are used to describea particular state in the system, the latter ones express dynamic properties. Hence CTL*

operates on two levels.

These operators are used

State operators They include the operators A and E, indicating that a property holds ina state iff it holds on all paths resp. on at least one path emanating from it,

Path operators They include the operators

• X for next time — a property holds in the next, i.e., second state of a path,

• F for in the future — the specified property holds for some state on the path,

• G for globally — the property holds always on a path,

• U for until — this requires two properties as arguments; it holds on a path if thereexists a state on the path for which the second property holds, and the first oneholds on each preceding state.

State formulas are given through this syntax:

ϕ ::= ⊥ | p | ¬ϕ | ϕ1 ∧ ϕ2 | Eψ | Aψ

with p ∈ Φ an atomic proposition and ψ a path formula. Path formulas are given through

ψ ::= ϕ | ¬ψ | ψ1 ∧ ψ2 |Xψ | Fψ | Gψ | ψ1Uψ2

with ϕ a state formula. So both state and path formulas are closed under the usual Booleanoperations, each atomic proposition is a state formula, and state formulas are also pathformulas. Path formulas are closed under the operators X,F ,G,U , and the operators A andE convert a path formula to a state formula.

Let W be the set of all states, and assume that V : Φ → P (W ) assigns to each atomicformula the states for which it is valid. We assume also that we are given a transition relationR ⊆ W ×W ; it is sometimes assumed [CGP99] that R is left total, but this is mostly forcomputational purposes, so we will not make this assumption here. Put

S := 〈w1, w2, . . .〉 ∈WN | wi R wi+1 for all i ∈ N

as the set of all infinite R-paths over W . The interpretation of formulas is then defined asfollows:

Page 208: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

196 CHAPTER 2. CATEGORIES

State formulas Let w ∈W , ϕ,ϕ1, ϕ2 be state formulas and ψ be a path formula, then

w |= >holds always

s |= p⇔ w ∈ V (p)

w |= ¬ϕ⇔ w |= ϕ is false

w |= ϕ1 ∧ ϕ2 ⇔ w |= ϕ1 and w |= ϕ2

w |= Eψ ⇔ σ |= ψ for some path σ starting from w

w |= Aψ ⇔ σ |= ψ for all paths σ starting from w

Path formulas Let σ ∈ S be an infinite path with first node σ1, σk is the path with thefirst k nodes deleted; ψ is a path formula, and ϕ a state formula, then

σ |= ϕ⇔ σ1 |= ϕ

σ |= ¬ψ ⇔ σ |= ψ is false

σ |= ψ1 ∧ ψ2 ⇔ σ |= ψ1 and σ |= ψ2

σ |= Xψ ⇔ σ1 |= ψ

σ |= Fψ ⇔ σk |= ψ for some k ≥ 0

σ |= Gψ ⇔ σk |= ψ for all k ≥ 0

σ |= ψ1Uψ2 ⇔ ∃k ≥ 0 : σk |= ψ2 and ∀0 ≤ j < k : σj |= ψ1.

Thus a state formula holds on a path iff it holds on the first node, Xψ holds on path σ iffψ holds on σ with its first node deleted, and ψ1Uψ2 holds on path σ iff ψ2 holds on σk forsome k, and iff ψ1 holds on σi for all i preceding k.

We need to provide interpretations only for conjunction, negation, for A, X, and U . Thisis so since E is the nabla of A, G is the nabla of F , and Fψ is equivalent to (¬⊥)Uψ.Conjunction and negation are easily interpreted, so we have to take care only of the temporaloperators A, X and U .

A coalgebraic interpretation reads as follows. The P-coalgebras together with their morphismsform a category CoAlg. Let (X,R) be a P-coalgebra, then

R(X,R) := (xn)n∈N ∈ X∞ | xn R xn+1 for all n ∈ N

is the object part of a functor, (Rf)((xn)n∈N

):= (f(xn)n∈N) sends each coalgebra morphism

f : (X,R) → (Y, S) to a map (Rf) : R(X,R) → R(Y, S), which maps (xn)n∈N to f(xn)n∈N;recall that x R x′ implies f(x) S f(x′). Thus R : CoAlg → Set is a functor. Note that thetransition structure of the underlying Kripke model is already encoded through functor R.This is reflected in the definition of the dynamics γ : X → R(X,R)× P (Φ) upon setting

γ(x) :=⟨w ∈ R(X,R) | w1 = x, V1(x)〉,

where V1 : X → P (Φ) is defined according to V : Φ→ P (X) as above. Define for the modelM := (W,R, V ) the map λR(W,R) : C 7→ 〈C ′, A〉 ∈ P (R(W,R)) × P (Φ) | C ′ ⊆ C, then λ

Page 209: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.8. BIBLIOGRAPHIC NOTES 197

defines a natural transformation Pop R → Pop F R (the functor F has been defined inExample 2.7.56); note that we have to check naturality in terms of model morphisms, whichare in particular morphisms for the underlying P-coalgebra. Thus we can define for w ∈W

w |=M Aψ ⇔ w ∈(γ−1 λR(W,R)

)([[ψ]]M)

In a similar way we define w |=M p for atomic propositions p ∈ Φ; this is left to the reader.

The interpretation of path formulas requires a slightly different approach. We define

µR(X,R)(A) := σ ∈ R(X,R) | σ1 ∈ A,

ϑR(X,R)(A,B) :=⋃k∈Nσ ∈ R(X,R) | σk ∈ B, σi ∈ A for 0 ≤ i < k,

whenever A,B ∈ R(X,R). Then µ : PopR→ PopR and ϑ : (PopR)×(PopR)→ PopRare natural transformations, and we put

[[Xψ]]M := µR(M,R)([[ψ]]M),

[[ψ1Uψ2]]M := ϑR(X,R)([[ψ1]]M, [[ψ2]]M).

The example shows that a two level logics can be interpreted as well through a coalgebraicapproach, provided the predicate liftings which characterize this approach are complementedby additional natural transformations (which are called bridge operators in [Dob09]). Itindicates also that defining a coalgebraic logic requires first and foremost the definition of afunctor and of natural transformations. Thus a certain overhead comes with it.

,

2.8 Bibliographic Notes

The monograph by Mac Lane [ML97] discusses all the definitions and basic constructions;the text [BW99] takes much of its motivation for categorical constructions from applica-tions in computer science. Monads are introduced following essentially Moggi’s seminal pa-per [Mog91]. The text book [Pum99] is an exposition fine tuned towards students interestedin categories; the proof of Lemma 2.3.24 and the discussion on Yoneda’s construction followsits exposition rather closely. There are many other fine text books on categories available,catering also for the needs of computer scientists, among them [Awo10, Bor94a, Bor94b];giving an exhaustive list is difficult.

The discrete probability functor has been studied extensively in [Sok05], it continuous steptwin in [Gir81, Dob03]. The use of upper closed subsets for the interpretation of game logicis due to Parikh [Par85], [PP03] defines bisimilarity in this context. The coalgebraic inter-pretation is investigated in [Dob10]. Coalgebras are carefully discussed at length in [Rut00],to which the present discussion in Section 2.6 owes much of its structure.

The programming language Haskell is discussed in a growing number of accessible books, apersonal selection includes [OGS09, Lip11]; the present short discussion is taken from [Dob12a].The representation of modal logics draws substantially from [BdRV01], and the discussion oncoalgebraic logic is strongly influenced by [Pat04], by the survey paper [DS11] and the mono-graph [Dob09].

Page 210: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

198 CHAPTER 2. CATEGORIES

2.9 Exercises

Exercise 2.1 The category uGraph has as objects undirected graphs. A morphism f :(G,E)→ (H,F ) is a map f : G→ H such that f(x), f(y) ∈ F whenever x, y ∈ E (hencea morphism respects edges). Show that the laws of a category are satisfied.

Exercise 2.2 A morphism f : a→ b in a category K is a split monomorphism iff it has a leftinverse, i.e. there exists g : b → a such that g f = ida. Similarly, f is a split epimorphismiff it has a right inverse, i.e. there exists g : b→ a such that f g = idb.

1. Show that every split monomorphism is monic and every split epimorphism is epic.

2. Show that a split epimorphism that is monic must be an isomorphism.

3. Show that for a morphism f : a→ b, it holds that:

(i) f is a split monomorphism ⇔ homK(f, x) is surjective for every object x,

(ii) f is a split epimorphism ⇔ homK(x, f) is surjective for every object x,

4. Characterize the split monomorphisms in Set. What can you say about split epimor-phisms in Set?

Exercise 2.3 The category Par of sets and partial functions is defined as follows:

1. Objects are sets.

2. A morphism in homPar(A,B) is a partial function f : A B, i.e. it is a set-theoreticfunction f : car(f) → B from a subset car(f) ⊆ A into B. car(f) is called the carrierof f .

3. The identity idA : A A is the usual identity function with car(idA) = A.

4. For f : A B and g : B C the composition g f is defined as the usual compositiong(f(x)) on the carrier:

car(g f) := x ∈ car(f)| f(x) ∈ car(g).

1. Show that Par is a category and characterize its monomorphisms and epimorphisms.

2. Show that the usual set-theoretic Cartesian product you know is not the categoricalproduct in Par. Characterise binary products in Par.

Exercise 2.4 Define the category Pos of ordered sets and monotone maps. The objects areordered sets (P,≤), morphisms are monotone maps f : (P,≤)→ (Q,v), i.e. maps f : P → Qsuch that x ≤ y implies f(x) v f(y). Composition and identities are inherited from Set.

1. Show that under this definition Pos is a category.

2. Characterize monomorphisms and epimorphisms in Pos.

3. Give an example of an ordered set (P,≤) which is isomorphic (in Pos) to (P,≤)op but(P,≤) 6= (P,≤)op.

An ordered set (P,≤) is called totally ordered if for all x, y ∈ P it holds that x ≤ y or y ≤ x.

Page 211: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.9. EXERCISES 199

Show that if (P,≤) is isomorphic (in Pos) to a totally ordered set (Q,v), then (P,≤) is alsototally ordered. Use this result to give an example of a monotone map f : (P,≤) → (Q,v)that is monic and epic but not an isomorphism.

Exercise 2.5 Given a set X, the set of (finite) strings of elements of X is again denoted byX∗.

1. Show that X∗ forms a monoid under concatenation, the free monoid over X.

2. Given a map f : X → Y , extend it uniquely to a monoid morphism f∗ : X∗ → Y ∗. Inparticular for all x ∈ X, it should hold that f∗(〈x〉) = 〈f(x)〉, where 〈x〉 denotes thestring consisting only of the character x.

3. Under what conditions on X is X∗ a commutative monoid, i.e. has a commutativeoperation?

Exercise 2.6 Let (M, ∗) be a monoid. We define a category M as follows: it has only oneobject ∗, homM (∗, ∗) = M with id∗ as the unit of the monoid, and composition is definedthrough m2 m1 := m2 ∗m1.

1. Show that M indeed forms a category.

2. Characterize the dual category M op. When are M and M op equal?

3. Characterize monomorphisms, epimorphisms and isomorphisms for finite M . (Whathappens in the infinite case?)

Exercise 2.7 Let (S,A) and (T,B) be measurable spaces, and assume that the σ-algebra Bis generated by B0. Show that a map f : S → T is A-B-measurable iff f−1

[B0

]∈ A for all

B0 ∈ B0.

Exercise 2.8 Let (S,A) and (T,B) be measurable spaces and f : S → T be A-B-measurable.Define f∗(µ)(B) := µ(f−1

[B]) for µ ∈ P (S,A) , B ∈ B, then f∗ : P (S,A) → P (T,B). Show

that f∗ is ℘℘℘(A)-℘℘℘(B)-measurable. Hint: Use Exercise 2.7.

Exercise 2.9 Let S be a countable sets with p : S → [0, 1] as a discrete probability distribu-tion, thus

∑s∈S p(s) = 1; denote the corresponding probability measure on P (S) by µp, hence

µp(A) =∑

s∈A p(s). Let T be an at most countable set with a discrete probability distribu-tion q. Show that a map f : S → T is a morphism for the probability spaces (S,P (S) , µp)and (T,P (T ) , µq) iff q(t) =

∑f(s)=t p(s) holds for all t ∈ T .

Exercise 2.10 Show that x ∈ [0, 1] | 〈x, x〉 ∈ E ∈ B([0, 1]), whenever E ∈ B([0, 1]) ⊗B([0, 1]).

Exercise 2.11 Let’s chase some objects through diagrams. Consider the following diagramin a category K:

af //

k

bg //

`

c

m

x r

// y s// z

Page 212: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

200 CHAPTER 2. CATEGORIES

1. Show that if the left inner and right inner diagrams commute, then the outer diagramcommutes as well.

2. Show that if the outer and right inner diagrams commute and s is a monomorphism,then the left inner diagram commutes as well.

3. Give examples in Set such that:

(a) the outer and left inner diagrams commute, but not the right inner diagram,

(b) the outer and right inner diagrams commute, but not the left inner diagram.

Exercise 2.12 Give an example of a product in a category K such that one of the projectionsis not epic.

Exercise 2.13 What can you say about products and sums in the category M given by afinite monoid (M, ∗), as defined in Exercise 2.5? (Consider the case that (M, ∗) is commutativefirst.)

Exercise 2.14 Show that the product topology has this universal property: f : (D,D) →(S × T,G × H) is continuous iff πS f : (D,D) → (S,G) and πT f : (D,D) → (T,H) arecontinuous. Formulate and prove the corresponding property for morphisms in Meas.

Exercise 2.15 A collection of morphisms fi : a→ bii∈I with the same domain in categoryK is called jointly monic whenever the following holds: If g1 : x → a and g2 : x → a aremorphisms such that fi g1 = fi g2 for all i ∈ I, then g1 = g2. Dually one defines a collectionof morphisms to be jointly epic.

Show that the projections from a categorical product are jointly monic and the injections intoa categorical sum are jointly epic.

Exercise 2.16 Assume the following diagram in a category K commutes:

af //

k

bg //

`

c

m

x r

// y s// z

Prove or disprove: if the outer diagram is a pullback, one of the inner diagrams is a pullbackas well. Which inner diagram has to be a pullback for the outer one to be also a pullback?

Exercise 2.17 Suppose f, g : a → b are morphisms in a category C. An equalizer of f andg is a morphism e : x→ a such that f e = g e, and whenever h : y → a is a morphism withf h = g h, then there exists a unique j : y → x such that h = e j.

This is the diagram:

xe // a

f //g

// b

y

j !

OO

h

88

1. Show that equalizers are uniquely determined up to isomorphism.

2. Show that the morphism e : x→ a is a monomorphism.

Page 213: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.9. EXERCISES 201

3. Show that a category has pullbacks if it has products and equalizers.

Exercise 2.18 A terminal object in category K is an object such that for every object athere exists a unique morphism ! : a→ .

1. Show that terminal objects are uniquely determined up to isomorphism.

2. Show that a category has (binary) products and equalizers if it has pullbacks and aterminal object.

Exercise 2.19 Show that the coproduct σ-algebra has this universal property: f : (S +T,A+B)→ (R,X ) is A+B-X -measurable iff f iS and f iT are A-X - resp.B-X -measurable.Formulate and prove the corresponding property for morphisms in Top.

Exercise 2.20 Assume that in category K any two elements have a product. Show thata× (b× c) and (a× b)× c are isomorphic.

Exercise 2.21 Prove Lemma 2.2.22.

Exercise 2.22 Assume that the coproducts a + a′ and b + b′ exist in category K. Givenmorphisms f : a→ b and f ′ : a′ → b′, show that there exists a unique morphism q : a+ a′ →b+ b′ such that this diagram commutes:

a

f

a+ a′iaoo ia′ //

q

a′

f ′

b b+ b′

iboo

ib′// b′

Exercise 2.23 Show that the category Prob has no coproducts (Hint: Considering (S, C) +(T,D), show that, e.g., i−1

S

[iS[A]]

equals A for A ⊆ S).

Exercise 2.24 Identify the product of two objects in the category Rel of relations.

Exercise 2.25 We investigate the epi-mono factorization in the category Meas of measur-able spaces. Fix two measurable spaces (S,A) and (T,B) and a morphism f : (S,A)→ (T,B).

1. Let A/ker (f) be the largest σ-algebra X on S/ker (f) rendering the factor map ηker(f) :

S → S/ker (f) A-X -measurable. Show that A/ker (f) = C ⊆ S/ker (f) | η−1ker(f)

[C]∈

A, and show that A/ker (f) has this universal property: given a measurable space(Z, C), a map g : S/ker (f) → Z is A/ker (f)-C measurable iff g ηker(f) : S → Z isA-C-measurable.

2. Show that ηker(f) is an epimorphism in Meas, and that f• : [x]ker(f) 7→ f(x) is amonomorphism in Meas.

3. Let f = m e with an epimorphism e : (S,A) → (Z, C) and a monomorphism m :(Z, C) → (T,B), and define b : S/ker (f) → Z through [s]ker(f) 7→ e(s), see Corol-lary 2.1.27. Show that b is A/ker (f)-C-measurable, and prove or disprove measurabilityof b−1.

Exercise 2.26 Let AbGroup be the category of Abelian groups. Its objects are commuta-tive groups, a morphism ϕ : (G,+)→ (H, ∗) is a map ϕ : G→ H with ϕ(a+ b) = ϕ(a) ∗ϕ(b)

Page 214: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

202 CHAPTER 2. CATEGORIES

and ϕ(−a) = −ϕ(a). Each subgroup V of an Abelian group (G, ∗) defines an equivalencerelation ρV through a ρV b iff a − b ∈ V . Characterize the pushout of ηρV and ηρW forsubgroups V and W in AbGroup.

Exercise 2.27 Given a set X, define F (X) := X×X, for a map f : X → Y , F (f)(x1, x2) :=〈f(x1), f(x2)〉 is defined. Show that F is an endofunctor on Set.

Exercise 2.28 Fix a set A of labels; define F (X) := ∗∪A×X for the set X, if f : X → Yis a map, put F (f)(∗) := ∗ and F (f)(a, x) := 〈a, f(x)〉. Show that F : Set→ Set defines anendofunctor.

This endofunctor models termination or labeled output.

Exercise 2.29 Fix a set A of labels, and put for the set X

F (X) := Pf (A×X),

where Pf denotes all finite subsets of its argument. Thus G ⊆ F (X) is a finite subset ofA×X, which models finite branching, with 〈a, x〉 ∈ G as one of the possible branches, whichis in this case labeled by a ∈ A. Define

F (f)(B) := 〈a, f(x)〉 | 〈a, x〉 ∈ B

for the map f : X → Y and B ⊆ A×X. Show that F : Set→ Set is an endofunctor.

Exercise 2.30 Show that the limit cone for a functor F : K → L is unique up to isomor-phisms, provided it exists.

Exercise 2.31 Let I 6= ∅ be an arbitrary index set, and let K be the discrete category overI. Given a family (Xi)i∈I , define F : I → Set by F i := Xi. Show that

X :=∏i∈I

Xi := x : I →⋃i∈I

Xi | x(i) ∈ Xi for all i ∈ I

with πi : x 7→ x(i) is a limit (X, (πi)i∈I) of F .

Exercise 2.32 Formulate the equalizer of two morphisms (cp. Exercise 2.17) as a limit.

Exercise 2.33 Define for the set X the free monoid X∗ generated by X through

X∗ := 〈x1, . . . , xk〉 | xi ∈ X, k ≥ 0

with juxtaposition as multiplication, i.e., 〈x1, . . . , xk〉 ∗ 〈x′1, . . . , x′r〉 := 〈x1, . . . , xk, x′1, . . . , x

′r〉;

the neutral element ε is 〈x1, . . . , xk〉 with k = 0. Define

f∗(x1 ∗ . . . ∗ xk) := f(x1) ∗ . . . ∗ f(xk)

ηX(x) := 〈x〉

for the map f : X → Y ∗ and x ∈ X. Put FX := X∗. Show that (F , η,−∗) is a Kleisli tripel,and compare it with the list monad, see page 129. Compute µX for this monad.

Page 215: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.9. EXERCISES 203

Exercise 2.34 Given are the systems S and T .

s1

~~s2

s3

s4

~~

ss

s5yy

s6

``

t1

t2

22

t3

t4

t5

??**

t6// t7oo

1. Consider the transition systems S and T as coalgebras for a suitable functor F : Set→Set, X 7→ P(X). Determine the dynamics of the respective coalgebras.

2. Show that there is no coalgebra morphism S → T .

3. Construct a coalgebra morphism T → S.

4. Construct a bisimulation between S and T as a coalgebra on the carrier

〈s2, t3〉, 〈s2, t4〉, 〈s4, t2〉, 〈s5, t6〉, 〈s5, t7〉, 〈s6, t5〉.

Exercise 2.35 Characterize this nondeterministic transition system S as a coalgebra for asuitable functor F : Set→ Set.

s2

s7// s9

// s12oo

s0// s1

//

>>

s3 s5// s6

>>

s4

>>

s8//

!!

s10// s13

s11// s14

Show that

α := 〈si, si〉| 0 ≤ i ≤ 12 ∪ 〈s2, s4〉, 〈s4, s2〉, 〈s9, s12〉, 〈s12, s9〉, 〈s13, s14〉, 〈s14, s13〉

is a bisimulation equivalence on S. Simplify S by giving a coalgebraic characterization of thefactor system S/α. Furthermore, determine whether α is the largest bisimulation equivalenceon S.

Exercise 2.36 The deterministic finite automata A1, A2 with input and output alphabet

Page 216: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

204 CHAPTER 2. CATEGORIES

0, 1 and the following transition tables are given:

A state input output next state A state input output next state

s0 0 0 s1 s′0 0 0 s′0s0 1 1 s0 s′0 1 1 s′1s1 0 0 s2 s′1 0 0 s′0s1 1 1 s3 s′1 1 1 s′2s2 0 1 s4 s′2 0 1 s′3s2 1 0 s2 s′2 1 0 s′2s3 0 0 s1 s′3 0 1 s′4s3 1 1 s3 s′3 1 0 s′2s4 0 1 s3 s′4 0 0 s′5s4 1 0 s2 s′4 1 1 s′4

s′5 0 0 s′2s′5 1 1 s′4

1. Formalize the automata as coalgebras for a suitable functor F : Set → Set, F (X) =(X ×O)I . (You have to choose I and O first.)

2. Construct a coalgebra morphism from A1 to A2 and use this to find a bisimulation Rbetween A1 and A2. Describe the dynamics of R coalgebraically.

Exercise 2.37 Let P be an effectivity function on X, and define ∂P (A) := X \ P (X \ A).Show that ∂P defines an effectivity function on X. Given an effectivity function Q on Y anda morphism f : P → Q, show that f : ∂P → ∂Q is a morphism as well.

Exercise 2.38 Show that the power set functor P : Set→ Set does not preserve pullbacks.(Hint: You can use the fact, that in Set the pullback of the left diagram is explicitly givenas P := 〈x, y〉 | f(x) = g(y) with πX and πY being the usual projections.)

Exercise 2.39 Suppose F ,G : Set→ Set are functors.

1. Show that if F and G both preserve weak pullbacks, then also the product functorF × G : Set → Set, defined as (F × G)(X) = F (X) × G(X) and (F × G)(f) =F (f)×G(f), preserves weak pullbacks.

2. Generalize to arbitrary products, i.e show the following: If I is a set and for every i ∈ I,Fi : Set → Set is a functor preserving weak pullbacks, then also the product functor∏i∈I Fi : Set→ Set preserves pullbacks.

Use this to show that the exponential functor (−)A : Set→ Set, given by X 7→ XA =∏a∈AX and f 7→ fA =

∏a∈A f preserves weak pullbacks.

3. Show that if F preserves weak pullbacks and there exist natural transformations η :F → G and ν : G→ F , then also G preserves weak pullbacks.

4. Show that if both F and G preserve weak pullbacks, then also F + G : Set → Set,defined as X 7→ F (X) + G(X) and f 7→ F (f) + G(f), preserves weak pullbacks.(Hint: Show first that for every morphism f : X → A + B, one has a decompositionX ∼= XA +XB and fA : XA → A, fB : XB → B such that f ∼= (fA iA) + (fB iB).)

Exercise 2.40 Consider the modal similarity type τ = (O, ρ), with O := 〈a〉, 〈b〉 andρ(〈a〉) = ρ(〈b〉) = 1, over the propositional letters p, q. Let furthermore [a], [b] denote thenablas of 〈a〉 and 〈b〉.

Page 217: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

2.9. EXERCISES 205

Show that the following formula is a tautology, i.e. it holds in every possible τ -model:

(〈a〉p ∨ 〈a〉q ∨ [b](¬p ∨ q))→ (〈a〉(p ∨ q) ∨ ¬[b]p ∨ [b]q)

A frame morphism between frames (X, (R〈a〉, R〈b〉)) and (Y, (S〈a〉, S〈b〉)) is given for this modalsimilarity type by a map f : X → Y which satisfies the following properties:

• If 〈x, x1〉 ∈ R〈a〉, then 〈f(x), f(x1)〉 ∈ S〈a〉. Moreover, if 〈f(x), y1〉 ∈ S〈a〉, then thereexists x1 ∈ X with 〈x, x1〉 ∈ R〈a〉 and y1 = f(x1).

• If 〈x, x1〉 ∈ R〈b〉, then 〈f(x), f(x1)〉 ∈ S〈b〉. Moreover, if 〈f(x), y1〉 ∈ S〈b〉, then thereexists x1 ∈ X with 〈x, x1〉 ∈ R〈b〉 and y1 = f(x1).

Give a coalgebraic definition of frame morphisms for this modal similarity type, i.e. find afunctor F : Set→ Set such that frame morphisms correspond to F -coalgebra morphisms.

Exercise 2.41 Consider the fragment of PDL defined mutually recursive by:

Formulas ϕ ::= p | ϕ1 ∧ ϕ2 | ¬ϕ1 | 〈π〉ϕ (where p ∈ Φ for a set of basic propositions Φ, andπ is a program).

Programs π ::= t | π1;π2 | ϕ? (where t ∈ Bas for a set of basic programs Bas and ϕ is aformula).

Suppose you are given the set of basic programs Bas := init, run,print and basic proposi-tions Φ := is init, did print.

We define a model M for this language as follows:

• The basic set of M is X := −1, 0, 1.

• The modal formulas for basic programs are interpreted by the relations

Rinit := 〈−1, 0〉, 〈0, 0〉, 〈1, 1〉,Rrun := 〈−1,−1〉, 〈0, 0〉, 〈1, 1〉,Rprint := 〈−1,−1〉, 〈0, 1〉, 〈1, 1〉.

• The modal formulas for composite programs are defined by Rπ1;π2 := Rπ1 Rπ2 andRϕ? := 〈x, x〉|M, x |= ϕ, as usual.

• The valuation function is given by V (is init) := 0, 1 and V (did print) := 1.

Show the following:

1. M,−1 2 〈run; print〉did print,

2. M, x |= 〈init; run; print〉did print (for all x ∈ X),

3. M, x 2 〈(¬is init)?; print〉did print (for all x ∈ X).

Informally speaking, the model above allows one to determine whether a program composedof initialization (init), doing some kind of work (run), and printing (print) is initialized or hasprinted something.

Page 218: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

206 CHAPTER 2. CATEGORIES

Suppose we want to modify the logic by counting how often we have printed, i.e. we extendthe set of basic propositional letters by did printn| n ∈ N. Give an appropriate model forthe new logic.

Page 219: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Chapter 3

Topological Spaces

A topology formalizes the notion of an open set; call a set open iff each of its members leavesa little room like a breathing space around it. This gives immediately a hint at the struc-ture of the collection of open sets — they should be closed under finite intersections, butunder arbitrary unions, yielding the base for a calculus of observable properties, as outlinedin [Smy92, Chapter 1] or in [Vic89]. This development makes use of properties of topologicalspaces, but puts its emphasis subtly away from the classic approach, e.g., in mathematicalanalysis or probability theory, by stressing different properties of a space. The traditionalapproach, for example, stresses separation properties like being able to separate two distinctpoints through an open set. Such a strong emphasis is not necessarily observed in the com-putationally oriented use of topologies, where for example pseudometrics for measuring theconceptual distance between objects are important, when it comes to find an approximationbetween Markov transition systems.

We give in this chapter a brief introduction to some of the main properties of topologicalspaces, given that we have touched upon topologies already in the context of the Axiom ofChoice in Section 1.5.8. The objective is to provide the tools and methods offered by set-theoretic topology to an application oriented reader. Thus we introduce the very basic notionsof topology, and hint at applications of these tools. Some connections to logic and set theoryare indicated, but as Moschovakis writes “General (pointset) topology is to set theory likeparsley to Greek food: some of it gets in almost every dish, but there are no ’parsley recipes’that the good Greek cook needs to know.” [Mos06, 6.27, p. 79]. In this metaphor, we studythe parsley here, so that it can get into the dishes which require it.

Basic notions of a topology and its construction, including bases and subbases. Since com-pactness has been made available very early, compact spaces serve occasionally as an exerciseground. Continuity is an important topic in this context, and the basic constructions likeproduct or quotients which are enabled by it. Since some interesting and important topolog-ical constructions are tied to filters, we study filters and convergence, comparing in examplesthe sometimes more easily handled nets to the occasionally more cumbersome filters, which,however, offer some conceptual advantages. Talking about convergence, separation propertiessuggest themselves; they are studied in detail, providing some classic results like Urysohn’sTheorem. It happens so often that one works with a powerful concept, but that this conceptrequires assumptions which are too strong, hence one has to weaken it in a sensible way. This

207

Page 220: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

208 CHAPTER 3. TOPOLOGICAL SPACES

is demonstrated in the transition from compactness to local compactness; we discuss localcompact spaces, and we give an example of a compactification. Quantitative aspects enterwhen one measures openness through a pseudometric; here many concepts are seen in a new,sharper light, in particular the problem of completeness comes up — you have a sequencethe elements of which are eventually very close to each other, and you want to be sure that alimit exists. This is possible on complete spaces, and, even better, if a space is not complete,then you can complete it. Complete spaces have some very special properties, for examplethe intersection of countably many open dense sets is dense again. This is Baire’s Theorem.We show through a Banach-Mazur game played on a topological space that being of firstcategory can be determined through Demon having a winning strategy.

This completes the round trip of basic properties of topological spaces. We then present asmall gallery in which topology is in action. The reason for singling out some topics is thatwe want to demonstrate the techniques developed with topological spaces for some interestingapplications. For example, Godel’s Completeness Theorem for (countable) first order logichas been proved by Rasiowa and Sikorski through a combination of Baire’s Theorem andStone’s topological representation of Boolean algebras. This topic is discussed. The calculusof observations, which is mentioned above, leads to the notion of topological systems, asdemonstrated by Vickers. This hints at an interplay of topology and order, since a topologyis after all a complete Heyting algebra. Another important topic is the approximation ofcontinuous functions by a given class of functions, like the polynomials on an interval, leadingquickly to the Stone-Weierstraß Theorem on a compact topological space, a topic with arich history. Finally, the relationship of pseudometric spaces to general topological spaces isreflected again, we introduce uniform spaces as an interesting class of spaces which is moregeneral than pseudometric spaces, but less general than their topological cousins. Here wefind concepts like completeness or uniform continuity, which are formulated for metric spaces,but which cannot be realized in general topological ones. This gallery could be extended, forexample, Polish spaces could be discussed here with considerable relish, but it seemed to bemore adequate to discuss these spaces in the context of their measure theoretic use.

We assume throughout that the Axiom of Choice is valid.

3.1 Defining Topologies

Recall from Section 1.5.8 that a topology τ on a carrier set X is a collection of subsets whichcontains both ∅ and X, and which is closed under finite intersections and arbitrary unions.The elements of τ are called the open sets. Usually a topology is not written down as oneset, but it is specified what an open set looks like. This is done through a base or a subbase.Recall that a base β for τ is a set of subsets of τ such that for any x ∈ G there exists B ∈ β

Base, sub-base

with x ∈ B ⊆ G. A subbase is a family of sets for which the finite intersections form abase.

Not every family of subsets qualifies as a subbase or a base. Kelley [Kel55, p. 47] gives thefollowing example: Put X := 0, 1, 2, A := 0, 1 and B := 1, 2, then β := X,A,B, ∅cannot be the base for a topology. Assume it is, then the topology must be β itself, butA ∩B 6∈ β. We have the following characterization of a base.

Page 221: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.1. DEFINING TOPOLOGIES 209

Proposition 3.1.1 A family β of sets is the base for a topology on X =⋃β iff given U, V ∈ β

and x ∈ U ∩ V , there exists W ∈ β with x ∈W ⊆ U ∩ V , and if X.

We have to be a bit careful in view of Kelley’s example. Let us have a look at the proof.

Proof Checking the properties for a base shows that the condition is certainly necessary.Suppose that the condition holds, and define

τ := ⋃β0 | β0 ⊆ β.

Then ∅, X ∈ τ , and τ is closed under arbitrary unions, so that we have to check whether τis closed under finite intersections. In fact, let x ∈ U ∩ V with U, V ∈ τ , then we can findU0, V0 ∈ β with x ∈ U0∩V0. By assumption there exists W ∈ β with x ∈W ⊆ U0∩V0 ⊆ U∩V ,so that U ∩ V can be written as union of elements in β. a

We perceive a base and a subbase, resp., relative to a topology, but it is usually clear whatthe topology looks like, once a basis is given. Let us have a look at some examples to clarifythings.

Example 3.1.2 Consider the real numbers R with the Euclidean topology τ . We say that aset G is open iff given x ∈ G, there exists an open interval ]a, b[ with x ∈]a, b[ ⊆ G. Hencethe set

]a, b[ | a, b ∈ R, a < b

forms a base for τ ; actually, we could have chosen a and b as

rational numbers, so that we have even a countable base for τ . Note that although we canfind a closed interval [v, w] such that x ∈ [v, w] ⊆ ]a, b[ ⊆ G, we could not have used theclosed intervals for a description of τ , since otherwise the singleton sets x = [x, x] would beopen as well. This is both undesirable and counter intuitive: in an open set we expect thateach element has some breathing space around it. ,

The next example looks at higher dimensional Euclidean spaces; here we do not have in-tervals directly at our disposal, but we can measure distances as well, which is a suitablegeneralization, given that the interval ]x− r, x+ r[ equals y ∈ R | |x− y| < r.

Example 3.1.3 Consider the three dimensional space R3, and define for x, y ∈ R3 theirdistance

d(x, y) :=3∑i=1

|xi − yi|.

Call G ⊆ R3 open iff given x ∈ G, there exists r > 0 such that y ∈ R3 | d(x, y) < r ⊆ G.Then it is clear that the set of all open sets form a topology:

• Both the empty set and R3 are open.

• The union of an arbitrary collection of open sets is open again.

• Let G1, . . . , Gk be open, and x ∈ G1∩. . .∩Gk. Take an index i; since x ∈ Gi, there existsri > 0 such that K(d, x, r) := y ∈ R3 | d(x, y) < ri ⊆ Gi. Let r := minr1, . . . , rk,then

y ∈ R3 | d(x, y) < r =

k⋂i=1

y ∈ R3 | d(x, y) < ri ⊆k⋂i=1

Gi.

Hence the intersection of a finite number of open sets is open again.

Page 222: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

210 CHAPTER 3. TOPOLOGICAL SPACES

This argument would not work with a countable number of open sets, by the way.

We could have used other measures for the distance, e.g.,

d′(x, y) :=

√∑i

|xi − yi|2,

d′′(x, y) := max1≤i≤3

|xi − yi|.

Then it is not difficult to see that all three describe the same collection of open sets. This isso because we can find for x and r > 0 some r′ > 0 and r′′ > 0 with K(d′, x, r′) ⊆ K(d, x, r)and K(d′′, x, r′′) ⊆ K(d, x, r), similarly for the other combinations.

It is noted that 3 is not a magical number here, we can safely replace it with any positive n,indicating an arbitrary finite dimension. Hence we have shown that Rn is a topological spacein the Euclidean topology for each n ∈ N. ,

The next example uses also some notion of distance between two elements, which are giventhrough evaluating real valued functions. Think of f(x) as the numerical value of attributef for object x, then |f(x) − f(y)| indicates how far apart x and y are with respect to theirattribute values.

Example 3.1.4 Let X be an arbitrary non-empty set, and E be a non-empty collections offunctions f : X → R. Define for the finite collection F ⊆ E , for r > 0, and for x ∈ X thebase set

WF ;r(x) := y ∈ X | |f(x)− f(y)| < r for all f ∈ F.

We define as a base β := WF ;r(x) | x ∈ X, r > 0,F ⊆ G finite, and hence call G ⊆ X openiff given x ∈ G, there exists F ⊆ E finite and r > 0 such that WF ;r(x) ⊆ G.

It is immediate that the finite intersection of open sets is open again. Since the other propertiesare checked easily as well, we have defined a topology, which is sometimes called the weak

Weaktopology

topology on X induced by E .

It is clear that in the last example the argument would not work if we restrict ourselves toelements of G for defining the base, i.e., to sets of the form Wg;r. These sets, however, have

the property that they form a subbase, since finite intersections of these sets form a base. ,

The next example shows that a topology may be defined on the set of all partial functions fromsome set to another one. In contrast to the previous example, we do without any numericalevaluations.

Example 3.1.5 Let A and B be non-empty sets, defineA B

A B := f ⊆ A×B | f is a partial map.

A set G ⊆ A B is called open iff given f ∈ G there exists a finite f0 ∈ A B such that

f ∈ N(f0) := g ∈ A B | f0 ⊆ g ⊆ G.

Thus we can find for f a finite partial map f0 which is extended by f , such that all extensionsto f0 are contained in G.

Page 223: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.1. DEFINING TOPOLOGIES 211

Then this is in fact a topology. The collection of open sets is certainly closed under arbitraryunions, and both the empty set and the whole set A B are open. Let G1, . . . , Gn be open,and f ∈ G := G1∩ . . .∩Gn, then we can find finite partial maps f1, . . . , fn which are extendedby f such that N(fi) ⊆ Gi for 1 ≤ i ≤ n. Since f extends all these maps, f0 := f1 ∪ . . . ∪ fnis a well defined finite partial map which is extended by f , and

f ∈ N(f0) = N(f1) ∩ . . . ∪N(fn) ⊆ G.

Hence the finite intersection of open sets is open again.

A base for this topology is the set N(f) | f is finite, a subbase is the setN(〈a, b〉) | a ∈

A, b ∈ B

,

The next example deals with a topology which is induced by an order structure. Recall fromSection 1.5 that a chain in a partially ordered set is a non-empty totally ordered subset, andthat in an inductively ordered set each chain has an upper bound.

Example 3.1.6 Let (P,≤) be a inductively ordered set. Call G ⊆ P Scott open iff

1. G is upper closed (hence x ∈ G and x ≤ y imply y ∈ G).

2. If S ⊆ P is a chain with supS ∈ G, then S ∩G 6= ∅.

Again, this defines a topology on P . In fact, it is enough to show that G1 ∩G2 is open, if G1

and G2 are. Let S be a chain with supS ∈ G1 ∩G2, then we find si ∈ S with si ∈ Gi. SinceS is a chain, we may and do assume that s1 ≤ s2, hence s2 ∈ G1, because G1 is upper closed.Thus s2 ∈ S ∩ (G1 ∩G2). Because G1 and G2 are upper closed, so is G1 ∩G2.

As an illustration, we show that the set F := x ∈ P | x ≤ t is Scott closed for each t ∈ P .Put G := P \ F . Let x ∈ G, and x ≤ y, then obviously y 6∈ F , so y ∈ G. If S is a chain withsupS ∈ G, then there exists s ∈ S such that s 6∈ F , hence S ∩G 6= ∅. ,

3.1.1 Continuous Functions

A continuous map between topological spaces is compatible with the topological structure.This is familiar from real functions, but we cannot copy the definition, since we have no meansof measuring the distance between points in a topological space. All we have is the notion ofan open set. So the basic idea is to say that given an open neighborhood U of the image, wewant to be able to find an open neighborhood V of the inverse image so that all element ofV are mapped to U . This is a direct translation of the familiar ε-δ-definition from calculus.Since we are concerned with continuity as a global concept (as opposed to one which focusseson a given point), we arrive at this definition, and show in the subsequent example that it isreally a faithful translation.

Definition 3.1.7 Let (X, τ) and (Y, ϑ) be topological spaces. A map f : X → Y is calledτ -ϑ-continuous iff f−1

[H]∈ τ for all H ∈ ϑ holds, we write this also as f : (X, τ)→ (Y, ϑ).

If the context is clear, we omit the reference to the topologies. Hence we say that the inverseimage of an open set under a continuous map is an open set again, see Example 2.1.11.

Let us have a look at real functions.

Page 224: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

212 CHAPTER 3. TOPOLOGICAL SPACES

Example 3.1.8 Endow the reals with the Euclidean topology, and let f : R→ R be a map.Then the definition of continuity given above coincides with the usual ε-δ-definition.

1. Assuming the ε-δ-definition, we show that the inverse image of an open set is open. Inε-δ?fact, let G ⊆ R be open, and pick x ∈ f−1

[G]. Since f(x) ∈ G, we can find ε > 0 such

that ]f(x) − ε, f(x) + ε[ ⊆ G. Pick δ > 0 for this ε, hence x′ ∈ ]x − δ, x + δ[ impliesf(x′) ∈ ]f(x)− ε, f(x) + ε[ ⊆ G. Thus x ∈ ]x− δ, x+ δ[ ⊆ f−1

[G].

2. Assuming that the inverse image of an open set is open, we establish the ε-δ-definition.Given x ∈ R, let ε > 0 be arbitrary, we show that there exists δ > 0 such that |x − x′| < δimplies |f(x) − f(x′)| < ε. Now ]f(x) − ε, f(x′) + ε[ is an open set hence H := f−1

[]f(x) −

ε, f(x′) + ε[]

is open by assumption, and x ∈ H, Select δ > 0 with ]x − δ, x + δ[ ⊆ H, then

|x− x′| < δ implies x′ ∈ H, hence f(x′) ∈ ]f(x)− ε, f(x′) + ε[. ,

Thus we work on familiar ground, when it comes to the reals. Continuity may be tested ona subbase:

Lemma 3.1.9 Let (X, τ) and (Y, ϑ) be topological spaces, f : X → Y be a map. Then f isτ -ϑ-continuous iff f−1

[S]∈ τ for each S ∈ σ with σ ⊆ ϑ a subbase.

Proof Clearly, the inverse image of a subbase element is open, whenever f is continuous.Assume, conversely, that the f−1

[S]∈ τ for each S ∈ σ. Then f−1

[B]∈ τ for each

element B of the base β generated from σ, because B is the intersection of a finite numberof subbase elements. Now, finally, if H ∈ ϑ, then H =

⋃B | B ∈ β,B ⊆ H, so that

f−1[H]

=⋃f−1

[B]| B ∈ β,B ⊆ H ∈ τ . Thus the inverse image of an open set is open.

a

Example 3.1.10 Take the topology from Example 3.1.5 on the space A B of all partialmaps. A map q : (A B) → (C D) is continuous in this topology iff the followingcondition holds: whenever q(f)(c) = d, then there exists f0 ⊆ f finite such that q(f0)(c) = d.

In fact, let q be continuous, and q(f)(c) = d, then G := q−1[N(〈c, d〉)

]is open and contains

f , thus there exists f0 ⊆ f with f ∈ N(f0) ⊆ G, in particular q(f0)(c) = d. Conversely,assume that H ⊆ C D is open, and we want to show that G := q−1

[H]⊆ A B is

open. Let f ∈ G, thus q(f) ∈ H, hence there exists g0 ⊆ q(f) finite with q(f) ∈ N(g0) ⊆ H.g0 is finite, say g0 = 〈c1, d1〉, . . . , 〈cn, dn〉. By assumption there exists f0 ∈ A B withq(f0)(ci) = di for 1 ≤ i ≤ n, then f ∈ N(f0) ⊆ G, so that the latter set is open. ,

This is an easy criterion for continuity with respect to the Scott topology.

Example 3.1.11 Let (P,≤) and (Q,≤) be inductively ordered sets, then f : P → Q is Scottcontinuous (i.e., continuous, when both ordered sets carry their respective Scott topology) ifff is monotone, and if f(supS) = sup f

[S]

holds for every chain S.

Assume that f is Scott continuous. If x ≤ x′, then every open set which contains x alsocontains x′, so if x ∈ f−1

[H]

then x′ ∈ f−1[H]

for every Scott open H ⊆ Q; thus f ismonotone. If S ⊆ P is a chain, then supS exists in P , and f(s) ≤ f(supS) for all s ∈ S,so that sup f

[S]≤ f(supS). For the other inequality, assume that f(supS) 6≤ sup f

[S]. We

note that G := f−1[q ∈ Q | q 6≤ sup f

[S]]

is open with supS ∈ G, hence there exists s ∈ Swith s ∈ G. But this is impossible. On the other hand, assume that H ⊆ Q is Scott open,we want to show that G := f−1

[H]⊆ P is Scott open. G is upper closed, since x ∈ G and

Page 225: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.1. DEFINING TOPOLOGIES 213

x ≤ x′ implies f(x) ∈ H and f(x) ≤ f(x′), thus f(x′) ∈ H, so that x′ ∈ G. Let S ⊆ P be achain with supS ∈ G, hence f(supS) ∈ H. Since f

[S]

is a chain, and f(supS) = sup f[S],

we infer that there exists s ∈ S with f(s) ∈ H, hence there is s ∈ S with s ∈ G. Thus G isScott open in P , and f is Scott continuous. ,

The interpretation of modal logics in a topological space is interesting, when we interpretthe transition which is associated with the diamond operator through a continuous map.Thus the next step of a transition is uniquely determined, and it depends continuously on itsargument.

Example 3.1.12 The syntax of our modal logics is given through

ϕ ::= > | p | ϕ1 ∨ ϕ2 | ϕ1 ∧ ϕ2 | ¬ϕ | 3ϕ

with p ∈ Φ an atomic proposition. The logic has the usual operators, viz., disjunction andnegation, and 3 as the modal operator.

For interpreting the logic, we take a topological state space (S, τ) and a continuous mapf : X → X, and we associate with each atomic proposition p an open set Vp as the set of allstates in which p is true. We want the validity set [[ϕ]] of all those states in which formula ϕholds to be open, and define inductively the validity of a formula in a state in the followingway.

[[>]] := S

[[p]] := Vp, if p is atomic

[[ϕ1 ∨ ϕ2]] := [[ϕ1]] ∪ [[ϕ2]]

[[ϕ1 ∧ ϕ2]] := [[ϕ1]] ∩ [[ϕ2]]

[[¬ϕ]] := (S \ [[ϕ]])o

[[3ϕ]] := f−1[[[ϕ]]]

All definitions but the last two are self explanatory. The interpretation of [[3γ]] throughf−1

[[[ϕ]]]

suggests itself when considering the graph of f in the usual interpretation of thediamond in modal logics, see Definition 2.7.15.

Since we want [[¬ϕ]] be open, we cannot take the complement of [[ϕ]] and declare it as thevalidity set for ϕ, because the complement of an open set is not necessarily open. Instead,we take the largest open set which is contained in S \ [[ϕ]] (this is the best we can do), andassign it to ¬ϕ. One shows easily through induction on the structure of formula ϕ that [[ϕ]]is an open set.

But now look at this. Assume that X := R in the usual topology, Vp = [[p]] =]0,+∞[, then[[¬p]] = ]−∞, 0]o =] −∞, 0[, thus [[p ∨ ¬p]] = ¸R \ 0 6= [[>]]. Thus the law of the excludedmiddle does not hold in this model. ,

Returning to the general discussion, the following fundamental property is immediate; westate it here as a proposition for the record, see Example 2.1.11.

Proposition 3.1.13 The identity (X, τ) → (X, τ) is continuous, and continuous maps areclosed under composition. Consequently, topological spaces with continuous maps form acategory. a

Page 226: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

214 CHAPTER 3. TOPOLOGICAL SPACES

Continuous maps can be used to define topologies.

Definition 3.1.14 Given a family F of maps f : A → Xf , where (Xf , τf ) is a topologicalspace for each f ∈ F , the initial topology τin,F on A with respect to F is the smallest topologyon A so that f is τin,F -τf -continuous for every f ∈ F . Dually, given a family G of mapsg : Xg → Z, where (Xg, τg) is a topological space for each g ∈ G, the final topology τfi,G onZ is the largest topology on Z so that g is τ -τfi,G-continuous for every g ∈ G.

In the case of the initial topology for just one map f : A → Xf , note that P (A) is atopology which renders f continuous, so there exists in fact a smallest topology on A withthe desired property; because f−1

[G]| G ∈ τf is a topology that satisfies the requirement,

and because each such topology must contain it, this is in fact the smallest one. If we have afamily F of maps A→ Xf , then each topology making all f ∈ F continuous must contain ξ :=⋃f∈Ff−1

[G]| G ∈ τf, so the initial topology with respect to F is just the smallest topology

on A containing ξ. Similarly, being the largest topology rendering each g ∈ G continuous, thefinal topology with respect to G must contain the set

⋃g∈GH | g−1

[H]∈ τg.

An easy characterization of the initial resp. the final topology is proposed here:

Proposition 3.1.15 Let (Z, τ) be a topological space, and F be a family of maps A → Xf

with (Xf , τf ) topological spaces; A is endowed with the initial topology τin,F with respect toF . A map h : Z → A is τ -τin,F -continuous iff h f : Z → Xf is τ -τf -continuous for everyf ∈ F .

Proof 1. Certainly, if h : Z → A is τ -τin,F continuous, then hf : Z → Xf is τ -τf -continuousfor every f ∈ F by Proposition 3.1.13.

2. Assume, conversely, that h f is continuous for every f ∈ F ; we want to show that h iscontinuous. Consider

ζ := G ⊆ A | h−1[G]∈ τ.

Because τ is a topology, ζ is; because hf is continuous, ζ contains the sets f−1[H]| H ∈ τf

for every f ∈ F . But this implies that ζ contains τin,F , hence h−1[G]∈ τ for every G ∈ τin,F .

This establishes the assertion. a

There is a dual characterization for the final topology, see Exercise 3.1.

These are the most popular examples for initial and final topologies.

1. Given a family (Xi, τi)i∈I of topological spaces, let X :=∏i∈I Xi be the Cartesian

product of the carrier sets1. The product topology∏i∈I τi is the initial topology on XProduct

with respect to the projections πi : X → Xi. The product topology has as a base

∏i∈I Ai | Ai ∈ τi and Ai 6= Xi only for finitely many indices

2. Let (X, τ) be a topological space, A ⊆ X. The trace (A, τ ∩ A) of τ on A is theinitial topology on A with respect to the embedding iA : A → X. It has the open setsSubspaceG∩A | G ∈ τ; this is sometimes called the subspace topology, see page 42. We do notassume that A is open.

1This works only if X 6= ∅, recall that we assume here that the Axiom of Choice is valid

Page 227: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.1. DEFINING TOPOLOGIES 215

3. Given the family of spaces as above, let X :=∑

i∈I Xi be the direct sum. The sumtopology

∑i∈I τi is the final topology on X with respect to the injections ιi : Xi → X. Sum

Its open sets are described through∑i∈I

ιi[Gi]| Gi ∈ τi for all i ∈ I

.

4. Let ρ be an equivalence relation on X with τ a topology on the base space. Thefactor space X/ρ is equipped with the final topology τ/ρ with respect to the factormap ηρ which sends each element to its ρ-class. This topology is called the quotient Factortopology (with respect to τ and ρ). If a set G ⊆ X/ρ is open then its inverse imageη−1ρ

[G]

=⋃G ⊆ X is open in X. But the converse holds as well: assume that

⋃G

is open in X for some G ⊆ X/ρ, then G = ηρ[⋃

G], and, because

⋃G is the union if

equivalence classes, one shows that η−1ρ

[G]

= η−1ρ

[ηρ[⋃

G]]

=⋃G. But this means

that G is open in X/ρ.

Just to gain some familiarity with the concepts involved, we deal with an induced map on aproduct space, and with the subspace coming from the image of a map. The properties wefind here will be useful later on as well.

The product space first. We will use that a map into a topological product is continuous iffall its projections are; this follows from the characterization of an initial topology. It goeslike this.

Lemma 3.1.16 Let M and N be non-empty sets, f : M → N be a map. Equip both [0, 1]M

and [0, 1]N with the product topology. Then

f∗ :

[0, 1]N → [0, 1]M

g 7→ g f

is continuous.

Proof Note the reversed order; we have f∗(g)(m) = (g f)(m) = g(f(m)) for g ∈ [0, 1]N andm ∈M .

Because f∗ maps [0, 1]N into [0, 1]M , and the latter space carries the initial topology withrespect to the projections (πM,m)m∈N with πM,m : q 7→ q(m), it is by Proposition 3.1.15sufficient to show that πM,m f∗ : [0, 1]N → [0, 1] is continuous for every m ∈ M . ButπM,m f∗ = πN,f(m); this is a projection, which is continuous by definition. Hence f∗ iscontinuous. a

Hence an application of the projection defuses a seemingly complicated map. Note in passingthe neither M nor N are assumed to carry a topology, they are simply plain sets.

The next observation displays an example of a subspace topology. Each continuous mapf : X → Y of one topological space to another one induces a subspace f

[X]

of Y , which mayor may not have interesting properties. In the case considered, it inherits compactness fromits source.

Page 228: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

216 CHAPTER 3. TOPOLOGICAL SPACES

Proposition 3.1.17 Let (X, τ) and (Y, ϑ) be topological spaces, f : X → Y be τ -ϑ-continuous.If (X, τ) is compact, so is (f

[X], ϑ ∩ f

[X]), the subspace of (Y, ϑ) induced by f .

Proof We take on open cover of f[X]

and show that it contains a finite cover of this space. Solet (Hi)i∈I be an open cover of f

[X]. There exists open sets H ′i ∈ ϑ such that H ′i = Hi∩f

[X],

since (f[X], τ ∩ f

[X]) carries the subspace topology. Then (f−1

[H ′i])i∈I is an open cover of

X, so there exists a finite subset J ⊆ I such that X =⋃i∈J f

−1[H ′i], since X is compact.

But then (H ′i ∩ f[X])i∈J is an open cover of f

[X]. Hence this space is compact. a

Before continuing, we introduce the notion of homeomorphism (as an isomorphism in thecategory of topological spaces with continuous maps).

Definition 3.1.18 Let X and Y be topological spaces. A bijection f : X → Y is called ahomeomorphism iff both f and f−1 are continuous.

It is clear that continuity and bijectivity alone do not make a homeomorphism. Take asa trivial example the identity (R,P (R)) → (R, τ) with τ as the Euclidean topology. It iscontinuos and bijective, but its inverse is not continuous.

Let us have a look at some examples, first one for the quotient topology.

Example 3.1.19 Let U := [0, 2 · π], and identify the endpoints of the interval, i.e., considerthe equivalence relation

ρ := 〈x, x〉 | x ∈ U ∪ 〈0, 2 · π〉, 〈2 · π, 0〉.

Let K := U/ρ, and endow K with the quotient topology.

A set G ⊆ K is open iff η−1ρ

[G]⊆ U is open, thus iff we can find an open set H ⊆ R such

that η−1ρ

[G]

= H ∩ U , since U carries the trace of R. Consequently, if [0]ρ 6∈ G, we find that

η−1ρ

[G]

= x ∈ U | x ∈ G, which is open by construction. If, however, [0]ρ ∈ G, then

η−1ρ

[G]

= x ∈ U | x ∈ G ∪ 0, 2 · π, which is open in U .

We claim that K and the unit circle S := 〈s, t〉 | 0 ≤ s, t ≤ 1, s2+t2 = 1〉, are homeomorphicunder the map ψ : [x]ρ 7→ 〈sinx, cosx〉. Because 〈sin 0, cos 0〉 = 〈sin 2 · π, cos 2 · π〉, the mapis well defined. Since we can write S = 〈sinx, cosx〉 | 0 ≤ x ≤ 2 · π, it is clear that ψ isonto. The topology on S is inherited from the Cartesian plane, so open arcs are a subbasisfor it. Because the old Romans Sinus and Cosinus both are continuous, we find that ψ ηρ iscontinuous. We infer from Exercise 3.1 that ψ is continous, since K has the quotient topology,which is final.

We show now that ψ−1 is continuous. The argumentation is geometrical. Given an open arcon K, we may describe it through (P1, P2) with a clockwise movement. If the arc does notcontain the critical point P := 〈0, 1〉, we find an open interval I :=]a, b[ with 0 < a < b < 2 ·πsuch that ψ

[(P1, P2)

]= [x]ρ | x ∈ I, which is open in K. If, however, P is on this arc,

we decompose it into two parts (P1, P ) ∪ (P, P2). Then (P1, P ) is the image of some interval]a, 2 ·π], and (P, P2) is the image of an interval [0, b[, so that ψ

[(P1, P2)

]= ηρ

[[0, b[ ∪ ]a, 2 ·π]

],

which is open in K as well (note that [0, b[ as well as ]a, 2 · π] are open in U).

,

While we have described so far direct methods to describe a topology by saying under whichconditions a set is open, we turn now to an observation due to Kuratowski which yields an

Page 229: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.1. DEFINING TOPOLOGIES 217

indirect way. It describes axiomatically what properties the closure of a set should have.Assume that we have a closure operator , i.e., a map A 7→ Ac on the powerset of a set X with

Closureoperator

these properties:

1. ∅c = ∅ and Xc = X.

2. A ⊆ Ac, and (A ∪B)c = Ac ∪Bc.

3. (Ac)c = Ac.

Thus the operator leaves the empty set and the whole set alone, the closure of the union isthe union of the closures, and the operator is idempotent. One sees immediately that theoperator which assigns to each set its closure with respect to a given topology is such a closureoperator. It is also quite evident that a closure operator is monotone. Assume that A ⊆ B,then B = A ∪ (B \A), so that Bc = Ac ∪ (B \A)c ⊇ Ac.

Example 3.1.20 Let (D,≤) be a finite partially ordered set. We put ∅c := ∅ and Dc := D,moreover,

xc := y ∈ D | y ≤ x

is defined for x ∈ D, and Ac :=⋃x∈X x

c for subsets A of D. Then this is a closure operator.It is enough to check whether (xc)c = xc holds. In fact, we have

z ∈ (xc)c ⇔ z ∈ yc for some y ∈ xc

⇔ there exists y ≤ x with z ≤ y⇔ z ≤ x⇔ z ∈ xc.

Thus we associate with each finite partially ordered set a closure operator, which assigns toeach A ⊆ D its down set. The map x 7→ xc embeds D into a distributive lattice, see thediscussion in Section 1.5.6. ,

We will show now that we can obtain a topology by calling open all those sets the complementsof which remain fixed under the closure operator; in addition, it turns out that the topologicalclosure and the one from the closure operator are the same.

Theorem 3.1.21 Let ·c be a closure operator. Then

1. The set τ := X \ F | F ⊆ X,F c = F is a topology.

2. For each set Aa = Ac with ·a as the closure in τ .

Proof 1. For establishing that τ is a topology, it is enough to show that τ is closed underarbitrary unions, since the other properties are evident. Let G ⊆ τ , and put G :=

⋃G, so

we want to know whether X \Gc = X \ G. If H ∈ G, then X \ G ⊆ X \H, so (X \G)c ⊆(X \H)c = X \H, thus (X \G)c ⊆ X \ G. Since the operator is monotone, it follows that(X \G)c = X\G, hence τ is in fact closed under arbitrary unions, hence it is a topology.

2. Given A ⊆ X,Aa =

⋂F ⊆ X | F is closed, and A ⊆ F,

and Ac takes part in the intersection, so that Aa ⊆ Ac. On the other hand, A ⊆ Aa, thusAc ⊆ (Aa)c = Aa by part 1. Consequently, Aa and Ac are the same. a

Page 230: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

218 CHAPTER 3. TOPOLOGICAL SPACES

It is on first sight a bit surprising that a topology can be described by finitary means, althougharbitrary unions are involved for the topology. But we should not forget that we have alsothe subset relation at our disposal. Nevertheless, a rest of surprise remains.

3.1.2 Neighborhood Filters

The last method for describing a topology we are discussing here deals also with some orderproperties. Assume that we assign to each x ∈ X, where X is a given carrier set, a filterU(x) ⊆ P (X) with the property that x ∈ U holds for each U ∈ U(x). Thus U(x) has theseproperties:

1. x ∈ U for all U ∈ U(x).

2. If U, V ∈ U(x), then U ∩ V ∈ U(x).

3. If U ∈ U(x) and U ⊆ V , then V ∈ U(x).

It is fairly clear that, given a topology τ on X, the neighborhood filterUτ (x)

Uτ (x) := V ⊆ X | there exists U ∈ τ with x ∈ U and U ⊆ V

for x has these properties. It has also an additional property, which we will discuss shortly —for dramaturgical reasons.

Such a system of special filters defines a topology. We declare all those sets as open whichbelong to the neighborhoods of their elements. So if we take all balls in Euclidean R3 as thebasis for a filter and assign to each point the balls which it centers, then the sphere of radius1 around the origin would not be open (intuitively, it does not contain an open ball). So thisappears to be an appealing idea. In fact:

Proposition 3.1.22 Let U(x) | x ∈ X be a family of filters such that x ∈ U for allU ∈ U(x). Then

τ := U ⊆ X | U ∈ U(x) whenever x ∈ U

defines a topology on X.

Proof We have to establish that τ is closed under finite intersections, since the other prop-erties are fairly straightforward. Now, let U and V be open, and take x ∈ U ∩ V . We knowthat U ∈ U(x), since U is open, and we have V ∈ U(x) for the same reason. Since U(x) is afilter, it is closed under finite intersections, hence U ∩ V ∈ U(x), thus U ∩ V is open. a

We cannot, however, be sure that the neighborhood filter Uτ (x) for this new topology is thesame as the given one. Intuitively, the reason is that we do not know if we can find forU ∈ U(x) an open V ∈ U(x) with V ⊆ U such that V ∈ U(y) for all y ∈ V . To illustrate,look at R3, and take the neighborhood filter for, say, 0 in the Euclidean topology. Put forsimplicity

‖x‖ :=√x2

1 + x22 + x2

3.

Let U ∈ U(0), then we can find an open ball V ∈ U(0) with V ⊆ U . In fact, assumeU = a | ‖a‖ < q. Take z ∈ U , then we can find r > 0 such that the ball V := y |

Page 231: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.1. DEFINING TOPOLOGIES 219

‖y − z‖ < r is entirely contained in U (select ‖z‖ < r < q), thus V ∈ U(0). Now let y ∈ V ,let 0 < t < r − ‖z − y‖, then a | ‖a− y‖ < t ⊆ V , since ‖a− z‖ ≤ ‖a− y‖+ ‖z − y‖ < r.Hence U ∈ U(y) for all y ∈ V .

We obtain now as a simple corollary

Corollary 3.1.23 Let U(x) | x ∈ X be a family of filters such that x ∈ U for all U ∈ U(x),and assume that for any U ∈ U(x) there exists V ∈ U(x) with V ⊆ U and U ∈ U(y) for ally ∈ V . Then U(x) | x ∈ X coincides with the neighborhood filter for the topology defined bythis family. a

In what follows, unless otherwise stated, U(x) will denote the neighborhood filter of a pointx in a topological space X.

Example 3.1.24 Let L := 1, 2, 3, 6 be the set of all divisors of 6, and define x ≤ y iff xdivides y, so that we obtain

6

2

3

1

Let us compute — just for fun — the topology associated with this partial order, and a basisfor the neighborhood filters for each element. The topology can be seen from the table below(we have used that Ao = X \ (X \A)a, see page 42):

set closure interior

1 1 ∅2 1, 2 ∅3 1, 3 ∅6 1, 2, 3, 6 61, 2 1, 2 ∅1, 3 1, 3 ∅1, 6 1, 2, 3, 6 62, 3 1, 2, 3, 5 ∅2, 6 1, 2, 3, 6 2, 63, 6 1, 2, 3, 6 ∅1, 2, 3 1, 2, 3 ∅1, 2, 6 1, 2, 3, 6 2, 61, 3, 6 1, 2, 3, 6 3, 62, 3, 6 1, 2, 3, 6 2, 3, 61, 2, 3, 6 1, 2, 3, 6 1, 2, 3, 6

This is the topology:

τ =∅, 6, 2, 6, 3, 6, 2, 3, 6, 1, 2, 3, 6

.

A basis for the respective neighborhood filters is given in this table:

Page 232: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

220 CHAPTER 3. TOPOLOGICAL SPACES

element basis

11, 2, 3, 6

2

2, 6, 1, 2, 3, 6

3

3, 6, 2, 3, 6, 1, 2, 3, 6

6

6, 2, 6, 3, 6, 2, 3, 6, 1, 2, 3, 6

,

The next example deals with topological groups, i.e., topological spaces which have also agroup structure rendering multiplication continuous. Here the neighborhood structure is fairlyuniform — if you know the neighborhood filter of the neutral element, you know the neigh-borhood filter of each element, because you obtain them by a left shift or a right shift.

Example 3.1.25 Let (G, ·) be a group, and τ be a topology on G such that the map〈x, y〉 7→ xy−1 is continuous. Then (G, ·, τ) is called a topological group. We will writedown a topological group as G, the group operations and the topology will not be mentioned.The neutral element is denoted by e, multiplication will usually be omitted. Given a subsetU of G, define gU := gh | h ∈ U and Ug := hg | h ∈ U for g ∈ G.

Let us examine the algebraic operations in a group. Put ζ(x, y) := xy−1, then the mapξ : g 7→ g−1 which maps each group element to its inverse is just ζ(e, g), hence the cut ofa continuous map, to it is continous as well. ξ is a bijection with ξ ξ = idG, so it is infact a homeomorphism. We obtain multiplication as xy = ζ(x, ξ(y)), so multiplication is alsocontinuous. Fix g ∈ G, then multiplication λg : x 7→ gx from the left and ρg : x 7→ xg fromthe right are continuous. Now both λg and ρg are bijections, and λg λg−1 = λg−1 λg = idG,also ρg ρg−1 = ρg−1 ρg = idG, thus λg and ρg are homeomorphisms for every g ∈ G.

Thus we have in a topological group this characterization of the neighborhood filter for everyg ∈ G:

U(g) = gU | U ∈ U(e) = Ug | U ∈ U(e).

In fact, let U be a neighborhood of g, then λ−1g

[U]

= g−1U is a neighborhood of e, so is

ρ−1g

[U]

= Ug−1. Conversely, a neighborhood V of e determines a neighborhood λ−1g−1

[V]

= gV

resp. ρ−1g−1

[V]

= V g of g. ,

3.2 Filters and Convergence

The relationship between topologies and filters turns out to be fairly tight, as we saw whendiscussing the neighborhood filter of a point. We saw also that we can actually grow atopology from a suitable family of neighborhood filters. This relationship is even closer, aswe will discuss now when having a look at convergence.

Let (xn)n∈N be a sequence in R which converges to x ∈ R. This means that for any givenopen neighborhood U of x there exists an index n ∈ N such that xm | m ≥ n ⊆ U , soall members of the sequence having an index larger that n are members of U . Now considerthe filter F generated by the set xm | m ≥ n | n ∈ N of tails. The condition above saysexactly that U(x) ⊆ F, if you think a bit about it. This leads to the definition of convergencein terms of filters.

Page 233: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.2. FILTERS AND CONVERGENCE 221

Definition 3.2.1 Let X be a topological space, F a filter on X. Then F converges to a limitx ∈ X iff U(x) ⊆ F. This is denoted by F→ x. F→ x

Plainly, U(x) → x for every x. Note that the definition above does not force the limit tobe uniquely determined. If if two different points x, y share their neighborhood filter, thenF→ x iff F→ y. Look again at Example 3.1.24. There all neighborhood filters are containedin U(6), so that we have U(6) → t for t ∈ 1, 2, 3, 6. It may seem that the definition ofconvergence through a filter is too involved (after all, being a filter should not be taken ona light shoulder!). In fact, sometimes convergence is defined through a net as follows. Let Net(I,≤) be a directed set, i.e., ≤ is a partial order such that, given i, j ∈ I there exists k withi ≤ k and j ≤ k. An I-indexed family (xi)i∈I is said to converge to a point x iff, given aneighborhood U ∈ U(x) there exists k ∈ I such that xi ∈ U for all i ≥ k. This generalizes theconcept of convergence from sequences to index sets of arbitrary size. But look at this. Thesets

xj | j ≥ i | i ∈ I

form a filter base, because (I,≤) is directed. The corresponding

filter converges to x iff the net converges to x.

But what about the converse? Take a filter F on X, then F1 ≤ F2 iff F2 ⊆ F1 renders (F,≤)a net. In fact, given F1, F2 ∈ F, we have F1 ≤ F1 ∩ F2 and F2 ≤ F1 ∩ F2. Now pick xF ∈ F .Then the net (xF )F∈F converges to x iff F → x. Assume that F → x; take U ∈ U(x), thenU ∈ F, thus if F ∈ F with F ≥ U , then F ⊆ U , hence xF ∈ U for all such xF . Conversely,if each net (xF )F∈F derived from F converges to x, then for a given U ∈ U(x) there exists F0

such that xF ∈ U for F ⊆ F0. Since xF has been chosen arbitrarily from F , this can onlyhold if F ⊆ U for F ⊆ F0, so that U ∈ F. Because U ∈ U(x) was arbitrary, we concludeU(x) ⊆ F.

Hence we find that filters offer a uniform generalization.

The argument above shows that we may select the elements xF from a base for F. If the filterhas a countable base, we construct in this way a sequence; conversely, the filter constructedfrom a sequence has a countable base. Thus the convergence of sequences and the convergenceof filters with a countable base are equivalent concepts.

We investigate the characterization of the topological closure in terms of filters. In order todo this, we need to be able to restrict a filter to a set, i.e., looking at the footstep the filter Traceleaves on the set, hence at

F ∩A := F ∩A | F ∈ F.

This is what we will do now.

Lemma 3.2.2 Let X be a set, and F be a filter on X. Then F∩A is a filter on A iff F ∩A 6= ∅for all F ∈ F.

Proof Since a filter must not contain the empty set, the condition is necessary. But it is alsosufficient, because it makes sure that the laws of a filter are satisfied. a

Looking at F∩A for an ultrafilter F, we know that either A ∈ F or X \A ∈ F, so if F ∩A 6= ∅holds for all F ∈ F, then this implies that A ∈ F. Thus we obtain

Page 234: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

222 CHAPTER 3. TOPOLOGICAL SPACES

Corollary 3.2.3 Let X be a set, and F be an ultrafilter on X. Then F ∩ A is a filter iffA ∈ F. Moreover, in this case F ∩A is an ultrafilter on A.

Proof It remains to show that F ∩ A is an ultrafilter on A, provided, F ∩ A is a filter. LetB 6∈ F ∩A for some subset B ⊆ A. Since A ∈ F, we conclude B 6∈ F, thus X \B ∈ F, since Fis an ultrafilter. Thus (X \B)∩A = A\B ∈ F∩A, so F∩A is an ultrafilter by Lemma 1.5.21.a

From Lemma 3.2.2 we obtain a simple and elegant characterization of the topological closureof a set.

Proposition 3.2.4 Let X be a topological space, A ⊆ X. Then x ∈ Aa iff U(x)∩A is a filteron A. Thus x ∈ Aa iff there exists a filter F on A with F→ x.

Proof We know from the definition of Aa that x ∈ Aa iff U ∩ A 6= ∅ for all U ∈ U(x). Thisis by Lemma 3.2.2 equivalent to U(x) ∩A being a filter on A. a

We know from Calculus that continuous functions preserve convergence, i.e., if xn → x andf is continuous, then f(xn) → f(x). We want to carry this over to the world of filters. Forthis, we have to define the image of a filter. Let F be a filter on a set X, and f : X → Y amap, then

f(F) := B ⊆ Y | f−1[B]∈ F

is a filter on Y . In fact, ∅ 6∈ f(F), and, since f−1 preserves the Boolean operations, f(F)Image of afilter

is closed under finite intersections. Let B ∈ f(F) and B ⊆ B′. Since f−1[B]∈ F, and

f−1[B]⊆ f−1

[B′], we conclude f−1

[B′]∈ F, so that B′ ∈ f(F). Hence f(F) is also upper

closed, so that it is in fact a filter.

This is an easy representation through the direct image.

Lemma 3.2.5 Let f : X → Y be a map, F a filter on X, then f(F) equals the filter generatedby f

[A]| A ∈ F.

Proof Because f[A1 ∩ A2

]⊆ f

[A1

]∩ f[A2

], the set G0 := f

[A]| A ∈ F is a filter base.

Denote by G the filter generated by G0.

We claim that f(F) = G.

“⊆”: Assume that B ∈ f(F), hence f−1[B]∈ F . Since f

[f−1

[B]]⊆ B, we conclude that

B is contained in the filter generated by G0, hence in G.

“⊇”: If B ∈ G0, we find A ∈ F with B = f[A], hence A ⊆ f−1

[f[A]]

= f−1[B]∈ F, so that

B ∈ f(F). This implies the desired inclusion, since f(F) is a filter.

This establishes the desired equality and proves the claim. a

We see also that not only the filter property is transported through maps, but also theproperty of being an ultrafilter.

Lemma 3.2.6 Let f : X → Y be a map, F an ultrafilter on X. Then f(F) is an ultrafilteron Y .

Proof It is by Lemma 1.5.21 enough to show that if f(F) does not contain a set, it willcontain its complement, since f(F) is already known to be a filter. In fact, assume that

Page 235: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.2. FILTERS AND CONVERGENCE 223

H 6∈ f(F), so that f−1[H]6∈ F. Since F is an ultrafilter, we know that X \ f−1

[H]∈ F; but

X \ f−1[H]

= f−1[Y \H

], so that Y \H ∈ f(F). a

Example 3.2.7 Let X be the product of the topological spaces (Xi)i∈I with projectionsπi : X → Xi. For a filter F on X, we have πj(F) = Aj ⊆ Xj | Aj ×

∏i 6=j Xi ∈ F. ,

Continuity preserves convergence:

Proposition 3.2.8 Let X and Y be topological spaces, and f : X → Y a map.

1. If f is continuous, and F a filter on X, then F→ x implies f(F)→ f(x) for all x ∈ X.

2. If F→ x implies f(F)→ f(x) for all x ∈ X and all filters F on X, then f is continuous.

Proof Let V ∈ U(f(x)), then there exists U ∈ U(f(x)) open with U ⊆ V . Since f−1[U]∈

U(x) ⊆ F, we conclude U ∈ f(F), hence V ∈ f(F). Thus U(f(x)) ⊆ f(F), which means thatf(F)→ f(x) indeed. This establishes the first part.

Now assume that F → x implies f(F) → f(x) for all x ∈ X and an arbitrary filter F on X.Let V ⊆ Y be open. Given x ∈ f−1

[V], we find an open set U with x ∈ U ⊆ f−1

[V]

in thefollowing way. Because x ∈ f−1

[V], we know f(x) ∈ V . Since U(x)→ x, we obtain from the

assumption that f(U(x)) → f(x), thus U(f(x)) ⊆ f(U(x)). Because V ∈ U(f(x)), it followsf−1

[V]∈ U(x), hence we find an open set U with x ∈ U ⊆ f−1

[V]. Consequently, f−1

[V]

isopen in X. a

Thus continuity and filters cooperate in a friendly manner.

Proposition 3.2.9 Assume that X carries the initial topology with respect to a family (fi :X → Xi)i∈I of functions. Then F→ x iff fi(F)→ fi(x) for all i ∈ I.

Proof Proposition 3.2.8 shows that the condition is necessary. Assume that fi(F) → fi(x)for every i ∈ I, let τi be the topology on Xi. The setsf−1i1

[Gi1]∩ . . .∩f−1

ik

[Gik] | i1, . . . , ik ∈ I, fi1(x) ∈ Gi1 ∈ τi1 , . . . , fik(x) ∈ Gik ∈ τik , k ∈ N

form a base for the neighborhood filter for x in the initial topology. Thus, given an openneighborhood U of x, we have f−1

i1

[Gi1]∩ . . . ∩ f−1

ik

[Gik]⊆ U for some suitable finite set of

indices. Since fij (F)→ fij (x), we infer Gij ∈ fij (F), hence f−1ij

[Gij]∈ F for 1 ≤ j ≤ k, thus

U ∈ F. This means U(x) ⊆ F. Hence F→ x, as asserted. a

We know that in a product a sequence converges iff its components converge. This is thecounterpart for filters:

Corollary 3.2.10 Let X =∏i∈I Xi be the product of the topological spaces. Then F →

(xi)i∈I in X iff Fi → xi in Xi for all i ∈ I, where Fi it the i-th projection πi(F) of F. a

The next observation further tightens the connection between topological properties andfilters. It requires the existence of ultrafilters, so recall the we assume that the Axiom ofChoice holds.

Theorem 3.2.11 Let X be a topological space. Then X is compact iff each ultrafilter con-verges.

Thus we tie compactness, i.e., the possibility to extract from each cover a finite subcover, tothe convergence of ultrafilters. Hence an ultrafilter in a compact space cannot but converge.

Page 236: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

224 CHAPTER 3. TOPOLOGICAL SPACES

The proof of Alexander’s Subbase Theorem 1.5.57 indicates already that there is a fairly closeconnection between the Axiom of Choice and topological compactness. This connection istightened here.

Proof 1. Assume that X is compact, but that we find an ultrafilter F which fails to converge.Hence we can find for each x ∈ X an open neighborhood Ux of x which is not contained inF. Since F is an ultrafilter, X \ Ux ∈ F. Thus X \ Ux | x ∈ X ⊆ F is a collection of closedsets with

⋂x∈X(X \ Ux) = ∅. Since X is compact, we find a finite subset F ⊆ X such that⋂

x∈F (X \ Ux) = ∅. But X \ Ux ∈ F, and F is closed under finite intersections, hence ∅ ∈ F.This is a contradiction.

2. Assume that each ultrafilter converges. It is sufficient to show that each family H of closedsets for which every finite subfamily has a non-empty intersection has a non-empty intersectionitself. Now, the set

⋂H0 | H0 ⊆ H finite of all finite intersections forms the base for a filter

F0, which may be extended to an ultrafilter F by Theorem 1.5.43. By assumption F→ x forsome x, hence U(x) ⊆ F. The point x is a candidate for being a member in the intersection.Assume the contrary. Then there exists H ∈ H with x 6∈ H, so that x ∈ X \ H, which isopen. Thus X \ H ∈ U(x) ⊆ F. On the other hand, H =

⋂H ∈ F0 ⊆ F, so that ∅ ∈ F.

Thus we arrive at a contradiction, and x ∈⋂H. Hence

⋂H 6= ∅. a

From Theorem 3.2.11 we obtain Tihonov’s celebrated theorem2 as an easy consequence.

Theorem 3.2.12 (Tihonov’s Theorem) The product∏i∈I Xi of topological spaces with Xi 6=

∅ for all i ∈ I is compact iff each space Xi is compact.

Proof If the product X :=∏i∈I Xi is compact, then πi

[X]

= Xi is compact by Proposi-tion 3.1.17. Let, conversely, be F an ultrafilter on X, and assume all Xi are compact. Thenπi(F) is by Lemma 3.2.6 an ultrafilter on Xi for all i ∈ I, which converges to some xi byTheorem 3.2.11. Hence F→ (xi)i∈I by Corollary 3.2.10. This implies the compactness of Xby another application of Theorem 3.2.11. a

According to [Eng89, p. 146], Tihonov established the theorem for a product of an arbitrarynumbers of closed and bounded intervals of the real line (we know from the Heine-BorelTheorem 1.5.46 that these intervals are compact). Kelley [Kel55, p. 143] gives a proof of thenon-trivial implication of the theorem which relies on Alexander’s Subbase Theorem 1.5.57. Itgoes like this. It is sufficient to establish that, whenever we have a family of subbase elementseach finite family of which fails to cover X, then the whole family will not cover X. Thesets

π−1i

[U]| U ⊆ Xi open, i ∈ I

form a subbase for the product topology of X. Let S

be a family of sets taken from this subbase such that no finite family of elements of S coversX. Put Si := U ⊆ Xi | π−1

i

[U]∈ S, then Si is a family of open sets in Xi. Suppose Si

contains sets U1, . . . , Uk which cover Xi, then π−1i

[U1

], . . . , π−1

i

[Uk]

are elements of S whichcover X; this is impossible, hence Si fails to contain a finite family which covers Xi. SinceXi is compact, there exists a point xi ∈ Xi with xi 6∈

⋃Si. But then x := (xi)i∈I cannot be

a member of⋃S. Hence S does not cover X. This completes the proof.

Both proofs rely heavily on the Axiom of Choice, the first one through the existence of anultrafilter extending a given filter, the second one through Alexander’s Subbase Theorem.

Axiom ofChoice

2“The Tychonoff Product Theorem concerning the stability of compactness under formation of topologicalproducts may well be regarded as the single most important theorem of general topology” according to H.

Page 237: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.2. FILTERS AND CONVERGENCE 225

The relationship of Tihonov’s Theorem to the Axiom of Choice is even closer: It can actuallybe shown that the theorem and the Axiom of Choice are equivalent [Her06, Theorem 4.68];this requires, however, establishing the existence of topological products without any recourseto the infinite Cartesian product as a carrier.

We have defined above the concept of a limit point of a filter. A weaker concept is that of anaccumulation point. Taking in terms of sequences, an accumulation point of a sequence hasthe property that each neighborhood of the point contains infinitely many elements of thesequence. This carries over to filters in the following way.

Definition 3.2.13 Given a topological space X, the point x ∈ X is called an accumulationpoint of filter F iff U ∩ F 6= ∅ for every U ∈ U(x) and every F ∈ F.

Since F → x iff U(x) ⊆ F, it is clear that x is an accumulation point of F. But a filter mayfail to have an accumulation point at all. Consider the filter F over R which is generated bythe filter base

]a,∞[| a ∈ R

; it is immediate that F does not have an accumulation point.

Let us have a look at a sequence (xn)n∈N, and the filter F generated by the infinite tailsxm | m ≥ n | n ∈ N

. If x is an accumulation point of the sequence, U ∩xm | m ≥ n 6= ∅

for every neighborhood U of x, thus U ∩ F 6= ∅ for all F ∈ F and all such U . Conversely, if xis an accumulation point for filter F, it is clear that the defining property holds also for theelements of the base for the filter, thus x is an accumulation point for the sequence. Hencewe have found the “right” generalization from sequences to filters.

An easy characterization of the set of all accumulation points goes like this.

Lemma 3.2.14 The set of all accumulation points of filter F is exactly⋂F∈F F

a.

Proof This follows immediately from the observation that x ∈ Aa iff U ∩ A 6= ∅ for eachneighborhood U ∈ U(x). a

The lemma has an interesting consequence for the characterization of compact spaces throughfilters:

Corollary 3.2.15 X is compact iff each filter on X has an accumulation point.

Proof Let F be a filter in a compact space X, and assume that F does not have an ac-cumulation point. Lemma 3.2.14 implies that

⋂F∈F F

a = ∅. Since X is compact, we findF1, . . . , Fn ∈ F with

⋂ni=1 Fi

a = ∅. Thus⋂ni=1 Fi = ∅. But this set is a member of F, a

contradiction.

Now assume that each filter has an accumulation point. It is by Theorem 3.2.11 enough toshow that every ultrafilter F converges. An accumulation point x for F is a limit: assumethat F 6→ x, then there exists V ∈ U(x) with V 6∈ F, hence X \ V ∈ F. But V ∩ F 6= ∅ for allF ∈ F, since x is an accumulation point. This is a contradiction. a

This is a characterization of accumulation points in terms of converging filters.

Lemma 3.2.16 In a topological space X, the point x ∈ X is an accumulation point of filterF iff there exists a filter F0 with F ⊆ F0 and F0 → x.

Herrlich and G. E. Strecker, quoted from [Her06, p. 85]

Page 238: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

226 CHAPTER 3. TOPOLOGICAL SPACES

Proof Let x be an accumulation point of F, then U ∩ F | U ∈ U(x), F ∈ F is a filterbase. Let F0 be the filter generated by this base, then F ⊆ F0, and certainly U(x) ⊆ F0, thusF0 → x.

Conversely, let F ⊆ F0 → x. Since U(x) ⊆ F0 follows, we conclude U ∩ F 6= ∅ for allneighborhoods U and all elements F ∈ F, for otherwise we would have ∅ = U ∩ F ∈ F forsome U,F ∈ F, which contradicts ∅ ∈ F. Thus x is indeed an accumulation point of F.a

3.3 Separation Properties

We see from Example 3.1.24 that a filter may converge to more than one point. This may beundesirable. Think of a filter which is based on a sequence, and each element of the sequenceindicates an approximation step. Then you want the approximation to converge, but theresult of this approximation process should be unique. We will have a look at this question,and we will see that this is actually a special case of a separation property.

Proposition 3.3.1 Given a topological space X, the following properties are equivalent

1. If x 6= y are different points in X, there exists U ∈ U(x) and V ∈ U(y) with U ∩ V = ∅.

2. The limit of a converging filter is uniquely determined.

3. x =⋂U | U ∈ U(x) is closed or all points x.

4. The diagonal ∆ := 〈x, x〉 | x ∈ X is closed in X ×X.

Proof

1 ⇒ 2: If F→ x and F→ y with x 6= y, we have U ∩ V ∈ F for all U ∈ U(x) and V ∈ U(y),hence ∅ ∈ F. This is a contradiction.

2 ⇒ 3: Let y ∈⋂U | U ∈ U(x) is closed, thus y is an accumulation point of U(x). Hence

there exists a filter F with U(x) ⊆ F→ y by Lemma 3.2.16. Thus x = y.

3 ⇒ 4: Let 〈x, y〉 6∈ ∆, then there exists a closed neighborhood W of x with y 6∈ W . LetU ∈ U(x) open with U ⊆W , and put V := X \W , then 〈x, y〉 ∈ U × V ∩∆ = ∅, and U × Vis open in X ×X.

4 ⇒ 1: If 〈x, y〉 ∈ (X × X) \ ∆, there exists open sets U ∈ U(x) and V ∈ U(y) withU × V ∩∆ = ∅, hence U ∩ V = ∅. a

Looking at the proposition, we see that having a unique limit for a filter is tantamount tobeing able to separate two different points through disjoint open neighborhoods. Becausethese spaces are important, they deserve a special name.

Definition 3.3.2 A topological space is called a Hausdorff space iff any two different points inX can be separated by disjoint open neighborhoods, i.e., iff condition (1) in Proposition 3.3.1holds. Hausdorff spaces are also called T2-spaces.

Example 3.3.3 Let X := R, and define a topology through the base

[a, b[ | a, b ∈ R, a < b

.

Then this is a Hausdorff space. This space is sometimes called the Sorgenfrey line. ,

Page 239: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.3. SEPARATION PROPERTIES 227

Being Hausdorff can be determined from neighborhood filters:

Lemma 3.3.4 Let X be a topological space. Then X is a Hausdorff space iff each x ∈ X hasa base U0(x) for its neighborhood filters such that for any x 6= y there exists U ∈ U0(x) andV ∈ U0(y) with U ∩ V = ∅. a

It follows a first and easy consequence for maps into a Hausdorff space, viz., the set ofarguments on which they coincide is closed.

Corollary 3.3.5 Let X, Y be topological spaces, and f, g : X → Y continuous maps. If Y isa Hausdorff space, then x ∈ X | f(x) = g(x) is closed.

Proof The map t : x 7→ 〈f(x), g(x)〉 is a continuos map X → Y × Y . Since ∆ ⊆ Y × Yis closed by Proposition 3.3.1, the set t−1

[∆]

is closed. But this is just the set in question.a

The reason for calling a Hausdorff space a T2 space3 will become clear once we have discussedother ways of separating points and sets; then T2 will be a point in a spectrum denotingseparation properties. For the moment, we introduce two other separation properties whichdeal with the possibility of distinguishing two different points through open sets. Let for thisX be a topological space.

T0-space: X is called a T0-space iff, given two different points x and y, there exists an open T0, T1

set U which contains exactly one of them.

T1-space: X is called a T1-space iff, given two different points x and y, there exist openneighborhoods U of x and V of y with y 6∈ U and x 6∈ V .

The following examples demonstrate these spaces.

Example 3.3.6 Let X := R, and define the topologies on the real numbers through

τ< := ∅,R ∪

]−∞, a[ | a ∈ R,

τ≤ := ∅,R ∪

]−∞, a] | a ∈ R.

Then τ< is a T0-topology. τ≤ is a T1-topology which is not T0. ,

This is an easy characterization of T1-spaces.

Proposition 3.3.7 A topological space X is a T1-space iff x is closed for all x ∈ X.

Proof Let y ∈ xa, then y is in every open neighborhood U of x. But this can happenin a T1-space only if x = y. Conversely, if x is closed, and y 6= x, then there exists aneighborhood U of x which does not contain y, and x is not in the open set X \ x. a

Example 3.3.8 Let X be a set with at least two points, x0 ∈ X be fixed. Put ∅c := ∅ and forAc := A ∪ x0 for A 6= ∅. Then ·c is a closure operator, we look at the associated topology.Since x is open for x 6= x0, X is a T0 space, and since x is not closed for x 6= x0, X isnot T1. ,

3T stands for German Trennung, i.e., separation

Page 240: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

228 CHAPTER 3. TOPOLOGICAL SPACES

Example 3.3.9 Let (D,≤) be a partially ordered set. The topology associated with theclosure operator for this order according to Example 3.1.20 is T1 iff y ≤ x⇔ x = y, becausethis is what xc = x says. ,

Example 3.3.10 Let X := N, and τ := A ⊆ N | A is cofinite ∪ ∅. Recall that a cofiniteset is defined as having a finite complement. Then τ is a topology on X such that X \ xis open for each x ∈ X. Hence X is a T1-space. But X is not Hausdorff. If x 6= y and Uis an open neighborhood of x, then X \ U is finite. Thus if V is disjoint from U , we haveV ⊆ X \ U . But then V cannot be an open set with y ∈ V . ,

While the properties discussed so far deal with the relationship of two different points to eachother, the next group of axioms looks at closed sets; given a closed set F , we call an open setU with F ⊆ U an open neighborhood of F . Let again X be a topological space.

T3-space: X is a T3-space iff given a point x and a closed set F , which does not contain x,T3, T3 12, T4

there exist disjoint open neighborhoods of x and of F .

T3 12-space: X is a T3 1

2-space iff given a point x and a closed set F with x 6∈ F there exists a

continuous function f : X → R with f(x) = 1 and f(y) = 0 for all y ∈ F .

T4-space: X is a T4-space iff two disjoint closed sets have disjoint open neighborhoods.

T3 and T4 deal with the possibility of separating a closed set from a point resp. anotherclosed set. T3 1

2is squeezed-in between these axioms. Because x ∈ X | f(x) < 1/2 and

x ∈ X | f(x) > 1/2 are disjoint open sets, it is clear that each T3 12-space is a T3-space. It

is also clear that the defining property of T3 12

is a special property of T4, provided singletons

are closed. The relationship and further properties will be explored now.

It might be noted that continuous functions play now an important role here in separatingobjects. T3 1

2entails among others that there are “enough” continuous functions. Engel-

king [Eng89, p. 29 and 2.7.17] mentions that there are spaces which satisfy T3 but have onlyconstant continuous functions, and comments “they are, however, fairly complicated ...” (p.29), Kuratowski [Kur66, p. 121] makes a similar remark. So we will leave it at that and directthe reader, who want to know more, to these sources and the papers quoted there.

We look at some examples.

Example 3.3.11 Let X := 1, 2, 3, 4.

1. With the indiscrete topology ∅, X, X is a T3 space, but it is neither T2 nor T1.

2. Take the topology1, 1, 2, 1, 3, 1, 2, 3, X, ∅

, then two closed sets are only dis-

joint when one of them is empty, because all of them contain the point 4 (with theexception of ∅, of course). Thus the space is T4. The point 1 and the closed set 4cannot be separated by a open sets, thus the space is not T3.

,

The next example displays a space which is T2 but not T3.

Example 3.3.12 Let X := R, and put Z := 1/n | n ∈ N. Define in addition for x ∈ Rand i ∈ N the sets Bi(x) :=]x − 1/i, x + 1/i[. Then U0(x) := Bi(x) | i ∈ N for x 6= 0, and

Page 241: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.3. SEPARATION PROPERTIES 229

U0(0) := Bi(0)\Z | i ∈ N define neighborhood filters for a Hausdorff space by Lemma 3.3.4.But this is not a T3-space. One notes first that Z is closed: if x 6∈ Z and x 6∈ [0, 1], one certainlyfinds i ∈ N with Bi(x) ∩ Z = ∅, and if 0 < x ≤ 1, there exists k with 1/(k + 1) < x < 1/k, sotaking 1/i less than the minimal distance of x to 1/k and 1/(k + 1), one has Bi(x) ∩ Z = ∅.If x = 0, each neighborhood contains an open set which is disjoint from Z. Now each openset U which contains Z contains also 0, so we cannot separate 0 from Z. ,

Just one positive message: the reals satisfy T3 12.

Example 3.3.13 Let F ⊆ R be non-empty, then

f(t) := infy∈F

|t− y|1 + |t− y|

defines a continuous function f : R→ [0, 1] with z ∈ F ⇔ f(z) = 0. Thus, if x 6∈ F , we havef(x) > 0, so that y 7→ f(y)/f(x) is a continuous function with the desired properties. Thusthe reals with the usual topology are a T3 1

2-space. ,

The next proposition is a characterization of T3-spaces in terms of open neighborhoods, mo-tivated by the following observation. Take a point x ∈ R and an open set G ⊆ R withx ∈ G. Then there exists r > 0 such that the open interval ]x − r, x + r[ is entirely con-tained in G. But we can say more: by making this open interval a little bit smaller, we canactually fit a closed interval around x into the given neighborhood as well, so, for example,x ∈ ]x− r/2, x+ r/2[ ⊆ [x− r/2, x+ r/2] ⊆ ]x− r, x+ r[ ⊆ G. Thus we find for the givenneighborhood another neighborhood the closure of which is entirely contained in it.

Proposition 3.3.14 Let X be a topological space. Then the following are equivalent.

1. X is a T3-space.

2. For every point x and every open neighborhood U of x there exists an open neighborhoodV of x with V a ⊆ U .

Proof 1 ⇒ 2: Let U be an open neighborhood of x, then x is not contained in the closedset X \ U , so by T3 we find disjoint open sets U1, U2 with x ∈ U1 and X \ U ⊆ U2, henceX \ U2 ⊆ U . Because U1 ⊆ X \ U2 ⊆ U , and X \ U2 is closed, we conclude Ua1 ⊆ U .

2 ⇒ 1: Assume that we have a point x and a closed set F with x 6∈ F . Then x ∈ X \ F , sothat X \F is an open neighborhood of x. By assumption, there exists an open neighborhoodV of x with x ∈ V a ⊆ X \F , then V and X \ (V a) are disjoint open neighborhoods of x resp.F . a

This characterization can be generalized to T4-spaces (roughly, by replacing the point througha closed set) in the following way.

Proposition 3.3.15 Let X be a topological space. Then the following are equivalent.

1. X is a T4-space.

2. For every closed set F and every open neighborhood U of F there exists an open neigh-borhood V of F with F ⊆ V ⊆ V a ⊆ U .

The proof of this proposition is actually nearly a copy of the preceding one, mutatis mutan-dis.

Page 242: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

230 CHAPTER 3. TOPOLOGICAL SPACES

Proof 1 ⇒ 2: Let U be an open neighborhood of the closed set F , then the closed setF ′ := X \U is disjoint to F , so that we can find disjoint open neighborhoods U1 of F and U2

of F ′, thus U1 ⊆ X \ U2 ⊆ X \ F ′ = U , so V := U1 is the open neighborhood we are lookingfor.

2 ⇒ 1: Let F and F ′ be disjoint closed sets, then X \F ′ is an open neighborhood for F . LetV be an open neighborhood for F with F ⊆ V ⊆ V a ⊆ X \ F ′, then V and U := X \ (V a)are disjoint open neighborhoods of F and F ′. a

We mentioned above that the separation axiom T3 12

makes sure that there are enough contin-

uous functions on the space. Actually, the continuous functions even determine the topologyin this case, as the following characterization shows.

Proposition 3.3.16 Let X be a topological space, then the following statements are equiva-lent.

1. X is a T3 12-space.

2. β :=f−1

[U]| f : X → R is continuous, U ⊆ R is open

constitutes a basis for the

topology of X.

Proof The elements of β are open sets, since they are comprised of inverse images of opensets under continuous functions.

1 ⇒ 2: Let G ⊆ X be an open set with x ∈ G. We show that we can find B ∈ βwith x ∈ B ⊆ G. In fact, since X is T3 1

2, there exists a continuous function f : X → R with

f(x) = 1 and f(y) = 0 for y ∈ X \G. Then B := x ∈ X | −∞ < x < 1/2 = f−1[]−∞, 1/2[

]is a suitable element of β.

2 ⇒ 1: Take x ∈ X and a closed set F with x 6∈ F . Then U := X\F is an open neighborhoodx. Then we can find G ⊆ R open and f : X → R continuous with x ∈ f−1

[G]⊆ U . Since

G is the union of open intervals, we find an open interval I :=]a, b[ ⊆ G with f(x) ∈ I. LetG : R → R be a continuous with g(f(x)) = 1 and g(t) = 0, if t 6∈ I; such a function existssince R is a T3 1

2-space (Example 3.3.13). Then g f is a continuous function with the desired

properties. Consequently, X is a T3 12-space. a

The separation axioms give rise to names for classes of spaces. We will introduce theretraditional names now.

Definition 3.3.17 Let X be a topological space, then X is called

• regular iff X satisfies T1 and T3,

• completely regular iff X satisfies T1 and T3 12,

• normal iff X satisfies T1 and T4.

The reason T1 is always included is that one wants to have every singleton as a closed set,which, as the examples above show, is not always the case. Each regular space is a Hausdorffspace, each regular space is completely regular, and each normal space is regular. We willobtain as a consequence of Urysohn’s Lemma that that each normal space is completelyregular as well (Corollary 3.3.24).

Page 243: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.3. SEPARATION PROPERTIES 231

In a completely regular space we can separate a point x from a closed set not containing xthrough a continuous function. It turns out that normal spaces have an analogous property:Given two disjoint closed sets, we can separate these sets through a continuous function. Thisis what Urysohn’s Lemma says, a famous result from the beginnings of set-theoretic topology.To be precise:

Theorem 3.3.18 (Urysohn) Let X be a normal space. Given disjoint closed sets F0 and F1,there exists a continuous function f : X → R such that f(x) = 0 for x ∈ F0 and f(x) = 1 forx ∈ F1.

We need some technical preparations for proving Theorem 3.3.18; this gives also the oppor-tunity to introduce the concept of a dense set.

Definition 3.3.19 A subset D ⊆ X of a topological space X is called dense iff Da = X.

Dense sets are fairly practical when it comes to compare continuous functions for equality:if suffices that the functions coincide on a dense set, then they will be equal. Just for therecord:

Lemma 3.3.20 Let f, g : X → Y be continuous maps with Y Hausdorff, and assume thatD ⊆ X is dense. Then f = g iff f(x) = g(x) for all x ∈ D.

Proof Clearly, if f = g, then f(x) = g(x) for all x ∈ D. So we have to establish the otherdirection.

Because Y is a Hausdorff space, ∆Y := 〈y, y〉 | y ∈ Y is closed (Proposition 3.3.1), andbecause f × g : X × X → Y × Y is continuous, (f × g)−1

[∆Y

]⊆ X × X is closed as well.

The latter set contains ∆D, hence its closure ∆X . a

It is immediate that if D is dense, then U ∩D 6= ∅ for each open set U , so in particular eachneighborhood of a point meets the dense set D. To provide an easy example, both Q andR \Q are dense subsets of R in the usual topology. Note that Q is countable, so R has evena countable dense set.

The first lemma has a family of subsets indexed by a dense subset of R exhaust a given setand provides a useful real function.

Lemma 3.3.21 Let M be set, D ⊆ R+ be dense, and (Et)t∈D be a family of subsets of Mwith these properties:

• if t < s, then Et ⊆ Es,

• M =⋃t∈D Et.

Put f(m) := inft ∈ D | m ∈ Et, then we have for all s ∈ R

1. m | f(m) < s =⋃Et | t ∈ D, t < s,

2. m | f(m) ≤ s =⋂Et | t ∈ D, t > s.

Proof 1. Let us work on the first equality. If f(m) < s, there exists t < s with m ∈ Et.Conversely, if m ∈ Et for some t < s, then f(m) = infr ∈ D | m ∈ Er ≤ t < s.

2. For the second equality, assume f(m) ≤ s, then we can find for each r > s some t < rwith m ∈ Et ⊆ Er. To establish the other inclusion, assume that f(m) ≤ t for all t > s.

Page 244: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

232 CHAPTER 3. TOPOLOGICAL SPACES

If f(m) = r > s, we can find some t′ ∈ D with r > t′ > s, hence f(m) ≤ t′. This is acontradiction, hence f(m) ≤ s. a

This lemma, which does not assume a topology on M , but requires only a plain set, is extendednow for the topological scenario in which we will use it. We assume that each set Et is open,and we assume that Et contains the closures of its predecessors. Then it will turn out thatthe function we just have defined is continuous, specifically:

Lemma 3.3.22 Let X be a topological space, D ⊆ R+ a dense subset, and assume that(Et)t∈D is a family of open sets with these properties

• if t < s, then Eat ⊆ Es,

• X =⋃t∈D Et.

Then f : x 7→ inft ∈ D | x ∈ Et defines a continuous function on X.

Proof 0. Because a subbase for the topology on R is comprised of the intervals ] − ∞, x[resp. ]x,+∞[, we see from Lemma 3.1.9 that it is sufficient to show that for any s ∈ R thesets x ∈ X | f(x) < s and x ∈ X | f(x) > s are open, since they are the correspondinginverse images under f . For the latter set we show that its complement x ∈ X | f(x) ≤ sis closed. Fix s ∈ R.

1. We obtain from Lemma 3.3.21 that x ∈ X | f(x) < s equals⋃Et | t ∈ D, t < s; since

all sets Et are open, their union is. Hence x ∈ X | f(x) < s is open.

2. We obtain again from Lemma 3.3.21 that x ∈ X | f(x) ≤ s equals⋂Et | t ∈ D, t > s,

so if we can show that⋂Et | t ∈ D, t > s =

⋂Eat | t ∈ D, t > s, we are done. In fact,

the left hand side is contained in the right hand side, so assume that x is an element of theright hand side. If x is not contained in the left hand side, we find t′ > s with t′ ∈ D suchthat x 6∈ Et′ . Because D is dense, we find some r with s < r < t′ with Ear ⊆ Et′ . But thenx 6∈ Ear , hence x 6∈

⋂Eat | t ∈ D, t > s, a contradiction. Thus both sets are equal, so that

x ∈ X | f(x) ≥ s is closed. a

We are now in a position to establish Urysohn’s Lemma. The idea of the proof rests on thisobservation for a T4-space X: suppose that we have open sets A and B with A ⊆ Aa ⊆ B.

Idea of theproof

Then we can find an open set C such that Aa ⊆ C ⊆ Ca ⊆ B, see Proposition 3.3.15.Denote just for the proof for open sets A,B the fact that Aa ⊆ B by A v∗ B. Then weA v∗ Bmay express the idea above by saying that A v∗ B implies the existence of an open set Cwith A v∗ C v∗ B, so C may be squeezed in. But now we have A v∗ C and C v∗ B, sowe find open sets E and F with A v∗ E v∗ C and C v∗ F v∗ B, arriving at the chainA v∗ E v∗ C v∗ F v∗ B. But why stop here?

The proof makes this argument systematic and constructs in this way a continuous func-tion.

Proof (of Theorem 3.3.18) 1. Let D := p/2q | p, q non-negative integers. These are alldyadic numbers, which are dense in R+. We are about to construct a family (Et)t∈D of opensets Et indexed by D in the following way.

Page 245: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.3. SEPARATION PROPERTIES 233

2. Put Et := X for t > 1, and let E1 := X \ F1, moreover let E0 be an open set containingF0 which is disjoint from E1. We now construct open sets Ep/2n by induction on n in thefollowing way. Assume that we have already constructed open sets

E0 v∗ E 12n−1

v∗ E 22n−1

. . . v∗ E 2n−1−1

2n−1

v∗ E1.

Let t = 2m+12n , then we find an open set Et with E 2m

nv∗ Et v∗ E 2m+2

2n; we do this for all m

with 0 ≤ m ≤ 2n−1 − 1.

3. Look as an illustration at the case n = 3. We have found already the open setsE0 v∗ E1/4 v∗ E1/2 v∗ E3/4 v∗ E1. Then the construction goes on with finding open setsE1/8, E3/8, E5/8 and E7/8 such that after the step is completed, we obtain this chain.

E0 v∗ E1/8 v∗ E1/4 v∗ E3/8 v∗ E1/2 v∗ E5/8 v∗ E3/4 v∗ E7/8 v∗ E1.

4. In this way we construct a family (Et)t∈D with the properties requested by Lemma 3.3.22.It yields a continuous function f : X → R with f(x) = 0 for all x ∈ F0 and f(1)(x) = 1 forall x ∈ F1. a

Urysohn’s Lemma is used to prove the Tietze Extension Theorem, which we will only state,but not prove.

Theorem 3.3.23 Let X be a T4-space, and f : A→ R be a function which is continuous ona closed subset A of X. Then f can be extended to a continuous function f∗ on all of X. a

We obtain as an immediate consequence of Urysohn’s Lemma

Corollary 3.3.24 A normal space is completely regular.

We have obtained a hierarchy of spaces through gradually tightening the separation properties,and found that continuous functions help with the separation. The question arises, howcompactness fits into this hierarchy. It turns out that a compact Hausdorff space is normal;the converse obviously does not hold: the reals with the Euclidean topology are normal, butby no means compact.

We call a subset K in a topological space X compact iff it is compact as a subspace, i.e., acompact topological space in its own right. This is a first and fairly straightforward observa-tion.

Lemma 3.3.25 A closed subset F of a compact space X is compact.

Proof Let (Gi ∩ F )i∈I be an open cover of F with Gi ⊆ X open, then F ∪ Gi | i ∈ I isan open cover of X, so we can find a finite subset J ⊆ I such that F ∪ Gj | i ∈ J coversX, hence Gi ∩ F | i ∈ J covers F . a

In a Hausdorff space, the converse holds as well:

Lemma 3.3.26 Let X be a Hausdorff space, and K ⊆ X compact, then

1. Given x 6∈ K, there exist disjoint open neighborhoods U of x and V of K.

2. K is closed.

Page 246: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

234 CHAPTER 3. TOPOLOGICAL SPACES

Proof Given x 6∈ K, we want to find U ∈ U(x) with U ∩ K = ∅ and V ⊇ K open withU ∩ V = ∅.

Let’s see, how to do that. There exists for x and any element y ∈ K disjoint open neighbor-hoods Uy ∈ U(x) and Wy ∈ U(y), because X is Hausdorff. Then (Wy)y∈Y is an open cover ofK, hence by compactness there exists a finite subset W0 ⊆W such that Wy | y ∈W0 coversK. But then

⋂y∈W0

Uy is an open neighborhood of x which is disjoint from V :=⋃y∈W0

Wy,hence from K. V is the open neighborhood of K we are looking for. This establishes the firstpart, the second follows as an immediate consequence. a

Look at the reals as an illustrative example.

Corollary 3.3.27 A ⊆ R is compact iff it is closed and bounded.

Proof If A ⊆ R is compact, then it is closed by Lemma 3.3.26, since R is a Hausdorffspace. Since A is compact, it is also bounded. If, conversely, A ⊆ R is closed and bounded,then we can find a closed interval [a, b] such that A ⊆ [a, b]. We know from the Heine-BorelTheorem 1.5.46 that this interval is compact, and a closed subset of a compact space iscompact by Lemma 3.3.25. a

This has yet another, frequently used consequence, viz., that a continuous real valued functionon a compact space assume its minimal and its maximal value. Just for the record:

Corollary 3.3.28 Let X be a compact Hausdorff space, f : X → R a continuous map. Thenthere exist x∗, x

∗ ∈ X with f(x∗) = min f[X]

and f(x∗) = max f[X]. a

But — after travelling an interesting side path — let us return to the problem of establishingthat a compact Hausdorff space is normal. We know now that we can separate a pointfrom a compact subset through disjoint open neighborhoods. This is but a small step fromestablishing the solution to the above problem.

Proposition 3.3.29 A compact Hausdorff space is normal.

Proof Let X be compact, A and B disjoint closed subsets. Since X is Hausdorff, A and Bare compact as well. Now the rest is an easy application of Lemma 3.3.26. Given x ∈ B,there exist disjoint open neighborhoods Ux ∈ U(x) of x and Vx of A. Let B0 be a finite subsetof B such that U :=

⋃U(x) | b ∈ B0 covers B and V :=

⋂Vx | x ∈ B0 is an open

neighborhood of A. U and V are disjoint. a

From the point of view of separation, to be compact is a stronger property than being normalfor a topological space. The example R shows that this is a strictly stronger property. We willshow now that R is just one point apart from being compact by investigating locally compactspaces.

3.4 Local Compactness and Compactification

We restrict ourselves in this section to Hausdorff spaces. Sometimes a space is not compact buthas enough compact subsets, because each point has a compact neighborhood. These spacesare called locally compact, and we investigate properties they share with and propertiesthey distinguish them from compact spaces. We show also that a locally compact space

Page 247: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.4. LOCAL COMPACTNESS AND COMPACTIFICATION 235

misses being compact by just one point. Adding this point will make it compact, so wehave an example here where we embed a space into one with a desired property. Whilewe are compactifying spaces, we also provide another one, named after Stone and Cech,which requires the basic space to be completely regular. We establish also another classic,the Baire Theorem, which states that in a locally compact T3 space the intersection of acountable number of open dense sets is dense again; applications will later on capitalize onthis observation.

Definition 3.4.1 Let X be a Hausdorff space. X is called locally compact iff for each x ∈ Xand each open neighborhood U ∈ U(x) there exists a neighborhood V ∈ U(x) such that V a iscompact and V a ⊆ U .

Thus the compact neighborhoods form a basis for the neighborhood filter for each point.This implies that we can find for each compact subset an open neighborhood with compactclosure. The proof of this property gives an indication of how to argue in locally compactspaces.

Proposition 3.4.2 Let X be a locally compact space, K a compact subset. Then there existsan open neighborhood U of K and a compact set K ′ with K ⊆ U ⊆ K ′.

Proof Let x ∈ K, then we find an open neighborhood Ux ∈ U(x) with Uax compact. Then(Ux)x∈K is a cover for K, and there exists a finite subset K0 ⊆ K such that (Ux)x∈K0 coversK. Put U :=

⋃x∈K0

, and note that this open set has a compact closure. a

So this is not too bad: We have plenty of compact sets in a locally compact space. Such aspace is very nearly compact. We add to X just one point, traditionally called ∞ and definethe neighborhood for ∞ in such a way that the resulting space is compact. The obvious wayto do that is to make all complements of compact sets a neighborhood of ∞, because it willthen be fairly easy to construct from a cover of the new space a finite subcover. This is whatthe compactification which we discuss now will do for you. We carry out the construction ina sequence of lemmas, just in order to render the process a bit more transparent.

Lemma 3.4.3 Let X be a Hausdorff space with topology τ , ∞ 6∈ X be a distinguished newpoint. Put X∗ := X ∪ ∞, and define

One pointextension

τ∗ := U ⊆ X∗ | U ∩X ∈ τ ∪ U ⊆ X∗ | ∞ ∈ U,X \ U is compact.

Then τ∗ is a topology on X∗, and the identity iX : X → X∗ is τ -τ∗-continuous.

Proof ∅ and X∗ are obviously members of τ∗; note that X \U being compact entails U ∩Xbeing open. Let U1, U2 ∈ τ∗. If ∞ ∈ U1 ∩U2, then X \ (U1 ∩U2) is the union of two compactsets in X, hence is compact, If ∞ 6∈ U1 ∩ U2, X ∩ (U1 ∩ U2) is open in X. Thus τ∗ is closedunder finite intersections. Let (Ui)i∈I be a family of elements of τ∗. The critical case is that∞ ∈ U :=

⋃i∈I Ui, say, ∞ ∈ Uj . But then X \ U ⊆ X ⊆ Uj , which is compact, so that

U ∈ τ∗. Continuity of iX is now immediate. a

We find X in this new construction as a subspace.

Corollary 3.4.4 (X, τ) is a dense subspace of (X∗, τ∗).

Proof We have to show that τ = τ∗ ∩ X. But this is obvious from the definition of τ∗.a

Page 248: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

236 CHAPTER 3. TOPOLOGICAL SPACES

Now we can state and prove the result which has been announced above.

Theorem 3.4.5 Given a Hausdorff space X, the one point extension X∗ is a compact space,in which X is dense. If X is locally compact, X∗ is a Hausdorff space.

Proof It remains to show that X∗ is compact, and that it is a Hausdorff space, whenever Xis locally compact.

Let (Ui)i∈I be an open cover of X∗, then ∞ ∈ Uj for some j ∈ I, thus X \Uj is compact andis covered by (Ui)i∈I,i 6=j . Select an finite subset J ⊆ I such that (Ui)i∈J covers X \ Uj , then— voila — we have found a finite cover (Ui)i∈J∪j of X∗.

Since the given space is Hausdorff, we have to separate the new point ∞ from a given pointx ∈ X, provided X is locally compact. But take a compact neighborhood U of x, then X∗ \Uis an open neighborhood of ∞. a

X∗ is called the Alexandrov one point compactification of X. The new point is sometimescalled the infinite point. It is not difficult to show that two different one point compactifica-tions are homeomorphic, so we may talk about the (rather than a) one-point compactifica-tion.

Looking at the map iX : X → X∗, which permits looking at elements of X as elements ofX∗, we see that iX is injective and has the property that iX

[G]

is an open set in the imageiX[X]

of X in X∗, whenever G ⊆ X is open. These properties will be used for characterizingcompactifications. Let us first define embeddings, which are of interest independently of thisprocess.

Definition 3.4.6 The continuous map f : X → Y between the topological spaces X and Yis called an embedding iff

• f is injective,

• f[G]

is open in f[X], whenever G ⊆ X is open.

So if f : X → Y is an embedding, we may recover a true image of X from its image f[X], so

that f : X → f[X]

is a homeomorphism.

Consider again the map [0, 1]N → [0, 1]M which is induced by a map f : M → N , and whichwe delt with in Lemma 3.1.16. We will put this map to good use in a moment, so it is helpfulto analyze it a bit more closely.

Example 3.4.7 Let f : M → N be a surjective map. Then f∗ : [0, 1]N → [0, 1]M , whichsends g : N → [0, 1] to g f : M → [0, 1] is an embedding. We have to show that f∗ isinjective, and that it maps open sets into open sets in the image. This is done in two steps:

f∗ is injective: In fact, if g1 6= g2, we find n ∈ N with g1(n) 6= g2(n), and because f is onto,we find m with n = f(m), hence f∗(g1)(m) = g1(f(m)) 6= g2(f(m)) = f∗(g2)(m). Thusf∗(g1) 6= f(g2) (an alternative proof is proposed in Lemma 2.1.29 in a more generalscenario).

Open sets are mapped to open sets: We know already from Lemma 3.1.16 that f∗ iscontinuous, so we have to show that the image f

[G]

of an open set G ⊆ [0, 1]N is openin the subspace f

[[0, 1]M

]. Let h ∈ f

[G], hence h = f∗(g) for some g ∈ G. G is

open, thus we can find a base element H of the product topology with g ∈ H ⊆ G,

Page 249: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.4. LOCAL COMPACTNESS AND COMPACTIFICATION 237

say, H =⋂ki=1 π

−1N,ni

[Hi

]for some n1, . . . , nk ∈ N and some open subsets H1, . . . ,Hk in

[0, 1]. Since f is onto, n1 = f(m1), . . . , nk = f(mk) for some m1, . . . ,mk ∈ M . Sinceh ∈ π−1

N,ni

[Hi

]iff f∗(h) ∈ π−1

M,mi

[Hi

], we obtain

h = f∗(g) ∈ f∗[ k⋂i=1

π−1N,ni

[Hi

]]=( k⋂i=1

π−1M,mi

[Hi

])∩ f∗

[[0, 1]N

]The latter set is open in the image of [0, 1]N under f∗, so we have shown that the imageof an open set is open relative to the subset topology of the image.

These proofs will serve as patterns later on. ,

Given an embedding, we define the compactification of a space.

Definition 3.4.8 A pair (e, Y ) is said to be a compactification of a topological space X iffY is a compact topological space, and if e : X → Y is an embedding.

The pair (iX , X∗) constructed as the Alexandrov one-point compactification is a compactifi-

cation in the sense of Definition 3.4.8, provided the space X is locally compact. We are aboutto construct another important compactification for a completely regular space X. Definefor X the space βX as follows4: Let F (X) be the set of all continuous maps X → [0, 1], andmap x to its evaluations from F (X), so construct eX : X 3 x 7→ (f(x))f∈F (X) ∈ [0, 1]F (X).

Then βX := (eX[X])a, the closure being taken in the compact space [0, 1]F (X). We claim

that (eX , βX) is a compactification of X.

Before delving into the proof, we note that we want to have a completely regular space, sincein these spaces we have enough continuous functions, e.g., to separate points, as will becomeclear shortly. We will first show that this is a compactification indeed, and then investigatean interesting property of it.

Proposition 3.4.9 (eX , βX) is a compactification of the completely regular space X.

Proof 1. We take the closure in the Hausdorff space [0, 1]F (X), which is compact by Tihonov’sTheorem 3.2.12. Hence βX is a compact Hausdorff space by Lemma 3.3.25.

2. eX is continuous, because we have πf eX = f for f ∈ F (X), and each f is continuous.eX is also injective, because we can find for x 6= x′ a map f ∈ F (X) such that f(x) 6= f(x′);this translates into eX(x)(f) 6= eX(x′)(f), hence eX(x) 6= eX(x′).

3. The image of an open set in X is open in the image. In fact, let G ⊆ X be open, andtake x ∈ G. Since X is completely regular, we find f ∈ F (X) and an open set U ⊆ [0, 1]with x ∈ f−1

[U]⊆ G; this is so because the inverse images of the open sets in [0, 1] under

continuous functions form a basis for the topology (Proposition 3.3.16). But x ∈ f−1[U]⊆ G

is equivalent to x ∈ (πf eX)−1[U]⊆ G. Because eX : X → eX

[X]

is a bijection, this impliesx ∈ π−1

f

[U]⊆ eX

[G]∩ eX

[X]⊆ eX

[G]∩ (eX

[X])a. Hence eX

[G]

is open in βX. a

If the space we started from is already compact, then we obtain nothing new:

Corollary 3.4.10 If X is a compact Hausdorff space, eX : X → βX is a homeomorphism.

4It is a bit unfortunate that there appears to be an ambiguity in notation, since we denote the basis of atopological space by β as well. But tradition demands this compactification to be called βX, and from thecontext it should be clear what we have in mind.

Page 250: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

238 CHAPTER 3. TOPOLOGICAL SPACES

Proof A compact Hausdorff space is normal, hence completely regular by Proposition 3.3.29and Corollary 3.3.24, so we can construct the space βX for X compact. The assertion thenfollows from Exercise 3.10. a

This kind of compactification is important, so it deserves a distinguishing name.

Definition 3.4.11 The compactification (eX , βX) is called the Stone-Cech compactificationof the regular space X.

This compactification permits the extension of continuous maps in the following sense: sup-pose that f : X → Y is continuous with Y compact, then there exists a continuous extensionβX → Y . This statement is slightly imprecise, because f is not defined on βX, so we wantreally to extend f e−1

X : eX[X]→ Y — since eX is a homeomorphism from X onto its

image, one tends to identify both spaces.

Theorem 3.4.12 Let (eX , βX) be the Stoch-Cech compactification of the completely regularspace X. Then, given a continuous map f : X → Y with Y compact, there exists a continuousextension f! : βX → Y to f e−1

X .

The idea of the proof is to capitalize on the compactness of the target space Y , because Y andβY are homeomorphic. This means that Y has a topologically identical copy in [0, 1]F (Y ),which may be used in a suitable fashion. The proof is adapted from [Kel55, p. 153]; Kelleycalls it a “mildly intricate calculation”.

Proof 1. Define ϕf : F (Y ) → F (X) through h 7→ f h, then this map induces a mapϕ∗f : [0, 1]F (X) → [0, 1]F (Y ) by sending t : F (X) → [0, 1] to t ϕf . Then ϕ∗f is continuousaccording to Lemma 3.1.16.

2. Consider this diagram

eX[X] ⊆ // [0, 1]F (X)

ϕ∗f // [0, 1]F (Y ) βY⊇oo

X

eX

OO

f// Y

eY

OO

We claim that ϕ∗f eX = eY f . In fact, take x ∈ X and h ∈ F (Y ), then

(ϕ∗f eX)(x)(h) = (eX ϕf )(h) = eX(x)(h f) = (h f)(x) = eY (f(x))(h) = (eY f)(x)(h).

3. Because Y is compact, eY is a homeomorphism by Exercise 3.10, and since ϕ∗f is continuous,we have

ϕ∗f[βX]

= ϕ∗f[eX[X]a] ⊆ (ϕ∗f [eX[X]])a ⊆ βY.

Thus e−1X ϕ∗f is a continuous extension to f eX . a

It is immediate from Theorem 3.4.12 that a Stone-Cech compactification is uniquely deter-mined, up to homeomorphism. This justifies the probably a bit prematurely used character-ization as the Stone-Cech compactification above.

Baire’s Theorem, which we will establish now, states a property of locally compact spaceswhich has a surprising range of applications — it states that the intersection of dense open

Page 251: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.4. LOCAL COMPACTNESS AND COMPACTIFICATION 239

sets in a locally compact T3-space is dense again. This applies of course to compact Hausdorfspaces as well. The theorem has a counterpart for complete pseudometric spaces, as wewill see below. For stating and proving the theorem we lift the assumption of working in aHausdorff space, because it is really not necessary here.

Theorem 3.4.13 Let X be a locally compact T3-space. Then the intersection of dense opensets is dense.

Proof 0. The idea of the proof is to construct for ∅ 6= G open a decreasing sequence (Vn)n∈Nod sets with V a

n+1 ⊆ Dn ∩ Vn, where (Dn)n∈N be a sequence of dense open sets, where V1 isIdea of the

proofchosen so that V a

1 ⊆ D1 ∩G. Then we will conclude from the finite intersection property forcompact sets that G contains a point in the intersection

⋂n∈NDn.

1. Fix a non-empty open set G, then we have to show that G ∩⋂n∈NDn 6= ∅. Now D1 is

dense and open, hence we find an open set V1 such that V a1 is compact and V a

1 ⊆ D1 ∩ Gby Proposition 3.3.14, since X is a T3-space. We select inductively in this way a sequence ofopen sets (Vn)n∈N with compact closure such that V a

n+1 ⊆ Dn ∩ Vn. This is possible since Dn

is open and dense for each n ∈ N.

2. Hence we have a decreasing sequence V a2 ⊇ . . . V a

n ⊇ . . . of closed sets in the compactset V a

1 , thus⋂n∈N V

an =

⋂n∈N Vn is not empty, which entails G ∩

⋂n∈NDn not being empty.

a

Just for the record:

Corollary 3.4.14 The intersection of a sequence of dense open sets in a compact Hausdorffspace is dense.

Proof A compact Hausdorff space is normal by Proposition 3.3.29, hence regular by Propo-sition 3.3.15, thus the assertion follows from Theorem 3.4.13. a

We give an example from Boolean algebras.

Example 3.4.15 Let B be a Boolean algebra with ℘B as the set of all prime ideals. LetXa := I ∈ ℘B | a 6∈ I be all prime ideals which do not contain a given element a ∈ B, thenXa | a ∈ B is the basis for a compact Hausdorff topology on ℘B, and a 7→ Xa is a Booleanalgebra isomorphism, see the proof of Theorem 1.5.45.

Assume that we have a countable family S of elements of B with a = sup S ∈ B, then wesay that the prime ideal I preserves the supremum of S iff [a]∼I = sups∈S [s]∼I holds. Here∼I is the equivalence relation induced by I, i.e., b ∼I b′ ⇔ b b′ ∈ I with as the symmetricdifference in B (Lemma 1.5.40).

We claim that the set R of all prime ideals, which do not preserve the supremum of this family,is closed and has an empty interior. Well, R = Xa \

⋃k∈K Xak . Because the sets Xa and Xak

are clopen, R is closed. Assume that the interior of R is not empty, then we find b ∈ B withXb ⊆ R, so that Xak ⊆ Xa \ Xb = Xa∧−b for all k ∈ K. Since a 7→ Xa is an isomorphism,this means ak ≤ a ∧ −b, hence supk∈K ak ≤ a ∧ −b for all k ∈ K, thus a = a ∧ −b, hencea ≤ −b. But then Xb ⊆ Xa ⊆ X−b, which is certainly a contradiction. Consequently, the setof all prime ideal preserving this particular supremum is open and dense in ℘B.

If we are given for each n ∈ N a family Sn ⊆ B and a0 ∈ B such that

Page 252: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

240 CHAPTER 3. TOPOLOGICAL SPACES

• a0 6= >, the maximal element of B,

• an := sups∈Sn s is an element of B for each n ∈ N,

then we claim that there exists a prime ideal I which contains a0 and which preserves all thesuprema of Sn for n ∈ N.

Let P be the set of all prime ideals which preserve all the suprema of the families above, then

P =⋂n∈N

Pn,

where Pn is the set of all prime ideals which preserve the supremum an, which is dense andopen by the discussion above. Hence P is dense by Baire’s Theorem (Corollary 3.4.14). SinceX−a0 = ℘B \Xa0 is open and not empty, we infer that P ∩X−a0 is not empty, because P isdense. Thus we can select an arbitrary prime ideal from this set. ,

This example, which is taken from [RS50, Sect. 5], will help in establishing Godel’s Com-pleteness Theorem, see Section 3.6.1. The approach is typical for an application of Baire’sTheorem — it is used to show that a set P , which is obtained from an intersection of count-ably many open and dense sets in a compact space, is dense, and that the object of one’sdesire is a member of P intersecting an open set, hence this object must exist.

Having been carried away by Baire’s Theorem, let us return to the main stream of thediscussion and make some general remarks. We see that local compactness is a somewhatweaker property than compactness. Other notions of compactness have been studied; anincomplete list for Hausdorff space X includes

countably compact: X is called countably compact iff each countable open cover containsa finite subcover.

Lindelof space: X is a Lindelof space iff each open cover contains a countable subcover.

paracompactness: X is said to be paracompact iff each open cover has a locally finiterefinement. This explains it:

• An open cover B is a refinement of an open cover A iff each member of B is thesubset of a member of A.

• An open cover A is called locally finite iff each point has a neighborhood whichintersects a finite number of elements of A.

sequentially compact: X is called sequentially compact iff each sequence has a convergentsubsequence (we will deal with this when discussing compact pseudometric spaces, seeProposition 3.5.31).

The reader is referred to [Eng89, Chapter 3] for a penetrating study.

3.5 Pseudometric and Metric Spaces

We turn to a class of spaces now in which we can determine the distance between any twopoints numerically. This gives rise to a topology, declaring a set as open iff we can construct

Page 253: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 241

for each of its points an open ball which is entirely contained in this set. It is clear thatthis defines a topology, and it is also clear that having such a metric gives the space somespecial properties, which are not shared by general topological spaces. It also adds a sense ofvisual clearness, since an open ball is conceptually easier to visualize that an abstract openset. We will study the topological properties of these spaces starting with pseudometrics,with which we may measure the distance between two objects, but if the distance is zero, wecannot necessarily conclude that the objects are identical. This is a situation which occursquite frequently when modelling an application, so it is sometimes more adequate to dealwith pseudometric rather than metric spaces.

Definition 3.5.1 A map d : X×X → R+ is called a pseudometric on X iff these conditionshold

identity: d(x, x) = 0 for all x ∈ X.

symmetry: d(x, y) = d(y, x) for all x, y ∈ X,

triangle inequality: d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ X.

Then (X, d) is called a pseudometric space. If, in addition, we have

d(x, y) = 0⇔ x = y,

then d is called a metric on X; accordingly, (X, d) is called a metric space.

The non-negative real number d(x, y) is called the distance of the elements x and y in apseudometric space (X, d). It is clear that one wants point to have have distance 0 to itself,and that the distance between two points is determined in a symmetric fashion. The triangleinequality is intuitively clear as well:

x

&&

// y

z

88

Before proceeding, let us have a look at some examples. Some of them will be discussed lateron in greater detail.

Example 3.5.2 1. Define for x, y ∈ R the distance as |x− y|, hence as the absolute valueof their difference. Then this defines a metric. Define, similarly,

d(x, y) :=|x− y|

1 + |x− y|,

then d defines also a metric on R (the triangle inequality follows from the observationthat a ≤ b⇔ a/(1 + a) ≤ b/(1 + b) holds for non-negative numbers a and b).

Page 254: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

242 CHAPTER 3. TOPOLOGICAL SPACES

2. Given x, y ∈ Rn for n ∈ N, then

d1(x, y) := max1≤i≤n

|xi − yi|,

d2(x, y) :=n∑i=1

|xi − yi|,

d3(x, y) :=

√√√√ n∑i=1

(xi − yi)2

define all metrics an Rn. Metric d1 measures the maximal distance between the compo-nents, d2 gives the sum of the distances, and d3 yields the Euclidean, i.e., the geometric,distance of the given points. The crucial property to be established is in each case thetriangle inequality. It follows for d1 and d2 from the triangle inequality for the absolutevalue, and for d3 by direct computation.

3. Given a set X, define

d(x, y) :=

0, if x = y

1, otherwise

Then (X, d) is a metric space, d is called the discrete metric. Different points areassigned the distance 1, while each point has distance 0 to itself.

4. Let X be a set, D(X) be the set of all bounded maps X → R. Define

d(f, g) := supx∈X|f(x)− g(x)|.

Then (D(X), d) is a metric space; the distance between functions f and g is just theirmaximal difference.

5. Similarly, given a set X, take a set E ⊆ D(X) of bounded real valued functions as a setof evaluations and determine the distance of two points in terms of their evaluations:

e(x, y) := supf∈F|f(x)− f(y)|.

So two points are similar if their evaluations on terms of all elements of F are close.This is a pseudometric on X, which is not a metric if F does not separate points.

6. Denote by C([0, 1)] the set of all continuous real valued functions [0, 1]→ R, and measurethe distance between f, g ∈ C([0, 1)] through

d(f, g) := sup0≤x≤1

|f(x)− g(x)|.

Because a continuous function on a compact space is bounded, d(f, g) is always finite,and since for each x ∈ [0, 1] the inequality |f(x)− g(x)| ≤ |f(x)− h(x)|+ |h(x)− g(x)|holds, the triangle inequality is satisfied. Then (C([0, 1)], d) is a metric space, becauseC([0, 1)] separates points.

Page 255: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 243

7. Define for the Borel sets B([0, 1]) on the unit interval this distance:

d(A,B) := λ(A∆B)

with λ as Lebesgue measure. Then λ(A∆B) = λ((A∆C)∆(C∆B) ≤ λ(A∆C) +λ(C∆B) implies the triangle inequality, so that (B([0, 1]), d) is a pseudometric space.It is no metric space, however, because λ(Q ∩ [0, 1]) = 0, hence d(∅,Q ∩ [0, 1]) = 0, butthe latter set is not empty.

8. Given a non-empty set X and a ranking function r : X → N, define the closeness c(A,B)of two subset A,B of X as

c(A,B) :=

+∞, if A = B,

inf r(w) | w ∈ A∆B, otherwise

If w ∈ A∆B, then w can be interpreted as a witness that A and B are different, and thecloseness of A and B is just the minimal rank of a witness. We observe these properties:

• c(A,A) = +∞, and c(A,B) = 0 iff A = B (because A = B iff A∆B = ∅).

• c(A,B) = c(B,A),

• c(A,C) ≥ min c(A,B), c(B,C). If A = C, this is obvious; assume otherwise thatb ∈ A∆C is a witness of minimal rank. Since A∆C = (A∆B)∆(B∆C), b must beeither in A∆B or B∆C, so that r(b) ≥ c(A,C) or r(b) ≥ c(B,C).

Now put d(A,B) := 2−c(A,B) (with 2−∞ := 0). Then d is a metric on P (X). Thismetric satisfies even d(A,B) ≤ max d(A,C), d(B,C) for an arbitrary C, hence d isan ultrametric, see Exercise 3.18.

9. A similar construction is possible with a decreasing sequence of equivalence relation ona set X. In fact, let (ρn)n∈N be such a sequence, and put ρ0 := X ×X. Define

c(x, y) :=

+∞, if 〈x, y〉 ∈

⋂n∈N ρn

max n ∈ N | 〈x, y〉 ∈ ρn, otherwise

Then it is immediate that c(x, y) ≥ min c(x, z), c(z, y). Intuitively, c(x, y) gives thedegree of similarity of x and y — the larger this value, the more similar x and y are.Then

d(x, y) :=

0, if c(x, y) =∞2−c(x,y), otherwise

defines a pseudometric, which is a metric iff⋂n∈N ρn = 〈x, x〉 | x ∈ X.

,

Given a pseudometric space (X, d), define for x ∈ X and r > 0 the open ball B(x, r) with B(x, r)center x and radius r as

B(x, r) := y ∈ X | d(x, y) < r.

The closed ball S(x, r) is defined similarly as

S(x, r) := y ∈ X | d(x, y) ≤ r.

Page 256: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

244 CHAPTER 3. TOPOLOGICAL SPACES

If necessary, we indicate the pseudometric explicitly with B and S. Note that B(x, r) is open,and S(x, r) is closed, but that the closure B(x, r)a of B(x, r) may be properly containedin the closed ball S(x, r) (let d be the discrete metric, then B(x, 1) = x = B(x, 1)a, butS(x, 1) = X, so both closed set do not coincide if X has more than one point).

Call G ⊆ X open iff we can find for each x ∈ G some r > 0 such that B(x, r) ⊆ G. Then thisdefines the pseudometric topology on X. It has the set β := B(x, r) | x ∈ X, r > 0 of open

Pseudo-metrictopology

balls as a basis. Let us have a look at the properties a base is supposed to have. Assume thatx ∈ B(x1, r1) ∩ B(x2, r2), and select r with 0 < r < minr1 − d(x, x1), r2 − d(x, x2). ThenB(x, r) ⊆ B(x1, r1) ∩B(x2, r2), because we have for z ∈ B(x, r)

d(z, x1) ≤ d(z, x) + (x, x1) < r + d(x, x1) ≤ (r1 − d(x, x1)) + d(x, x1) = r1, (3.1)

by the triangle inequality; similarly, d(x, x2) < r2. Thus it follows from Proposition 3.1.1 thatβ is in fact a base.

Call two pseudometrics on X equivalent iff they generate the same topology. An equivalentformulation goes like this. Let τi be the topologies generated from pseudometrics di fori = 1, 2, then d1 and d2 are equivalent iff the identity (X, τ1)→ (X, τ2) is a homeomorphism.These are two common methods to construct equivalent pseudometrics.

Lemma 3.5.3 Let (X, d) be a pseudometric space. Then

d1(x, y) := maxd(x, y), 1,

d2(x, y) :=d(x, y)

1 + d(x, y)

both define pseudometrics which are equivalent to d.

Proof It is clear that both d1 and d2 are pseudometrics (for d2, compare Example 3.5.2).Let τ, τ1, τ2 be the respective topologies, then it is immediate that (X, τ) and (X, τ1) arehomeomorphic. Since d2(x, y) < r iff d(x, y) < r/(1− r), provided 0 < r < 1, we obtain alsothat (X, τ) and (X, τ2) are homeomorphic. a

These pseudometrics have the advantage that they are bounded, which is sometimes quitepractical for establishing topological properties. Just as a point in case:

Proposition 3.5.4 Let (Xn, dn) be a pseudometric space with associated topology τn. Thenthe topological product

∏n∈N(Xn, τn) is a pseudometric space again.

Proof 1. We may assume that each dn is bounded by 1, otherwise we select an equivalentpseudometric with this property (Lemma 3.5.3). Put

d((xn)n∈N, (yn)n∈N

):=∑n∈N

2−n · dn(xn, yn).

We claim that the product topology is the topology induced by the pseudometric d (it isobvious that d is one).

2. Let Gi ⊆ Xi open for 1 ≤ i ≤ k, and assume that x ∈ G := G1 × . . .×Gk ×∏n>kXn. We

can find for xi ∈ Gi some positive ri with Bdi(xi, ri) ⊆ Gi. Put r := minr1, . . . , rk, then

Page 257: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 245

certainly Bd(x, r) ⊆ G. This implies that each element of the base for the product topologyis open with respect to d.

3. Given the sequence x and r > 0, take y ∈ Bd(x, r). Put t := r− d(x, y) > 0. Select m ∈ Nwith

∑n>m 2−n < t/2, and let Gn := Bdn(yn, t/2) for n ≤ m. If z ∈ U := G1 × . . . × Gn ×∏

k>mXk, then

d(x, z) ≤ d(x, y) + d(y, z)

≤ r − t+m∑n=1

2−ndn(yn, zn) +∑n>m

2−n

< r − t+ t/2 + t/2

= r,

so that U ⊆ Bd(x, r). Thus each open ball is open in the product topology. a

One sees immediately that the pseudometric d constructed above is a metric, provided eachdn is one. Thus

Corollary 3.5.5 The countable product of metric spaces is a metric space in the producttopology. a

One expects that each pseudometric space can be made a metric space by identifying thoseelements which cannot be separated by the pseudometric. Let’s try:

Proposition 3.5.6 Let (X, d) be a pseudometric space, and define x ∼ y iff d(x, y) = 0 forx, y ∈ X. Then the factor space X/∼ is a metric space with metric D([x]∼ , [y]∼) := d(x, y).

Proof 1. It is clear that ∼ is an equivalence relation, since x ∼ y and y ∼ z impliesd(x, z) ≤ d(x, y) + d(y, z) = 0, hence x ∼ z follows.

Because d(x, x′) = 0 and d(y, y′) = 0 implies d(x, y) = d(x′, y′), D is well-defined, and it isclear that it has all the properties of a pseudometric. D is also a metric, sinceD([x]∼ , [y]∼) = 0is equivalent to d(x, y) = 0, hence to x ∼ y, thus to [x]∼ = [y]∼.

2. The metric topology is the final topology with respect to the factor map η∼. To establishthis, take a map f : X/∼ → Z with a topological space Y . Assume that (f η∼)−1

[G]

is open for G ⊆ Y open. If [x]∼ ∈ f−1[G], we have x ∈ (f η∼)−1

[G], thus there exists

r > 0 with Bd(x, r) ⊆ η−1∼[f−1

[G]]

. But this means that BD([x]∼ , r) ⊆ f−1[U], so that the

latter set is open. Thus if f η∼ is continuous, f is. The converse is established in the sameway. This implies that the metric topology is final with respect to the factor map η∼, cp.Proposition 3.1.15. a

We want to show that a pseudometric space satisfies the T4-axiom (hence that a metric spaceis normal). So we take two disjoint closed sets and need to produce two disjoint open sets,each of which containing one of the closed sets. The following construction is helpful.

Lemma 3.5.7 Let (X, d) be a pseudometric space. Define the distance of point x ∈ X to d(x,A)∅ 6= A ⊆ X through

d(x,A) := infy∈A

d(x, y).

Then d(·, A) is continuous.

Page 258: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

246 CHAPTER 3. TOPOLOGICAL SPACES

Proof Let x, z ∈ X, and y ∈ A, then d(x, y) ≤ d(x, z)+d(z, y). Now take lower bounds, thend(x,A) ≤ d(x, z) + d(z,A). This yields d(x,A) − d(z,A) ≤ d(x, z). Interchanging the rolesof x and z yields d(z,A) − d(x,A) ≤ d(z, x), thus |d(x,A) − d(z,A)| ≤ d(x, z). This impliescontinuity of d(·, A). a

Given a closed set A ⊆ X, we find that A = x ∈ X | d(x,A) = 0; we can say a bitmore:

Corollary 3.5.8 Let X,A be as above, then Aa = x ∈ X | d(x,A) = 0.

Proof Since x ∈ X | d(x,A) = 0 is closed, we infer that Aa is contained in this set. If, inthe other hand, x 6∈ Aa, we find r > 0 such that B(x, r) ∩ A = ∅, hence d(x,A) ≥ r. Thusthe other inclusion holds as well. a

Armed with this observation, we can establish now

Proposition 3.5.9 A pseudometric space (X, d) is a T4-space.

Proof Let F1 and F2 be disjoint closed subsets of X. Define

f(x) :=d(x, F1)

d(x, F1) + d(x, F2),

then Lemma 3.5.7 shows that f is continuous, and Corollary 3.5.8 indicates that the denomi-nator will not vanish, since F1 and F2 are disjoint. It is immediate that F1 is contained in theopen set x | f(x) < 1/2, that F2 ⊆ x | f(x) > 1/2, and that these open sets are disjoint.a

Note that a pseudometric T1-space is already a metric space (Exercise 3.15).

Define for r > 0 the r-neighborhood Ar of set A ⊆ X asAr

Ar := x ∈ X | d(x,A) < r.

This makes of course only sense if d(x,A) is finite. Using the triangle inequality, one calculates(Ar)s ⊆ Ar+s. This observation will be helpful when we look at the next example.

Example 3.5.10 Let (X, d) be a pseudometric space, and let

C(X) := C ⊆ X | C is compact and not empty

be the set of all compact and not empty subsets of X. Define

δH(C,D) := max maxx∈C

d(x,D),maxx∈D

d(x,C)

for C,D ∈ C(X). We claim that δH is a pseudometric on C(X), which is a metric if d is aδHmetric on X.

One notes first thatδH(C,D) = inf r > 0 | C ⊆ Dr, D ⊆ Cr.

This follows easily from C ⊆ Dr iff maxx∈C d(x,D) < r. Hence we obtain that δH(C,D) ≤ rand δH(D,E) ≤ s together imply δH(C,E) ≤ r + s, which implies the triangle inequality.The other laws for a pseudometric are obvious. δH is called the Hausdorff pseudometric.

Page 259: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 247

Now assume that d is a metric, and assume δH(C,D) = 0. Thus C ⊆⋂n∈ND

1/n and

D ⊆⋂n∈NC

1/n. Because C and D are closed, and d is a metric, we obtain C = D from

Corollary 3.5.8, thus δH is a metric, which is accordingly called the Hausdorff metric. ,

Let us take a magnifying glass and have a look at what happens locally in a point of apseudometric space. Given U ∈ U(x), we find an open ball B(x, r) which is contained in U ,hence we find even a rational number q with B(x, q) ⊆ B(x, r). But this means that the openballs with rational radii form a basis for the neighborhood filter of x. This is sometimes alsothe case in more general topological spaces, so we define this property and two of its cousinsfor topological, rather than pseudometric spaces.

Definition 3.5.11 A topological space

1. satisfies the first axiom of countability (and the space is called in this case first count-able) iff the neighborhood filter of each point has a countable base of open sets,

2. satisfies the second axiom of countability (the space is called in this case second count-able) iff the topology has a countable base,

3. is separable iff it has a countable dense subset.

The standard example for a separable topological space is of course R, where the rationalnumbers Q form a countable dense subset.

This is a trivial consequence of the observation just made.

Proposition 3.5.12 A pseudometric space is first countable. a

In a pseudometric space separability and satisfying the second axiom of countability coincide,as the following observation shows.

Proposition 3.5.13 A pseudometric space (X, d) is second countable iff it has a countabledense subset.

Proof 1. Let D be a countable dense subset, then

β := B(x, r) | x ∈ D, 0 < r ∈ Q

is a countable base for the topology. For, given U ⊆ X open, there exists d ∈ D with d ∈ U ,hence we can find a rational r > 0 with B(d, r) ⊆ U . On the other hand, one shows exactlyas in the argumentation leading to Eq. (3.1) on page 244 that β is a base.

2. Assume that β is a countable base for the topology, pick from each B ∈ β an element xB.Then xB | B ∈ β is dense: given an open U , we find B ∈ β with B ⊆ U , hence xB ∈ U .This argument does not require X being a pseudometric space (but the Axiom of Choice).a

We know from Exercise 3.8 that a point x in a topological space is in the closure of a set Aiff there exists a filter F with iA(F)→ x with iA as the injection A→ X. In a first countablespace, in particular in a pseudometric space, we can work with sequences rather than filters,which is sometimes more convenient.

Proposition 3.5.14 Let X be a first countable topological space, A ⊆ X. Then x ∈ Aa iffthere exists a sequence (xn)n∈N in A with xn → x.

Page 260: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

248 CHAPTER 3. TOPOLOGICAL SPACES

Proof If there exists a sequence (xn)n∈N which converges to x such that xn ∈ A for alln ∈ N, then the corresponding filter converges to x, so it remains to establish the conversestatement.

Now let (Un)n∈N be the basis of the neighborhood filter of x ∈ Aa, and F be a filter withiA(F) → x. Put Vn := U1 ∩ . . . ∩ Un, then Vn ∩ A ∈ iA(F). The sequence (Vn)n∈N decreases,and forms a basis for the neighborhood filter of x. Pick from each Vn an element xn ∈ A, andtake a neighborhood U ∈ U(x). Since there exists n with Vn ⊆ U , we infer that xm ∈ U forall m ≤ n, hence xn → x. a

A second countable normal space X permits the following remarkable construction. Let β bea countable base for X, and define A := 〈U, V 〉 | U, V ∈ β, Ua ⊆ V . Then A is countableas well, and we can find for each pair 〈U, V 〉 ∈ A a continuous map f : X → [0, 1] withf(x) = 0 for all x ∈ U and f(x) = 1 for all x ∈ X \ V . This is a consequence of Urysohn’sLemma (Theorem 3.3.18). The collection F of all these functions is countable, because A iscountable. Now define the embedding map

e :

X → [0, 1]F

x 7→ (f(x))f∈F

We endow the space [0, 1]F with the product topology, i.e., with the initial topology withrespect to all projections πf : x 7→ f(x). Then we observe these properties

1. The map e is continuous. This is so because πf e = f , and f is continuous, hence wemay infer continuity from Proposition 3.1.15.

2. The map e is injective. This follows from Urysohn’s Lemma (Theorem 3.3.18), sincetwo distinct points constitute two disjoint closed sets.

3. If G ⊆ X is open, e[G]

is open in e[X]. In fact, let e(x) ∈ e

[G]. We find an open

neighborhood H of e(x) in [0, 1]F such that e[X]∩ H ⊆ e

[G]

in the following way:from the construction we infer that we can find a map f ∈ F such that f(x) = 0 andf(y) = 1 for all y ∈ X \ G, hence f(x) 6∈ f

[X \G

]a; hence the set H := y ∈ [0, 1]F |

yf 6∈ f[X \G

] is open in [0, 1]F , and H ∩ e

[X]

is contained in e[G].

4. [0, 1]F is a metric space by Corollary 3.5.5, because the unit interval [0, 1] is a metricspace, and because F is countable.

Summarizing, X is homeomorphic to a subspace of [0, 1]F . This is what Urysohn’s MetrizationTheorem says.

Proposition 3.5.15 A second countable normal topological space is metrizable. a .

The problem of metrization of topological spaces is non-trivial, as one can see from Propo-sition 3.5.15. The reader who wants to learn more about it may wish to consult Kelley’stextbook [Kel55, p. 124 f] or Engelking’s treatise [Eng89, 4.5, 5.4].

3.5.1 Completeness

Fix in this section a pseudometric space (X, d). A Cauchy sequence (xn)n∈N is defined in Xjust as in R: Given ε > 0, there exists an index n ∈ N such that d(xm, xm′) < ε holds for all

Page 261: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 249

m,m′ ≥ n.

Thus we have a Cauchy sequence, when we know that eventually the members of the sequencewill be arbitrarily close; a converging sequence is evidently a Cauchy sequence. But a sequencewhich converges requires the knowledge of its limit; this is sometimes a disadvantage inapplications. It would be helpful if we could conclude from the fact that we have a Cauchysequence that we may safely assume that there exists a point to which it converges. Spaces forwhich this is always guaranteed are called complete; they will be introduced next, examplesshow that there are spaces which are not complete; note, however, that we can complete eachpseudometric space. This will be considered in some detail later on.

Definition 3.5.16 The pseudometric space is said to be complete iff each Cauchy sequencehas a limit.

It is well known that the rational numbers are not complete, which is usually shown byshowing that

√2 is not rational. Another instructive example proposed by Bourbaki [Bou89,

II.3.3] is the following.

Example 3.5.17 The rational numbers Q are not complete in the usual metric. Take

xn :=

n∑i=0

2−i·(i+1)/2.

Then (xn)n∈N is a Cauchy sequence in Q: if m > n, then |xm−xn| ≤ 2−(n+3)/2 (this is showneasily through the well known identity

∑pi=0 i = p · (p+ 1)/2). Now assume that the sequence

converges to a/b ∈ Q, then we can find an integer hn such that

∣∣ab− hn

2n·(n+1)/2

∣∣ ≤ 1

2n·(n+3)/2,

yielding

|a · 2n·(n+1)/2 − b · hn| ≤b

2n

for all n ∈ N. The left hand side of this inequality is a whole number, the right side is not,once n > n0 with n0 so large that b < 2n. This means that the left hand side must be zero,so that a/b = xn for n > n0. This is a contradiction. ,

We know that R is complete with the usual metric, the rationals are not. But there is a catch:if we change the metric, completeness may be lost.

Example 3.5.18 The half open interval ]0, 1] is not complete under the usual metric d(x, y) :=|x− y|. But take the metric

d′(x, y) :=∣∣1x− 1

y

∣∣Because a < x < b iff 1/b < 1/x < 1/a holds for 0 < a ≤ b ≤ 1, the metrics d and d′

are equivalent on ]0, 1]. Let (xn)n∈N be a d′-Cauchy sequence, then (1/xn)n∈N is a Cauchysequence in (R, | · |), hence it converges, so that (xn)n∈N is d′-convergent in ]0, 1].

The trick here is to make sure that a Cauchy sequence avoids the region around the criticalvalue 0. ,

Page 262: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

250 CHAPTER 3. TOPOLOGICAL SPACES

Thus we have to carefully stick to the given metric, changing the metric entails checkingcompleteness properties for the new metric.

Example 3.5.19 Endow the set C([0, 1)] of continuous functions on the unit interval withthe metric d(f, g) := sup0≤x≤1 |f(x) − g(x)|, see Example 3.5.2. We claim that this metricspace is complete. In fact, let (fn)n∈N be a d-Cauchy sequence in C([0, 1)]. Because we havefor each x ∈ [0, 1] the inequality |fn(x)− fm(x)| ≤ d(fn, fm), we conclude that

(fn(x)

)n∈N is

a Cauchy sequence for each x ∈ [0, 1], which converges to some f(x), since R is complete. Wehave to show that f is continuous, and that d(f, fn)→ 0.

Let ε > 0 be given, then there exists n ∈ N such that d(fm, fm′) < ε/2 for m,m′ ≥ n; hencewe have

|fm(x)− fm′(x′)| ≤ |fm(x)− fm(x′)|+ |fm(x′)− fm′(x′)| ≤ |fm(x)− fm(x′)|+ d(fm, fm′).

Choose δ > 0 so that |x− x′| < δ implies |fm(x)− fm(x′)| < ε/2, then |fm(x)− fm′(x′)| < εfor m,m′ ≥ n. But this means |x− x′| < δ implies |f(x)− f(x′)| ≤ ε. Hence f is continuous.Since

(x ∈ [0, 1] | |fn(x)− f(x)| ≤ ε

)n∈N constitutes an open cover of [0, 1], we find a finite

cover given by n1, . . . , nk; let n′ be the smallest of these numbers, then d(f, fn) ≤ ε for alln ≥ n′, hence d(f, fn)→ 0. ,

The next example is suggested by an observation in [MPS86].

Example 3.5.20 Let r : X → N be a ranking function, and denote the (ultra-) metric onP (X) constructed from it by d, see Example 3.5.2. Then (P (X) , d) is complete. In fact,let (An)n∈N be a Cauchy sequence, thus we find for each m ∈ N an index n ∈ N such thatc(Ak, A`) ≥ m, whenever k, ` ≥ n. We claim that the sequence converges to

A :=⋃n∈N

⋂k≥n

Ak,

which is the set of all elements in X which are contained in all but a finite number of sequenceelements. Given m, fix n as above; we show that c(A,Ak) > m, whenever k > n. Take anelement x ∈ A∆Ak of minimal rank.

• If x ∈ A, then there exists ` such that x ∈ At for all t ≥ `, so take t ≥ max `, n, thenx ∈ At∆Ak, hence c(A,Ak) = r(x) ≥ c(At, Ak) > m.

• If, however, x 6∈ A, we conclude that x 6∈ At for infinitely many t, so x 6∈ At forsome t > n. But since x ∈ A∆Ak, we conclude x ∈ Ak, hence x ∈ Ak∆At, thusc(A,Ak) = r(x) ≥ c(Ak, At) > m.

Hence An → A in (P (X) , d). ,

The following observation is trivial, but sometimes helpful.

Lemma 3.5.21 A closed subset of a complete pseudometric space is complete. a

If we encounter a pseudometric space which is not complete, we may complete it through thefollowing construction. Before discussing it, we need a simple auxiliar statement, which saysthat we can check completeness already on a dense subset.

Lemma 3.5.22 Let D ⊆ X be dense. Then the space is complete iff each Cauchy sequenceon D converges.

Page 263: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 251

Proof If each Cauchy sequence from X converges, so does each such sequence from D, so wehave to establish the converse. Let (xn)n∈N be a Cauchy sequence on X. Given n ∈ N, thereexists for xn an element yn ∈ D such that d(xn, yn) < 1/n. Because (xn)n∈N is a Cauchysequence, (yn)n∈N is one as well, which converges by assumption to some x ∈ X. The triangleinequality shows that (xn)n∈N converges to x as well. a

This helps in establishing that each pseudometric space can be embedded into a completepseudometric space. The approach may be described as Charlie Brown’s device — “If youcan’t beat them, join them”. So we take all Cauchy sequences as our space into which weembed X, and — intuitively — we flesh out from a Cauchy sequence of these sequences thediagonal sequence, which then will be a Cauchy sequence as well, and which will be a limitof the given one. This sounds more complicated than it is, however, because fortunatelyLemma 3.5.22 makes life easier, when it comes to establish completeness. Here we go.

Proposition 3.5.23 There exists a complete pseudometric space (X∗, d∗) into which (X, d)may be embedded isometrically as a dense subset.

Proof 0. This is the line of attack: We define X∗ and d∗, show that we can embed Xisometrically into it as a dense subset, and then we establish completeness with the help ofLemma 3.5.22.

Fairlydirect

approach

1. DefineX∗ := (xn)n∈N | (xn)n∈N is a d-Cauchy sequence in X,

and putd∗((xn)n∈N, (yn)n∈N

):= lim

n→∞d(xn, yn)

Before proceeding, we should make sure that the limit in question exists. In fact, given ε > 0,there exists n ∈ N such that d(xm′ , xm) < ε/2 and d(ym′ , ym) < ε/2 for m,m′ ≥ n, thus, ifm,m′ ≥ n, we obtain

d(xm, ym) ≤ d(xm, xm′) + d(xm′ , ym′) + d(ym′ , ym) < d(xm′ , ym′) + ε,

interchanging the roles of m and m′ yields

|d(xm, ym)− d(xm′ , ym′)| < ε

for m,m′ ≥ n. Hence (d(xn, yn))n∈N is a Cauchy sequence in R, which converges by com-pleteness of R.

2. Given x ∈ X, the sequence (x)n∈N is a Cauchy sequence, so it offers itself as the imageof x; let e : X → X∗ be the corresponding map, which is injective, and which preserves thepseudometric. Hence e is continuous. We show that e

[X]

is dense in X∗: take a Cauchysequence (xn)n∈N and ε > 0. Let n ∈ N be selected for ε, and assume m ≥ n. Then

D((xn)n∈N, e(xm)) = limn→∞

d(xn, xm) < ε.

3. The crucial point is completeness. An appeal to Lemma 3.5.22 shows that it is sufficientto show that a Cauchy sequence in e

[X]

converges in (X∗, d∗), because e[X]

is dense. Butthis is trivial. a

Page 264: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

252 CHAPTER 3. TOPOLOGICAL SPACES

Having the completion X∗ of a pseudometric space X at our disposal, we might be temptedto extend a continuous map X → Y to a continuous map X∗ → Y for example in the casethat Y is complete. This is usually not possible, for example, not every continuous functionQ→ R has a continuous extension. We will deal with this problem when discussing uniformcontinuity below, but we will state and prove here a condition which is sometime helpfulwhen one wants to extend a function not to the whole completion, but to a domain which issomewhat larger than the given one. Define as in Section 1.7.2 the diameter diam(A) of a setA asdiam(A)

diam(A) := sup d(x, y) | x, y ∈ A

(note that the diameter may be infinite). It is easy to see that diam(A) = diam(Aa) usingProposition 3.5.14. Now assume that f : A→ Y is given, then we measure the discontinuityof f at point x through the oscillation ∅f (x) of f at x ∈ Aa. This is defined as the smallestOscillationdiameter of the image of an open neighborhood of x, formally,∅f (x)

∅f (x) := infdiam(f[A ∩ V

]) | x ∈ V, V open.

If f is continuous on A, we have ∅f (x) = 0 for each element x of A. In fact, let ε > 0 begiven, then there exists δ > 0 such that diam(f

[A ∩ V

]) < ε, whenever V is a neighborhood

of x of diameter less than δ. Thus ∅f (x) < ε; since ε > 0 was chosen to be arbitrary, theclaim follows.

Lemma 3.5.24 Let Y be a complete metric space, X a pseudometric space, then a continuousmap f : A → Y can be extended to a continuous map f∗ : G → Y, where G := x ∈ Aa |∅f (x) = 0 has these properties:Extension

1. A ⊆ G ⊆ Aa,

2. G can be written as the intersection of countably many open sets.

The basic idea for the proof is rather straightforward. Take an element in the closure of A,then there exists a sequence in A converging to this point. If the oscillation at that point is

Idea for theproof

zero, the images of the sequence elements must form a Cauchy sequence, so we extend themap by forming the limit of this sequence. Now we have to show that this map is well definedand continuous.

Proof 1. We may and do assume that the complete metric d for Y is bounded by 1. DefineG as above, then A ⊆ G ⊆ Aa, and G can be written as the intersection of a sequence of opensets. In fact, represent G as

G =⋂n∈Nx ∈ Aa | ∅f (x) <

1

n,

so we have to show that x ∈ Aa | ∅f (x) < q is open in Aa for any q > 0. But we have

x ∈ Aa | ∅f (x) < q =⋃V ∩Aa | diam(f

[V ∩A

]) < q.

Page 265: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 253

This is the union of sets open in Aa, hence is an open set itself.

2. Now take an element x ∈ G ⊆ Aa. Then there exists a sequence (xn)n∈N of elementsxn ∈ A with xn → x. Given ε > 0, we find a neighborhood V of x with diam(f

[A ∩ V

]) < ε,

since the oscillation of f at x is 0. Because xn → x, we know that we can find an index nε ∈ Nsuch that xm ∈ V ∩A for all m > nε. This implies that the sequence (f(xn))n∈N is a Cauchysequence in Y . It converges because Y is complete. Put

f∗(x) := limn→∞

f(xn).

3. We have to show now that

• f∗ is well-defined.

• f∗ extends f .

• f∗ is continuous.

Assume that we can find x ∈ G such that (xn)n∈N and (x′n)n∈N are sequences in A withxn → x and x′n → x, but limn→∞ f(xn) 6= limn→∞ f(x′n). Thus we find some η > 0 suchthat d(f(xn), f(x′n)) ≥ η infinitely often. Then the oscillation of f at x is at least η > 0, acontradiction. This implies that f∗ is well-defined, and it implies also that f∗ extends f . Nowlet x ∈ G. If ε > 0 is given, we find a neighborhood V of x with diam(f

[A ∩ V

]) < ε. Thus,

if x′ ∈ G ∩ V , then d(f∗(x), f∗(x′)) < ε. Hence f∗ is continuous. a

We will encounter later on sets which can be written as the countable intersection of opensets. They are called Gδ-sets. Rephrasing Lemma 3.5.24, f can be extended from A to a Gδ-setGδ-set containing A and contained in Aa.

A characterization of complete spaces in terms of sequences of closed sets with decreasingdiameters is given below.

Proposition 3.5.25 These statements are equivalent

1. X is complete.

2. For each decreasing sequence (An)n∈N of non-empty closed sets the diameter of whichtends to zero there exists x ∈ X such that

⋂n∈NAn = xa.

In particular, if X is a metric space, then X is complete iff each decreasing sequence ofnon-empty closed sets the diameter of which tends to zero has exactly one point in common.

Proof The assertion for the metric case follows immediately from the general case, becausexa = x, and because there can be not more than one element in the intersection.

1 ⇒ 2: Let (An)n∈N be a decreasing sequence of non-empty closed sets with diam(An)→ 0,then we have to show that

⋂n∈NAn = xa for some x ∈ X. Pick from each An an element

xn, then (xn)n∈N is a Cauchy sequence which converges to some x, since X is complete.Because the intersection of closed sets is closed again, we conclude

⋂n∈NAn = xa.

2 ⇒ 1: Take a Cauchy sequence (xn)n∈N, then An := xm | m ≥ na is a decreasing sequenceof closed sets the diameter of which tends to zero. In fact, given ε > 0 there exists n ∈ N such

Page 266: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

254 CHAPTER 3. TOPOLOGICAL SPACES

that d(xm, xm′) < ε for all m,m′ ≥ n, hence diam(An) < ε, and it follows that this holds alsofor all k ≥ n. Then it is obvious that xn → x whenever x ∈

⋂n∈NAn. a

We mention all too briefly a property of complete spaces which renders them most attractive,viz., Banach’s Fixpoint Theorem.

Definition 3.5.26 Call f : X → X a contraction iff there exists γ with 0 < γ < 1 such thatd(f(x), f(y)) ≤ γ · d(x, y) holds for all x, y ∈ X.

This is the celebrated Banach Fixpoint Theorem.

Theorem 3.5.27 Let f : X → X be a contraction with X complete. Then there exists x ∈ Xwith f(x) = x. If f(y) = y holds as well, then d(x, y) = 0. In particular, if X is a metricspace, then there exists a unique fixed point for f .

Banach’sFixpointTheorem

The idea is just to start with an arbitrary element of X, and to iterate f on it. This yieldsa sequence of elements of X. Because the elements become closer and closer, completenesskicks in and makes sure that there exists a limit. This limit is independent of the startingpoint.

Proof Define the n-th iteration fn of f through f1 := f and fn+1 := fnf . Now let x0 be anarbitrary element of X, and define xn := fn(x0). Then d(xn, xn+m) ≤ γn · d(x0, xm), so that(xn)n∈N is a Cauchy sequence which converges to some x ∈ X, and f(x) = x. If f(y) = y,we have d(x, y) = d(f(x), f(y)) ≤ γ · d(x, y), thus d(x, y) = 0. This implies uniqueness of thefixed point as well. a

The Banach Fixed Point Theorem has a wide range of applications, and it used for iterativelyapproximating the solution of equations, e.g., for implicit functions. The following examplepermits a glance at Google’s page rank algorithm, it follows [Rou10] (the linear algebra behindGoogleit is explored in, e.g., [LM05, Kee93]).

Example 3.5.28 Let S := 〈x1, . . . , xn〉 | xi ≥ 0, x1 + . . .+ xn = 1 be the set of all discreteprobability distributions over n objects, and P : Rn → Rn be a stochastic matrix; this meansthat P has non-negative entries and the rows all add up to 1. The set 1, . . . , n is usuallyinterpreted as the state space for some random experiment, entry pi,j is then interpreted asthe probability for the change of state i to state j. We have in particular P : S → S, soa probability distribution is transformed into another probability distribution. We assumethat P has an eigenvector v1 ∈ S for the eigenvalue 1, and that the other eigenvalues are inabsolute value not greater than 1 (this is what the classic Perron-Frobenius Theorem says,see [LM05, Kee93]); moreover we assume that we can find a base v1, . . . , vn of eigenvectors,all of which may be assumed to be in S; let λi be the eigenvector for vi, then λ1 = 1, and|λi| ≤ 1 for i ≥ 2. Such a matrix is called a regular transition matrix ; these matrices areinvestigated in the context of stability of finite Markov transition chains.

Define for the distributions p =∑n

i=1 pi · vi and q =∑n

i=1 qi · vi their distance through

d(p, q) :=1

2·n∑i=1

|pi − qi|.

Because v1, . . . , vn are linearly independent, d is a metric. Because this set forms a basis,

Page 267: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 255

hence is given through a bijective linear maps from the base given by the unit vectors, andbecause the Euclidean metric is complete, d is complete as well.

Now define f(x) := P · x, then this is a contraction S → S:

d(P · x, P · y) =1

2·n∑i=1

|xi · P (vi)− yi · P (vi)| ≤1

2

n∑i=1

|λi · (xi − yi)| ≤1

2· d(x, y).

Thus f has a fixed point, which must be v1 by uniqueness.

Now assume that we have a (very litte) Web universe with only five pages. The links aregiven as in the diagram.

1

2oo

3

ff

//

xx

5oo

oo

ll2

OO

88

The transitions between pages are at random, the matrix below describes such a randomwalk

P :=

0 1 0 0 012 0 1

2 0 013

13 0 0 1

31 0 0 0 00 1

313

13 0

It says that we make a transition from state 2 to state 1 with p2,1 = 1

2 , also p2,3 = 12 , the

transition from state 2 to state 3. From state 1 one goes with probability one to state 2,because p1,2 = 1. Iterating P quite a few times will yield a solution which does not changemuch after 32 steps, one obtains

P 32 =

.293 .390 .220 .024 .073.293 .390 .220 .024 .073.293 .390 .220 .024 .073.293 .390 .220 .024 .073.293 .390 .220 .024 .073

The eigenvector p for the eigenvalue 1 looks like this: p = 〈.293, .390, .220, .024, .073〉, so thisyields a stationary distribution.

In terms of web searches, the importance of the pages is ordered according this stationary Web searchdistribution as 2, 1, 3, 5, 4, so this is the ranking one would associate with these pages.

This is the basic idea behind Google’s page ranking algorithm. Of course, there are manypractical considerations which have been eliminated from this toy example. It may be thatthe matrix does not follow the assumptions above, so that it has to me modified accordingly ina preprocessing step. Size is a problem, of course, since handling the extremely large matricesoccurring in web searches may become quite intricate. ,

Page 268: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

256 CHAPTER 3. TOPOLOGICAL SPACES

Compact pseudometric spaces are complete. This will be a byproduct of a more generalcharacterization of compact spaces. We show first that compactness and sequential compact-ness are the same for these spaces. This is sometimes helpful in those situations in which asequence is easier to handle than an open cover, or an ultrafilter.

Before discussing this, we introduce ε-nets as a cover of X through a finite family B(x, ε) |ε-netx ∈ A of open balls of radius ε. X may or may not have an ε-net for any given ε > 0. Forexample, R does not have an ε-net for any ε > 0, in contrast to [0, 1] or ]0, 1].

Definition 3.5.29 The pseudometric space X is totally bounded iff there exists for eachε > 0 an ε-net for X. A subset of a pseudometric space is totally bounded iff it is a totallybounded subspace.

Thus A ⊆ X is totally bounded iff Aa ⊆ X is totally bounded.

We see immediately

Lemma 3.5.30 A compact pseudometric space is totally bounded. a

Now we are in a position to establish this equivalence, which will help characterize compactpseudometric spaces.

Proposition 3.5.31 The following properties are equivalent for the pseudometric space X:

1. X is compact.

2. X is sequentially compact, i.e., each sequence has a convergent subsequence.

Proof 1 ⇒ 2: Assume that the sequence (xn)n∈N does not have a convergent subsequence,and consider the set F := xn | n ∈ N. This set is closed, since, if yn → y and yn ∈ F forall n ∈ N, then y ∈ F , since the sequence (yn)n∈N is eventually constant. F is also discrete,since, if we could find for some z ∈ F for each n ∈ N an element in F ∩ B(z, 1/n) differentfrom z, we would have a convergent subsequence. Hence F is a closed discrete subspace of Xwhich contains infinitely many elements, which is impossible. This contradiction shows thateach sequence has a convergent subsequence.

2 ⇒ 1: Before we enter into the second and harder part of the proof, we have a look at itsPlan of at-tack

plan. Given an open cover for the sequential compact space X, we have to construct a finitecover from it. If we succeed in constructing for each ε > 0 a finite net so that we can fit eachball into some element of the cover, we are done, because in this case we may take just theseelements of the cover, obtaining a finite cover. That this fitting in is possible is shown in thefirst part of the proof. We construct under the assumption that it is not possible a sequence,which has a converging subsequence, and the limit of this subsequence will be used as kindof a flyswatter.

The second part of the proof is then just a simple application of the net so constructed.

Fix (Ui)i∈I as a cover of X. We claim that we can find for this cover some ε > 0 such that,whenever diam(A) < ε, there exists i ∈ I with A ⊆ Ui. Assume that this is wrong, then wefind for each n ∈ N some An ⊆ X which is not contained in one single Ui. Pick from each Anan element xn, then (xn)n∈N has a convergent subsequence, say (yn)n∈N, with yn → y. There

Page 269: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 257

exists a member U of the cover with y ∈ U , and there exists r > 0 with B(y, r) ⊆ U . Now wecatch the fly. Choose ` ∈ N with 1/` < r/2, then ym ∈ B(y, r/2) for m ≥ n0 for some suitablechosen n0 ∈ N, hence, because (yn)n∈N is a subsequence of (xn)n∈N, there are infinitely manyxk contained in B(y, r/2). But since diam(A`) < 1/`, this implies A` ⊆ B(y, r) ⊆ U , whichis a contradiction.

Now select ε > 0 as above for the cover (Ui)i∈I , and let the finite set A be the set of centersfor an ε/2-net, say, A = a1, . . . , ak. Then we can find for each aj ∈ A some member Uij ofthis cover with B(aj , ε/2) ⊆ Uij (note that diam(B(x, r) < 2 · r). This yields a finite coverUij | 1 ≤ j ≤ k of X. a

This proof was conceptually a little complicated, since we had to make the step from asequence (with a converging subsequence) to a cover (with the goal of finding a finite cover).Both are not immediately related. The missing link turned out to be measuring the size of aset through its diameter, and capturing limits through suitable sets.

Using the last equivalence, we are in a position to characterize compact pseudometric spaces.

Theorem 3.5.32 A pseudometric space is compact iff it is totally bounded and complete.

Proof 1. Let X be compact. We know already from Lemma 3.5.30 that a compact pseu-dometric space is totally bounded. Let (xn)n∈N be a Cauchy sequence, then we know thatit has a converging subsequence, which, being a Cauchy sequence, implies that it convergesitself.

2. Assume that X is totally bounded and complete. In view of Proposition 3.5.31 it is enoughto show that X is sequentially compact. Let (xn)n∈N be a sequence in X. Since X is totallybounded, we find a subsequence (xn1) which is entirely contained in an open ball of radiusless that 1. Then we may extract from this sequence a subsequence (xn2) which is containedin an open ball of radius less than 1/2. Continuing inductively we find a subsequence (xnk+1

)of (xnk) the members of which are completely contained in an open ball of radius less than2−(k+1). Now define yn := xnn , hence (yn)n∈N is the diagonal sequence in this scheme.

We claim that (yn)n∈N is a Cauchy sequence. In fact, let ε > 0 be given, then there existsn ∈ N such that

∑`>n 2−` < ε/2. Then we have for m > n

d(yn, ym) ≤ 2 ·m∑`=n

2−` < ε.

By completeness, yn → y for some y ∈ X. Hence we have found a converging subsequence ofthe given sequence (xn)n∈N, so that X is sequentially compact. a

It might be noteworthy to observe the shift of emphasis between finding a finite cover forShift of

emphasisa given cover, and admitting an ε-net for each ε > 0. While we have to select a finitecover from an arbitrarily given cover beyond our control, we can construct in the case of atotally bounded space for each ε > 0 a cover of a certain size, hence we may be in a positionto influence the shape of this special cover. Consequently, the characterization of compactspaces in Theorem 3.5.32 is very helpful and handy, but, alas, it works only in the restrictedcalls of pseudometric spaces.

Page 270: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

258 CHAPTER 3. TOPOLOGICAL SPACES

We apply this characterization to (C(X), δH), the space of all non-empty compact subsets of(X, d) with the Hausdorff metric δH , see Example 3.5.10.

Proposition 3.5.33 (C(X), δH) is complete, if X is a complete pseudometric space.

Proof We fix for the proof a Cauchy sequence (Cn)n∈N of elements of C(X).

0. Let us pause a moment and discuss the approach to the proof first. We show in a first stepPlanthat (

⋃n∈NCn)a is compact by showing that it is totally bounded and complete. Completeness

is trivial, since the space is complete, and we are dealing with a closed subset, so we focuson showing that the set is totally bounded. Actually, it is sufficient to show that

⋃n∈NCn is

totally bounded, because a set is totally bounded iff its closure is.

Then compactness of (⋃n∈NCn)a implies that C :=

⋂n∈N (

⋃k≥nCk)

a is compact as well,moreover, we will argue that C must be non-empty. Then it is shown that Cn → C in theHausdorff metric.

1. Let D :=⋃n∈NCn, and let ε > 0 be given. We will construct an ε-net for D. Because

(Cn)n∈N is Cauchy, we find for ε an index ` so that δH(Cn, Cm) < ε/2 for n,m ≥ `. When

n ≥ ` is fixed, this means in particular that Cm ⊆ Cε/2n for all m ≥ `, thus d(x,Cn) < ε/2 forall x ∈ Cm and all m ≥ `. We will use this observation in a moment.

Let x1, . . . , xt be an ε/2-net for⋃nj=1Cj , we claim that this is an ε-net for D. In fact, let

x ∈ D. If x ∈⋃`j=1Cj , then there exists some k with d(x, xk) < ε/2. If x ∈ Cm for some

m > n, x ∈ Cε/2n , so that we find x′ ∈ Cn with d(x, x′) < ε/2, and for x′ we find k such thatd(xk, x

′) < ε/2. Hence we have d(x, x′k) < ε, so that we have shown that x1, . . . , xt is anε-net for D. Thus Da is totally bounded, hence compact.

2. From the first part it follows that (⋃k≥nCk)

a is compact for each n ∈ N. Since these setsform a decreasing sequence of non-empty closed sets to the compact set given by n = 1, theirintersection cannot be empty, hence C :=

⋂n∈N (

⋃k≥nCk)

a is compact and non-empty, hencea member of C(X).

We claim that δH(Cn, C) → 0, as n → ∞. Let ε > 0 be given, then we find ` ∈ N such thatδH(Cm, Cn) ≤ /2, whenever n,m ≥ `. We show that δH(Cn, C) < ε for all n ≥ `. Let n ≥ `.The proof is subdivided into showing that C ⊆ Cεn and Cn ⊆ Cε.

• Let us work on the first inclusion. Because D := (⋃i≥nCi)

a is totally bounded, thereexists a ε/2-net, say, x1, . . . , xt, for D. If x ∈ C ⊆ D, then there exists j such thatd(x, xj) < ε/2, so that we can find y ∈ Cn with d(y, xj) < ε/2. Consequently, we findfor x ∈ C some y ∈ Cn with d(x, y) < ε. Hence C ⊆ Cεn.

• Now for the second inclusion. Take x ∈ Cn. Since δH(Cm, Cn) < ε/2 for m ≥ `, we

have Cn ⊆ Cε/2m , hence find xm ∈ Cm with d(x, xm) < ε/2. The sequence (xk)k≥m

consists of members of the compact set D, so it has converging subsequence whichconverges to some y ∈ D. But it actually follows from the construction that y ∈ C, andd(x, y) ≤ d(x, xm) + d(xm, y) < ε for m taken sufficiently large from the subsequence.This yields x ∈ Cε.

Taking these inclusions together, they imply δH(Cn, C) < ε for n > `. This shows that

Page 271: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 259

(C(X), δH) is a complete pseudometric space, if (X, d) is one. a

The topology induced by the Hausdorff metric can be defined in a way which permits ageneralization to arbitrary topological spaces, where it is called the Vietoris topology. Ithas been studied with respect to finding continuous selections, e.g., by Michael [Mic51], seealso [JR02, CV77]. The reader is also referred to [Kur66, §33], and to [Eng89, p. 120] for astudy of topologies on subsets.

We will introduce uniform continuity now and discuss this concept briefly here. Uniformspaces will turn out to be the proper scenario for the more extended discussion in Section 3.6.4.As a motivating example, assume that the pseudometric d on X is bounded, take a subsetA ⊆ X and look at the function x 7→ d(x,A). Since

|d(x,A)− d(y,A)| ≤ d(x, y),

we know that this map is continuous. This means that, given x ∈ X, there exists δ > 0 suchthat d(x, x′) < δ implies |d(x,A) − d(x′, A)| < ε. We see from the inequality above that thechoice of δ does only depend on ε, but not on x. Compare this with the function x 7→ 1/x on]0, 1]. This function is continuous as well, but the choice of δ depends on the point x you areconsidering: whenever 0 < δ < ε · x2/(1 + ε · x), we may conclude that |x′ − x| ≤ δ implies|1/x′ − 1/x| ≤ ε. In fact, we may easily infer from the graph of the function that a uniformchoice of δ for a given ε is not possible.

This leads to the definition of uniform continuity in a pseudometric space: the choice of δ fora given ε does not depend on a particular point, but is rather, well, uniform.

Definition 3.5.34 The map f : X → Y into the pseudometric space (Y, d′) is called uni-formly continuous iff given ε > 0 there exists δ > 0 such that d′(f(x), f(x′)) < ε wheneverd(x, x′) < δ.

Doing a game of quantifiers, let us just point out the difference between uniform continuityContinuity

vs.uniform

continuity

and continuity.

1. Continuity says

∀ε > 0∀x ∈ X∃δ > 0∀x′ ∈ X : d(x, x′) < δ ⇒ d′(f(x), f(x′)) < ε.

2. Uniform continuity says

∀ε > 0∃δ > 0∀x ∈ X∀x′ ∈ X : d(x, x′) < δ ⇒ d′(f(x), f(x′)) < ε.

The formulation suggests that uniform continuity depends on the chosen metric. In contrastto continuity, which is a property depending on the topology of the underlying spaces, uniformcontinuity is a property of the underlying uniform space, which will be discussed below. Wenote that the composition of uniformly continuous maps is uniformly continuous again.

A uniformly continuous map is continuous. The converse is not true, however.

Example 3.5.35 Consider the map f : x 7→ x2, which is certainly continuous on R. Assumethat f is uniformly continuous, and fix ε > 0, then there exists δ > 0 such that |x − y| < δ

Page 272: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

260 CHAPTER 3. TOPOLOGICAL SPACES

always implies |x2 − y2| < ε. Thus we have for all x, and for all r with 0 < r ≤ δ that|x2 − (x+ r)2| = |2 · x · r + r2| < ε after Binomi’s celebrated theorem. But this would mean|2 · x + r| < ε/r for all x, which is not possible. In general, a very similar argument showsthat polynomials

∑ni=1 ai · xi with n > 1 and an 6= 0 are not uniformly continuous. ,

A continuous function on a compact pseudometric space, however, is uniformly continuous.This is established through an argument constructing a cover of the space, compactness willthen permit us to extract a finite cover, from which we will infer uniform continuity.

Proposition 3.5.36 Let f : X → Y be a continuous map from the compact pseudometricspace X to the pseudometric space (Y, d′). Then f is uniformly continuous.

Proof Given ε > 0, there exists for each x ∈ X a positive δx such that f[B(x, δx)

]⊆

Bd′(f(x), ε/3). Since B(x, δx/3) | x ∈ X is an open cover of X, and since X is compact, wefind x1, . . . , xn ∈ X such that B(x1, δx1/3), . . . , B(xn, δxn/3) cover X. Let δ be the smallestamong δx1 , . . . , δxn . If d(x, x′) < δ/3, then there exist xi, xj with d(x, xi) < δ/3 and d(x′, xj) <δ/3, so that d(xi, xj) ≤ d(xi, x) + d(x, x′) + d(x′, xj) < δ, hence d′(f(xi), f(xj)) < ε/3,thus

d′(f(x), f(x′)) ≤ d′(f(x), f(xi)) + d′(f(xi), f(xj)) + d′(f(xj), f(x′)) < 3 · ε/3 = ε.

This establishes uniform continuity. a

One of the most attractive features of uniform continuity is that it permits certain extensions— given a uniform continuous map f : D → Y with D ⊆ X dense and Y complete metric,we can extend f to a uniformly continuous map F on the whole space. This extension is

Idea for aproof

necessarily unique (see Lemma 3.3.20). The basic idea is to define F (x) := limn→∞ f(xn),whenever xn → x is a sequence in D which converges to x. This requires that the limit exists,and that it is in this case unique, hence it demands the range to be a metric space which iscomplete.

Proposition 3.5.37 Let D ⊆ X be a dense subset, and assume that f : D → Y is uniformlycontinuous, where (Y, d′) is a complete metric space. Then there exists a unique uniformlycontinuous map F : X → Y which extends f .

Proof 0. We have already argued that an extension must be unique, if it exists. So we haveto construct it, and to show that it is uniformly continuous. We will generalize the argumentfrom above referring to a limit by considering the oscillation at each point. A glimpse at the

Outline —use the os-cillation

proof of Lemma 3.5.24 shows indeed that we argue with a limit here, and are able to takeinto view the whole set of points which makes this possible.

1. Let us have a look at the oscillation ∅f (x) of f at a point x ∈ X (see page 252), andwe may assume that x 6∈ D. We claim that ∅f (x) = 0. In fact, given ε > 0, there existsδ > 0 such that d(x′, x′′) < δ implies d′(f(x′), f(x′′)) < ε/3, whenever x′, x′′ ∈ D. Thus, ify′, y′′ ∈ f

[D ∩ B(x, δ/2)

], we find x′, x′′ ∈ D with f(x′) = y′, f(x′′) = y′′ and d(x′, x′′) ≤

d(x, x′) + d(x′′, x) < δ, hence d′(y′, y′′) = d′(f(x′), f(y′′)) < ε. This means that diam(f[D ∩

B(x, δ/2)]) < ε.

2. Lemma 3.5.24 tells us that there exists a continuous extension F of f to the set x ∈ X |∅f (x) = 0 = X. Hence it remains to show that F is uniformly continuous. Given ε > 0,

Page 273: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 261

we choose the same δ as above, which did not depend on the choice of the points we wereconsidering above. Let x1, x2 ∈ X with d(x1, x2) < δ/2, then there exists v1, v2 ∈ D such thatd(x1, v1) < δ/4 with d′(F (x1), f(v1)) ≤ ε/3 and d(x2, v2) < δ/4 with d′(F (x2), f(v2)) ≤ ε/3.We see as above that d(v1, v2) < δ, thus d′(f(v1), f(v2)) < ε/3, consequently,

d′(F (x1, x2)) ≤ d′(F (x1), f(v1)) + d′(f(v1), f(v2)) + d′(f(v2), F (x2)) < 3 · ε/3 = ε.

But this means that F is uniformly continuous. a

Looking at x 7→ 1/x on ]0, 1] shows that uniform continuity is indeed necessary to obtain acontinuous extension.

3.5.2 Baire’s Theorem and a Banach-Mazur Game

The technique of constructing a shrinking sequence of closed sets with a diameter tending tozero used for establishing Proposition 3.5.25 is helpful in establishing Baire’s Theorem 3.4.13also for complete pseudometric spaces; completeness then makes sure that the intersection isnot empty. The proof is essentially a blend of this idea with the proof given above (page 239).We will then give an interpretation of Baire’s Theorem in terms of the game Angel vs. Demonintroduced in Section 1.7. We show that Demon has a winning strategy iff the space is thecountable union of nowhere dense sets (the space is then called to be of the first category).This is done for a subset of the real line, but can easily generalized.

This is the version of Baire’s Theorem in a complete pseudometric space. We mimic the proofof Theorem 3.4.13, having the diameter of a set at our disposal.

Theorem 3.5.38 Let X be a complete pseudometric space, then the intersection of a sequenceof dense open sets is dense again.

Baire’sTheorem

Proof Let (Dn)n∈N be a sequence of dense open sets. Fix a non-empty open set G, then wehave to show that G ∩

⋂n∈NDn 6= ∅. Now D1 is dense and open, hence we find an open set

V1 and r > 0 such that diam(V a1 ) ≤ r and V a

1 ⊆ D1 ∩ G. We select inductively in this waya sequence of open sets (Vn)n∈N with diam(V a

n ) < r/n such that V an+1 ⊆ Dn ∩ Vn. This is

possible since Dn is open and dense for each n ∈ N.

Hence we have in the complete space X a decreasing sequence V a1 ⊇ . . . V a

n ⊇ . . . of closed setswith diameters tending to 0. Thus

⋂n∈N V

an =

⋂n∈N Vn is not empty by Proposition 3.5.25,

which entails G ∩⋂n∈NDn not being empty. a

Kelley [Kel55, p. 201] remarks that there is a slight incongruence with this theorem, sincethe assumption of completeness is non-topological in nature (hence a property which may getlost when switching to another pseudometric, see Example 3.5.18), but we draw a topologicalconclusion. He suggests that the assumption on space X should be reworded to X being atopological space for which there exists a complete pseudometric. But, alas, the formulationabove is the usual one, because it is pretty suggestive after all.

Definition 3.5.39 Call a set A ⊆ X nowhere dense iff Aoa = ∅, i.e., the closure of theinterior is empty, equivalently, iff the open set X \Aa is dense. The space X is said to be ofthe first category iff it can be written as the countable union of nowhere dense sets.

Page 274: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

262 CHAPTER 3. TOPOLOGICAL SPACES

Then Baire’s Theorem can be reworded that the countable union of nowhere dense sets in acomplete pseudometric space is nowhere dense. This constitutes an important example for anowhere dense set:

Example 3.5.40 Cantor’s ternary set C from Example 1.6.4 can be written asCantor’sternary set

C = ∞∑i=1

ai3−i | ai ∈ 0, 2 for all i ∈ N

.

This is seen as follows: Define [a, b]′ := [a + (b − a)/3] ∪ [a + 2 · (b − a)/3] for an interval[a, b], and (A1 ∪ . . . ∪ A`)′ := A′1 ∪ . . . ∪ A′`, then C =

⋂n∈NCn with the inductive definition

C1 := [0, 1]′ and Cn+1 := C ′n. It is shown easily by induction that

Cn = ∞∑i=1

ai · 3−i | ai ∈ 0, 2 for i ≤ n and ai ∈ 0, 1, 2 for i > n.

The representation above implies that the interior of C is empty, so that C is in fact nowheredense in the unit interval. ,

Cantor’s ternary set is a helpful device in investigating the structure of complete metric spaceswhich have a countable dense subset, i.e., in Polish spaces.

We will give now a game theoretic interpretation of spaces of the first category through agame which is attributed to Banach and Mazur, tying the existence of a winning strategy forDemon to spaces of the first category. For simplicity, we discuss it for a closed interval ofthe real line. We do not assume that the game is determined; determinacy is not necessaryhere (and its assumption would bring us into serious difficulties with the assumption of thevalidity of the Axiom of Choice, see Section 1.7.1).

Let a subset S of a closed interval L0 ⊆ R be given; this set is assigned to Angel, its adversaryDemon is assigned its complement T := L0 \ S. The game is played in this way:

Banach-Mazurgame

• Angel chooses a closed interval L1 ⊆ L0,

• Demon reacts with choosing a closed interval L2 ⊆ L1,

• Angel chooses then — knowing the moves L0 and L1 — a closed interval L2 ⊆ L1,

• and so on: Demon chooses the intervals with even numbers, Angel selects the intervalswith the odd numbers, each interval is closed and contained in the previous one, bothAngel and Demon have complete information about the game’s history, when making amove.

Angel wins iff⋂n∈N Ln ∩ S 6= ∅, otherwise Demon wins.

We focus on Demon’s behavior. Its strategy for the n-th move is modelled as a map fn whichis defined on 2 · n-tuples 〈L0, . . . , L2·n−1〉 of closed intervals with L0 ⊇ L1 ⊇ . . . ⊇ L2·n−1,taking a closed interval L2·n as a value with

L2·n = fn(L0, . . . , L2·n−1) ⊆ L2·n−1.

Page 275: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.5. PSEUDOMETRIC AND METRIC SPACES 263

The sequence (fn)n∈N will be a winning strategy for Demon iff⋂n∈N Ln ⊆ T , when (Ln)n∈N

is chosen according to these rules.

The following theorem relates the existence of a winning strategy for Demon with S being offirst category.

Theorem 3.5.41 There exists a strategy for Demon to win iff S is of the first category.

We divide the proof into two parts — we show first that we can find a strategy for Demon,if S is of the first category. The converse is technically somewhat more complicated, so wedelay it and do the necessary constructions first.

Proof (First part) Assume that S is of the first category, so that we can write S =⋃n∈N Sn

with Sn nowhere dense for each n ∈ N. Angel starts with a closed interval L1, then demonhas to choose a closed interval L2; the choice will be so that L2 ⊆ L1 \S1. We have to be surethat such a choice is possible; our assumption implies that L1 ∩ Sa1 is open and dense in L1,thus it contains an open interval. In the inductive step, assume that Angel has chosen theclosed interval L2·n−1 such that L2·n−1 ⊆ . . . ⊆ L2 ⊆ L1 ⊆ L0. Then Demon will select aninterval L2·n ⊆ L2·n−1\(S1∪ . . .∪Sn). For the same reason as above, the latter set contains anopen interval. This constitutes Demon’s strategy, and evidently

⋂n∈N Ln ∩ S = ∅, so Demon

wins. a

The proof for the second part requires some technical constructions. We assume that fn as-signs to each 2·n-tuple of closed intervals I1 ⊇ I2 ⊇ . . . ⊇ I2·n a closed interval fn(I1, . . . , I2·n) ⊆I2·n, but do not make any further assumptions, for the time being, that is. We are given aclosed interval L0 and a subset S ⊆ L0.

In a first step we define a sequence (Jn)n∈N of closed intervals with these properties:

• Jn ⊆ L0 for all n ∈ N,

• Kn := f1(L0, Jn) defines a sequence (Kn)n∈N of mutually disjoint closed intervals,

•⋃n∈NK

on is dense in L0.

Let’s see how to do this. Define F as the sequence of all closed intervals with rationalendpoints that are contained in Lo0. Take J1 as the first element of F . Put K1 := f1(L0, J1),then K1 is a closed interval with K1 ⊆ J1 by assumption on f1. Let J2 be the first elementin F which is contained in L0 \K1, put K2 := f1(L0, J2). Inductively, select Ji+1 as the firstelement of F which is contained in L0 \

⋃it=1Kt, and set Ki+1 := f1(L0, Ji+1). It is clear

from the construction that (Kn)n∈N forms a sequence of mutually disjoint closed intervalswith Kn ⊆ Jn ⊆ L0 for each n ∈ N. Assume that

⋃n∈NK

on is not dense in L0, then we

find x ∈ L0 which is not contained in this union, hence we find an interval T with rationalendpoints which contains x but T ∩

⋃n∈NK

on = ∅. So T occurs somewhere in F , but it is

never the first interval to be considered in the selection process. Since this is impossible, wearrive at a contradiction.

We repeat this process for Koi rather than L0 for some i, hence we will define a sequence

(Ji,n)n∈N of closed intervals Ji,n with these properties:

• Ji,n ⊆ Koi for all n ∈ N,

Page 276: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

264 CHAPTER 3. TOPOLOGICAL SPACES

• Ki,n := f2(L0, Ji,Ki, Ji,n) defines a sequence (Ki,n)n∈N of mutually disjoint closed in-tervals,

•⋃n∈NK

oi,n is dense in Ki.

It is immediate that⋃i,jK

oi,j is dense in L0.

Continuing inductively, we find for each ` ∈ N two families Ji1,...,i` and Ki1,...,i` of closedintervals with these properties

• Ki1,...,i` = f`(L0, Ji1 ,Ki1 , Ji1,i2 ,Ki1,i2 , . . . , Ji1,...,i`),

• Ji1,...,i`+1⊆ Ko

i1,...,i`,

• the intervals (Ki1,...,i`−1,i`)i`∈N are mutually disjoint for each i1, . . . , i`−1,

•⋃Ko

i1,...,i`−1,i`| 〈i1, . . . , i`−1, i`〉 ∈ N` is dense in L0.

Note that this sequence depends on the chosen sequence (fn)n∈N of functions that representsRelaxNOW!

the strategy for Demon.

Proof (Second part) Now assume that Demon has a winning strategy (fn)n∈N; hence nomatter how Angel plays, Demon will win. For proving the assertion, we have to construct asequence of nowhere dense subsets the union of which is S. In the first move, Angel choosesa closed interval L1 := Ji1 ⊆ L0 (we refer here to the enumeration given by F above, so theinterval chosen by Angel has index i1). Demon’s answer is then

L2 := Ki1 := f1(L0, L1) = f1(L0, Ji1),

as constructed above. In the next step, Angel selects L3 := Ji1,i2 among those closed intervalswhich are eligible, i.e., which are contained in Ko

i1and have rational endpoints, Demon’s

countermove is

L4 := Ki1,i2 := f2(L0, L1, L2, L3) = f2(L0, Ji1 ,Ki1 , Ji1,i2).

In the n-th step, Angel selects L2·n−1 := J11,...,in and Demon selects L2·n := Ki1,...,in .Then we see that the sequence L0 ⊇ L1 . . . ⊇ L2·n−1 ⊇ L2·n . . . decreases and L2·n =fn(L0, L1, . . . , L2·n−1) holds, as required.

Put T := S \L0 for convenience, then⋂n∈N Ln ⊆ T by assumption (after all, we assume that

Demon wins), put

Gn :=⋃

〈i1,...,in〉∈NnKoi1,...,in .

Then Gn is open. Let E :=⋂n∈NGn. Given x ∈ E, there exists a unique sequence (in)n∈N

such that x ∈ Ki1,...,in for each n ∈ N. Hence x ∈⋂n∈N Ln ⊆ T , so that E ⊆ T . But then we

can write

S = L0 \ T ⊆ L0 \ T =⋃n∈N

(L0 \Gn).

Because⋃Ko

i1,...,in−1,in| 〈i1, . . . , in−1, in〉 ∈ Nn is dense in L0 for each n ∈ N by construction,

we conclude that L0 \Gn is nowhere dense, so S is of the first category. a

Page 277: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 265

Games are an interesting tool for proofs, as we can see in this example; we have shownalready that games may be used for other purposes, e.g., demonstrating the each subset of[0, 1] is Lebesgue measurable under the Axiom of Determinacy, see Section 1.7.2. Furtherexamples for using games to derive properties in a metric space can be found, e.g., in Kechris’book [Kec94].

3.6 A Gallery of Spaces and Techniques

The discussion of the basic properties and techniques suggest that we now have a powerfulcollection of methods at our disposal. Indeed, we set up a small gallery of show cases, inwhich we demonstrate some approaches and methods.

We first look at the use of topologies in logics from two different angles. The more conventionalone is a direct application of the important Baire Theorem, which permits the construction ofa model in a countable language of first order logic. Here the application of the theorem liesat the heart of the application, which is a proof of Godel’s Completeness Theorem. The othervantage point starts from a calculus of observations and develops the concept of topologicalsystems from it, stressing an order theoretic point of view by perceiving topologies as completeHeyting algebras, when considering them as partially ordered subset of the power set of theircarrier. Since partial orders may generate topologies on the set they are based on, this yieldsan interesting interplay between order and topology, which is reflected here in the Hofmann-Mislove Theorem.

Then we return to the green pastures of classic applications and give a proof of the Stone-Weierstraß Theorem, one of the true classics. It states that a subring of the space of continuousfunctions on a compact Hausdorff space, which contains the constants, and which separatespoints is dense in the topology of uniform convergence. We actually give two proofs for this.One is based on a covering argument in a general space, it has a wide range of applications, ofcourse. The second proof is no less interesting. It is essentially based on Weierstraß’ originalproof and deals with polynomials over [0, 1] only; here concepts like elementary integrationand uniform continuity are applied in a very concise and beautiful way.

Finally, we deal with uniform spaces; they are a generalization of pseudometric spaces, butmore specific than topological spaces. We argue that the central concept is closeness of points,which is, however, formulated in conceptual rather than quantitative terms. It is shownthat many concepts which appear specific to the metric approach like uniform continuity orcompleteness may be carried into this context. Nevertheless, uniform spaces are topologicalspaces, but the assumption on having a uniformity available has some consequences for theassociated topology.

The reader probably misses Polish spaces in this little gallery. We deal with these spaces indepth, but since most of our applications of them are measure theoretic in nature, I decidedto discuss them in the context of a discussion of measures as a kind of natural habitat, seeChapter 4.

Page 278: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

266 CHAPTER 3. TOPOLOGICAL SPACES

3.6.1 Godel’s Completeness Theorem

Godel’s Completeness Theorem states that a set of sentences of first order logic is consistent iffit has a model. The crucial part is the construction of a model for a consistent set of sentences.This is usually done through Henkin’s approach, see, e.g., [Sho67, 4.2], [CK90, Chapter 2]or [Sri08, 5.1]. Rasiowa and Sikorski [RS50] followed a completely different path in theirtopological proof by making use of Baire’s Category Theorem and using the observation thatin a compact topological space the intersection of a sequence of open and dense sets is denseagain. The compact space is provided by the clopen sets of a Boolean algebra which in turnis constructed from the formulas of the first order language upon factoring. The equivalencerelation is induced by the consistent set under consideration.

We present the fundamental ideas of their proof in this section, since it is an unexpectedapplication of a combination of the topological version of Stone’s Representation Theoremfor Boolean algebras and Baire’s Theorem, hinted at already in Example 3.4.15. Since weassume that the reader is familiar with the semantics of first order languages, we do not wantto motivate every definition for this area in detail, but we sketch the definitions, indicate thededuction rules, say what a model is, and rather focus on the construction of the model. Thereferences given above may be used to fill in any gaps.

A slightly informal description of the first order language L with identity with which we willbe working is given first. For this, we assume that we have a countable set xn | n ∈ Nof variables and countably many constants. Moreover, we assume countably many functionsymbols and countably many predicate symbols. In particular, we have a binary relation ==,the identity. Each function and each predicate symbol has a positive arity.

These are the components of our language L.

Terms. A variable is a term and a constant symbol is a term. If f is a function symbol ofarity n, and t1, . . . , tn are terms, then f(t1, . . . , tn) is a term. Nothing else is a term.

Atomic Formulas. If t1 and t2 are terms, then t1 == t2 is an atomic formula. If p is apredicate symbol of arity n, and t1, . . . , tn are terms, then p(t1, . . . , tn) is an atomicformula.

Formulas. An atomic formula is a formula. If ϕ and ψ are formulas, then ϕ∧ψ and ¬ϕ areformulas. If x is a variable and ϕ is a formula, then ∀x.ϕ is a formula. Nothing else isa formula.

Because there are countably many variables resp. constants, the language has countablymany formulas.

One usually adds parentheses to the logical symbols, but we do without, using them, however,freely, when necessary. We will use also disjunction [ϕ∨ψ abbreviates ¬(¬ϕ∧¬ψ)], implication[ϕ → ψ for ¬ϕ ∨ ψ], logical equivalence [ϕ ↔ ψ for (ϕ → ψ) ∧ (ψ → ϕ)] and existentialquantification [∃x.ϕ for ¬(∀x.¬ϕ)]. Conjunction and disjunction are associative.

We need logical axioms and inference rules as well. We have four groups of axioms

Propositional Axioms. Each propositional tautology is an axiom.

Identity Axioms. x == x, when x is a variable.

Page 279: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 267

Equality Axioms. y1 == z1 → . . . → yn == zn → f(y1, . . . , yn) = f(z1, . . . , zn), wheneverf is a function symbol of arity n, and y1 == z1 → . . . → yn == zn → p(y1, . . . , yn) →p(z1, . . . , zn) for a predicate symbol of arity n.

Substitution Axiom. If ϕ is a formula, ϕx[t] is obtained from ϕ by freely substituting allfree occurrences of variable x by term t, then ϕx[t]→ ∃x.ϕ is an axiom.

These are the inference rules.

Modus Ponens. From ϕ and ϕ→ ψ infer ψ.

Generalization Rule. From ϕ infer ∀x.ϕ.

A sentence is a formula without free variables. Let Σ be a set of sentences, ϕ a formula, thenwe denote that ϕ is deducible from Σ by Σ ` ϕ, i.e., iff there is a proof for ϕ in Σ. Σ is Σ ` ϕcalled inconsistent iff Σ ` ⊥, or, equivalently, iff each formula can be deduced from Σ. If Σis not inconsistent, then Σ is called consistent or a theory.

Fix a theory T , and define

ϕ ∼ ψ iff T ` ϕ↔ ψ

for formulas ϕ and ψ, then this defines an equivalence relation on the set of all formulas. LetBT be the set of all equivalence classes [ϕ], and define BT

[ϕ] ∧ [ψ] := [ϕ ∧ ψ]

[ϕ] ∨ [ψ] := [ϕ ∨ ψ]

−[ϕ] := [¬ϕ].

This defines a Boolean algebra structure on BT , the Lindenbaum algebra of T . The maximalelement > of BT is ϕ | T ` ϕ, its minimal element ⊥ is ϕ | T ` ¬ϕ. The proof that BT is

Lindenbaumalgebra

a Boolean algebra follows the lines of Lemma 1.5.40 closely, hence it can be safely omitted. Itmight be noted, however, that the individual steps in the proof require additional propertiesof `, for example, one has to show that T ` ϕ and T ` ψ together imply T ` ϕ∧ψ. We trustthat the reader is in a position to recognize and accomplish this; [Sri08, Chapter 4] providesa comprehensive catalog of useful derivation rules with their proofs.

Let ϕ be a formula, then denote by ϕ(k/p) the formula obtained in this way:

• all bound occurrences of xp are replaced by x`, where x` is the first variable amongx1, x2, . . . which does not occur in ϕ,

• all free occurrences of xk are replaced by xp.

This construction is dependent on the integer `, so the formula ϕ(k/p) is not uniquely de-termined, but its class is. We have these representations in the Lindenbaum algebra forexistentially resp. universally quantified formulas.

Lemma 3.6.1 Let ϕ be a formula in L, then we have for every k ∈ N

1. supp∈N[ϕ(k/p)] = [∃xk.ϕ],

Page 280: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

268 CHAPTER 3. TOPOLOGICAL SPACES

2. infp∈N[ϕ(k/p)] = [∀xk.ϕ].

Proof 1. Fix k ∈ N, then we have T ` ϕ(k/p)→ ∃xk.ϕ for each p ∈ N by the ∃ introductionrule. This implies [ϕ(k/p)] ≤ [∃xk.ϕ] for all p ∈ N, hence supp∈N[ϕ(k/p)] ≤ [∃xk.ϕ], thus[∃xk.ϕ] is an upper bound to [ϕ(k/p)] | p ∈ N in the Lindenbaum algebra. We have toshow that it is also the least upper bound, so take a formula ψ such that [ϕ(k/p)] ≤ [ψ]for all k ∈ N. Let q be an index such that xq does not occur free in ψ, then we concludefrom T ` ϕ(k/p) → ψ for all p that ∃xq.ϕ(k/q) → ψ. But T ` ∃xk.ϕ ↔ ∃xq.ϕ(k/q), henceT ` ∃xk.ϕ → ψ. This means that [∃xk.ϕ] is the least upper bound to [ϕ(k/p)] | p ∈ N,proving the first equality.

2. The second equality is established in a very similar way. a

These representations motivate

Definition 3.6.2 Let F be an ultrafilter on the Lindenbaum algebra BT , S ⊆ BT .

1. F preserves the supremum of S iff supS ∈ F⇔ s ∈ F for some s ∈ S.

2. F preserves the infimum of S iff inf S ∈ F⇔ s ∈ F for all s ∈ S.

Preserving the supremum of a set is similar to being inaccessible by joins (see Definition 3.6.29),but inaccessibility refers to directed sets, while we are not making any assumption on S, ex-cept, of course, that its supremum exists in the Boolean algebra. Note also that one of thecharacteristic properties of an ultrafilters is that the join of two elements is in the ultrafilter iffit contains at least one of them. Preserving the supremum of a set strengthens this propertyfor this particular set only.

The de Morgan laws and F being an ultrafilter make it clear that F preserves inf S iff itpreserves sup−s | s ∈ S, resp. that F preserves supS iff it preserves inf−s | s ∈ S. Thiscuts our work in half.

Proposition 3.6.3 Let (Sn)n∈N be a sequence of subsets Sn ⊆ BT such that supSn existsin BT . Then there exists an ultrafilter F such that F preserves the supremum of Sn for alln ∈ N.

Proof This is an application of Baire’s Category Theorem 3.4.13 and is discussed in Exam-ple 3.4.15. We find there a prime ideal which does not preserve the supremum for Sn forall n ∈ N. Since the complement of a prime ideal in a Boolean algebra is an ultrafilter, seeLemma 1.5.36 and Lemma 1.5.37, the assertion follows. a

So much for the syntactic side of our language L. We will leave the ultrafilter F alone for alittle while and turn to the semantics of the logic.

An interpretation of L is given by a carrier set A, each constant c is interpreted through anelement cA of A, each function symbol f with arity n is assigned a map fA : An → A, andeach n-ary predicate p is interpreted through an n-ary relation pA ⊆ An; finally, the binarypredicate == is interpreted through equality on A. We also fix a sequence wn | n ∈ N ofelements of A for the interpretation of variables, set A := (A, wn | n ∈ N), and call A amodel for the first order language. We then proceed inductively:

Page 281: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 269

Terms. Variable xi is interpreted by wi. Assume that the term f(t1, . . . , tn) is given. If theterms t1, . . . , tn are interpreted through the respective elements tA,1, . . . , tA,n of A, thenA |= ϕf(t1, . . . , tn) is interpreted through fA(tA,1, . . . , tA,n).

Atomic Formulas. The atomic formula t1 == t2 is interpreted through tA,1 = tA,2. If the n-ary predicate p is assigned pA ⊆ An, then p(t1, . . . , tn) is interpreted as 〈tA,1, . . . , tA,n〉 ∈pA.

We denote by A |= ϕ that the interpretation of the atomic formula ϕ yields the valuetrue. We say that ϕ holds in A.

Formulas. Let ϕ and ψ be formulas, then A |= ϕ∧ψ iff A |= ϕ and A |= ψ, and A |= ¬ϕ iffA |= ϕ is false. Let ϕ be the formula ∀xi.ψ, then A |= ϕ iff A |= ψxi|a for every a ∈ A,where ψx|a is the formula ψ with each free occurrence of x replaced by a.

Construct the ultrafilter F constructed in Proposition 3.6.3 for all possible suprema arisingfrom existentially quantified formulas according to Lemma 3.6.1. There are countably manysuprema, because the number of formulas is countable. This ultrafilter and the Lindenbaumalgebra BT will be used now for the construction of a model A for T (so that A |= ϕ holds Modelfor all ϕ ∈ T ).

We will first need to define the carrier set A. Define for the variables xi and xj the equivalencerelation ≈ through xi ≈ xj iff [xi == xj ] ∈ F; denote by xi the ≈-equivalence class of xi.The carrier set A is defined as xn | n ∈ N.

Let us take care of the constants now. Given a constant c, we know that ` ∃xi.c == xi bysubstitution. Thus [∃xi.c == xi] = > ∈ F. But [∃xi.c == xi] = supi∈N[c == xi], and Fpreserves suprema, so we conclude that there exists i with [c == xi] ∈ F. We pick this i anddefine cA := xi. Note that it does not matter which i to choose. Assume that there is morethan one. Since [c == xi] ∈ F and [c == xj ] ∈ F implies [c == xi ∧ c == xj ] ∈ F, we obtain[xi == xj ] ∈ F, so the class is well defined.

Coming to terms, let t be a variable or a constant, so that it has an interpretation already, andassume that f is a unary function. Then ` ∃xi.f(t) == xi, so that [∃xi.f(t) == xi] ∈ F, hencethere exists i such that [f(t) == xi] ∈ F, then put fA(tA) := xi. Again, if [f(t) == xi] ∈ Fand [f(t) == xj ] ∈ F, then [xi == xj ] ∈ F, so that fA(cA) is well defined. The argumentfor the general case is very similar. Assume that terms t1, . . . , tn have their interpretationsalready, and f is a function with arity n, then ` ∃xi.f(t1, . . . , tn) == xi, hence we find j with[f(t1, . . . , f(tn) == xj ] ∈ F, so put fA(tA,1, . . . , tA,n) := xj . The same argument as aboveshows that this is well defined.

Having defined the interpretation tA for each term t, we define for the n-ary relation symbolp the relation pA ⊆ An by

〈tA,1, . . . , tA,n〉 ∈ pA ⇔ [p(t1, . . . , tn] ∈ F

Then pA is well defined by the equality axioms.

Thus A |= ϕ is defined for each formula ϕ, hence we know how to interpret each formulain terms of the Lindenbaum algebra of T (and the ultrafilter F). We can show now that aformula is valid in this model iff its class is contained in ultrafilter F.

Page 282: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

270 CHAPTER 3. TOPOLOGICAL SPACES

Proposition 3.6.4 A |= ϕ iff [ϕ] ∈ F holds for each formula ϕ of L.

Proof The proof is done by induction on the structure of formula ϕ and is straightforward,using the properties of an ultrafilter. For example,

A |= ϕ ∧ ψ ⇔ A |= ϕ and A |= ψ (definition)

⇔ [ϕ] ∈ F and [ψ] ∈ F (induction hypothesis)

⇔ [ϕ ∧ ψ] ∈ F (F is an ultrafilter)

For establishing the equivalence for universally quantified formulas ∀xi.ψ, assume that xi isa free variable in ψ such that A |= ψxi|a ⇔ [ψxi|a] ∈ F has been established for all a ∈ A.Then

A |= ∀xi.ψ ⇔ A |= ψxi|a for all a ∈ A (definition)

⇔ [ψxi|a] ∈ F for all a ∈ A (induction hypothesis)

⇔ supa∈A

[ψxi|a] ∈ F (F preserves the infimum)

⇔ [∀xi.ψ] ∈ F (by Lemma 3.6.1)

This completes the proof. a

As a consequence, we have established this version of Godel’s Completeness Theorem:

Corollary 3.6.5 A is a model for the consistent set T of formulas. a

This approach demonstrates how a topological argument is used at the center of a constructionin logic. It should be noted, however, that the argument is only effective since the universein which we work is countable. This is so because the Baire Theorem, which enables theconstruction of the ultrafilter, works for a countable family of open and dense sets. If, however,we work in an uncountable language L, this instrument is no longer available ([CK90, Exercise2.1.24] points to a possible generalization).

But even in the countable case one cannot help but note that the construction above dependson the Axiom of Choice, because we require an ultrafilter. The approach in [CK90, Exercise2.1.22] resp. [Kop89, Theorem 2.21] suggest to construct a filter without the help of a topology,but, alas, this filter is extended to an ultrafilter, and here the dreaded axiom is neededagain.

3.6.2 Topological Systems or: Topology Via Logic

This section investigates topological systems. They abstract from topologies being sets ofsubsets and concentrate on the order structure imposed by a topology instead. We focus onthe interplay between a topology and the base space by considering these objects separately.A topology is considered a complete Heyting algebra, the carrier set is, well, a set of points,both are related through a validity relation |= which mimics the ∈-relation between a set andits elements. This leads to the definition of a topological system, and the question is whetherthis separation really bears fruits. It does; for example we may replace the point set by themorphisms from the Heyting algebra to the two element algebras 22, giving sober spaces, andwe show that, e.g., a Hausdorff space is isomorphic to such a structure.

Page 283: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 271

The interplay of the order structure of a topology and its topological obligations will beinvestigated through the Scott topology on a dcpo, a directed complete partial order, leadingto the Hofmann-Mislove Theorem which characterizes compact sets that are represented asthe intersection of the open sets containing them in terms of Scott open filters.

Before we enter into a technical discussion, however, we put the following definitions onrecord.

Definition 3.6.6 A partially ordered set P is called a complete Heyting algebra iff

1. each finite subset S has a join∧S,

2. each subset S has a meet∨S,

3. finite meets distribute over arbitrary joins, i.e.,

a ∧∨S =

∨a ∧ s | s ∈ S

holds for a ∈ L, S ⊆ L.

A morphism f between the complete Heyting algebras P and Q is a map f : P → Q such that

1. f(∧S) =

∧f[S]

holds for finite S ⊆ P ,

2. f(∨S) =

∨f[S]

holds for arbitrary S ⊆ P .

‖P,Q‖ denotes the set of all morphisms P → Q. ‖P,Q‖

The definition of a complete Heyting algebra is a bit redundant, but never mind. Becausethe join and the meet of the empty set is a member of such an algebra, it contains a smallestelement ⊥ and a largest element >, and f(⊥) = ⊥ and f(>) = > follows. A topologyis a complete Heyting algebra with inclusion as the partial order, as announced alreadyin Exercise 1.27. Sometimes, complete Heyting algebras are called frames; but since thestructure underlying the interpretation of modal logics are also called frames, we stick hereto the longer name.

Example 3.6.7 Call a lattice V pseudo-complemented iff given a, b ∈ V , there exists c ∈ Vsuch that x ≤ c iff x ∧ a ≤ b; c is usually denoted by a → b. A complete Heyting algebra ispseudo-complemented. In fact, let c :=

∨x ∈ V | x ∧ a ≤ b, then

c ∧ a =∨x ∈ V | x ∧ a ≤ b ∧ a =

∨x ∧ a ∈ V | x ∧ a ≤ b ≤ b

by the general distributive law, hence x ≤ c implies x ∧ a ≤ b. Conversely, if x ∧ a ≤ b, thenx ≤ c follows. ,

Example 3.6.8 Assume that we have a complete lattice V which is pseudo-complemented.Then the lattice satisfies the general distributive law. In fact, given a ∈ V and S ⊆ V , wehave s ∧ a ≤

∨a ∧ b | b ∈ S , thus s ≤ a →

∨a ∧ b | b ∈ S for all s ∈ S, from which

we obtain∨S ≤ a ∧

∨a ∧ b | b ∈ S, which in turn gives a ∧

∨S ≤ a ∧

∨a ∧ b | b ∈ S.

On the other hand,∨a ∧ b | b ∈ S ≤

∨S, and

∨a ∧ b | b ∈ S ≤ a, so that we obtain∨

a ∧ b | b ∈ S ≤ a ∧∨S.

,

Page 284: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

272 CHAPTER 3. TOPOLOGICAL SPACES

We note

Corollary 3.6.9 A complete Heyting algebra is a complete distributive lattice. a

Quite apart from investigating what can be said if open sets are replaced by an element ofa complete Heyting algebra, and thus focussing on the order structure, one can argue asfollows. Suppose we have observers and events, say, X is the set of observers, and A is theset of events. The observers are not assumed to have any structure, the events have a partialorder making them a distributive lattice; an observation may be incomplete, so a ≤ b indicatesthat observing event b contains more information than observing event a. If observer x ∈ Xobserves event a ∈ A, we denote this as x |= a. The lattice structure should be compatiblewith the observations, that is, we want to have for S ⊆ A that

x |=∧S iff x |= a for all a ∈ S, S finite,

x |=∨S iff x |= a for some a ∈ S, S arbitrary.

(recall∧∅ = > and

∨∅ = ⊥). Thus our observations should be closed under finite conjunc-

tions and arbitrary disjunctions; replacing disjunctions by intersections and conjunctions byunions, this shows a somewhat topological face. We define accordingly

Definition 3.6.10 A topological system (X[, X], |=) has a set X[ of points, a complete Heyt-ing algebra X] of observations, and a satisfaction relation |= ⊆ X[ ×X] (written as x |= afor 〈x, a〉 ∈ |=) such that we have for all x ∈ X]

• If S ⊆ X] is finite, then x |=∨S iff x |= a for all a ∈ S.

• For S ⊆ X] arbitrary, x |=∨S iff x |= a for some a ∈ S.

The elements of X[ are called points, the elements of X] are called opens.X[, X]

We will denote a topological system X = (X[, X]) usually without writing down the satisfac-tion relation, which is either explicitly defined or understood from the context.

Example 3.6.11 1. The obvious example for a topological system D is a topological space(X, τ) with D[ := X and D] := τ , ordered through inclusion. The satisfaction relation|= is given by the containment relation ∈, so that we have x |= G iff x ∈ G for x ∈ D[

and G ∈ D].

2. But it works the other way around as well. Given a topological system X, define forthe open a ∈ X] its extension

Extension(| · |)

(|a|) := x ∈ X[ | x |= a.

Then τ := (|a|) | a ∈ X] is a topology on X[. In fact, ∅ = (|⊥|), X[ = (|>|), andif S ⊆ τ is finite, say, S = (|a1|), . . . , (|an|), then

⋂S = (|

∧ni=1 ai|). Similarly, if

S = (|ai|) | i ∈ I ⊆ τ is an arbitrary subset of τ , then⋃S = (|

∨i∈I ai|). This follows

directly from the laws of a topological system.

3. Put 22 := ⊥,>, then this is a complete Heyting algebra. Let X] := A be another22

Page 285: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 273

complete Heyting algebra, and put X[ := ‖X], 22‖ defining x |= a iff x(a) = > thenyields a topological system. Thus a point in this topological system is a morphismX] → 22, and a point satisfies the open a iff it assigns > to it.

,

Next, we want to define morphisms between topological systems. Before we do that, we haveanother look at topological spaces and continuous maps. Recall that a map f : X → Ybetween topological spaces (X, τ) and (Y, ϑ) is τ -ϑ-continuous iff f−1

[H]∈ τ for all H ∈ ϑ.

Thus f spawns a map f−1 : ϑ → τ — note the opposite direction. We have x ∈ f−1[H]

ifff(x) ∈ H, accounting for containment.

This leads to the definition of a morphism as a pair of maps, one working in the oppositedirection of the other one, such that the satisfaction relation is maintained, formally:

Definition 3.6.12 Let X and Y be topological systems. Then f : X → Y is a c-morphismiff

1. f is a pair of maps f = (f [, f ]) with f [ : X[ → Y [, and f ] ∈ ‖Y ], X]‖ is a morphismfor the underlying algebras.

2. f [(x) |=Y b iff x |=X f ](b) for all x ∈ X[ and all b ∈ Y ]. f [, f ]

We have indicated above for the reader’s convenience in which system the satisfaction relationis considered. It is evident that the notion of continuity is copied from topological spaces,taking the slightly different scenario into account.

Example 3.6.13 Let X and Y be topological systems with f : X → Y a c-morphism.Let (X[, τX[) and (Y [, τY [) be the topological spaces generated from these systems throughthe extent of the respective opens, as in Example 3.6.11, part 2. Then f [ : X[ → Y [ isτX[-τY [-continuous. In fact, let b ∈ Y ], then

x ∈ (f [)−1[(|b|)]⇔ f [(x) ∈ (|b|)⇔ f [(x) |= b⇔ x |= f ](b),

thus(f [)−1

[(|b|)]

= (|f ](b)|) ∈ τX[ .

This shows that continuity of topological spaces is a special case of c-morphisms betweentopological systems, in the same way as topological spaces are special cases of topologicalsystems. ,

Let f : X → Y and g : Y → Z be c-morphisms, then their composition is defined asg f := (g[ f [, f ] g]). The identity idX : X → X is defined through idX := (idX[ , idX]). If,given the c-morphism f : X → Y , there there is a c-morphisms g : Y → X with g f = idXand f g = idY , then f is called a homeomorphism.

Corollary 3.6.14 Topological systems for a category TS, the objects of which are topologicalsystems, with c-morphisms as morphisms. a

Given a topological system X, the topological space (X[, τX[) with τX[ :=

(|a|) | a ∈ X]

iscalled the spatialization of X and denoted by SP (X). We want to make SP a (covariant)functor TS → Top, the latter one denoting the category of topological spaces with continuous

Page 286: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

274 CHAPTER 3. TOPOLOGICAL SPACES

maps as morphisms. Thus we have to define the image SP (f) of a c-morphism f : X → Y . TS,Top,SPBut this is fairly straightforward, since we have shown in Example 3.6.13 that f induces acontinuous map (X[, τX[) → (Y [, τY [). It is clear now that SP : TS → Top is a covariantfunctor. On the other hand, part 1 of Example 3.6.11 shows that we have a forgetful functorV : Top→ TS with V (X, τ) := (X[, X]) with X[ := X and X] := τ , and V (f) := (f, f−1).These functors are related.

Proposition 3.6.15 SP is right adjoint to V .

Proof 0. Given a topological space X and a topological system A we have to find a bi-jection ϕX,A : homTS(V (X), A) → homTop(X,SP (A)) rendering these diagrams commuta-tive:

homTS(V (X), A)

F∗

ϕX,A // homTop(X,SP (A))

(SP (F ))∗

homTS(V (X), B) ϕX,B// homTop(X,SP (B))

and

homTS(V (X), A)

(V (G))∗

ϕX,A // homTop(X,SP (A))

G∗

homTS(V (Y ), A) ϕY,A

// homTop(Y,SP (A))

where F∗ := homTS(V (X), F ) : f 7→ Ff for F : A→ B in TS, andG∗ := homTop(G,SP (A)) :g 7→ g G for G : Y → X in Top, see Section 2.5.

We define ϕX,A(f [, f ]) := f [, hence we focus on the component of a c-morphism which mapspoints to points.

1. Let us work on the first diagram. Take f = (f [, f ]) : V (X) → A as a morphism in TS,and let F : A→ B be a c-morphism, F = (F [, F ]), then ϕX,B(F∗(f)) = ϕX,B(F f) = F [f [,and (SP (F ))∗(ϕX,A(f)) = SP (F ) f [ = F [ f [.

2. Similarly, chasing f through the second diagram for some continuous map G : Y → Xyields

ϕY,A((V (G))∗(f)) = ϕY,A((f [, f ]) (G,G−1)) = f [ G = G∗(f [) = G∗(ϕX,A(f)).

This completes the proof. a

Constructing SP , we went from a topological space to its associated topological system byexploiting the observation that a topology τ is a complete Heyting algebra. But we can travelin the other direction as well, as we will show now.

Given a complete Heyting algebra A, we take the elements of A as opens, and take allmorphisms in ‖A, 22‖ as points, defining the relation |= which connects the componentsthrough

x |= a⇔ x(a) = >.

This construction was announced already in Example 3.6.11, part 3. In order to extract afunctor from this construction, we have to cater for morphisms. In fact, let ψ ∈ ‖B,A‖ a

Page 287: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 275

morphismB → A of the complete Heyting algebras B and A, and p ∈ ‖A, 22‖ a point of A, thenp ψ ∈ ‖B, 22‖ is a point in B. Let cHA be the category of all complete Heyting algebraswith homcHA(A,B) := ‖B,A‖, then we define the functor Loc : cHA → TS throughLoc(A) := (‖A, 22‖, A), and Loc(ψ) := (ψ∗, ψ) for ψ ∈ homcHA(A,B) with ψ∗(p) := p ψ.Thus Loc(ψ) : Loc(A) → Loc(B), if ψ : A → B in cHA. In fact, let f := Loc(ψ), andp ∈ ‖A, 22‖ a point in Loc(A), then we obtain for b ∈ B cHA,Loc

f [(p) |= b⇔ f [(p)(b) = >⇔ (p ψ)(b) = > (since f [ = p ψ)

⇔ p |= ψ(b)

⇔ p |= f ](b) (since f ] = ψ).

This shows that Loc(ψ) is a morphism in TS. Loc(A) is called the localization of the Localizationcomplete Heyting algebra A. The topological system is called localic iff it is homeomorphicto the localization of a complete Heyting algebra.

We have also here a forgetful functor V : TS → cHA, and with a proof very similar to theone for Proposition 3.6.15 one shows

Proposition 3.6.16 Loc is left adjoint to the forgetful functor V . a

In a localic system, the points enjoy as morphisms evidently much more structure than justbeing flat points without a face, in an abstract set.

Before exploiting this wondrous remark, recall these notations, where (P,≤) is a reflexive andtransitive relation:

↑p := q ∈ P | q ≥ p,↓p := q ∈ P | q ≤ p.

The following properties are stated just for the record.

Lemma 3.6.17 Let a ∈ A with A a complete Heyting algebra. Then ↑a is a filter, and ↓a isan ideal in A. a

Definition 3.6.18 Let A be a complete Heyting algebra.

1. a ∈ A is called a prime element iff ↓a is a prime ideal.

2. The filter F ⊆ A is called completely prime iff∨S ∈ F implies s ∈ F for some s ∈ S,

where S ⊆ A.

Thus a ∈ A is a prime element iff we may conclude from∧S ≤ a that there exists s ∈ S with

s ≤ a, provided S ⊆ A is finite. Note that a prime filter has the stipulated property for finiteS ⊆ A, so a completely prime filter is a prime filter by implication.

Example 3.6.19 Let (X, τ) be a topological space, x ∈ X, then

Gx := G ∈ τ | x ∈ G

Page 288: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

276 CHAPTER 3. TOPOLOGICAL SPACES

is a completely prime filter in τ . It is clear that Gx is a filter in τ , since it is closed underfinite intersections, and G ∈ Gx and G ⊆ H implies H ∈ Gx for H ∈ τ . Now let

⋃i∈I Si ∈ Gx

with Si ∈ τ for all i ∈ I, then there exists j ∈ I such that x ∈ Sj , hence Sj ∈ Gx. ,

Prime filters in a complete Heyting algebra have this useful property: if we have an elementwhich is not in the filter, then we can find a prime element not in the filter dominatingthe given one. The proof of this property requires the Axiom of Choice through Zorn’sLemma.

Proposition 3.6.20 Let F ⊆ A be a prime filter in the complete Heyting algebra A. Leta 6∈ F , then there exists a prime element p ∈ A with a ≤ p and p 6∈ F .

Proof Let Z := b ∈ A | a ≤ b and b 6∈ F, then Z 6= ∅, since a ∈ Z. We want to showthat Z is inductively ordered. Hence take a chain C ⊆ Z, then c := sup C ∈ A, since A isa complete lattice. Clearly, a ≤ c; suppose c ∈ F , then, since F is completely prime, we findc′ ∈ C with c′ ∈ F , which contradicts the assumption that C ⊆ Z. But this means that Zcontains a maximal element p by Zorn’s Lemma.

Since p ∈ Z, we have a ≤ p and p 6∈ F , so we have to show that p is a prime element.Assume that x ∧ y ≤ p, then either of x ∨ p or y ∨ p is not in F : if both are in F , we haveby distributivity (x ∨ p) ∧ (y ∨ p) = (x ∧ y) ∨ p = p, so p ∈ F , since F is a filter; this is acontradiction. Assume that x ∨ p 6∈ F , then a ≤ x ∨ p, since a ≤ p, hence even x ∨ p ∈ Z.Since p is maximal, we conclude x ∨ p ≤ p, which entails x ≤ p. Thus p is a prime element.a

The reader might wish to compare this statement to an argument used in the proof of Stone’sRepresentation Theorem, see Section 1.5.7. There it is established that in a Boolean algebrawe may find for each ideal a prime ideal which contains it. The argumentation is fairly similar,but, alas, one works there in a Boolean algebra, and not in a complete Heyting algebra, aswe do presently.

This is a characterization of completely prime filters and prime elements in a complete Heytingalgebra in terms of morphisms into 22. We will use this characterization later on.

Lemma 3.6.21 Let A be a complete Heyting algebra, then

1. F ⊆ A is a completely prime filter iff F = f−1(>) := f−1[>

]for some f ∈ ‖A, 22‖.

2. I = f−1(⊥) for some f ∈ ‖A, 22‖ iff I =↓p for some prime element p ∈ A.

Proof 1. Let F ⊆ A be a completely prime filter, and define

f(a) :=

>, if a ∈ F⊥, if a 6∈ F

Then f : A → 22 is a morphism for the complete Heyting algebras A and 22. Since F is afilter, we have f(

∧S) =

∧s∈S f(s) for S ⊆ A finite. Let S ⊆ A, then∨

s∈Sf(s) = > ⇔ f(s) = > for some s ∈ S ⇔ f(

∨S) = >,

since F is completely prime. Thus f ∈ ‖A, 22‖ and F = f−1(>). Conversely, given f ∈ ‖A, 22‖,the filter f−1(>) is certainly completely prime.

Page 289: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 277

2. Assume that I = f−1(⊥) for some f ∈ ‖A, 22‖, and put

p :=∨a ∈ A | f(a) = ⊥.

Since A is complete, we have p ∈ A, and if a ≤ p, then f(a) = ⊥. Conversely, if f(a) = ⊥, thena ≤ p, so that I =↓p; moreover, I is a prime ideal, for f(a)∧f(b) = ⊥ iff f(a) = ⊥ or f(b) = ⊥,thus a ∧ b ∈ I implies a ∈ I or b ∈ I. Consequently, p is a prime element.

Let, conversely, the prime element p be given, then one shows as in part 1. that

f(a) :=

⊥, if a ≤ p>, otherwise

defines a member of ‖A, 22‖ with ↓p = f−1(⊥). a

Continuing Example 3.6.19, we see that there exists for a topological space X := (X, τ)for each x ∈ X an element fx ∈ ‖τ, 22‖ such that fx(G) = > iff x ∈ G. Define the mapΦX : X → ‖τ, 22‖ through ΦX(x) := fx (so that ΦX(x) = fx iff Gx = f−1

x (>)). We willexamine ΦX now in a little greater detail. ΦX

Lemma 3.6.22 ΦX is injective iff X is a T0-space.

Proof Let ΦX be injective, x 6= y, then Gx 6= Gy. Hence there exists an open set G whichcontains one of x, y, but not the other. If, conversely, X is a T0-space, then we have by thesame argumentation Gx 6= Gy for all x, y with x 6= y, so that ΦX is injective. a

Well, that’s not too bad, because the representation of elements into ‖τ, 22‖ is reflected bya (very basic) separation axiom. Let us turn to surjectivity. For this, we need to transferreducibility to the level of open or closed sets; since this is formulated most concisely forclosed sets, we use this alternative. A closed set is called irreducible iff each of its covers withclosed sets implies its being covered already by one of them.

Definition 3.6.23 A closed set F ⊆ X is called irreducible iff F ⊆⋃i∈I Fi implies F ⊆ Fi

for some i ∈ I for any family (Fi)i∈I of closed sets.

Thus a closed set F is irreducible iff the open set X \ F is a prime element in τ . Let’s see:Assume that F is irreducible, and let

⋂i∈I Gi ⊆ X \ F for some open sets (Gi)i∈I . Then

F ⊆⋃i∈I X \ Gi with X \ Gi closed, thus there exists j ∈ I with F ⊆ X \ Gj , hence

Gj ⊆ X \ F . Thus ↓ (X \ F ) is a prime ideal in τ . One argues in exactly the same way forshowing that if ↓(X \ F ) is a prime ideal in τ , then F is irreducible.

Now we have this characterization of surjectivity of our map ΦX through irreducible closedsets.

Lemma 3.6.24 ΦX is onto iff for each irreducible closed set F there exists x ∈ X such thatF = xa.

Proof 1. Let ΦX be onto, F ⊆ X be irreducible. By the argumentation above, X \ F is aprime element in τ , thus we find f ∈ ‖τ, 22‖ with ↓ (X \ F ) = f−1(⊥). Since ΦX is into, wefind x ∈ X such that f = ΦX(x), hence we have x 6∈ G ⇔ f(x) = ⊥ for all open G ⊆ X. Itis then elementary to show that F = xa.

Page 290: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

278 CHAPTER 3. TOPOLOGICAL SPACES

2. Let f ∈ ‖τ, 22‖, then we know that f−1(⊥) =↓G for some prime open G. Put F := X \G,then F is irreducible and closed, hence F = xa for some x ∈ X. Then we infer f(H) => ⇔ x ∈ H for each open set H, so we have indeed f = ΦX(x). Hence ΦX is onto. a

Thus, if ΦX is a bijection, we can recover (the topology on) X from the morphisms on thecomplete Heyting algebra ‖τ, 22‖.

Definition 3.6.25 A topological space (X, τ) is called sober5 iff ΦX : X → ‖τ, 22‖ is abijection.

Thus we obtain as a consequence this characterization.

Corollary 3.6.26 Given a topological space X, the following conditions are equivalent

• X is sober.

• X is a T0-space and for each irreducible closed set F there exists x ∈ X with F = xa.

Exercise 3.31 shows that each Hausdorff space is sober. This property is, however, seldommade use of in the context of classic applications of Hausdorff spaces in, say, analysis.

Before continuing, we generalize the Scott topology, which has been defined in Example 3.1.6for inductively ordered sets. The crucial property is closedness under joins, and we statedthis properts in a linearly ordered set by saying that, if the supremum of a set S is in a Scottopen set G, then we should find an element s ∈ S with s ∈ G. This will have to be relaxedsomewhat. Let us analyze the argument why the intersection G1 ∩G2 of two Scott open sets(old version) G1 and G2 is open by taking a set S such that

∨S ∈ G1 ∩ G2. Because Gi is

Scott open, we find si ∈ S with si ∈ Gi (i = 1, 2), and because we work in a linear orderedset, we know that either s1 ≤ s2 or s2 ≤ s1. Assuming s1 ≤ s2, we conclude that s2 ∈ G1,because open sets are upper closed, so that G1 ∩ G2 is indeed open. The crucial ingredienthere is evidently that we can find for two elements of S an element which dominates both,and this is the key to the generalization.

We want to be sure that each directed set has an upper bound; this is the case, e.g., whenwe are working in a complete Heyting algebra. The structure we are defining now, however,is considerably weaker, but makes sure that we can do what we have in mind.

Definition 3.6.27 A partially ordered set in which every directed subset has an upper boundis called a directed completed partial ordered set, abbreviated as dcpo.dcpo

Evidently, complete Heyting algebras are dcpos, in particular topologies are under inclu-sion. Sober topological spaces with the specialization order, which is introduced in the nextexample, induced by the open sets furnish another example for a dcpo.

Example 3.6.28 Let X = (X, τ) be a sober topological space. Hence the points in X andthe morphisms in ‖τ, 22‖ are in a bijective correspondence. Define for x, x′ ∈ X the relationx v x′ iff we have for all open sets x |= G ⇒ x′ |= G (thus x ∈ G implies x′ ∈ G). If wethink that being contained in more open sets means having better information, x v x′ is theninterpreted as x′ being better informed than x; v is sometimes called the specialization order.

5The rumors in the domain theory community that a certain Johann Heinrich-Wilhelm Sober was a skatpartner of Hilbert’s gardener at Gottingen could not be confirmed — anyway, what about the third man?

Page 291: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 279

Then (X,v) is a partially ordered set, antisymmetry following from the observation that asober space is a T0-space. But (X,v) is also a dcpo. Let S ⊆ X be a directed set, thenL := ΦX

[S]

is directed in ‖τ, 22‖. Define

p(G) :=

>, if there exists ` ∈ L with `(G) = >⊥, otherwise

We claim that p ∈ ‖τ, 22‖. It is clear that p(∨W ) =

∨w∈W p(w) for W ⊆ τ . Now let W ⊆ τ

be finite, and assume that∧p[W]

= >, hence p(w) = > for all w ∈ W . Thus we find foreach w ∈ W some `w ∈ L with `w(w) = >. Because L is directed, and W is finite, we findan upper bound ` ∈ L to `w | w ∈W, hence `(w) = > for all w ∈W , so that `(

∧W ) = >,

hence p(∧W ) = >. This implies

∧p[W]

= p(∧W ). Thus p ∈ ‖τ, 22‖, so that there exists

x ∈ X with x = ΦX(p). Clearly, x is an upper bound to S. ,

Definition 3.6.29 Let (P,≤) be a dcpo, then U ⊆ P is called Scott open iff

1. U is upper closed.

2. If sup S ∈ U for some directed set S, then there exists s ∈ S with s ∈ U .

The second property can be described as inaccessability through directed joins: If U containsthe directed join of a set, it must contain already one of its elements. The following exampleis taken from [GHK+03, p. 136].

Example 3.6.30 The powerset P (X) of a set X is a dcpo under inclusion. The setsF ⊆ P (X) | F is of weakly finite character are Scott open (F ⊆ P (X) is of weakly fi-nite character iff this condition holds: F ∈ F iff some finite subset of F is in F). Let F beof weakly finite character. Then F is certainly upper closed. Now let S :=

⋃S ∈ F for some

directed set S ⊆ P (X), thus there exists a finite subset F ⊆ S with F ∈ F . Because S isdirected, we find S0 ∈ S with F ⊆ S0, so that S0 ∈ F . ,

In a topological space, each compact set gives rise to a Scott open filter as a subset of thetopology.

Lemma 3.6.31 Let (X, τ) be a topological space, and C ⊆ X compact, then

H(C) := U ∈ τ | C ⊆ U

is a Scott open filter.

Proof Since H(C) is upper closed and a filter, we have to establish that it is not accessibleby directed joins. In fact, let S be a directed subset of τ such that

⋃S ∈ H(C). Then S

forms a cover of the compact set C, hence there exists S0 ⊆ S finite such that C ⊆⋃S0. But

S is directed, so S0 has an upper bound S ∈ S, thus S ∈ H(C). a

Scott opens form in fact a topology, and continuous functions are characterized in a fashionsimilar to Example 3.1.11. We just state and prove these properties for completeness, beforeentering into a discussion of the Hofmann-Mislove Theorem.

Proposition 3.6.32 Let (P,≤) be a dcpo,

1. U ⊆ P | U is Scott open is a topology on P , the Scott topology of P .

Page 292: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

280 CHAPTER 3. TOPOLOGICAL SPACES

2. F ⊆ P is Scott closed iff F is downward closed (x ≤ y and y ∈ F imply x ∈ F ) andclosed with respect to suprema of directed subsets.

3. Given a dcpo (Q,≤), a map f : P → Q is continuous with respect to the correspondingScott topologies iff f preserves directed joins (i.e., if S ⊆ P is directed, then f

[S]⊆ Q

is directed and sup f[S]

= f(sup S)).

Proof 1. Let U1, U2 be Scott open, and sup S ∈ U1 ∩ U2 for the directed set S. Then thereexist si ∈ S with si ∈ Gi for i = 1, 2. Since S is directed, we find s ∈ S with s ≥ s1 ands ≥ s2, and since U1 and U2 both are upper closed, we conclude s ∈ U1∩U2. Because U1∩U2

is plainly upper closed, we conclude that U1 ∩ U2 is Scott open, hence the set of Scott opensis closed under finite intersections. The other properties of a topology are evidently satisfied.This establishes the first part.

2. The characterization of closed sets follows directly from the one for open sets by takingcomplements.

3. Let f : P → Q be Scott-continuous. Then f is monotone: if x ≤ x′, then x′ is containedin the closed set f−1

[↓ f(x′)

], thus x ∈ f−1

[↓ f(x′)

], hence f(x) ≤ f(x′). Now let S ⊆ P

be directed, then f[S]⊆ Q is directed by assumption, and S ⊆ f−1

[↓ (sups∈S f(s))

]. Since

the latter set is closed, we conclude that it contains sup S, hence f(sup S) ≤ sup f[S]. On

the other hand, since f is monotone, we know that f(sup S) ≥ sup f[S]. Thus f preserves

directed joins.

Assume that f preserves directed joins, then, if x ≤ x′,

f(x′) = f(sup x, x′) = sup f(x), f(x′)

follows, hence f is monotone. Now let H ⊆ Q be Scott open, then f−1[H]

is upper closed. LetS ⊆ P be directed, and assume that sup f

[S]∈ H, then there exists s ∈ S with f(s) ∈ H,

hence s ∈ f−1[H], which therefore is Scott open. Hence f is Scott continuous. a

Following [GHK+03, Chapter II-1], we show that in a sober space there is an order morphismbetween Scott open filters and certain compact subsets. Preparing for this, we observe thatin a sober space every open subset which contains the intersection of a Scott open filter isalready an element of the filter. This will turn out to be a consequence of the existence ofprime elements not contained in a prime filter, as stated in Proposition 3.6.20.

Lemma 3.6.33 Let F ⊆ τ be a Scott open filter of open subsets in a sober topological space(X, τ). If

⋂F ⊆ U for the open set U , then U ∈ F .

Proof 0. The plan of the proof goes like this: Since F is Scott open, it is a prime filter inPlanτ . We assume that there exists an open set which contains the intersection, but which is notin F . This is exactly the situation in Proposition 3.6.20, so there exists an open set whichis maximal with respect to not being a member of F , and which is prime, hence we mayrepresent this set as f−1(⊥) for some f ∈ ‖τ, 22‖. But now sobriety kicks in, and we representf through an element x ∈ X. This will then lead us to the desired contradiction.

1. Because F is Scott open, it is a prime filter in τ . Let G :=⋂F , and assume that U

is open with G ⊆ U (note that we do not know whether or not G is empty). Assume thatU 6∈ F , then we obtain from Proposition 3.6.20 a prime open set V which is not in F , which

Page 293: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 281

contains U , and which is maximal. Since V is prime, there exists f ∈ ‖τ, 22‖ such thatH ∈ τ | f(H) = ⊥ =↓ V by Lemma 3.6.21. Since X is sober, we find x ∈ X such thatΦX(x) = f , hence X \ V = xa.

2. We claim that xa ⊆ G. If this is not the case, we have z 6∈ H for some H ∈ F andz ∈ xa. Because H is open, this entails xa ∩H = ∅, thus by maximality of V , H ⊆ V .Since F is a filter, this implies V ∈ F , which is not possible. Thus xa ⊆ G, hence G 6= ∅,and X \ V ∩G = ∅. Thus U ∩G = ∅, contradicting the assumption. a

This is a fairly surprising and strong statement, because we usually cannot conclude from⋂F ⊆ U that U ∈ F holds, when F is an arbitrary filter. But we work here under stronger

assumptions: the underlying space is sober, so each point is given by a morphism for theunderlying complete Heyting algebra and vice versa. In addition we deal with Scott openfilters. They have the pleasant property that they are inaccessible by directed suprema.

But we may even say more, viz., that the intersection of these filters is compact. For, ifwe have an open cover of the intersection, the union of this cover is open, thus must be anelement of the filter by the previous lemma. We may write the union as a union of a directedset of open sets, which then lets us apply the assumption that the filter is inaccessible.

Corollary 3.6.34 Let X be sober, and F be a Scott open filter. Then⋂F is compact and

nonempty.

Proof Let K :=⋂F , and S be an open cover of K. Thus U :=

⋃S is open with K ⊆ S,

hence U ∈ F by Lemma 3.6.33. But⋃S =

⋃⋃S0 | S0 ⊆ S finite

, and the latter collection

is directed, so there exists S0 ⊆ S finite with⋃S0 ∈ F . But this means S0 is a finite subcover

of K, which consequently is compact. If K is empty, ∅ ∈ F by Lemma 3.6.33, which isimpossible. a

This gives a complete characterization of the Scott open filters in a sober space. The char-acterization involves compact sets which are represented as the intersections of these filters.But we can represent only those compact sets C which are upper sets in the specializationorder, i.e., for which holds x ∈ C and x v x′ implies x′ ∈ C. These sets are called saturated.Recall that x v x′ means x ∈ G ⇒ x′ ∈ G for all open sets G, hence a set is saturatediff it equals the intersection of all open sets containing it. With this in mind, we state theHofmann-Mislove Theorem.

Theorem 3.6.35 Let X be a sober space. Then the Scott open filters are in one-to-oneand order preserving correspondence with the non-empty saturated compact subsets of X viaF 7→

⋂F .

Proof We have shown in Corollary 3.6.34 that the intersection of a Scott open filter is compactand nonempty; it is saturated by construction. Conversely, Lemma 3.6.31 shows that we mayobtain from a compact and saturated subset C of X a Scott open filter, the intersection ofwhich must be C. It is clear that the correspondence is order preserving. a

It is quite important for the proof of Lemma 3.6.33 that the underlying space is sober. Henceit does not come as a surprise that the Theorem of Hofmann-Mislove can be used for acharacterization of sober spaces as well [GHK+03, Theorem II-1.21].

Proposition 3.6.36 Let X be a T0-space. Then the following statements are equivalent.

Page 294: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

282 CHAPTER 3. TOPOLOGICAL SPACES

1. X is sober.

2. Each Scott open filter F on τ consists of all open sets containing⋂F .

Proof 1 ⇒ 2: This follows from Lemma 3.6.33.

2 ⇒ 1: Corollary 3.6.26 tells us that it is sufficient to show that each irreducible closed setsis the closure of one point.

Let A ⊆ X be irreducible and closed. Then F := G open | G∩A 6= ∅ is closed under finiteintersections, since A is irreducible. In fact, let G and H be open sets with G ∩ A 6= ∅ andH ∩ A 6= ∅. If A ⊆ (X \ G) ∪ (X \ H), then A is a subset of one of these closed sets, say,X \G, but then A ∩H = ∅, which is a contradiction. This implies that F is a filter, and Fis obviously Scott open.

Assume that A cannot be represented as xa for some x. Then X \ xa is an open setthe intersection of which with A is not empty, hence X \ xa ∈ F . We obtain from theassumption that X \A ∈ F , because with K :=

⋂F ⊆

⋂x∈X(X \ xa), we have K ⊆ X \A,

and X \A is open. Consequently, A ∩X \A 6= ∅, which is a contradiction.

Thus there exists x ∈ X such that A = xa. Hence X is sober. a

These are the first and rather elementary discussions of the interplay between topology and or-der, considered in a systematic fashion in domain theory. The reader is directed to [GHK+03]or to [AJ94] for further information.

3.6.3 The Stone-Weierstraß Theorem

This section will see the classic Stone-Weierstraß Theorem on the approximation of continu-ous functions on a compact topological space. We need for this a ring of continuous functions,and show that — under suitable conditions — this ring is dense. This requires some prelimi-nary considerations on the space of continuous functions, because this construction evidentlyrequires a topology.

Denote for a topological space X by C(X) the space of all continuous and bounded functionsC(X)f : X → R.

The structure of C(X) is algebraically fairly rich; just for the record:

Proposition 3.6.37 C(X) is a real vector space which is closed under constants, multiplica-tion and under the lattice operations. a

This describes the algebraic properties, but we need a topology on this space, which is providedfor by the supremum norm. Define for f ∈ C(X)

‖f‖ := supx∈X|f(x)|.

Then (C(X), ‖ · ‖) is an example for a normed linear (or vector) space.

Definition 3.6.38 Let V be a real vector space. A norm ‖ · ‖ : V → R+ assigns to eachvector v a non-negative real number ‖v‖ with these properties:

Page 295: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 283

1. ‖v‖ ≥ 0, and ‖v‖ = 0 iff v = 0.

2. ‖α · v‖ = |α| · ‖v‖ for all α ∈ R and all v ∈ V .

3. ‖x+ y‖ ≤ ‖x‖+ ‖y‖ for all x, y ∈ V .

A vector space with a norm is called a normed space.

It is immediate that a normed space is a metric space, putting d(v, w) := ‖v − w‖. It isalso immediate that f 7→ ‖f‖ defines a norm on C(X). But we can say actually a bit more:with this definition of a metric, C(X) is a complete metric space; we have established thisfor the compact interval [0, 1] in Example 3.5.19 already. Let us have a look at the generalcase.

Lemma 3.6.39 C(X) is complete with the metric induced by the supremum norm.

Proof Let (fn)n∈N be a ‖ · ‖-Cauchy sequence in C(X), then (fn(x))n∈N is bounded, andf(x) := limn→∞ fn(x) exists for each x ∈ X. Let ε > 0 be given, then we find n0 ∈ N suchthat ‖fn−fm‖ < ε for all n,m ≥ n0, thus |f(x)−fn(x)| ≥ ε for n ≥ n0. This inequality holdsfor each x ∈ X, so that we obtain ‖f − fn‖ ≤ ε for n ≥ n0. It implies also that f ∈ C(X).a

Normed spaces for which the associated metric space are special, so they deserve their ownname.

Definition 3.6.40 A normed space (V, ‖ · ‖) which is complete in the metric associated with‖ · ‖ is called a Banach space.

The topology induced by the supremum norm is called the topology of uniform convergence, sothat we may restate Lemma 3.6.39 by saying the C(X) is closed under uniform convergence. Ahelpful example is Dini’s Theorem for uniform convergence on C(X) for compact X. It gives acriterion of uniform convergence, provided we know already that the limit is continuous.

Proposition 3.6.41 Let X be a compact topological space, and assume that (fn)n∈N is asequence of continuous functions which increases monotonically to a continuous function f .Then (fn)n∈N converges uniformly to f .

Proof We know that fn(x) ≤ fn+1(x) holds for all n ∈ N and all x ∈ X, and that f(x) :=supn∈N fn(x) is continuous. Let ε > 0 be given, then Fn := x ∈ X | f(x) ≥ fn(x) − εdefines a closed set with

⋂n∈N Fn = ∅, moreover, the sequence (Fn)n∈N decreases. Thus we

find n0 ∈ N with Fn = ∅ for n ≥ n0, hence ‖f − fn‖ < ε for n ≥ n0. a

The goal of this section is to show that, given the compact topological space X, we canapproximate each continuous real function uniformly through elements of a subspace of C(X).It is plain that this subspace has to satisfy some requirements: it should

• be a vector space itself,

• contain the constant functions,

• separate points,

• be closed under multiplication.

Page 296: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

284 CHAPTER 3. TOPOLOGICAL SPACES

Hence it is in particular a subring of the ring C(X). Let A be such a subset, then we want toshow that the closure Aa with respect to uniform convergence equals C(X). We will show firstthat Aa is closed under the lattice operations, because we will represent an approximatingfunction as the finite supremum of a finite infimum of simpler approximations. So the firstgoal will be to establish closure under inf and sup. Recall that

f ∧ g =1

2· (f + g − |f − g|),

f ∨ g =1

2· (f + g + |f − g|).

Now it is easy to see that Aa is closed under the vector space operations, if A is. Our firststep boils down to showing that |f | ∈ Aa if f ∈ A. Thus, given f ∈ A, we have to find asequence (fn)n∈N such that |f | is the uniform limit of this sequence. It is actually enough toBut wait!show that t 7→

√t can be approximated uniformly on the unit interval [0, 1], because we know

that |f | =√f2 holds. It is enough to do this on the unit interval, as we will see below.

Lemma 3.6.42 There exists a sequence (fn)n∈N in C([0, 1)] which converges uniformly to thefunction t 7→

√t.

Proof Define inductively for t ∈ [0, 1]

f0(t) := 0,

fn+1(t) := fn(t) +1

2· (t− f2

n(t)).

We show by induction that fn(t) ≤√t holds. This is clear for n = 0. If we know already that

the assumption holds for n, then we write

√t− fn+1(t) =

√t− fn(t)− 1

2· (t− f2

n(t)) = (√t− fn(t)) ·

((1− 1

2· (√t+ fn(t))

).

Because t ∈ [0, 1] and from the induction hypothesis, we have√t+ fn(t) ≤ 2 ·

√t ≤ 2, so that√

t− fn+1(t) ≥ 0.

Thus we infer that fn(t) ≤√t for all t ∈ [0, 1], and limn→∞ fn(t) =

√t. From Dini’s

Proposition 3.6.41 we now infer that the convergence is uniform. a

This is the desired consequence from this construction.

Corollary 3.6.43 Let X be a compact topological space, and let A ⊆ C(X) be a ring of contin-uous functions which contains the constants, and which is closed under uniform convergence.Then A is a lattice.

Proof It is enough to show that A is closed under taking absolute values. Let f ∈ A, thenwe may and do assume that 0 ≤ f ≤ 1 holds (otherwise consider (f − ‖f‖)/‖f‖, which isan element of A as well). Because |f | =

√f2, and the latter is a uniform limit of elements

of A by Lemma 3.6.42, we conclude |f | ∈ A, which entails A being closed under the latticeoperations. a

We are now in a position to establish the classic Stone-Weierstraß Theorem, which permitsto conclude that a ring of bounded continuous functions on a compact topological space X is

Page 297: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 285

dense with respect to uniform convergence in C(X), provided it contains the constants andseparates points. The latter condition is obviously necessary, but has not been used in theargumentation so far. It is clear, however, that we cannot do without this condition, becauseC(X) separates points, and it is difficult to see how a function which separates points couldbe approximated from a collection which does not.

The polynomials on a compact interval in the reals are an example for a ring which satisfiesall these assumptions. This collection shows also that we cannot extend the result to a non-compact base space like the reals. Take x 7→ sinx for example; this function cannot beapproximated uniformly over R by polynomials. For, assume that given ε > 0 there exists apolynomial p such that supx∈R |p(x)− sinx| < ε, then we would have −ε− 1 < p(x) < 1 + εfor all x ∈ R, which is impossible, because a polynomial is unbounded.

Here, then, is the Stone-Weierstraß Theorem for compact topological spaces.

Theorem 3.6.44 Let X be a compact topological space, and A ⊆ C(X) be a ring of functionswhich separates points, and which contains all constant functions.

Proof 0. Our goal is to find for some given f ∈ C(X) and an arbitrary ε > 0 a function ApproachF ∈ Aa such that ‖f − F‖ < ε. Since X is compact, we will find F through a refinedcovering argument in the following way. If a, b ∈ X are given, we find a continuous functionfa,b ∈ A with fa,b = f(a) and fa,b = f(b). From this we construct a cover, using sets likex | fa,b(x) < f(x)+ε and x | fa,b(x) > f(x)−ε, extract finite subcovers and construct fromthe corresponding functions the desired function through suitable lattice operations.

1. Fix f ∈ C(X) and ε > 0. Given distinct point a 6= b, we find a function h ∈ A withh(a) 6= h(b), thus

g(x) :=h(x)− h(a)

h(b)− h(a)

defines a function g ∈ A with g(a) = 0 and g(b) = 1. Then

fa,b(x) := (f(b)− f(a)) · g(x) + f(a)

is also an element of A with fa,b(a) = f(a) and fa,b(b) = f(b). Now define

Ua,b := x ∈ X | fa,b(x) < f(x) + ε,Va,b := x ∈ X | fa,b(x) > f(x)− ε,

then Ua,b and Va,b are open sets containing a and b.

2. Fix b, then Ua,b | a ∈ X is an open cover of X, so we can find points a1, . . . , ak such thatUa1,b, . . . , Uak,b is an open cover of X by compactness. Thus

fb :=

k∧i=1

fai,b

defines an element of Aa by Corollary 3.6.43. We have fb(x) < f(x) + ε for all x ∈ X, and weknow that fb(x) > f(x)− ε for all x ∈ Vb :=

⋂ki=1 Vai,b. The set Vb is an open neighborhood

Page 298: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

286 CHAPTER 3. TOPOLOGICAL SPACES

of b, so from the open cover Vb | b ∈ X we find b1, . . . , b` such that X is covered throughVb1 , . . . , Vb`. Put

F :=∨i=1

fbi ,

then fε ∈ Aa and ‖f − F‖ < ε. a

This is the example already discussed above.

Example 3.6.45 Let X := [0, 1] be the closed unit interval, and let A consist of all poly-nomials

∑ni=0 ai · xi for n ∈ N and a0, . . . , an ∈ R. Polynomials are continuous, they form a

vector space and are closed under multiplication. Moreover, the constants are polynomials.Thus we obtain from the Stone-Weierstraß Theorem 3.6.44 that every continuous function on[0, 1] can be uniformly approximated through a sequence of polynomials. ,

The polynomials discussed in Example 3.6.45 are given algebraically as the ring generated bythe functions 1, id[0,1] in the ring of all continuous functions over [0, 1]. In general, when asubset A ⊆ C(X) is given, then the smallest ring R(A) containing A is

R(A) = n∑i=1

ai · fi | n ∈ N, a1, . . . , an ∈ R, f1, . . . , fn ∈ A.

This is so because the set above is a ring which must be contained in every ring containingA, and it is a ring itself. We obtain from this consideration

Corollary 3.6.46 Let (X, d) be compact metric, then (C(X), ‖ · ‖) is a separable Banachspace.

Proof Let G be a countable basis for the topology of X, then the countable set A0 :=d(·, X \ G) | G ∈ G of continuous functions separates points, hence R(A0 ∪ R) is dense inC(X) by the Stone-Weierstraß Theorem 3.6.44. Given f =

∑ni=1 ai · fi and ε > 0, there exists

f ′ =∑n

i=1 a′i · fi with a′i ∈ Q for 1 ≤ i ≤ n and ‖f − f ′‖ ≤ ε, so that the elements from R(A)

with rational coefficients form a countable dense subset as well. a

It is said that Oscar Wilde could resist everything but a temptation. The author concurs.OscarWilde

Here is a classic proof of the Weierstraß Approximation Theorem, the original form of The-orem 3.6.44, which deals with polynomials on [0, 1] only, and establishes the statement givenin Example 3.6.45. We will give this proof now, based on the discussion in the classic [CH67,§ II.4.1]. This proof is elegant and based on the manipulation of specific functions (we are alltoo often focussed on our pretty little garden of beautiful abstract structures, all too often indanger of loosing the contact to concrete mathematics, and our roots).

As a preliminary consideration, we will show that

limn→∞

∫ 1δ (1− v2)n dv∫ 10 (1− v2)n dv

= 0

Page 299: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 287

for every δ ∈]0, 1[. Define for this

Jn :=

∫ 1

0(1− v2)n dv,

J∗n :=

∫ 1

δ(1− v2)n dv.

(we will keep these notations for later use). We have

Jn >

∫ 1

0(1− v)n dv =

1

n+ 1

and

J∗n =

∫ 1

δ(1− v2)n dv < (1− δ2)n · (1− δ) < (1− δ2)n.

ThusJ∗nJn

< (n+ 1) · (1− δ2)n → 0.

This establishes the claim.

Let f : [0, 1]→ R be continuous. Given ε > 0, there exists δ > 0 such that |x− y| < δ implies|f(x)−f(y)| < ε for all x ∈ [0, 1], since f is uniformly continuous by Proposition 3.5.36. Thus0 ≤ v < δ implies |f(x+ v)− f(x)| < ε for all x ∈ [0, 1].

Put

Qn(x) :=

∫ 1

0f(u) ·

(1− (u− x)2

)ndu,

Pn(x) :=Qn(x)

2 · Jn.

We will show that Pn converges to f in the topology of uniform convergence.

We note first that Qn is a polynomial of degree 2n. In fact, put

Aj :=

∫ 1

0f(u) · uj du

for j ≥ 0, expanding yields the formidable representation

Qn(x) =

n∑k=0

2k∑j=0

(n

k

)(2k

j

)(−1)n−k+jAj · x2k−j .

Let us work on the approximation. We fix x ∈ [0, 1], and note that the inequalities de-rived below do not depend on the specific choice of x. Hence they will provide a uniformapproximation.

Substitute u by v + x in Qn; this yields∫ 1

0f(u)

(1− (u− x)2

)ndu =

∫ 1−x

xf(v + x)(1− v2)n dv

= I1 + I2 + I3

Page 300: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

288 CHAPTER 3. TOPOLOGICAL SPACES

with

I1 :=

∫ −δ−x

f(v + x)(1− v2)n dv,

I2 :=

∫ +δ

−δf(v + x)(1− v2)n dv,

I3 :=

∫ 1−x

+δf(v + x)(1− v2)n dv.

We work on these integrals separately. Let M := max0≤x≤1 |f(x)|, then

I1 ≤M∫ −δ−1

(1− v2)n dv = M · J∗n,

and

I3 ≤M∫ 1

δ(1− v2)n dv = M · J∗n.

We can rewrite I2 as follows:

I2 = f(x)

∫ +δ

−δ(1− v2)n dv +

∫ +δ

−δ

(f(x+ v)− f(x)

)(1− v2)n dv

= 2f(x)(Jn − J∗n) +

∫ +δ

−δ

(f(x+ v)− f(x)

)(1− v2)n dv.

From the choice of δ for ε we obtain∣∣∫ +δ

−δ

(f(x+ v)− f(x)

)(1− v2)n dv

∣∣ ≤ ε ∫ +δ

−δ(1− v2)n dv < ε

∫ +1

−1(1− v2)n dv = 2ε · Jn

Combining these inequalities, we obtain

|Pn(x)− f(x)| < 2M · J∗n

Jn+ ε.

Hence the difference can be made arbitrarily small, which means that f can be approximateduniformly through polynomials.

The two approaches presented are structurally very different, it would be difficult to recognizethe latter as a precursor of the former. While both make substantial use of uniform continuity,the first one is an existential proof, constructing two covers from which to choose a finitesubcover each, and from this deriving the existence of an approximating function. It is non-constructive because it would be difficult to construct an approximating function from it,even if the ring of approximating functions is given by a base for the underlying vector space.The second one, however, starts also from uniform continuity and uses this property to finda suitable bound for the difference of the approximating polynomial and the function properthrough integration. The representation of Qn above shows what the constructing polynomiallooks like, and the coefficients of the polynomials may be computed (in principle, at least).And, finally, the abstract situation gives us a greater degree of freedom, since we deal with aring of continuous functions observing certain properties, while the original proof works forthe class of polynomials only.

Page 301: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 289

3.6.4 Uniform Spaces

This section will give a brief introduction to uniform spaces. The objective is to demonstratethat the notion of a metric space can be generalized in meaningful ways without arriving atthe full generality of topological spaces, but retaining useful properties like completeness oruniform continuity. While pseudometric spaces formulate the concept of two points to beclose to each other through a numeric value, and general topological spaces use the conceptof an open neighborhood, uniform spaces formulate neighborhoods on the Cartesian product.This concept is truly in the middle: each pseudometric generates neighborhoods, and from aneighborhood we may obtain the neighborhood filter for a point.

For motivation and illustration, we consider a pseudometric space (X, d) and say that twopoint are neighbors iff their distance is smaller that r for some fixed r > 0; the degree ofneighborhood is evidently depending on r. The set

Vd,r := Vr := 〈x, y〉 | d(x, y) < r

is then the collection of all neighbors. We may obtain from Vr the neighborhood B(x, r) for Vd,r;Vrsome point x upon extracting all y such that 〈x, y〉 ∈ Vr, thus

B(x, r) = Vr[x] := y ∈ X | 〈x, y〉 ∈ Vr.

The collection of all these neighborhoods observes these properties.

1. The diagonal ∆ := ∆X is contained in Vr for all r > 0, because d(x, x) = 0.

2. Vr is — as a relation on X— symmetric: 〈x, y〉 ∈ Vr iff 〈y, x〉 ∈ Vr, thus V −1r = Vr.

This property reflects the symmetry of d.

3. Vr Vs ⊆ Vr+s for r, s > 0; this property is inherited from the triangle inequality for d.

4. Vr1 ∩ Vr2 = Vminr1,r2, hence this collection is closed under finite intersections.

It is convenient to consider not only these immediate neighborhoods but rather the filtergenerated by them on X×X (which is possible because the empty set is not contained in thiscollection, and the properties above shows that they form the base for a filter indeed). Thisleads to this definition of a uniformity. It focusses on the properties of the neighborhoodsrather than on that of a pseudometric, so we formulate it for a set in general.

Definition 3.6.47 Let X be a set. A filter u on P (X ×X) is called a uniformity on X iffthese properties are satisfied

1. ∆ ⊆ U for all U ∈ u.

2. If U ∈ u, then U−1 ∈ u.

3. If U ∈ u, there exists V ∈ u such that V V ⊆ U .

4. u is closed under finite intersections.

5. If U ∈ u and U ⊆W , then W ∈ u.

The pair (X, u) is called a uniform space. The elements of u are called u-neighborhoods.

Page 302: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

290 CHAPTER 3. TOPOLOGICAL SPACES

U V := 〈x, z〉 | ∃y : 〈x, y〉 ∈ U, 〈y, z〉 ∈ V U−1 := 〈y, x〉 | 〈x, y〉 ∈ UU [M ] := y | ∃x ∈M : 〈x, y〉 ∈ UU [x] := U [x]

U is symmetric :⇔ U−1 = U

(U V ) W = U (V W )

(U V )−1 = V −1 U−1

(U V )[M ] = U [V [M ]]

V U V =⋃

〈x,y〉∈U

V [x]× V [y] (V symmetric)

Here U, V,W ⊆ X ×X and M ⊆ X.

Figure 3.1: Some Relational Identities

The first three properties are gleaned from those of the pseudometric neighborhoods above,the last two are properties of a filter, which have been listed here just for completeness.

We will omit u when talking about a uniform space, if this does not yield ambiguities. Theterm “neighborhood” is used for elements of a uniformity and for the neighborhoods of a

Neigh-borhood,entourage,Nach-barschaft

point. There should be no ambiguity, because the point is always attached, when talkingabout neighborhood in the latter, topological sense. Bourbaki uses the term entourage fora neighborhood in the uniform sense, the German word for this is Nachbarschaft (while theterm for a neighborhood of a point is Umgebung).

We will need some relational identities; they are listed in Figure 3.1 for the reader’s conve-nience.

We will proceed here as we do in the case of topologies, where we do not always specify theentire topology, but occasionally make use of the possibility to define it through a base. Wehave this characterization for the base of a uniformity.

Proposition 3.6.48 A family ∅ 6= b ⊆ P (X ×X) is the base for a uniformity iff it has theBasefollowing properties:

1. Each member of b contains the diagonal of X.

2. For U ∈ b there exists V ∈ b with V ⊆ U−1.

3. For U ∈ b there exists V ∈ b with V V ⊆ U .

4. For U, V ∈ b there exists W ∈ b with W ⊆ U ∩ V .

Proof Recall that the filter generated by a filter base b is defined through F | U ⊆F for some U ∈ b. With this in mind, the proof is straightforward. a

Page 303: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 291

This permits a description of a uniformity in terms of a base, which is usually easier thangiving a uniformity as a whole. Let us look at some examples.

Example 3.6.49 1. The uniformity ∆,X × X is called the indiscrete uniformity, theuniformity A ⊆ X ×X | ∆ ⊆ A is called the discrete uniformity on X.

2. Let Vr := 〈x, y〉 | x, y ∈ R, |x − y| < r, then Vr | r > 0 is a base for a uniformityon R. Since it makes use of the structure of (R,+) as an additive group, it is called theadditive uniformity on R.

3. Put VE := 〈x, y〉 ∈ R2 | x/y ∈ E for some neighborhood E of 1 ∈ R \ 0. Then thefilter generated by VE | E is a neighborhood of 1 is a uniformity. This is so becausethe logarithm function is continuous on R+ \0. This uniformity is nourished from themultiplicative group (R \ 0, ·), so it is called the multiplicative uniformity on R \ 0.This is discussed in greater generality in part 9.

4. A partition π on a set X is a collection of non-empty and mutually disjoint subsets ofX which covers X. It generates an equivalence relation on X by rendering two elementsof X equivalent iff they are in the same partition element. Define Vπ :=

⋃ni=1(Pi × Pi)

for a finite partition π = P1, . . . , Pk. Then

b := Vπ | π is a finite partition on X

is the base for a uniformity. Let π be a finite partition, and denote the equivalencerelation generated by π by |π|, hence x |π| y iff x and y are in the same element of π.

• ∆ ⊆ Vπ is obvious, since |π| is reflexive.

• U−1 = U for all U ∈ Vπ, since |π| is symmetric.

• Because |π| is transitive, we have Vπ Vπ ⊆ Vπ.

• Let π′ be another finite partition, then A∩B | A ∈ π,B ∈ π′, A∩B 6= ∅ definesa partition π′′ such that Vπ′′ ⊆ Vπ ∩ Vπ′ .

Thus b is the base for a uniformity, which is, you guessed it, called the uniformity offinite partitions.

5. Let ∅ 6= I ⊆ P (X) be an ideal (Definition 1.5.31), and define

AE := 〈A,B〉 | A∆B ∈ E for E ∈ I,b := AE | E ∈ I.

Then b is a base for a uniformity on P (X). In fact, it is clear that ∆P(X) ⊆ AE alwaysholds, and that each member of b is symmetric. Let A∆B ⊆ E and B∆C ⊆ F , thenA∆C = (A∆B)∆(B∆C) ⊆ (A∆B) ∪ (A∆C) ⊆ E ∪ F , thus AE AF ⊆ AE∪F , andfinally AE ∩ AF ⊆ AE∩F . Because I is an ideal, it is closed under finite intersectionsand finite unions, the assertion follows.

6. Let p be a prime, and put Wk := 〈x, y〉 | x, y ∈ Z, pk divides x− y. Then Wk W` ⊆Wmink,` = Wk ∩W`, thus b := Wk | k ∈ N is the base for a uniformity up on Z, thep-adic uniformity.

Page 304: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

292 CHAPTER 3. TOPOLOGICAL SPACES

7. Let A be a set, (X, u) a uniform space, and let F (A,X) be the set of all maps A→ X. Wewill define a uniformity on F (A,X); the approach is similar to the one in Example 3.1.4.Define for U ∈ u the set

UF := 〈f, g〉 ∈ F (A,X) | 〈f(x), g(x)〉 ∈ U for all x ∈ X.

Thus two maps are close with respect to UF iff all their images are close with respectto U . It is immediate that UF | U ∈ u forms a uniformity on F (A,X), and thatUF | U ∈ b is a base for a uniformity, provided b is a base for uniformity u.

If X = R is endowed with the additive uniformity, a typical set of the base is given forε > 0 through

〈f, g〉 ∈ F (A,R) | supa∈A|f(a)− g(a)| < ε,

hence the images of f and of g have to be uniformly close to each other.

8. Call a map f : R → R affine iff it can be written as f(x) = a · x + b with a 6= 0; letfa,b be the affine map characterized by the parameters a and b, and define X := fa,b |a, b ∈ R, a 6= 0 the set of all affine maps. Note that an affine map is bijective, andthat its inverse is an affine map again with f−1

a,b = f1/a,−b/a; the composition of an affinemap is an affine map as well, since fa,b fc,d = fac,ad+b. Define for ε > 0, δ > 0 theε, δ-neighborhood Uε,δ by

Uε,δ := fa,b ∈ X | |a− 1| < ε, |b| < δ

Put

ULε,δ := 〈fx,y, fa,b〉 ∈ X ×X | fx,y f−1a,b ∈ Uε,δ,

bL := ULε,δ | ε > 0, δ > 0,URε,δ := 〈fx,y, fa,b〉 ∈ X ×X | f−1

x,y fa,b ∈ Uε,δ,bR := URε,δ | ε > 0, δ > 0.

Then bL resp. bR is the base for a uniformity uL resp. uR on X. Let us check thisfor bR. Given positive ε, δ, we want to find positive r, s with 〈fm,n, fp,q〉 ∈ V R

r,s implies

〈fp,q, fm,n〉 ∈ URε,δ. Now we can find for ε > 0 and δ > 0 some r > 0 and s > 0 so that

| pm− 1| < r ⇒ |m

p− 1| < ε

| qm− n

m| < s⇒ |n

p− q

p| < δ

holds, which is just what we want, since it translates into V Rr,s ⊆

(URε,δ

)−1. The other

properties of a base are easily seen to be satisfied. One argues similarly for bL.

Note that (X, ) is a topological group with the sets Uε,δ | ε > 0, δ > 0 as a base forthe neighborhood filter of the neutral element f1,0 (topological groups are introducedin Example 3.1.25 on page 220).

Page 305: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 293

9. Let, in general, G be a topological group with neutral element e . Define for U ∈ U(e)the sets

UL := 〈x, y〉 | xy−1 ∈ U,UR := 〈x, y〉 | x−1y ∈ U,UB := UL ∩ UR.

Then UL | U ∈ U(e), UR | U ∈ U(e) and UB | U ∈ U(e) define bases for uniformi-ties on G; it can be shown that they do not necessarily coincide, see Example 3.6.74.

,

Before we show that a uniformity generates a topology, we derive a sufficient criterion for afamily of subsets of X ×X is a subbase for a uniformity.

Lemma 3.6.50 Let s ⊆ P (X ×X), then s is the subbase for a uniformity on X, provided Subbasethe following conditions hold.

1. ∆ ⊆ S for each S ∈ s.

2. Given U ∈ s, there exists V ∈ s such that V ⊆ U−1.

3. For each U ∈ s there exists V ∈ s such that V V ⊆ U .

Proof We have to show that

b := U1 ∩ . . . ∩ Un | U1, . . . , Un ∈ s for some n ∈ N

constitutes a base for a uniformity. It is clear that every element of b contains the diagonal.Let U =

⋂ni=1 Ui ∈ b with Ui ∈ s for i = 1, . . . , n, choose Vi ∈ s with Vi ⊆ U−1

i for alli, then V :=

⋂ni=1 Vi ∈ b and V ⊆ U−1. If we select Wi ∈ s with Wi Wi ⊆ Ui, then

W :=⋂ni=1Wi ∈ b and W W ⊆ U . The last condition of Proposition 3.6.48 is trivially

satisfied for b, since b is closed under finite intersections. Thus we conclude that b is a basefor a uniformity on X by Proposition 3.6.48, which in turn entails that s is a subbase. a

The Topology Generated by a Uniformity

A pseudometric space (X, d) generates a topology by declaring a set G open iff there existsfor x ∈ G some r > 0 with B(x, r) ⊆ G; from this we obtain the neighborhood filter U(x) fora point x. Note that in the uniformity associated with the pseudometric the identity

B(x, r) = Vr[x]

holds. Encouraged by this, we approach the topology for a uniform space in the same way.Given a uniform space (X, u), a subset G ⊆ X is called open iff we can find for each x ∈ Gsome neighborhood U ∈ u such that U [x] ⊆ G. The following proposition investigates thisconstruction.

Page 306: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

294 CHAPTER 3. TOPOLOGICAL SPACES

Proposition 3.6.51 Given a uniform space (X, u), for each x ∈ X the familyFrom u to

τuu[x] := U [x] | U ∈ u

is the base for the neighborhood filter of x for a topology τu, which is called the uniformtopology. The neighborhoods for x in τu are just u[x].

Proof It follows from Proposition 3.1.22 that u[x] defines a topology τu, it remains to showthat the neighborhoods of this topology are just u[x]. We have to show that U ∈ u thereexists V ∈ u with V [x] ⊆ U [x] and V [x] ∈ u[y] for all y ∈ V [x], then the assertion will followfrom Corollary 3.1.23. For U ∈ u there exists V ∈ u with V V ⊆ U , thus 〈x, y〉 ∈ V and〈y, z〉 ∈ V implies 〈x, z〉 ∈ U . Now let y ∈ V [x] and z ∈ V [y], thus z ∈ U [x], but this meansU [x] ∈ u[y] for all x ∈ V [y]. Hence the assertion follows. a

Here are some illustrative example. They indicate also that different uniformities can generatethe same topology.

Example 3.6.52 1. The topology obtained from the additive uniformity on R is the usualtopology. The same holds for the multiplicative uniformity on R \ 0.

2. The topology induced by the discrete uniformity is the discrete topology, in which eachsingleton x is open. Since

x, X \ x

forms a finite partition of X, the discrete

topology is induced also by the uniformity defined by the finite partitions.

3. Let F (A,R) be endowed with the uniformity defined by the sets 〈f, g〉 ∈ F (A,R) |supa∈A |f(a) − g(a)| < ε, see Example 3.6.49. The corresponding topology yields foreach f ∈ F (A,R) the neighborhood g ∈ F (A,R) | supa∈A |f(a) − g(a)| < ε. This isthe topology of uniform convergence.

4. Let up for a prime p be the p-adic uniformity on Z, see Example 3.6.49, part 6. Thecorresponding topology τp is called the p-adic topology. A basis for the neighborhoodsof 0 is given by the sets Vk := x ∈ Z | pk divides x. Because pm ∈ Vk for m ≥ k, wesee that limn→∞ pn = 0 in τp, but not in τq for q 6= p, q prime. Thus the topologies τpand τq differ, hence also the uniformities up and uq.

,

Now that we know that each uniformity yields a topology on the same space, some questionsare immediate:

• Do the open resp. the closed sets play a particular role in describing the uniformity?

• Does the topology have particular properties, e.g., in terms of separation axioms?

• What about metric spaces — can we determine from the uniformity that the topologyis metrizable?

• Can we find a pseudometric for a given uniformity?

• Is the product topology on X ×X somehow related to u, which is defined on X ×X,after all?

We will give answers to some of these questions, some will be treated only lightly, with an indepth treatment to be found in the ample literature on uniform spaces, see the BibliographicNotes in Section 3.7.

Page 307: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 295

Fix a uniform space X with uniformity u and associated topology τ . References to neighbor-hoods and open sets are always to u resp. τ , unless otherwise stated.

This is a first characterization of the interior of an arbitrary set. Recall that in a pseudometricspace x is an interior point of A iff B(x, r) ⊆ A for some r > 0; the same description applieshere as well, mutatis mutandis (of course, this “mutatis mutandis” part is the interestingone).

Lemma 3.6.53 Given A ⊆ X, x ∈ Ao iff there exists a neighborhood U with U [x] ⊆ A.

Proof Assume that x ∈ Ao =⋃G | G open and G ⊆ A, then it follows from the definition

of an open set that we must be able to find an neighborhood U with U [x] ⊆ A.

Conversely, we show that the set B := x ∈ X | U [x] ⊆ A for some neighborhood U is open,then this must be the largest open set which is contained in A, hence B = Ao. Let x ∈ B,thus U [x] ⊆ A, and we should find now a neighborhood V such that V [y] ⊆ B for y ∈ V [x].But we find a neighborhood V with V V ⊆ U . Let’s see whether V is suitable: if y ∈ V [x],then V [y] ⊆ (V V )[x] (this is so because 〈x, y〉 ∈ V , and if z ∈ V [y], then 〈y, z〉 ∈ V ; thisimplies 〈x, z〉 ∈ V V , hence z ∈ (V V )[x]). But this yields V [y] ⊆ U [x] ⊆ B, hence y ∈ B.This means V [x] ⊆ B, so that B is open. a

The observation above gives us a handy way of describing the base for a neighborhood filterfor a point in X. It states that we may restrict our attention to the members of a base or of asubbase, when we want to work with the neighborhood filter for a particular element.

Corollary 3.6.54 If u has base or subbase b, then U [x] | U ∈ b is a base resp. subbase forthe neighborhood filter for x.

Proof This follows immediately from Lemma 3.6.53 together with Proposition 3.6.48 resp.Lemma 3.6.50 a

Let us have a look at the topology on X×X induced by τ . Since the open rectangles generatethis topology, and since we can describe the open rectangles in terms of the sets U [x]× V [y],we can expect that these open sets can also related to the uniformity proper. In fact:

Proposition 3.6.55 If U ∈ u, then both Uo ∈ u and Ua ∈ u.

Proof 1. Let G ⊆ X × X be open, then 〈x, y〉 ∈ G iff there exist neighborhoods U, V ∈ uwith U [x] × V [y] ⊆ G, and because U ∩ V ∈ u, we may even find some W ∈ u such thatW [x]×W [y] ⊆ G. Thus

G =⋃W [x]×W [y] | 〈x, y〉 ∈ G,W ∈ u.

2. Let W ∈ u, then there exists a symmetric V ∈ u with V V V ⊆W , and by the identitiesin Figure 3.1 we may write

V V V =⋃

〈x,y〉∈V

V [x]× V [y].

Hence 〈x, y〉 ∈ W o for every 〈x, y〉 ∈ V , so V ⊆ W o, and since V ∈ u, we conclude W o ∈ ufrom u being upper closed.

3. Because u is a filter, and U ⊆ Ua, we infer Ua ∈ u. a

Page 308: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

296 CHAPTER 3. TOPOLOGICAL SPACES

The closure of a subset of X and the closure of a subset of X ×X may be described as welldirectly through uniformity u. These are truly remarkable representations.

Proposition 3.6.56 Aa =⋂U [A] | U ∈ u for A ⊆ X, and Ma =

⋂U M U | U ∈ u

for M ⊆ X ×X.

Proof 1. We use the characterization of a point x in the closure through its neighborhoodfilter from Lemma 1.5.52: x ∈ Aa iff U [x] ∩ A 6= ∅ for all symmetric U ∈ u, because thesymmetric neighborhoods form a base for u. Now z ∈ U [x] ∩ A iff z ∈ A and 〈x, z〉 ∈ U iffz ∈ U [z] and z ∈ A, hence U [x]∩A 6= ∅ iff x ∈ U [A], because U is symmetric. But this meansAa =

⋂U [A] | U ∈ u.

2. Let 〈x, y〉 ∈Ma, then U [x]×U [y]∩M 6= ∅ for all symmetric neighborhoods U ∈ u, so that〈x, y〉 ∈ U M U for all symmetric neighborhoods. This accounts for the inclusion from leftto right. If 〈x, y〉 ∈ U M U for all neighborhoods U , then for every U ∈ u there exists〈a, b〉 ∈M with 〈a, b〉 ∈ U [x]× (U−1)[y], thus 〈x, y〉 ∈Ma. a

Hence

Corollary 3.6.57 The closed symmetric neighborhoods form a base for the uniformity.

Proof Let U ∈ u, then there exists a symmetric V ∈ u with V V V ⊆ U with V ⊆ V a ⊆V V V by Proposition 3.3.1. Hence W := V a∩(V a)−1) is a member of u which is containedin U . a

Proposition 3.6.56 has also an interesting consequence when looking at the characterizationof Hausdorff spaces in Proposition 3.3.1. Putting M = ∆, we obtain ∆a =

⋂U U | U ∈ u,

so that the associated topological space is Hausdorff iff the intersection of all neighborhoodsis the diagonal ∆. Uniform spaces with

⋂u = ∆ are called separated .

Separatedspace

Pseudometrization

We will see shortly that the topology for a separated uniform space is completely regular.First, however, we will show that we can generate pseudometrics from the uniformity by thefollowing idea: suppose that we have a neighborhood V , then there exists a neighborhoodV2 with V2 V2 V2 ⊆ V1 := V ; continuing in this fashion, we find for the neighborhood Vna neighborhood Vn+1 with Vn+1 Vn+1 Vn+1 ⊆ Vn, and finally put V0 := X × X. Givena pair 〈x, y〉 ∈ X × X, this sequence (Vn)n∈N is now used as a witness to determine howfar apart these points are: put fV (x, y) := 2−n, iff 〈x, y〉 ∈ Vn \ Vn−1, and dV (x, y) := 0 iff〈x, y〉 ∈

⋂n∈N Vn. Then fV will give rise to a pseudometric dV , the pseudometric associateddV

with V , as we will show below.

This means that many pseudometric spaces are hidden deep inside a uniform space! Moreover,if we need a pseudometric, we construct one from a neighborhood. These observations willturn out to be fairly practical later on. But before we are in a position to make use of them,we have to do some work.

Proposition 3.6.58 Assume that (Vn)n∈N is a sequence of symmetric subsets of X×X withthese properties for all n ∈ N:

Page 309: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 297

• ∆ ⊆ Vn,

• Vn+1 Vn+1 Vn+1 ⊆ Vn.

Put V0 := X ×X. Then there exists a pseudometric d with

Vn ⊆ 〈x, y〉 | d(x, y) < 2−n ⊆ Vn−1

for all n ∈ N.

Proof 0. The proof uses the idea outlined above. The main effort will be showing that wecan squeeze 〈x, y〉 | d(x, y) < 2−n between Vn and Vn−1.

1. Put f(x, y) := 2−n iff 〈x, y〉 ∈ Vn \ Vn−1, and let f(x, y) := 0 iff 〈x, y〉 ∈⋂n∈N Vn. Then

f(x, x) = 0, and f(x, y) = f(y, x), because each Vn is symmetric. Define

d(x, y) := inf k∑i=0

f(xi, xi+1) | x0, . . . , xk+1 ∈ X with x0 = x, xk+1 = y, k ∈ N

So we look at all paths leading from x to y, sum the weight of all their edges, and look attheir smallest value. Since we may concatenate a path from x to y with a path from y to zto obtain one from x to z, the triangle inequality holds for d, and since d(x, y) ≤ f(x, y), weknow that Vn ⊆ 〈x, y〉 | d(x, y) < 2−n. The latter set is contained in Vn−1; to show this isa bit tricky and requires an intermediary step.

2. We show by induction on n that

f(x0, xn+1) ≤ 2 ·n∑i=0

f(xi, xi+1),

so if we have a path of length n, then the weight of the edge connecting their endpoints cannotbe greater than twice the weight on an arbitrary path. If n = 1, there is nothing to show. Soassume the assertion is proved for all path with less that n edges. We take a path from x0

to xn+1 with n edges 〈xi, xi+1〉. Let w be the weight of the path from x0 to xn+1, and let kbe the largest integer such that the path from x0 to xk is at most w/2. Then the path fromxk+1 to xn+1 has a weight at most w/2 as well. Now f(x0, xk) ≤ w and f(xk+1, xn+1) ≤ w byinduction hypothesis, and f(xk, xk+1) ≤ w. Let m ∈ N the smallest integer with 2−m ≤ w,then we have 〈x0, xk〉, 〈xk, xk+1〉, 〈xk+1, xn+1〉 ∈ Vm, thus 〈x0, xn+1〉 ∈ Vm−1. This impliesf(x0, xn+1) ≤ 2−(m−1) ≤ 2 · w = 2 ·

∑ni=0 f(xi, xi+1).

3. Now let d(x, y) < 2−n, then f(x, y) ≤ 2−(n−1) by part 2., and hence 〈x, y〉 ∈ Vn−1. a

This has a — somewhat unexpected — consequence because it permits characterizing thoseuniformities, which are generated by a pseudometric.

Proposition 3.6.59 The uniformity u of X is generated by a pseudometric iff u has a count-able base.

Proof Let u be generated by a pseudometric d, then the sets Vd,r | 0 < r ∈ Q are a countablebasis. Let, conversely, b := Un | n ∈ N be a countable base for u. Put V0 := X ×X andV1 := U1, and construct inductively the sequence (Vn)n∈N ⊆ b of symmetric base elementswith Vn Vn Vn ⊆ Vn−1 and Vn ⊆ Un for n ∈ N. Then Vn | n ∈ N is a base for u. In fact,

Page 310: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

298 CHAPTER 3. TOPOLOGICAL SPACES

given U ∈ u, there exists Un ∈ b with Un ⊆ U , hence Vn ⊆ U as well. Construct d for thissequence as above, then we have Vn ⊆ 〈x, y〉 | d(x, y) < 2−n ⊆ Vn−1. Thus the sets Vd,r area base for the uniformity a

Note that this does not translate into an observation of the metrizability of the underlyingtopological space. This space may carry a metric, but the uniform space from which it isderived does not.

Example 3.6.60 Let X be an uncountable set, and let u be the uniformity given by thefinite partitions, see Example 3.6.49. Then we have seen in Example 3.6.52 that the topologyinduced by u on X is the discrete topology, which is metrizable.

Assume that u is generated by a pseudometric, then Proposition 3.6.59 implies that u hasa countable base, thus given a finite partition π, there exists a finite partition π∗ such thatVπ∗ ⊆ Vπ, and Vπ∗ is an element of this base. Here VP1,...,Pn :=

⋃ni=1(Pi × Pi) is the basic

neighborhood for u associated with partition P1, . . . , Pn. But for any given partition π∗ wecan only form a finite number of other partitions π with Vπ∗ ⊆ Vπ, so that we have only acountable number of partitions on X. ,

This is another consequence of Proposition 3.6.58: each uniform space satisfies the separationaxiom T3 1

2. For establishing this claim, we take a closed set F ⊆ X and a point x0 6∈ F , then we

have to produce a continuous function f : X → [0, 1] with f(x0) = 0 and f(y) = 1 for y ∈ A.This is how to do it. Since X \F is open, we find a neighborhood U ∈ u with U [x0] ⊆ X \F .Let dU be the pseudometric associated with U , then 〈x, y〉 | dU (x, y) < 1/2 ⊆ U . Clearly,x 7→ dU (x, x0) is a continuous function on X, hence

f(x) := max0, 1− 2 · dU (x, x0)

is continuous with f(x0) = 1 and f(y) = 0 for y ∈ F , thus f has the required properties.Thus we have shown

Proposition 3.6.61 A uniform space is a T3 12-space; a separated uniform space is completely

regular. a

Cauchy Filters

We generalize the notion of a Cauchy sequence to uniform spaces now. We do this in order toobtain a notion of convergence which includes convergence in topological spaces, and whichcarries the salient features of a Cauchy sequence with it.

First, we note that filters are a generalization for sequences. So let us have a look at whatcan be said, when we construct the filter F for a Cauchy sequence (xn)n∈N in a pseudometricspace (X, d). F has the sets c := Bn | n ∈ N with Bn := xm | m ≥ n as a base. Beinga Cauchy filter says that for each ε > 0 there exists n ∈ N such that Bn × Bn ⊆ Vd,ε; thisinclusion holds then for all Bm with m ≥ n as well. Because c is the base for F, and the setsVd,r are a base for the uniformity, we may reformulate that F is a Cauchy filter iff for eachneighborhood U there exists B ∈ F such that B × B ⊆ U . Now this looks like a propertywhich may be formulated for general uniform spaces.

Page 311: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 299

Fix the uniform space (X, u). Given U ∈ u, the set M ⊆ X is called U -small iff M ×M ⊆ U .Small setsA collection F of sets is said to contain small sets iff given U ∈ u there exists A ∈ F whichis U -small, or, equivalently, given U ∈ u there exists x ∈ X with A ⊆ U [x].

This helps in formulating the notion of a Cauchy filter.

Definition 3.6.62 A filter F is called a Cauchy filter iff it contains small sets.

In this sense, a Cauchy sequence induces a Cauchy filter. Convergent filters are Cauchy filtersas well:

Lemma 3.6.63 If F→ x for some x ∈ X, then F is a Cauchy filter.

Proof Let U ∈ u, then there exists a symmetric V ∈ u with V V ⊆ U . Because U(x) ⊆ F,we conclude V [x] ∈ F, and V [x]× V [x] ⊆ U , thus V [x] is a U -small member of F. a

But the converse does not hold, as the following example shows.

Example 3.6.64 Let u be the uniformity induced by the finite partitions with X infinite.We claim that each ultrafilter F is a Cauchy filter. In fact, let π = A1, . . . , An be a finitepartition, then Vπ =

⋃ni=1Ai × Ai is the corresponding neighborhood, then there exists i∗

with Ai∗ ∈ F. This is so since, if an ultrafilter contains the finite union of sets, it must containone of them. Ai∗ is V -small.

The topology induced by this uniformity is the discrete topology, see Example 3.6.52. Thistopology is not compact, since X is infinite. By Theorem 3.2.11 there are ultrafilters whichdo not converge. ,

If x is an accumulation point of a Cauchy sequence in a pseudometric space, then we knowthat xn → x; this is fairly easy to show. A similar observation can be made for Cauchy filters,so that we have a partial converse to Lemma 3.6.63.

Lemma 3.6.65 Let x be an accumulation point of the Cauchy filter F, then F→ x.

Proof Let V ∈ u be a closed neighborhood; in view of Corollary 3.6.57 is is sufficient to showthat V [x] ∈ F, then it will follow that U(x) ⊆ F. Because F is a Cauchy filter, we find F ∈ Fwith F × F ⊆ V , because V is closed, we may assume that F is closed as well (otherwise wereplace it by its closure). Because F is closed and x is an accumulation point of F, we knowfrom Lemma 3.2.14 that x ∈ F , hence F ⊆ V [x]. This implies U(x) ⊆ F. a

Definition 3.6.66 The uniform space (X, u) is called complete iff each Cauchy filter con-verges.

Each Cauchy sequence converges in a complete uniform space, because the associated filteris a Cauchy filter.

A slight reformulation is given in the following proposition, which is the uniform counterpartto the characterization of complete pseudometric spaces in Proposition 3.5.25. Recall that acollection of sets is said to have the finite intersection property iff each finite subfamily has anon-empty intersection.

Proposition 3.6.67 The uniform space (X, u) is complete iff each family of closed sets whichhas the finite intersection property and which contains small sets has a non-void intersection.

Page 312: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

300 CHAPTER 3. TOPOLOGICAL SPACES

Proof This is essentially a reformulation of the definition, but let’s see.

1. Assume that (X, u) is complete, and let A be a family of closed sets with the finite intersec-tion property, which contains small sets. Hence F0 := F1 ∩ . . .∩Fn | n ∈ N, F1, . . . , Fn ∈ Ais a filter base. Let F be the corresponding filter, then F is a Cauchy filter, for A, hence F0

contains small sets. Thus F→ x, so that U(x) ⊆ F, thus x ∈⋂F∈F F

a ⊆⋂A∈AA.

2. Conversely, let F be a Cauchy filter. Since F a | F ∈ F is a family of closed sets with thefinite intersection property which contains small sets, the assumption says that

⋂F∈F F

a isnot empty and contains some x. But then x is an accumulation point of F by Lemma 3.2.14,so F→ x by Lemma 3.6.65. a

As in the case of pseudometric spaces, compact spaces are derived from a complete unifor-mity.

Lemma 3.6.68 Let (X, u) be a uniform space so that the topology associated with the uni-formity is compact. Then the uniform space (X, u) is complete.

Proof In fact, let F be a Cauchy filter on X. Since the topology for X is compact, the filterhas an accumulation point x by Corollary 3.2.15. But Lemma 3.6.65 tells us then that F→ x.Hence each Cauchy filter converges. a

The uniform space which is derived from an ideal on the powerset of a set, which has beendefined in Example 3.6.49 (part 5) is complete. We establish this first for Cauchy nets as thenatural generalization of Cauchy sequences, and then translate the proof to Cauchy filters.This will permit an instructive comparison of the handling of these two concepts.

Example 3.6.69 Recall the definition of a net on page 221. A net (xi)i∈N in the uniformspace X is called a Cauchy net iff, given a neighborhood U ∈ u, there exists i ∈ N such thatCauchy net〈xj , xγ〉 ∈ U for all j, k ∈ N with j, k ≥ i. The net converges to x iff given a neighborhood Uthere exists i ∈ N such that 〈xj , x〉 ∈ U for j ≥ i.

Now assume that I ⊆ P (X) is an ideal; part 5 of Example 3.6.49 defines a uniformity uIon P (X) which has the sets VI := 〈A,B〉 | A,B ∈ P (X) , A∆B ⊆ I as a base, as I runsthrough I. We claim that each Cauchy net (Fi)i∈N converges to F :=

⋃i∈N

⋂j≥i Fj .

In fact, let a neighborhood U be given; we may assume that U = VI for some ideal I ∈ I.Thus there exists i ∈ N such that 〈Fj , Fk〉 ∈ VI for all j, k ≥ i, hence Fj∆Fk ⊆ I for all thesej, k. Let x ∈ F∆Fj for j ≥ i.

• If x ∈ F , we find i0 ∈ N such that x ∈ Fk for all k ≥ i0. Fix k ∈ N so that k ≥ i andk ≥ i0, which is possible since N is directed. Then x ∈ Fk∆Fj ⊆ I.

• If x 6∈ F , we find for each i0 ∈ N some k ≥ i0 with x 6∈ Fk. Pick k ≥ i0, then x 6∈ Fk,hence x ∈ Fγ∆Fj ⊆ I

Thus 〈F, Fj〉 ∈ VI for j ≥ i, hence the net converges to F . ,

Now let’s investigate convergence of a Cauchy filter. One obvious obstacle in a direct transla-tion seems to be the definition of the limit set, because this appears to be bound to the net’sindices. But look at this. If (xi)i∈N is a net, then the sets Bi := xj | j ≥ i form a filterbase B, as i runs through the directed set N (see the discussion on page 221). Thus we have

Page 313: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 301

defined F in terms of this base, viz., F =⋃B∈B

⋂B. This gives an idea for the filter based

case.

Example 3.6.70 Let uI be the uniformity on P (X) discussed in Example 3.6.69. Then eachCauchy filter F converges. In fact, let B be a base for F, then F→ F with F :=

⋃B∈B

⋂B.

Let U be a neighborhood in uI , and we may assume that U = VI for some I ∈ I. Since Fis a Cauchy filter, we find F ∈ F which is VI -small, hence F∆F ′ ⊆ I for all F, F ′ ∈ F . LetF0 ∈ F , and consider x ∈ F∆F0; we show that x ∈ I by distinguishing these cases:

• If x ∈ F , then there exists B ∈ B such that x ∈⋂B. Because B is an element of base

B, and because F is a filter, B ∩ F 6= ∅, so we find G ∈ B with G ∈ F , in particularx ∈ G. Consequently x ∈ G∆F0 ⊆ I, since F is VI -small.

• If x 6∈ F , we find for each B ∈ B some G ∈ B with x 6∈ G. Since B is a base for F, thereexists B ∈ B with B ⊆ F , so there exists G ∈ F with x 6∈ G. Hence x ∈ G∆F0 ⊆ I.

Thus F∆F0 ⊆ I, hence 〈F, F0〉 ∈ VI . This means F ⊆ VI [F ], which in turn implies U(F ) ⊆ F,or, equivalently, F→ F . ,

For further investigations of uniform spaces, we define uniform continuity as the brand ofcontinuity which is adapted to uniform spaces.

Uniform Continuity

Let f : X → X ′ be a uniformly continuous map between the pseudometric spaces (X, d) and(X ′, d′). This means that given ε > 0 there exists δ > 0 such that, whenever d(x, y) < δ,d′(f(x), f(y)) < ε follows. In terms of neighborhoods, this means Vd,δ ⊆ (f × f)−1

[Vd′,ε

], or,

equivalently, that (f × f)−1[V]

is a neighborhood in X, whenever V is a neighborhood in X ′.We use this formulation, which is based only on neighborhoods, and not on pseudometrics,for a formulation of uniform continuity.

Definition 3.6.71 Let (X, u) and (Y, v) be uniform spaces. Then f : X → Y is calleduniformly continuous iff (f × f)−1

[V]∈ u for all V ∈ v.

Proposition 3.6.72 Uniform spaces form a category with uniform continuous maps as mor-phisms.

Proof The identity is uniformly continuous, and, since (g × g) (f × f) = (g f)× (g f),the composition of uniformly continuous maps is uniformly continuous again. a

We want to know what happens in the underlying topological space. But here nothing unex-pected will happen: a uniformly continuous map is continuous with respect to the underlyingtopologies, formally:

Proposition 3.6.73 If f : (X, u)→ (Y, v) is uniformly continuous, then f : (X, τu)→ (Y, τv)is continuous.

Proof Let H ⊆ Y be open with f(x) ∈ H. If x ∈ f−1[H], there exists a neighborhood

V ∈ v such that V [f(x)] ⊆ H. Since U := (f × f)−1[V]

is a neighborhood in X, andU [x] ⊆ f−1

[H], it follows that f−1

[H]

is open in X. a

Page 314: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

302 CHAPTER 3. TOPOLOGICAL SPACES

The converse is not true, however, as Example 3.5.35 shows.

Before proceeding, we briefly discuss two uniformities on the same topological group whichdisplay quite different behavior, so that the identity is not uniformly continuous.

Example 3.6.74 Let X := fa,b | a, b ∈ R, a 6= 0 be the set of all affine maps fa,b : R→ Rwith the separated uniformities uR and uL, as discussed in Example 3.6.49, part 8.

Let an := dn := 1/n, bn := −1/n and cn := n. Put gn := fan,bn and hn := fcn,dn , jn := h−1n .

Now gn hn = f1,1/n2−1/n → f1,0, hn gn = f1,−1+1/n → f1,−1. Now assume that uR = uL.

Given U ∈ U(e), there exists V ∈ U(e) symmetric such that V R ⊆ UL. Since gn hn → f1,0,there exists for V some n0 such that gn hn ∈ V for n ≥ n0, hence 〈gn, jn〉 ∈ V R, thus〈jn, gn〉 ∈ V R ⊆ UL, which means that hn gn ∈ U for n ≥ n0. Since U ∈ U(e) is arbitrary,this means that hn gn → e, which is a contradiction.

Thus we find that the left and the right uniformity on a topological group are different,although they are derived from the same topology. In particular, the identity (X, uR) →(X, uL) is not uniformly continuous. ,

We will construct the initial uniformity for a family of maps now. The approach is similarto the one observed for the initial topology (see Definition 3.1.14), but since a uniformityis in particular a filter with certain properties, we have to make sure that the constructioncan be carried out as intended. Let F be a family of functions f : X → Yf , where (Yf , vf )is a uniform space. We want to construct a uniformity u on X rendering all f uniformlycontinuous, so u should contain

s :=⋃f∈F(f × f)−1

[V]| V ∈ vf,

and it should be the smallest uniformity on X with this property. For this to work, it isnecessary for s to be a subbase. We check this along the properties from Lemma 3.6.50:

1. Let f ∈ F and V ∈ vf , then ∆Yf ⊆ V . Since ∆X = f−1[∆Yf

], we conclude ∆X ⊆

(f × f)−1[V]. Thus each element of s contains the diagonal of X.

2. Because((f × f)−1

[V])−1

= (f × f)−1[V −1

], we find that, given U ∈ s, there exists

V ∈ s with V ⊆ U−1.

3. Let U ∈ b, so that U = (f × f)−1[V]

for some f ∈ F and V ∈ vf . We find W ∈ vfwith W W ⊆ V ; put W0 := (f × f)−1

[W], then W0 W0 ⊆ (f × f)−1

[W W

]⊆

(f × f)−1[V]

= U , so that we find for U ∈ s an element W0 ∈ s with W0 W0 ⊆ U .

Thus s is the subbase for a uniformity on X, and we have established

Proposition 3.6.75 Let F be a family of maps X → Yf with (Yf , vf ) a uniform space, thenthere exists a smallest uniformity uF on X rendering all f ∈ F uniformly continuous. uF iscalled the initial uniformity on X with respect to F .

Proof We know that s :=⋃f∈F(f × f)−1

[V]| V ∈ vf is a subbase for a uniformity u,

which is evidently the smallest uniformity so that each f ∈ F is uniformly continuous. Souf := u is the uniformity we are looking for. a

Having this tool at our disposal, we can now — in the same way as we did with topologies —define

Page 315: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 303

Product The product uniformity for the uniform spaces (Xi, ui)i∈I is the initial uniformity Producton X :=

∏i∈I Xi with respect to the projections πi : X → Xi.

Subspace The subspace uniformity uA is the initial uniformity on A ⊆ X with respect to Subspacethe embedding iA : x 7→ x.

We can construct dually a final uniformity on Y with respect to a family F of maps f :Xf → Y with uniform spaces (Xf , uf ), for example when investigating quotients. The readeris referred to [Bou89, II.2] or to [Eng89, 8.2].

The following is a little finger exercise for the use of a product uniformity. It takes a pseu-dometric and shows what you would expect: the pseudometric is uniformly continuous iff itgenerates neighborhoods. The converse holds as well. We do not assume here that d generatesu, rather, it is just an arbitrary pseudometric, of which there may be many.

Proposition 3.6.76 Let (X, u) be a uniform space, d : X ×X → R+ a pseudometric. Thend is uniformly continuous with respect to the product uniformity on X ×X iff Vd,r ∈ u for allr > 0.

Proof 1. Assume first that d is uniformly continuous, thus we find for each r > 0 someneighborhood W on X ×X such that

⟨〈x, u〉, 〈y, v〉

⟩∈ W implies |d(x, y)− d(u, v)| < r. We

find a symmetric neighborhood U on X such that U1 ∩ U2 ⊆W , where Ui := (πi × πi)−1[U]

for i = 1, 2, and

U1 = ⟨〈x, u〉, 〈y, v〉

⟩| 〈x, y〉 ∈ U,

U2 = ⟨〈x, u〉, 〈y, v〉

⟩| 〈u, v〉 ∈ U.

Thus if 〈x, y〉 ∈ U , we have⟨〈x, y〉, 〈y, y〉

⟩∈ W , hence d(x, y) < r, so that U ⊆ Vd,r, thus

Vd,r ∈ u.

2. Assume that Vd,r ∈ u for all r > 0, and we want to show that d is uniformly continuous inthe product. If 〈x, u〉, 〈y, v〉 ∈ Vd,r, then

d(x, y) ≤ d(x, u) + d(u, v) + d(v, y)

d(u, v) ≤ d(x, u) + d(x, y) + d(y, v),

hence |d(x, y)− d(u, v)| < 2 · r. Thus (π1 × π1)−1[Vd,r

]∩ (π2 × π2)−1

[Vd,r

]is a neighborhood

on X ×X such that⟨〈x, u〉, 〈y, v〉

⟩∈W implies |d(x, y)− d(u, v)| < 2 · r. a

Combining Proposition 3.6.58 with the observation from Proposition 3.6.76, we have estab-lished this characterization of a uniformity through pseudometrics.

Proposition 3.6.77 The uniformity u is the smallest uniformity which is generated by allpseudometrics which are uniformly continuous on X × X, i.e., u is the smallest uniformitycontaining Vd,r for all such d and all r > 0. a

We fix for the rest of this section the uniform spaces (X, u) and (Y, v). Note that for checkinguniform continuity it is enough to look at a subbase. The proof is straightforward and henceomitted.

Page 316: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

304 CHAPTER 3. TOPOLOGICAL SPACES

Lemma 3.6.78 Let f : X → Y be a map. Then f is uniformly continuous iff (f × f)−1[V]∈

u for all elements of a subbase for v. a

Cauchy filters are preserved through uniformly continous maps (the image of a filter is definedon page 222).

Proposition 3.6.79 Let f : X → Y be uniformly continuous and F a Cauchy filter on X.Then f(F) is a Cauchy filter.

Proof Let V ∈ v be a neighborhood in Y , then U := (f × f)−1[V]

is a neighborhood inX, so that there exists F ∈ F which is U -small, hence F × F ⊆ U , hence (f × f)

[F × F

]=

f[F]× f

[F]⊆ V . Since f

[F]∈ f(F) by Lemma 3.2.5, the image filter contains a V -small

member. a

A first consequence of Proposition 3.6.79 is shows that the subspaces induced by closed setsin a complete uniform space are complete again.

Proposition 3.6.80 If X is separated, then a complete subspace is closed. Let A ⊆ X beclosed and X be complete, then the subspace A is complete.

Note that the first part does not assume that X is complete, and that the second part doesnot assume that X is separated.

Proof 1. Assume that X is a Hausdorff space and A a complete subspace of X. We show∂A ⊆ A, from which it will follow that A is closed. Let b ∈ ∂A, then U ∩ A 6= ∅ for all openneighborhoods U of b. The trace U(b) ∩ A of the neighborhood filter U(b) on A is a Cauchyfilter. In fact, if W ∈ u is a neighborhood for X, which we may choose as symmetric, then((W [b] ∩ A) × (W [b] ∩ A)

)⊆ W ∩ (A × A), which means that W [b] ∩ A) is W ∩ (A × A)-

small. Thus U(b) ∩ A is a Cauchy filter on A, hence it converges to, say, c ∈ A. ThusU(c) ∩ A ⊆ U(b) ∈ A, which means that b = c, since X, and hence A, is Hausdorff as atopological space. Thus b ∈ A, and A is closed by Proposition 3.2.4.

2. Now assume that A ⊆ X is closed, and that X is complete. Let F be a Cauchy filter on A,then iA(F) is a Cauchy filter on X by Proposition 3.6.79. Thus iA(F) → x for some x ∈ X,and since A is closed, x ∈ A follows. a

We show that a uniformly continuous map on a dense subset into a complete and separateduniform space can be extended uniquely to a uniformly continuous map on the whole space.This was established in Proposition 3.5.37 for pseudometric spaces; having a look at theproof displays the heavy use of pseudometric machinery such as the oscillation, and thepseudometric itself. This is not available in the present situation, so we have to restrictourselves to the tools at our disposal, viz., neighborhoods and filters, in particular Cauchyfilters for a complete space. We follow Kelley’s elegant proof [Kel55, p. 195].

Theorem 3.6.81 Let A ⊆ X be a dense subsets of the uniform space (X, u), and (Y, v) be acomplete and separated uniform space. Then a uniformly continuous map f : A → Y can beextended uniquely to a uniformly continous F : X → Y .

Proof 0. The proof starts from the graph 〈a, f(a)〉 | a ∈ A of f and investigates theproperties of its closure in X×Y . It is shown that the closure is a relation which has Aa = X

Plan of theproof

as its domain, and which is the graph of a map, since the topology of Y is Hausdorff. This

Page 317: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.6. A GALLERY OF SPACES AND TECHNIQUES 305

map is an extension F to f , and it is shown that F is uniformly continuous. We also usethe observation that the image of a converging filter under a uniform continuous map is aCauchy filter, so that completeness of Y kicks in when needed. We do not have to separatelyestablish uniqueness, because this follows directly from Lemma 3.3.20.

1. Let Gf := graph(f) = 〈a, f(a)〉 | a ∈ A be the graph of f . We claim that the closure ofthe domain of f is the domain of the closure of Gf . Let x be in the domain of the closure ofGf , then there exists y ∈ Y with 〈x, y〉 ∈ Gaf , thus we find a filter F on Gf with F → 〈x, y〉.Thus π1(F) → x, so that x is in the closure of the domain of f . Conversely, if x is in theclosure of the domain of Gf , we find a filter F on the domain of Gf with F → x. Since f isuniformly continuous, we know that f(F) generates a Cauchy filter G on Y , which convergesto some y. The product filter F × G converges to 〈x, y〉 (see Exercise 3.35), thus x is in thedomain of the closure of Gf .

2. Now let W ∈ v; we show that there exists a neighborhood U ∈ u with this property:if 〈x, y〉, 〈u, v〉 ∈ Gaf , then x ∈ U [u] implies y ∈ W [v]. After having established this, weknow

• Gaf is the graph of a function F . This is so because Y is separated, hence its topologyis Hausdorff. For, assume there exists x ∈ X some y1, y2 ∈ Y with y1 6= y2 and〈x, y1〉, 〈x, y2〉 ∈ Gaf . Choose W ∈ v with y2 6∈ W [y1], and consider U as above. Thenx ∈ U [x], hence y2 ∈W [y1], contradicting the choice of W .

• F is uniformly continuous. The property above translates to finding for W ∈ v aneighborhood U ∈ u with U ⊆ (F × F )−1

[W].

So we are done after having established the statement above.

3. Assume that W ∈ v is given, and choose V ∈ v closed and symmetric with V V ⊆ W .This is possible by Corollary 3.6.57. There exists U ∈ u open and symmetric with f

[U [x]

]⊆

V [f(x)] for every x ∈ A, since f is uniformly continuous. If 〈x, y〉, 〈u, v〉 ∈ Gaf and x ∈ U [u],then U [x] ∩ U [u] is open (since U is open), and there exists a ∈ A with x, u ∈ U [a], since Ais dense. We claim y ∈

(f[U [a]

])a. Let H be an open neighborhood of y, then, since U [a] is

a neighborhood of x, U [a] ×H is a neighborhood of 〈x, y〉, thus Gf ∩ U [a] ×H 6= ∅. Hencewe find y′ ∈ H with 〈x, y′〉 ∈ Gf , which entails H ∩ f

[U [a]

]6= ∅. Similarly, z ∈

(f[U [a]

])a;

note(f[U [a]

])a ⊆ V [f(a)]. But now 〈y, v〉 ∈ V V ⊆ W , hence y ∈ W [v]. This establishesthe claim above, and finishes the proof. a

Let us just have a look at the idea lest it gets lost. If x ∈ X, we find a filter F on A withiA(F) → x. Then f(ia(F)) is a Cauchy filter, hence it converges to some y ∈ Y , which wedefine as F (x). Then it has to be shown that F is well defined, it clearly extends f . It finallyhas to be shown that F is uniformly continuous. So there is a lot technical ground which tobe covered.

We note on closing that also the completion of pseudometric spaces can be translated intothe realm of uniform spaces. Here, naturally, the Cauchy filters defined on the space playan important role, and things get very technical. The interplay between compactness anduniformities yields interesting results as well, here the reader is referred to [Bou89, Chapter II]or to [Jam87].

Page 318: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

306 CHAPTER 3. TOPOLOGICAL SPACES

3.7 Bibliographic Notes

The towering references in this area are [Bou89, Eng89, Kur66, Kel55]; the author had thepleasure of taking a course on topology from one of the authors of [Que01], so this texthas been an important source, too. The delightful Lecture Note [Her06] by Herrlich hasa chapter “Disasters without Choice” which discusses among others the relationship of theAxiom of Choice and various topological constructions. The discussion of the game relatedto Baire’s Theorem in Section 3.5.2 in taken from Oxtoby’s textbook [Oxt80, Sec. 6] onthe duality between measure and category (category in the topological sense introduced onpage 261 above); he attributes the game to Banach and Mazur. Other instances of proofsby games for metric spaces are given, e.g., in [Kec94, 8.H, 21]. The uniformity discussed inExample 3.6.70 has been considered in [Dob89] in greater detail. The section on topologicalsystems follows fairly closely the textbook [Vic89] by Vickers, but see also [GHK+03, AJ94],and for the discussion of dualities and the connection to intuitionistic logics, [Joh82, Gol06].The discussion of Godel’s Completeness Theorem in Section 3.6.1 is based on the originalpaper by Rasiowa and Sikorski [RS50] together with occasional glimpses at [RS63], [CK90,Chapter 2.1],[Sri08, Chapter 4] and [Kop89, Chapter 1.2]. Uniform spaces are discussedin [Bou89, Eng89, Kel55, Que01], special treatises include [Jam87] and [Isb64], the latter oneemphasizing an early categorical point of view.

3.8 Exercises

Exercise 3.1 Formulate and prove an analogue of Proposition 3.1.15 for the final topologyfor a family of maps.

Exercise 3.2 The Euclidean topology on Rn is the same as the product topology on∏ni=1 R.

Exercise 3.3 Recall that the topological space (X, τ) is called discrete iff τ = P (X). Showthat the product

∏i∈I(0, 1,P (0, 1)) is discrete iff the index set I is finite.

Exercise 3.4 Let L := (xn)n∈N ⊆ RN |∑

n∈N |xn| <∞ be all sequences of real numberswhich are absolutely summable. τ1 is defined as the trace of the product topology on

∏n∈NR

on L, τ2 is defined in the following way: A set G is τ2-open iff given x ∈ G, there existsr > 0 such that y ∈ L |

∑n∈N |xn − yn| < r ⊆ G. Investigate whether the identity maps

(L, τ1)→ (L, τ2) and (L, τ2)→ (L, τ1) are continuous.

Exercise 3.5 Define for x, y ∈ R the equivalence relation x ∼ y iff x − y ∈ Z. Show thatR/∼ is homeomorphic to the unit circle. Hint: Example 3.1.19.

Exercise 3.6 Let A be a countable set. Show that a map q : (A B) → (C D) iscontinuous in the topology taken from Example 3.1.5 iff it is continuous, when A B as wellas C D are equipped with the Scott topology.

Exercise 3.7 Let D24 be the set of all divisors of 24, including 1, and define an order von D24 through x v y iff x divides y. The topology on D24 is given through the closureoperator as in Example 3.1.20. Write a Haskell program listing all closed subsets of D24,and determining all filters F with F → 1. Hint: It is helpful to define a type Set withappropriate operations first, see [Dob12a, 4.2.2].

Page 319: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.8. EXERCISES 307

Exercise 3.8 Let X be a topological space, A ⊆ X, and iA : A → X the injection. Showthat x ∈ Aa iff there exists a filter F on A such that iA(F)→ x.

Exercise 3.9 Show by expanding Example 3.3.13 that R with its usual topology is a T4-space.

Exercise 3.10 Given a continuous bijection f : X → Y with the Hausdorff spaces X and Y ,show that f is a homeomorphism, if X is compact.

Exercise 3.11 Let A be a subspace of a topological space X.

1. If X is a T1, T2, T3, T3 12

space, so is A.

2. If A is closed, and X is a T4-space, then so is A.

Exercise 3.12 A function f : X → R is called lower semicontinuous iff for each c ∈ Rthe set x ∈ X | f(x) < c is open. If x ∈ X | f(x) > c is open, then f is calledupper semicontinuous. If X is compact, then a lower semicontinuous map assumes on X itsmaximum, and an upper semicontinuous map assumes its minimum.

Exercise 3.13 Let X :=∏i∈I Xi be the product of the Hausdorff space (Xi)i∈I . Show that

X is locally compact in the product topology iff Xi is locally compact for all i ∈ I, and allbut a finite number of Xi are compact.

Exercise 3.14 Given x, y ∈ R2, define

D(x, y) :=

|x2 − y2|, if x1 = y1

|x2|+ |y2|+ |x1 − y1|, otherwise.

Show that this defines a metric on the plane R2. Draw the open ball y | D(y, 0) < 1 ofradius 1 with the origin as center.

Exercise 3.15 Let (X, d) be a pseudometric space such that the induced topology is T1.Then d is a metric.

Exercise 3.16 Let X and Y be two first countable topological spaces. Show that a mapf : X → Y is continuous iff xn → x implies always f(xn) → f(x) for each sequence (xn)n∈Nin X.

Exercise 3.17 Consider the set C([0, 1)] of all continuous functions on the unit interval, anddefine

e(f, g) :=

∫ 1

0|f(x)− g(x)| dx.

Show that

1. e is a metric on C([0, 1]).

2. C([0, 1)] is not complete with this metric.

3. The metrics d on C([0, 1)] from Example 3.5.2 and e are not equivalent.

Exercise 3.18 Let (X, d) be an ultrametric space, hence d(x, z) ≤ max d(x, y), d(y, z) (seeExample 3.5.2). Show that

• If d(x, y) 6= d(y, z), then d(x, z) = max d(x, y), d(y, z).

Page 320: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

308 CHAPTER 3. TOPOLOGICAL SPACES

• Any open ball B(x, r) is both open and closed, and B(x, r) = B(y, r), whenever y ∈B(x, r).

• Any closed ball S(x, r) is both open and closed, and S(x, r) = S(y, r), whenever y ∈S(x, r).

• Assume that B(x, r) ∩B(x′, r′) 6= ∅, then B(x, r) ⊆ B(x′, r′) or B(x′, r′) ⊆ B(x, r).

Exercise 3.19 Let (X, d) be a metric space. Show that X is compact iff each continuousreal-valued function on X is bounded.

Exercise 3.20 Show that the set of all nowhere dense sets in a topological space X formsan ideal. Define a set A ⊆ X as open modulo nowhere dense sets iff there exists an open setG such that the symmetric difference A∆G is nowhere dense (hence both A \ G and G \ Aare nowhere dense). Show that the open sets modulo nowhere dense sets form an σ-algebra.

Exercise 3.21 Consider the game formulated in Section 3.5.2; we use the notation fromthere. Show that there exists a strategy that Angel can win iff L1 ∩B is of first category forsome interval L1 ⊆ L0.

Exercise 3.22 Let u be the additive uniformity on R from Example 3.6.49. Show that〈x, y〉 | |x− y| < 1/(1 + |y|) is not a member of u.

Exercise 3.23 Show that V U V =⋃〈x,y〉∈U V [x]× V [y] for symmetric V ⊆ X ×X and

arbitrary U ⊆ X ×X.

Exercise 3.24 Given a base b for a uniformity u, show that

b′ := B ∩B−1 | B ∈ b,b′′ := Bn | B ∈ b

are also bases for u, when n ∈ N (recall B1 := B and Bn+1 := B Bn).

Exercise 3.25 Show that the uniformities on a set X form a complete lattice with respectto inclusion. Characterize the initial and the final uniformity on X for a family of functionsin terms of this lattice.

Exercise 3.26 If two subsets A and B in a uniform space (X, u) are V -small then A ∪B isV V -small, if A ∩B 6= ∅.

Exercise 3.27 Show that a discrete uniform space is complete. Hint: A Cauchy filter is anultrafilter based on a point.

Exercise 3.28 Let F be a family of maps X → Yf with uniform spaces (Yf , vf ). Show thatthe initial topology on X with respect to F is the topology induced by the product uniformity.

Exercise 3.29 Equip the product X :=∏i∈I Xi with the product uniformity for the uniform

spaces((Xi, ui)

)i∈I , and let (Y, v) be a uniform space. A map f : Y → X is uniformly

continuous iff πi f : Y → Xi is uniformly continuous for each i ∈ I.

Exercise 3.30 Let X be a topological system. Show that the following statements are equiv-alent

1. X is homeomorphic to SP (Y ) for some topological system Y .

Page 321: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

3.8. EXERCISES 309

2. For all a, b ∈ X] holds a = b, provided we have x |= a⇔ x |= b for all x ∈ X[.

3. For all a, b ∈ X] holds a ≤ b, provided we have x |= a⇒ x |= b for all x ∈ X[.

Exercise 3.31 Show that a Hausdorff space is sober.

Exercise 3.32 Let X and Y be compact topological spaces with their Banach spaces C(X)resp. C(Y ) of real continuous maps. Let f : X → Y be a continuous map, then

f∗ :

C(Y ) → C(X)

g 7→ g f

defines a continuous map (with respect to the respective norm topologies). f∗ is onto iff f isan injection. f is onto iff f∗ is an isomorphism of C(Y ) onto a ring A ⊆ C(X) which containsconstants.

Exercise 3.33 Let L be a language for propositional logic with constants C and V as theset of propositional variables. Prove that a consistent theory T has a model, hence a maph : V → 22 such that each formula in T is assigned the value >. Hint: Fix an ultrafilter onthe Lindenbaum algebra of T and consider the corresponding morphism into 22.

Exercise 3.34 Let G be a topological group, see Example 3.1.25. Given F ⊆ G closed, showthat

1. gF and Fg are closed,

2. F−1 is closed,

3. MF and FM are closed, provided M is finite.

4. If A ⊆ G, then Aa =⋂U∈U(e)AU =

⋂U∈U(e) UA =

⋂U∈τ AU =

⋂U∈τ UA.

Exercise 3.35 Let (X, u) and (Y, v) be uniform spaces with Cauchy filters F and G on Xresp. Y . Define F×G as the smallest filter on X×Y which contains A×B | A ∈ F, B ∈ G.Show that F×G is a Cauchy filter on X × Y with F×G→ 〈x, y〉 iff F→ x and G→ y.

Page 322: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

310 CHAPTER 3. TOPOLOGICAL SPACES

Page 323: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Chapter 4

Measures for ProbabilisticSystems

Markov transition systems are based on transition probabilities on a measurable space. Thisis a generalization of discrete spaces, where certain sets are declared to be measurable. So,in contrast to assuming that we know the probability for the transition between two states,we have to model the probability of a transition going from one state to a set of states:point-to-point probabilities are no longer available due to working in a comparatively largespace. Measurable spaces are the domains of the probabilities involved. This approach hasthe advantage of being more general than finite or countable spaces, but now one deals with afairly involved mathematical structure; all of a sudden the dictionary has to be extended withwords like “universally measurable” or “sub-σ-algebra”. Measure theory becomes an areawhere one has to find answers to questions which did not appear to be particularly involvedbefore, in the much simpler world of discrete measures (the impression should not arise that Iconsider discrete measures as kiddie stuff, they are difficult enough to handle. The continuouscase, as it is called sometimes, offers questions, however, which simply do not arise in thediscrete context). Many arguments in this area are of a measure theoretic nature, and I wantto introduce the reader to the necessary tools and techniques.

It starts off with a discussion of σ-algebras, which have already been met in Section 1.6.We look at the structure of σ-algebras, in particular at its generators; it turns out that theunderlying space has something to say about it. In particular we will deal with Polish spacesand their brethren. Two aspects deserve to be singled out. The σ-algebra on the base spacedetermines a σ-algebra on the space of all finite measures, and, if this space has a topology,it determines also a topology, the Alexandrov topology. These constructions are studied,since they also affect the applications in logic, and for transition systems, in which measuresare vital. Second, we show that we can construct measurable selections, which then enableconstructions which are interesting from a categorical point of view.

After having laid the groundwork with a discussion of σ-algebras as the domains of measures,we show that the integral of a measurable function can be constructed through an approxi-mation process, very much in the tradition of the Riemann integral, but with a larger scope.We also go the other way: given an integral, we construct a measure from it. This is theelegant way P. J. Daniell did propose for constructing measures, and it can be brought to

311

Page 324: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

312 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

fruit in this context for a direct and elegant proof of the Riesz Representation Theorem oncompact metric spaces.

Having all these tools at our disposal, we look at product measures, which can be introducednow through a kind of line sweeping — if you want to measure an area in the plane, youmeasure the line length as you sweep over the area; this produces a function of the abscissa,which then yields the area through integration. One of the main tools here is Fubini’s Theo-rem. The product measure is not confined to two factors, we discuss the general case. Thisincludes a discussion of projective systems, which may be considered as a generalization ofsequences of products. A case study shows that projective systems arise easily in the studyof continuous time stochastic logics.

Now that integrals are available, we turn back and have a look at topologies on spaces of mea-sures; one suggests itself — the weak topology which is induced by the continuous functions.This is related to the Alexandrov topology. It is shown that there is a particularly handymetric for the weak topology, and that the space of all finite measures is complete with thismetric, so that we now have a Polish space. This is capitalized on when discussing selectionsfor set-valued maps into it, which are helpful in showing that Polish spaces are closed underbisimulations. We use measurable selections for an investigation into the structure of quo-tients in the Kleisli monad, providing another example for the interplay of arguments frommeasure theory and categories.

This interplay is stressed also for the investigation of stochastic effectivity functions, whichleads to an interpretation of game logics. Since it is known in the world of relations that non-deterministic Kripke models are inadequate for the study of game logics, and that effectivityfunctions serve this purpose well, we develop an approach to stochastic effectivity functionsand apply these functions to an interpretation of game logics. This serves as an example forstochastic modelling in logics, it demonstrates the close interaction of algebraic and proba-bilistic reasoning, indicating the importance of structural arguments, which go beyond themere discussion of probabilities.

Finally, we take up a true classic: Lp-spaces. We start from Hilbert spaces, apply the rep-resentation of linear functionals on L2 to obtain the Radon-Nikodym Theorem through vonNeumann’s ingenious proof, and derive from it the representation of the dual spaces. Thisis applied to disintegration, where we show that a measure on a product can be decomposedinto a projection and a transition kernel;on the surface this does not look like an applicationarea for Lp-spaces; the relationship derives from the Radon-Nikodym Theorem.

Because we are driven by applications to Markov transition systems and similar objects, wedid not strive for the most general approach to measure and integral. In particular, we usuallyformulate the results for finite or σ-finite measures, leaving the more general cases outside ofour focus. This means also that I do not deal with complex measures (and the associated linearspaces over the complex numbers), but things are discussed in the realm of real numbers; weshow, however, in which way one could start to deal with complex measures when the occasionarises. Of course, a lot of things had to be left out, among them a careful study of the Borelhierarchy and applications to descriptive set theory, as well as martingales.

Page 325: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 313

4.1 Measurable Sets and Functions

This section contains a systematic study of measurable spaces and measurable functions witha view towards later developments. A brief overview is in order, and a preview indicateswhy the study of measurable sets is important, nay, fundamental for discussing many of theapplications we have in mind.

The measurable structure is lifted to the space of finite measures, which form a measurableset under the weak-*-σ-algebra. This is studied in Section 4.1.2. If the underlying spacecarries a topology, the topological structure is handed down to finite measures through theAlexandrov topology. We will have a look at it in Section 4.1.4. The measurable functionsfrom a measurable space to the reals form a vector space, which is also a lattice, and we willshow that the step functions, i.e., those functions which take only finite number of values,are dense with respect to pointwise convergence. This mode of convergence is relaxed inthe presence of a measure in various ways to almost uniform convergence, convergence almosteverywhere, and to convergence in measure (Sections 4.2.1 and 4.2.2), from which also various(pseudo-) metrics and norms may be derived.

If the underlying measurable spaces are the Borel sets of a metric space, and if the metrichas a countable dense set, then the Borel sets are countably generated as well. But theirritating observation is that being countably generated is not hereditary — a sub-σ-algebraof a countable σ-algebra needs not be countably generated. So countably generated σ-algebrasdeserve a separate look, which is what we will do in Section 4.3. The very important class ofPolish spaces will be studied in this context as well, and we will show to manipulate a Polishtopology into making certain measurable functions continuous. This is one of the reasonswhy Polish spaces did not go into the gallery of topological spaces in Section 3.6. Polishspaces generalize to analytic spaces in a most natural manner, for example, when takingthe factor of a countably generated equivalence relation in a Polish space; we will study therelationship in Section 4.3.1. The most important tool here is Souslin’s Separation Theorem.This discussion leads quickly to a discussion of the abstract Souslin operation in Section 4.5,through which analytic sets may be generated in a Polish space. From there it is but a smallstep to introducing universally measurable sets in Section 4.6, which turn out to be closedunder Souslin’s operation in general measurable spaces.

Two applications of these techniques are given: Lubin’s Theorem extends a measure froma countably generated sub-σ-algebra of the Borel sets of an analytic space to the Borel setsproper, and the extension of a transition kernel to the universal completion (Sections 4.6.1and 4.6.2). Lubin’s Theorem is established through von Neumann’s Selection Theorem, whichprovides a universally measurable right inverse to a surjective measurable map from an ana-lytic space to a separable measurable space. The topic of selections is taken up in Section 4.7,where the selection theorem due to Kuratowski and Ryll-Nardzewski is in the center of at-tention. It gives conditions under which a map which takes values in the closed non-emptysubsets of a Polish space has a measurable selector. This is of interest, e.g., when it comes toestablish the existence of bisimulations for Markov transition systems, or for identifying thequotient structure of transition kernels.

Page 326: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

314 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

4.1.1 Measurable Sets

Recall from Example 2.1.12 that a measurable space (X,A) consists of a set X with a σ-algebra A, which is an Boolean algebra of subsets of X that is closed under countable unions(hence countable intersections or countable disjoint unions). If A0 is a family of subsets ofX, then

σ (A0) =⋂B | B is a σ-algebra on M with A0 ⊆ A

is the smallest σ-algebra on M which contains A0. This construction works since the powerset P(X) is a σ-algebra on X. Take for example as a generator I all open intervals in thereal numbers R, then σ(I) is the σ-algebra of real Borel sets. These Borel sets are denotedby B(R), and, since each open subset of R can be represented as a countable union of openintervals, B(R) is the smallest σ-algebra which contains the open sets of R. Unless otherwisestated, the real numbers are equipped with the σ-algebra B(R).

In general, if (X, τ) is a topological space, the σ-algebra B(τ) := σ(τ) are called its Borelsets. They will be discussed extensively in the context of Polish spaces. This is, however, notB(τ)the only σ-algebra of interest on a topological space.

Example 4.1.1 Call F ⊆ X functionally closed iff F = f−1[0]

for some continuous func-tion f : X → R, G ⊆ X is called functionally open iff G = X \ F with F functionally closed.The Baire sets Ba(τ) of (X, τ) are the σ-algebra generated by the functionally closed sets ofBa(τ)the space. We write sometimes also Ba(X), if the context is clear.

Let F ⊆ X be a closed subsets of a metric space (X, d), then d(x, F ) := infd(x, y) | y ∈ Fis the distance of x to F with x ∈ F iff d(x, F ) = 0, see Lemma 3.5.7. Moreover, d(·, F ) iscontinuous, thus F = d(·, F )−1

[0]

is functionally closed, hence the Baire and the Borel setscoincide for metric spaces.

Note that |d(x, F )− d(y, F )| ≤ d(x, y), so that d(·, F ) is even uniformly continuous. ,

The next example constructs a σ-algebra which comes up quite naturally in the study ofstochastic nondeterminism.

Example 4.1.2 Let A ⊆ P (X) for some set X, the family of hit sets, and G a distinguishedsubsets of P (X). Define the hit-σ-algebra HA(G) as the smallest σ-algebra on G whichcontains all the sets HA with A ∈ A, where HA is the hit set associated with A, i.e., HA :=B ∈ G | B ∩A 6= ∅. ,

Rather than working with a closure operation σ(·), one sometimes can adjoin additionalelements to obtain a σ-algebra from a given one, see also Exercise 4.5. This is demonstratedfor a σ-ideal through the following construction, which will be helpful when completing ameasure space. Recall that N ⊆ P (X) is a σ-ideal iff it is an order ideal which is closedunder countable unions (Definition 1.6.10).

Lemma 4.1.3 Let A be a σ-algebra on a set X, N ⊆ P (X) a σ-ideal. Then

AN := A∆N | A ∈ A, N ∈ N

is the smallest σ-algebra containing both A and N .

Page 327: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 315

Proof Is is sufficient to demonstrate that AN is a σ-algebra. Because

X \ (A∆N) = X∆(A∆N) = (X∆A)∆N = (X \A)∆N,

we see that AN is closed under complementation. Now let(An∆Nn

)n∈N be a sequence of

sets with (An)n∈N in A and (Nn)n∈N in N , we have⋃n∈N

(An∆Nn) =(⋃n∈N

An)∆N

with

N =⋃n∈N

(An∆Nn)∆(⋃n∈N

An) (‡)⊆⋃n∈N

((An∆Nn)∆An

)=⋃n∈N

Nn,

using Exercise 4.10 in (‡). Because N is a σ-ideal, we conclude that N ∈ N . Thus AN isalso closed under countable unions. Since ∅, X ∈ AN , it follows that this set is a σ-algebraindeed. a

If (Y,B) is another measurable space, then a map f : X → Y is called A-B-measurable iff theinverse image under f of each set in B is a member of A, hence iff f−1

[G]∈ A holds for all

G ∈ B; this is discussed in Example 2.1.12.

Checking measurability is made easier by the observation that it suffices for the inverse imagesof a generator to be measurable sets (see Exercise 2.7).

Lemma 4.1.4 Let (X,A) and (Y,B) be measurable spaces, and assume that B = σ(B0) isgenerated by a family B0 of subsets of Y . Then f : X → Y is A-B-measurable iff f−1

[G]∈ A

holds for all G ∈ B0.

Proof Clearly, if f is A-B-measurable, then f−1[G]∈ A holds for all G ∈ B0.

Conversely, suppose f−1[G]∈ A holds for all G ∈ B0, then we need to show that f−1

[G]∈ A

for all G ∈ B. We will use the principle of good sets (see page 62) for the proof. In fact,consider the set G for which the assertion is true,

G := G ∈ B | f−1[G]∈ A.

An elementary calculation shows that the empty set and Y are both members of G, and sincef−1

[Y \G

]= X \ f−1

[G], G is closed under complementation. Because

f−1[⋃

i∈I Gi]

=⋃i∈I f

−1[Gi]

holds for any index set I, G is closed under finite and countable unions. Thus G is a σ-algebra,so that σ(G) = G holds. By assumption, B0 ⊆ G, so that

A = σ(B0) ⊆ σ(G) = G ⊆ A

is inferred. Thus all elements of B have their inverse image in A. a

An example is furnished by a real valued function f : X → R on X which is A-B(R)-measurable iff x ∈ X | f(x) ./ t ∈ A holds for each t ∈ R; the relation ./ may be takenfrom <,≤,≥, > . We infer in particular that a function f from an topological space (X, τ)which is upper or lower semicontinuous (i.e., for which in the upper semicontinuous case the set

Page 328: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

316 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

x ∈ X | f(x) < c is open, and in the lower semicontinuous case the set x ∈ X | f(x) > cis open, c ∈ R being arbitrary), is Borel measurable. Hence a continuous function is Borelmeasurable. A continuous function f : X → Y into a metric space Y is Baire measurable(Exercise 4.2).

These observations will be used frequently.

The proof’s strategy is worthwhile repeating, since we will use this strategy over and overagain. It consists in having a look at all objects that have the desired property, and to showthat this set of good guys is a σ-algebra. It is similar to showing in a proof by inductionthat the set of all natural numbers having a certain property is closed under constructing thesuccessor. Then we show that the generator of the σ-algebra is contained in the good guys,which is rather similar to begin the induction. Taking both steps together then yields thedesired properties for both cases.

An example is furnished by the equivalence relation induced by a family of sets.

Example 4.1.5 Given a subset C ⊆ P (X) for a set X, define the equivalence relation ≡Con X upon setting

x ≡C x′ x ≡C x′ iff ∀C ∈ C : x ∈ C ⇔ x′ ∈ C.

Thus x ≡C x′ iff C cannot separate the elements x and x′; call ≡C the equivalence relationgenerated by C.

Now let A be a σ-algebra on X with A = σ(A0). Then A and A0 generate the sameequivalence relation, i.e., ≡A = ≡A0 . In fact, define for x, x′ ∈ X with x ≡A0 x

B := A ∈ A | x ∈ A⇔ x′ ∈ A

Then B is a σ-algebra with A0 ⊆ B, hence σ(A0) ⊆ B ⊆ A, so that σ(A0) = B. Thus x ≡A0 x′

implies x ≡A x′; since the reverse implication is obvious, the claim is established. ,

Let us just briefly discuss initial and final σ-algebras again. The spirit of this is very muchsimilar to defining initial and final topologies, see Section 3.1.1, and Definition 2.6.42. If(X,A) is a measurable space and f : X → Y is a map, then

B := D ⊆ Y | f−1[D]∈ A

is the largest σ-algebra B0 on N that renders f A-B0-measurable; then B is called the finalσ-algebra with respect to f . In fact, because the inverse set operator f−1 is compatiblewith the Boolean operations, it is immediate that B is closed under the operations for a σ-algebra, and a little moment’s reflection shows that this is also the largest σ-algebra with thisproperty.

Symmetrically, let g : P → X be a map, then

g−1[A]

:= g−1[E]| E ∈ A

is the smallest σ-algebra P0 on P that renders g : P0 → A measurable; accordingly, g−1[M]

iscalled initial with respect to f . Similarly, g−1

[A]

is a σ-algebra, and it is fairly clear that thisis the smallest one with the desired property. In particular, the inclusion iQ : Q→ X becomes

Page 329: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 317

measurable for a subset Q ⊆ X when Q is endowed with the σ-algebra Q∩B | B ∈ A. It iscalled the trace of A on Q and is denoted — in a slight abuse of notation — by A∩Q.

Initial and final σ-algebras generalize in an obvious way to families of maps. For example,σ(⋃

i∈I g−1i

[Ai])

is the smallest σ-algebra P0 on P which makes all the maps gi : P → Xi

P0-Ai-measurable for a family ((Xi,Ai))i∈I of measurable spaces.

This is an intrinsic, universal characterization of the initial σ-algebra for a single map.

Lemma 4.1.6 Let (X,A) be a measurable space and f : X → Y be a map. The followingconditions are equivalent:

1. The σ-algebra B on Y is final with respect to f .

2. If (P,P) is a measurable space, and g : Y → P is a map, then the A-P-measurabilityof g f implies the B-P-measurability of g.

Proof 1. Taking care of 1 ⇒ 2, we note that

(g f)−1[P]

= f−1[g−1[P]]⊆ A.

Consequently, g−1[P]

is one of the σ-algebras B0 with f−1[B0

]⊆ A. Since B is the largest

of them, we have g−1[P]⊆ B. Hence g is B-P-measurable.

2. In order to establish the implication 2 ⇒ 1, we have to show that B0 ⊆ B whenever B0

is a σ-algebra on Y with f−1[B0

]⊆ A. Put (P,P) := (Y,B0), and let g be the identity idY .

Because f−1[B0

]⊆ A, we see that idY f is B0-A-measurable. Thus idY is B-B0-measurable.

But this means B0 ⊆ B. a

We will use the final σ-algebra mainly for factoring through an equivalence relation. In fact,let α be an equivalence relation on a set X, where (X,A) is a measurable space. Then thefactor map

ηα :

X → X/α

x 7→ [x]α

that maps each element to its class can be made a measurable map by taking the final σ-algebra A/α with respect to ηα and A as the σ-algebra on X/α.

Dual to Lemma 4.1.6, the initial σ-algebra is characterized.

Lemma 4.1.7 Let (Y,B) be a measurable space and f : X → Y be a map. The followingconditions are equivalent:

1. The σ-algebra A on X is initial with respect to f .

2. If (P,P) is a measurable space, and g : P → X is a map, then the P-B-measurabilityof f g implies the P-A-measurability of g.

a

Let ((Ai,Ai))i∈I be a family of measurable spaces, then the product-σ-algebra⊗

i∈I Ai de-notes that initial σ-algebra on

∏i∈I Xi for the projections

πj : 〈mi | i ∈ I〉 7→ mj .

Page 330: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

318 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

It is not difficult to see that⊗

i∈I Ai = σ(Z) with

Z := ∏i∈I

Ei | ∀i ∈ I : Ei ∈Mi, Ei = Mi for almost all indices

as the collection of cylinder sets (use Theorem 1.6.30 and the observation that Z is closedunder intersection).

For I = 1, 2, the σ-algebra A1⊗A2 is generated from the set of measurable rectangles

E1 × E2 | E1 ∈ A1, E2 ∈ A2.

This is discussed in Example 2.2.4 as the example of a product in the category of measurablespaces.

Dually, the sum (X1 +X2,A1 +A2) of the measurable spaces (X1,A1) and (X2,A2) is definedthrough the final σ-algebra on the sum X1 + X2 for the injections Xi → X1 + X2. This isthe special case of the coproduct

⊕i∈I(Xi,Ai), where the σ-algebra

⊕i∈I Ai is initial with

respect to the injections. This is discussed in a general context in Example 2.2.16.

We will construct horizontal and vertical cuts from subsets of a Cartesian product, e.g., whendefining the product measure, and for investigating measurability properties. We define forQ ⊆ X × Y the horizontal cutQx, Q

y

Qx := y ∈ Y | 〈x, y〉 ∈ Q,

and the vertical cutQy := x ∈ X | 〈x, y〉 ∈ Q.

Lemma 4.1.8 Let (X,A) and (Y,B) be measurable spaces. If Q ∈ A ⊗ B, then Qx ∈ B andQy ∈ A holds for x ∈ X, y ∈ Y .

Proof Take the vertical cut Qx and consider the set

Q := Q ∈ A⊗ B | Qx ∈ B.

Then A×B ∈ Q, whenever A ∈ A, B ∈ B; this is so since the set of all measurable rectanglesforms a generator for the product σ-algebra which is closed under finite intersections. Because(X × Y ) \ Q)x = Y \ Qx, we infer that Q is closed under complementation, and because(⋃n∈NQn)x =

⋃n∈NQn,x, we conclude that Q is closed under disjoint countable unions.

Hence Q = A⊗ B by the π-λ-Theorem 1.6.30. a

The converse does not hold. We cannot conclude from the fact for a subset S ⊆ X ×Y that S is product measurable whenever all its cuts are measurable; this follows fromExample 4.3.7.

4.1.2 A σ-Algebra On Spaces Of Measures

We will now introduce a σ-algebra on the space of all σ-finite measures. It is induced byevaluating measures at fixed events. Note the inversion: instead of observing a measure

Page 331: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 319

assigning a real number to a set, we take a set and have it act on measures. This approachis fairly natural for many applications.

In addition to S resp. P, the functors which assign to each measurable space its subprobabil-ities and its probabilities (see Example 2.3.12), we introduce the space of finite resp. σ-finitemeasures.

Denote by M(X,A) the set of all finite measures on (X,A), the set of all σ-finite measures M,Mσ

is denoted by Mσ(X,A). Each set A ∈ A gives rise to the evaluation map evA : µ 7→ µ(A);the weak-*-σ-algebra ℘℘℘(X,A) on M(X,A) is the initial σ-algebra with respect to the familyevA | A ∈ A (actually, it suffices to consider a generator A0 of A, see Exercise 4.1). This is ℘℘℘(X,A)just an extension of the definitions given in Example 2.1.14. It is clear that we have

℘℘℘(X,A) = σ(βββA(A, ./ q) | A ∈ A, q ∈ R+)

when we defineβββA(A, ./ q) := µ ∈M(X,A) | µ(A) ./ q.

Here ./ is one of the relational operators ≤, <,≥, >, and it apparent that q may be taken βββA(A, ./ q)from the rationals. We will use the same symbol βββA when we refer to probabilities or sub-probabilities, if no confusion arises. Thus the base space from which the weak-*-σ-algebrawill be constructed should be clear from the context. We will also write ℘℘℘(X) or ℘℘℘(A) for℘℘℘(X,A), as the situation demands.

Let (Y,B) be another measurable space, and let f : X → Y be A-B-measurable. Define

M(f)(µ)(B) := µ(f−1[B])

for µ ∈M(X,A) and for B ∈ B, then M(f)(µ) ∈M(Y,B), hence M(f) : M(X,A)→M(Y,B) M(f)is a map, and since

(M(f))−1[βββB(B, ./ q)

]= βββA(f−1

[B], ./ q),

this map is℘℘℘(A)-℘℘℘(B)-measurable, see Exercise 2.8. Thus M is an endofunctor on the categoryof measurable spaces.

Measurable maps into Mσ(·) deserve special attention.

Definition 4.1.9 Given measurable spaces (X,A) and (Y,B), an A-℘℘℘(B) measurable mapK : X → Mσ(Y,B) is called a transition kernel and denoted by K : (X,A) (Y,B). A X Ytransition kernel with values in S(Y,B) is also called a stochastic relation.

A transition kernel K : (X,A) (Y,B) associates to each x ∈ X a σ-finite measure K(x)on (Y,B). In a probabilistic setting, this may be interpreted as the probability that a systemreacts on input x with K(x) as the probability distribution of its responses. For example, if(X,A) = (Y,B) is the state space of a probabilistic transition system, then K(x)(B) is ofteninterpreted as the probability that the next state is a member of measurable set B after atransition from x.

This is an immediate characterization of transition kernels.

Page 332: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

320 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Lemma 4.1.10 K : (X,A) (Y,B) is a transition kernel iff these conditions are satisfied

1. K(x) is a σ-finite measure on (Y,B) for each x ∈ X.

2. x 7→ K(x)(B) is a measurable function for each B ∈ B.

Proof If K : (X,A) (Y,B), then K(x) is a σ-finite measure on (Y,B), and

x ∈ X|K(x)(B) > q = K−1[βββB(B,> q)

]∈ A.

Thus x 7→ K(x)(B) is measurable for all B ∈ B. Conversely, if x 7→ K(x)(B) is measurable forB ∈ B, then the above equation shows that K−1

[βββB(B,> q)

]∈ A, so K : (X,A)→Mσ(Y,B)

is A-℘℘℘(B) measurable by Lemma 4.1.4. a

A special case of transition kernels are Markov kernels, sometimes also called stochastic rela-tions. These are kernels the image of which is in S or in P, whatever the case may be. Weencountered these Markov kernels already in Example 2.4.8 as the Kleisli morphisms for theGiry monad.

Example 4.1.11 Transition kernels may be used for interpreting modal logics. Consider thisgrammar for formulas

ϕ ::= > | ϕ1 ∧ ϕ2 | 3qϕ

with q ∈ Q, q ≥ 0. The informal interpretation in a probabilistic transition system is that >always holds, and that 3qϕ holds with probability not smaller that q after a transition in astate in which formula ϕ holds. Now let M : (X,A) (X,A) be a transition kernel, anddefine inductively

[[>]]M := X

[[ϕ1 ∧ ϕ2]]M := [[ϕ1]]M ∩ [[ϕ2]]M

[[3qϕ]]M := x ∈ X |M(x)([[ϕ]]M ) ≥ q= M−1

[βββA(B,≥ q)

]It is easy to show by induction on the structure of the formula that the sets [[ϕ]]M are mea-surable, since M is a transition kernel. For a generalization, see Example 4.2.7. ,

4.1.3 Case Study: Stochastic Effectivity Functions

This section will discuss stochastic effectivity functions as a generalization of stochastic tran-sition kernels resp. stochastic relations. The reasons for this discussion are as follows:

Nondeterminism Stochastic effectivity functions are the probabilistic versions of effectiv-ity functions which are well understood (for a historic overview, the reader is referredto [vdHP07, Section 9] or to [Dob14, Section 1]). Since these functions model nondeter-minism fairly well, their stochastic extensions are also a prime candidate for modellingapplications which incorporate stochastic nondeterminism. This argument is studied indepth in [DT41].

Confluence The confluence of categorical and stochastic reasoning can be studied in thisinstance, for example when introducing morphisms and congruences. This cooperation

Page 333: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 321

is the more interesting as these stochastic effectivity functions may be thought of as thecomposition of monads, but seem not to be the functorial part of a monad themselves.Hence it becomes mandatory to fine tune the arguments which are already available inthe context of, say, Kleisli morphisms. Extending this argument, the approach proposedhere is an exercise in stochastic modelling.

Tool Stochastic effectivity functions are proposed as the tool for interpreting game logicstochastically. It is argued below that Kripke models are not suitable for interpretingthis logic in the usual, non-stochastic realm, hence we have to look for other stochasticmethods to interpret this logics. A first proposal in this direction will be discussed inSection 4.9.4. Thus the discussion below may also be perceived as sharpening the tools,which later on will be put to use in the context of game logics.

For an interpretation of game logic as formulated in Example 2.7.5, Parikh and Pauly [Par85]use effectivity functions which are closely related to neighborhood relations [Pau00]; this issketched in Section 2.7.1, in particular in Example 2.7.22. It is shown that Kripke modelsare not adequate for interpreting game logics. These arguments should be taken care of for astochastic interpretation as well, so we need an extension to stochastic Kripke models, i.e., toKripke models which are based on stochastic transition systems. Since effectivity functionshave been shown to be useful as the formalism underlying neighborhood models, we willformulate here stochastic effectivity functions as a stochastic counterpart, and for furtherdevelopment.

When constructing such a probabilistic interpretation, we will take distributions over thestate space into account — thus, rather than working with states directly, we will workwith probabilities over them. As with the interpretation of modal logics through stochasticKripke models, it may well be that some information gets lost, so we choose to work withsubprobabilities rather than probabilities. Taking into account that Angel may be able tobring about a specific distribution of the new states when playing game γ in state s, wepropose that we model Angel’s effectivity by a set of distributions; this is remotely similar tothe idea of gambling houses in [DS65]. For example, Angel may have a strategy for achievinga normal distribution N (s, σ2) centered at s ∈ R such that the standard distribution variesin an interval I, yielding N (s, σ2) | σ ∈ I as a set of distributions effective for Angel inthat situation.

But we cannot do with just arbitrary subsets of the set of all subprobabilities on state spaceS. We want also to characterize possible outcomes, i.e., sets of distributions over the statespace for composite games. Hence we will want to average over intermediate states. Thisin turn requires measurability of the functions involved, both for the integrand and for themeasure used for integration. Consequently we require measurable sets of subprobabilities aspossible outcomes. We also impose a condition for measurability on the interplay betweendistributions on states and reals for measuring the probabilities of sets of states. This leadsto the definition of a stochastic effectivity function.

Modeling all this requires some preparations by fixing the range of a stochastic effectivityfunction. Put for a measurable space (S,A)

E(S,A) := V ⊆ ℘℘℘(S,A) | V is upper closed

Page 334: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

322 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

thus if V ∈ E(S,A), then A ∈ V and A ⊆ B together imply B ∈ V ; note that E = V ′S withV ′ as the restriction of V to the weak-*-measurable sets, V being defined in Example 2.3.13.Recall that ℘℘℘(S,A) is the weak-*-σ-algebra on the set of finite measures on the measurablespace (X,A), i.e., the initial σ-algebra with respect to evaluation, see Section 4.1.2, page 318.A measurable map f : (S,A) → (T,B) induces a map E(f) : E(S,A) → E(T,B) uponsetting

E(f)(V ) := W ∈ ℘℘℘(T,B) | (Sf)−1[W]∈ V

for V ∈ E(S), then clearly E(f)(V ) ∈ E(T ).

Note that E(S,A) has not been equipped with a σ-algebra, so the usual notion of measura-bility between measurable spaces cannot be applied. In particular, E is not an endofunctoron the category of measurable spaces. We will not discuss functorial aspects of E here, butrather refer the reader to the discussion in Section 2.3.1, in particular Example 2.3.13.

We need some measurability properties for dealing with the composition of distributionswhen discussing composite games. Let H ⊆ S(S)× [0, 1] be a measurable subset indicating aquantitative assessment of subprobabilities; a typical example could be the set 〈µ, q〉 | µ ∈βββA(A,> q), q ∈ C for some A ∈ A and some C ∈ B([0, 1]). Fix some real q and consider thecut of H at q, viz.,

Hq = µ | 〈µ, q〉 ∈ H,

for example,

βββA(A,> q) = 〈µ, r〉 | µ(A) > rq.

(see Example 4.2.6). We ask for all states s such that this set Hq is effective for s. Theyshould come from a measurable subset of S. It turns out that this is not enough, we alsorequire the real components being captured through a measurable set as well — after all, thereal component will be used to be averaged, i.e., integrated, over later on, so it should behavedecently. This idea is formulated in the following definition.

Definition 4.1.12 Call a map P : S → E(S) t-measurable iff 〈s, q〉 | Hq ∈ P (s) ∈A ⊗ [0, 1] whenever H ∈ ℘℘℘(S)⊗ [0, 1].

t-measur-able

Summarizing, we are led to the notion of a stochastic effectivity function.

Definition 4.1.13 A stochastic effectivity function P on a measurable space S is a t-measur-able map P → E(S).

In order to distinguish between sets of states and sets of state distributions we call the latterones portfolios; thus P (s) is a set of measurable portfolios. This will render some discussionsPortfoliobelow easier. By the way, stochastic effectivity functions between measurable spaces S andT could be defined in a similar way, but this added generality is not of interest in the presentcontext. Note that an effectivity function is not given by an endofunctor on the category ofmeasurable spaces, so we will have to assemble what we need from this context without beingable to directly refer to coalgebras or similar constructions.

We show that a finite transition system can be converted into a stochastic effectivity func-tion.

Page 335: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 323

Example 4.1.14 Let S := 1, . . . , n for some n ∈ N, and take the power set as a σ-algebra.Then S(S) can be identified with the compact convex set

Πn := 〈x1, . . . , xn〉 | xi ≥ 0 for 1 ≤ i ≤ n,n∑i=1

xi ≤ 1.

Geometrically, Πn is the positive convex hull of the unit vectors ei, 1 ≤ i ≤ n and the zerovector; here ei(i) = 1, and ei(j) = 0 if i 6= j is the i-th n-dimensional unit vector. Theweak-*-σ-algebra ℘℘℘(S) is the Borel-σ-algebra B(Πn) for the Euclidean topology on Πn.

Assume we have a transition system →S on S, hence a relation →S ⊆ S × S. Let succ(s) :=s′ ∈ S | s→S s

′ be the set of a successor states for state s, and define for s ∈ S the set ofweighted successors

κ(s) := ∑s′∈succ(s)

αs′ · es′ | Q 3 αs′ ≥ 0 for s′ ∈ succ(s),∑

s′∈succ(s)

αs′ ≤ 1

and the upper closed setP (s) := A ∈ B(Πn) | κ(s) ⊆ A.

A set A is in the portfolio for P in state s if A contains all rational subprobability distributionson the successor states of s. We will restrict our attention to these rational distributions.

We claim that P is an effectivity function on S. If P (s) = ∅, there is nothing to show, so weassume that always P (s) 6= ∅. Let H ∈ B(Πn)⊗B([0, 1]) = B(Πn⊗ [0, 1]), the latter equalityholding by Proposition 4.3.16. Then

〈s, q〉 | Hq ∈ P (s) =⋃

1≤s≤ns × q ∈ [0, 1] | Hq ∈ P (s).

Fix s ∈ S, and let succ(s) = s1, . . . , sm. Put

Ωm := 〈α1, . . . , αm〉 ∈ Qm | αi ≥ 0,∑i

αi ≤ 1,

hence Ωm is countable, and

q ∈ [0, 1] | Hq ∈ P (s) = q ∈ [0, 1] | k(s) ⊆ Hq

=⋂

〈α1,...,αm〉∈Ωm

q ∈ [0, 1] |∑i

αi · eji ∈ Hq.

Now fix α := 〈α1, . . . , αm〉 ∈ Ωm. The map ζα : [0, 1]m·n → [0, 1]n which maps 〈v1, . . . , vm〉 to∑mi=1 αi ·vi is continuous, hence measurable, so is ξ := ζα× id[0,1] : [0, 1]m·n× [0, 1]→ [0, 1]n×

[0, 1]. Hence I := ξ−1[H]∈ B([0, 1]m·n× [0, 1]), and

∑mi=1 αi ·eji ∈ Hq iff 〈ej1 , . . . , ejm , q〉 ∈ I.

Consequently,

q ∈ [0, 1] |∑i

αi · eji ∈ Hq = I〈ej1 ,...,ejm 〉 ∈ B([0, 1]).

But this implies that

q ∈ [0, 1] | Hq ∈ P (s) =⋂

α∈Ωm

q ∈ [0, 1] |∑i

αi · eji ∈ Hq ∈ B([0, 1])

Page 336: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

324 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

for the fixed state s ∈ S. Collecting states, we obtain

〈s, q〉 ∈ S × [0, 1] | Hq ∈ P (s) ∈ P (S)⊗ B([0, 1]).

Thus we have converted a finite transition system into a stochastic effectivity function byconstructing all subprobabilities over the respective successor sets with rational coefficients.

One might ask whether the restriction to rational coefficients is really necessary. Takingthe convex closure with real coefficients might, however, result in loosing measurability,see [Kec94, p. 216]. ,

The next example shows that a stochastic effectivity function can be used for interpreting asimple modal logic.

Example 4.1.15 Let Φ be a set of atomic propositions, and define the formulas of a logicthrough this grammar

ϕ ::= > | p | ϕ1 ∧ ϕ2 | 3qϕ

with p ∈ Φ an atomic proposition and q ∈ [0, 1] a threshold value. Intuitively, 3qϕ is true ina state s iff there can be a move in s to a state in which ϕ holds with probability not smallerthan q.

This logic is interpreted over the measurable space (S,A); assume that we are given a mape : Φ→ A, assigning each atomic proposition a measurable set as its validity set. Let P be astochastic effectivity function on S, then define inductively

[[>]] := S,

[[p]] := e(p), for p ∈ Φ,[[ϕ1 ∧ ϕ2]] := [[ϕ1]] ∩ [[ϕ2]],

[[3qϕ]] := s ∈ S | βββA([[ϕ]], > q) ∈ P (s).

The interesting line is of course the last one. It assigns to 3qϕ all states s such that βββA([[ϕ]], >q) is in the portfolio of P (s). These are all states for which the collection of all measuresyielding an evaluation on [[ϕ]] greater than q can be achieved.

Then t-measurability of P and the assumption on e make sure that these sets are measurable.This is shown by an easy induction on the structure of the formulas, and by observing that[[3qϕ]] = 〈s, r〉 ∈ S × [0, 1] | βββA([[ϕ]], > r) ∈ P (s)q. ,

We fix for the time being a measurable space (S,A); if a second measurable space enters thediscussion, it will be T as carrier with σ-algebra B.

The relationship of stochastic relations and stochastic effectivity functions is of considerableinterest; we will discuss stochastic Kripke models and general models for game logic in Sec-tion 4.9.4. Each stochastic relation K : S S yields a stochastic effectivity function PK ina natural way upon setting

PK(s) := A ∈ ℘℘℘(S) | K(s) ∈ A. (4.1)

Thus a portfolio in PK(s) is a measurable subset of S(S) which contains K(s). We ob-serve

Page 337: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 325

Lemma 4.1.16 PK : S → E(S) is t-measurable, whenever K : S S is a stochasticrelation.

Proof Clearly, PK(s) is upper closed for each s ∈ S. Put TH := 〈s, q〉 | Hq ∈ PK(s) forH ⊆ S(S)× [0, 1]. Thus

〈s, q〉 ∈ TH ⇔ K(s) ∈ Hq ⇔ 〈K(s), q〉 ∈ H ⇔ 〈s, q〉 ∈ (K × id[0,1])−1[H].

Because K × id[0,1] : S × [0, 1] → S(S) × [0, 1] is a measurable function, H ∈ ℘℘℘(S) ⊗ [0, 1]implies TH ∈ B(S ⊗ [0, 1]). Hence PK is t-measurable. a

This argument extends easily to countable families of stochastic relations.

Corollary 4.1.17 Assume F = Kn | n ∈ N is a countable family of stochastic relationsKn : S S, then

s 7→ A ∈ ℘℘℘(S) | Kn(s) ∈ A for some n ∈ N,s 7→ A ∈ ℘℘℘(S) | Kn(s) ∈ A for all n ∈ N

define stochastic effectivity functions on S. a

The following example will be of use later on. It shows that we have always a stochasticeffectivity function, albeit a fairly trivial one.

Example 4.1.18 Let D : S S be the Dirac relation x 7→ δx with δx as the Dirac measureon x, see Example 2.4.8, then ID := PD defines an effectivity function, the Dirac effectivityfunction. Consequently we have W ∈ ID(s) iff δs ∈W for W ∈ ℘℘℘(S). This is akin to assigningeach element of a set the ultra filter based on it. ,

The Dirac effectivity function will be useful for characterizing the effect of the empty gameε, it will also help in modelling the effects of the test games associated with formula ϕ, seeSection 4.9.4.

We will use the construction indicated through (4.1) for generating a model from a stochas-tic Kripke model, indicating that the models considered here are more general that Kripkemodels. The converse construction is of course of interest as well: Given a model, can we de-termine whether or not it comes from a Kripke model? This boils down to the question underwhich conditions a stochastic effectivity function is generated through a stochastic relation.We will deal with this problem now. The tools for investigating the converse to Lemma 4.1.16come from the investigation of deduction systems for probabilistic logics.

Characteristic Relations

In fact, we are given a set of portfolios and want to know under which conditions this set isgenerated from a single subprobability. The situation is roughly similar to the one observedwith deduction systems, where a set of formulas is given, and one wants to know whetherthis set can be constructed as valid under a suitable model. Because of the similarity, wemay take some inspiration from the work on deduction systems, and we adapt here theapproach proposed by R. Goldblatt [Gol10]. Goldblatt works with sets of formulas whilewe are interested foremost in families of sets; this permits a technically somewhat lighterapproach in the present scenario.

Page 338: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

326 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

We first have a look at a relation R ⊆ [0, 1] × A which models bounding probabilities frombelow. Intuitively, 〈r,A〉 ∈ R is intended to characterize a member of the set βββA(A,≥ r), i.e.,a measure µ with µ(A) ≥ r.

Definition 4.1.19 R ⊆ [0, 1]×A is called a characteristic relation on S iff these conditionsare satisfied

Characteris-tic relation

¬〈r,A〉 ∈ R,A ⊆ B〈r,B〉 ∈ R

­〈r,A〉 ∈ R, r ≥ s〈s,A〉 ∈ R

®〈r,A〉 /∈ R, 〈s,B〉 /∈ R, r + s ≤ 1

〈r + s,A ∪B〉 /∈ R¯〈r,A ∪B〉 ∈ R, 〈s,A ∪ (S \B)〉 ∈ R, r + s ≤ 1

〈r + s,A〉 ∈ R

°〈r,A〉 ∈ R, r + s > 1

〈s, S \A〉 /∈ R±〈r, ∅〉 ∈ Rr = 0

²A1 ⊇ A2 ⊇ . . . ,∀n ∈ N : 〈r,An〉 ∈ R

〈r,⋂n≥1An〉 ∈ R

The notation adopted here for convenience and for conciseness is that of deduction systems.For example, condition ¬ expresses that 〈r,A〉 ∈ R and A ⊆ B together imply 〈r,B〉 ∈ R.The interpretations are as follows. The conditions ¬ and ­ make sure that bounding frombelow is monotone both in its numeric and in its set valued component. By ® and ¯ wecater for sub- and superadditivity of the characteristic relation, condition ± sees to the factthat the probability for the impossible event cannot be bounded from below but through 0,and finally ² makes sure that if the members of a decreasing sequence of sets are uniformlybounded below, then so is its intersection.

We show that each characteristic relation defines a subprobability measure; the proof followsmutatis mutandis the proof of [Gol10, Theorem 5.4].

Proposition 4.1.20 Let R ⊆ [0, 1] × A be a characteristic relation on S, and define forA ∈ A

µR(A) := supr ∈ [0, 1] | 〈r,A〉 ∈ R.Then µR is a subprobability measure on A.

Proof 1. ± implies that µR(∅) = 0, and µR is monotone because of ¬. It is also clear thatµR(S) ≤ 1. We obtain from ­ that 〈s,A〉 /∈ R, whenever s ≥ r with 〈r,A〉 /∈ R.

2. Let A1, A2 ∈ A be arbitrary. Then

µR(A1 ∪A2) ≤ µR(A1) + µR(A2).

In fact, if µR(A1) + µR(A2) < q1 + q2 ≤ µR(A1 ∪ A2) with µR(Ai) < qi (i = 1, 2), then〈qi, Ai〉 /∈ R for i = 1, 2. Because q1 + q2 ≤ 1, we obtain from ® that 〈q1 + q2, A1 ∪ A2〉 /∈ R.By ­ this yields µR(a1 ∪A2) < q1 + q2, contradicting the assumption.

3. If A1 and A2 are disjoint, we observe first that µR(A1) + µR(A2) ≤ 1. Assume otherwisethat we can find qi ≤ µR(Ai) for i = 1, 2 with q1 + q2 > 1. Because 〈q1, A1〉 ∈ R we concludefrom ° that 〈q2, S \A2〉 /∈ R, hence 〈q2, A2〉 /∈ R by ¬, contradicting q2 ≤ µR(A2).

This implies thatµR(A1) + µR(A2) ≤ µR(A1) + µR(A2).

Page 339: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 327

Assuming this to be false, we find q1 ≤ µR(A1), q2 ≤ µR(A2) with

µR(A1 ∪A2) < q1 + q2 ≤ µR(A1) + µR(A2).

Because 〈q1, A1〉 ∈ R, we find 〈q1, (A1 ∪ A2) ∩ A1〉 ∈ R, because 〈q2, A2〉 ∈ R we see that〈q2, (A1 ∪A2)∩ (S \A1)〉 ∈ R (note that (A1 ∪A2)∩A1 = A1 and (A1 ∪A2)∩ (S \A1) = A2,since A1∩A2 = ∅). From ¯ we infer that 〈q1 +q2, A1∪A2〉 ∈ R, so that q1 +q2 ≤ µR(A1∪A2),which is a contradiction.

Thus we have shown that µR is additive.

4. From ² it is obvious that

µR(A) = infn∈N

µR(An),

whenever A =⋂n∈NAn for the decreasing sequence (An)n∈N in A. a

Thus we are in a position now to relationally characterize a subprobability on A. But wecan even say more, when taking subsets of ℘℘℘(S) into account. Note that A 7→ βββA(A,> q)is monotone for each q. So if we fix an upper closed subset Q ⊆ ℘℘℘(S), then we know thatβββA(A,≥ q) ∈ Q implies βββA(B,≥ q) ∈ Q, provided A ⊆ B. We relate Q to a characteristicrelation R on S by comparing βββA(A,≥ q) ∈ Q with 〈q, A〉 ∈ R by imposing a syntactic anda semantic condition. They will be shown to be equivalent.

Definition 4.1.21 The upper closed subset Q of ℘℘℘(S) is said to satisfy the characteristicrelation R on S (Q ` R) iff we have Q ` R

〈q, A〉 ∈ R⇔ βββA(A,≥ q) ∈ Q

for any q ∈ [0, 1] and any A ∈ A.

This is a syntactic notion: we look at R and determine from the knowledge of 〈q, A〉 ∈ Rwhether all evaluations on the measurable set A are contained in Q. Its semantic counterpartreads like this:

Definition 4.1.22 Q is said to implement µ ∈ S(S) iff

µ(A) ≥ q ⇔ βββA(A,≥ q) ∈ Q

for any q ∈ [0, 1] and any A ∈ A. We write this as Q |= µ. Q |= µ

Thus we actually evaluate µ at A and determine from this value the membership of βββA(A,≥ q)in Q.

Note that Q |= µ and Q |= µ′ implies

∀A ∈ A∀q ≥ 0 : µ(A) ≥ q ⇔ µ′(A) ≥ q.

Consequently, µ = µ′, so that the measure implemented by Q is uniquely determined.

We will show now that syntactic and semantic issues are equivalent: Q satisfies a characteristicrelation if and only if it implements the corresponding measure.

Page 340: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

328 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proposition 4.1.23 Q ` R iff Q |= µR.

Proof Q ` R ⇒ Q |= µR: Assume that Q ` R holds. It is then immediate that µR(A) ≥ riff βββA(A,≥ r) ∈ Q.

Q |= µR ⇒ Q ` R: If Q |= µR for relation R ⊆ [0, 1]×A, we show that the conditions givenin Definition 4.1.19 are satisfied.

1. Let βββA(A,≥ r) ∈ Q and A ⊆ B, thus µR(A) ≥ r, hence µR(B) ≥ r, which in turnimplies βββA(A,≥ r) ∈ Q. Hence ¬ holds. ­ is established similarly.

2. If µR(A) < r and µR(B) < s with r+ s ≤ 1, then µR(A∪B) = µR(A) +µ(B)−µR(A∩B) ≤ µR(A) + µR(B) < r + s, which implies ®.

3. If µR(A∪B) ≥ r and µR(A∪(S \B)) ≥ s, then µR(A) = µR(A∪B)+µR(A∪(S \B)) ≥r + s, hence ¯.

4. Assume µR(A) ≥ r and r+ s > 1, then µR(S \A) = µR(S)− µR(A) < p, thus ° holds.

5. If µR(∅) ≥ r, then r = 0, yielding ±.

6. Finally, if (An)n∈N is decreasing with µR(An) ≥ r for each n ∈ N, then it is plain thatµR(

⋂n∈NAn) ≥ r. This implies ².

a

This permits a complete characterization of those stochastic effectivity functions which aregenerated through stochastic relations.

Proposition 4.1.24 Let P be a stochastic effectivity frame on state space S. Then theseconditions are equivalent

1. There exists a stochastic relation K : S S such that P = PK .

2. R(s) := 〈r,A〉 | βββA(A,≥ r) ∈ P (s) defines a characteristic relation on S with P (s) `R(s) for each state s ∈ S.

Proof 1 ⇒ 2: Fix s ∈ S. Because βββA(A,≥ r) ∈ PK(s) iff K(s)(A) ≥ r, we see thatP (s) |= K(s), hence by Proposition 4.1.23 P (s) ` R(s).

2 ⇒ 1: Define K(s) := µR(s), for s ∈ S, then K(s) is a subprobability measure on A. Weshow that K : S S. Let G ⊆ S(S) be a measurable set, then G× [0, 1] ∈ A ⊗ [0, 1], hencethe measurability condition on P yields that

K−1[G]

= s ∈ S | K(s) ∈ G = s ∈ S | G ∈ P (s)

is a measurable subset of S, because

〈s, q〉 | (G× [0, 1])q ∈ P (s) = s ∈ S | G ∈ P (s) × [0, 1] ∈ A⊗ [0, 1].

This establishes measurability. a

Page 341: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 329

Morphisms and Congruences

Morphisms for stochastic effectivity functions are defined in a way very similar to the definitionof morphisms for the functor V in Example 2.3.13, or for the definition of neighborhoodframes, see Definition 2.7.33. We have, however, in comparison to these examples, to take intoconsideration again that we are dealing with the respective subprobabilities as an intermediatelayer. Hence, if we consider a measurable map f : S → T , we must use the induced mapS(f) : S(S)→ S(T ) as an intermediary. This leads to the following

Definition 4.1.25 Given stochastic effectivity functions P on S and Q on T , a measurablemap f : S → T is called a morphism of effectivity functions f : P → Q iff Q f = E(f) P ,hence iff this diagram commutes

S

P

f // T

Q

E(S)E(f)

// E(T )

Thus we have

W ∈ Q(f(s))⇔ (Sf)−1[W]∈ P (s) (4.2)

for all states s ∈ S and for all W ∈ ℘℘℘(T ).

In comparison, recall from Definition 2.6.43 that a measurable map f : S → T is a morphismof stochastic relations f : K → L for the stochastic relations K : S S and L : T T iffL f = S(f) K, hence iff this diagram commutes

S

K

f // T

L

S(S)S(f)

// S(T )

Thus

L(f(s))(B) = S(f)(K(s))(B) = K(s)(f−1[B]) (4.3)

for each state s ∈ S and each measurable set B ⊆ T .

These notions of morphisms are compatible: Each morphism for stochastic relations turnsinto a morphism for the associated effectivity function.

Proposition 4.1.26 A morphism f : K → L for stochastic relations K and L induces amorphism f : PK → PL for the associated stochastic effectivity functions.

Proof Fix a state s ∈ S. Then W ∈ PL(f(s)) iff L(f(s)) ∈ W . Because f : K → Lis a morphism, this is equivalent to S(f)(K(s)) ∈ W , hence to K(s) ∈ (Sf)−1

[W], thus

(Sf)−1[W]∈ PK(s). a

We investigate congruences for stochastic effectivity functions next. Since we have introduceda quantitative component into the argumentation, we want to deal not only with equivalent

Page 342: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

330 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

element of the set S on which the effectivity function is defined, but we have also to take theelements of [0, 1] into account. The most straightforward equivalence relation of [0, 1] is theidentity ∆ := ∆[0,1]. Given an equivalence relation ρ on S, define by ΣΣΣρ(A)

ΣΣΣρ(A) := A ∈ A | A is ρ-invariant

all ρ-invariant measurable subsets of S, i.e., all A ∈ A which are unions of ρ-equivalenceclasses.

Definition 4.1.27 Call the equivalence relation ρ on S tame iff

ΣΣΣρ×∆(A⊗ B([0, 1])) = ΣΣΣρ(A)⊗ B([0, 1])

holds.

Because ΣΣΣ∆(B([0, 1])) = B([0, 1]), we rephrase the definition that ρ is tame iff

ΣΣΣρ×∆(A⊗ B([0, 1])) = ΣΣΣρ(A)⊗ΣΣΣ∆(B([0, 1])).

Thus being tame means for an equivalence relation that is cooperates well with the identityon [0, 1]. From a structural point of view, tameness permits us to find a Borel isomorphismbetween S/ρ⊗ [0, 1] and S ⊗ [0, 1]/ρ×∆, which gives further insight into the cooperation ofρ and ∆ on S × [0, 1]. Here we go:

Lemma 4.1.28 Assume that ρ is a tame equivalence relation on S, the measurable spaces(S ⊗ [0, 1])/ρ×∆ and S/ρ⊗ [0, 1] are Borel isomorphic.

Proof 0. We define the Borel isomorphism ϑ through [〈x, q〉]ρ×∆ 7→ 〈[x]ρ , q〉, which is theOutlineobvious choice. It is not difficult to see that ϑ is a bijection, and that it is measurable. Theconverse direction is a bit more cumbersome and will be dealt with through the principle ofgood sets via the π-λ-Theorem and the defining property of tameness for ρ.

1. We show that

ϑ :

(S ⊗ [0, 1])/ρ×∆ → S/ρ⊗ [0, 1]

[〈s, t〉]ρ×∆ 7→ 〈[s]ρ , t〉

defines a measurable map. Let A× B ⊆ S/ρ⊗ [0, 1] be a measurable rectangle, η−1ρ

[A]∈ A

and B ∈ B([0, 1]), then η−1ρ×∆

[ϑ−1

[A× B

]]= η−1

ρ

[A]× B ∈ A ⊗ B([0, 1]). Hence the inverse

image of a generator of the σ-algebra on S/ρ ⊗ [0, 1] is a measurable subset of the factorspace of A ⊗ B([0, 1]) under the equivalence relation ρ × ∆, so that ϑ is measurable byLemma 4.1.4.

2. For establishing that ϑ−1 is measurable, one notes first that B ∈ΣΣΣρ(S)⊗B([0, 1]) impliesthat (ηρ × id)

[B]

is a measurable subset of S/ρ× [0, 1]. This is so because the set of all B forwhich the assertion is true is closed under complementation and countable disjoint unions, andit contains all measurable rectangles, so the assertion is established by Theorem 1.6.30.

Now take H ⊆ (S ⊗ [0, 1])/ρ×∆ measurable, then

η−1ρ×∆

[H]∈ΣΣΣρ×∆(S ⊗ [0, 1]) = ΣΣΣρ(S)⊗ B([0, 1]),

Page 343: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 331

since ρ is tame. Hence (ηρ × id)−1[ϑ[H]]∈ ΣΣΣρ(S) ⊗ B([0, 1]), so the assertion follows

from

ϑ[H]

= (ηρ × id)[(ηρ × id)−1

[ϑ[H]]].

a

Final maps (Definition 2.6.42) provide a rich source for tame relations.

Lemma 4.1.29 Let f : S → T be surjective and measurable such that f × id : S × [0, 1] →T × [0, 1] is final. Then ker (f) is tame.

Proof Because f× id is surjective and final, we infer from Lemma 2.6.44 that (f × id)−1[B⊗

B([0, 1])]

equals ΣΣΣker(f)×∆(A ⊗ B([0, 1])). But since f × id is final, f is final as well, hencef−1

[B]

= ΣΣΣker(f)(A), again by Lemma 2.6.44. Thus (f × id)−1[B ⊗ B([0, 1])

]= f−1

[B]⊗

B([0, 1]). This establishes the claim. a

Morphisms and congruences are as usual quite closely related, so we are in a position now tocharacterize congruences as those equivalence relations which are related to the structure ofthe underlying system.

Let P be a stochastic effectivity function on S. Congruences are defined as usual throughmorphisms and factorization.

Definition 4.1.30 The equivalence relation ρ on S is called a congruence for P iff thereexists an effectivity function Pρ on S/ρ which renders this diagram commutative

S

P

ηρ // S/ρ

E(S)E(ηρ)

// E(S/ρ)

Because ηρ is onto, Pρ is uniquely determined. The next proposition provides a criterion foran equivalence relation to be a congruence. It requires the equivalence relation to be tame,so that quantitative aspects are being taken care of. The factor space S/ρ will be equippedwith the final σ-algebra with respect to the factor map ηρ, on which we will have to considerthe weak-*-σ-algebra ℘℘℘(S/ρ)

Proposition 4.1.31 Let ρ be a tame equivalence relation on S. Then these statements areequivalent

1. ρ is a congruence for P .

2. Whenever s ρ s′, we have (Sηρ)−1[A]∈ P (s) iff (Sηρ)−1

[A]∈ P (s′) for every A ∈

℘℘℘(S/ρ).

The second property can be read off the diagram above, so one might ask why this propertyis singled out as characteristic. Well, we have to define a stochastic effectivity function of thefactor set, and for this to work we need t-measurability. The proof shows where the crucial Commentpoint is.

Proof 1 ⇒ 2: This follows immediately from the definition of a morphism, see (4.2).

Page 344: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

332 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

2 ⇒ 1: Define for s ∈ S

Q([s]ρ) := A ∈ ℘℘℘(S/ρ) | (Sηρ)−1[A]∈ P (s),

then Q is well defined by the assumption, and it is clear that Q([s]ρ) is an upper closedset of subsets of ℘℘℘(S/ρ) for each s ∈ S. It remains to be shown that Q is a stochasticeffectivity function, i.e., that Q is t-measurable. This is the crucial property. In fact, letCrucialH ∈ B(S(S/ρ)⊗ [0, 1]) be a test set, and let G := (S(ηρ)× id[0,1])

−1[H]

be its inverse imageunder S(ηρ)× id[0,1], then

〈t, q〉 ∈ S/ρ× [0, 1] | Hq ∈ Q(t) = (ηρ × id[0,1])[Z].

with Z := 〈s, q〉 ∈ S × [0, 1] | Gq ∈ P (s). By Lemma 4.1.28 it is enough to show that Z iscontained in ΣΣΣρ(S) ⊗ B([0, 1]). Because P is t-measurable, we infer Z ∈ B(S ⊗ [0, 1]), andbecause Z is (ρ ×∆)-invariant, we conclude that Z ∈ ΣΣΣρ×∆(S ⊗ [0, 1]), the latter σ-algebrabeing equal to ΣΣΣρ(S)⊗ B([0, 1]) by definition of tameness. a

The condition on subsets of ℘℘℘(S/ρ) imposed above asks for measurable subsets of S(S/ρ), sofactorization is done “behind the curtain” of functor S. It would be more convenient if thespace of all subprobabilities could be factored and the corresponding measurable sets gainedfrom the latter space.

The following sketch provides an alternative. Lift equivalence ρ on S to an equivalence ρ onS(S) upon setting

µ ρ µ′ iff ∀C ∈ΣΣΣρ(S) : µ(C) = µ′(C),

so measures are considered ρ-equivalent iff they coincide on the ρ-invariant measurable sets.Define the map ∂ρ through

S(S)/ρ → S(S/ρ)

∂ρ([µ]ρ) 7→ λG.S(ηρ)(µ)(G)

Then ∂ρ ηρ = S(ηρ), so that ∂ρ is measurable by finality of ηρ. If S is a Polish space,and ρ is countably generated (thus ΣΣΣρ(S) = σ(An | n ∈ N) for some sequence (An)n∈Nwith measurable An), then it can be shown through Souslin’s Theorem 4.4.8 that ∂ρ is anisomorphism. Hence it is in this case sufficient to focus on the sets η−1

ρ

[W]

with W ∈B(S(S)/ρ). But, as said above, this is a sketch in which many things have to be filledin. —

The relationship of morphisms and congruences through the kernel of the morphism is char-acterized in the following proposition. It assumes the morphism combined with the identityon [0, 1] to be final. This is a technical condition rendering the kernel of the morphism a tameequivalence relation.

Proposition 4.1.32 Given a morphism f : P → Q for the effectivity functions P and Qover the state spaces S resp. T . If f × id[0,1] : S × [0, 1]→ T × [0, 1] is final, then ker (f) is acongruence for P .

Page 345: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 333

Proof 0. We have to show that the condition in Proposition 4.1.31 is satisfied. The key ideaIdea for theproof

is that we can find for a given H0 ∈ ℘℘℘(S/ker (f)) a set H ∈ ℘℘℘(T ) which helps representingH0. But we do not take the representation through Sf into account, but factor the space firstthrough ker (f) and use the corresponding factorization f ηker(f). Using f is helpful because

the factor f transports measurable sets in a somewhat advantageous manner. The set H isthen used as a stand-in for H0, and because we know where it comes from, we exploit itsproperties.

1. We know from Lemma 4.1.29 that ker (f) is a tame equivalence relation.

2. Given H0 ∈ ℘℘℘(S/ker (f)), we claim that we can find H ∈ ℘℘℘(T ) such that H0 = S(f)−1[H],

f = f ηker(f) being the decomposition of f according Exercise 2.25. In fact, put

Z := H0 ∈ ℘℘℘(S/ker (f)) | ∃H ∈ ℘℘℘(T ) : H0 = S(f)−1[H].

Then Z is a σ-algebra, because ∅ ∈ Z and it is closed under the countable Boolean operationsas well as complementation. Take an element of the basis for the weak-*-σ-algebra, say,βββS/ker(f)(A,≥ q), with A ⊆ S/ker (f) measurable. Because f−1 is onto, there exists B ⊆ T

with A = f[B], and we know that B ∈ B(T ), because f is final. Consequently, µ(A) =

S(f)(µ)(B) for any µ ∈ S(S/ker (f)). This implies

βββS/ker(f)(A,≥ q) = S(f)−1[βββB(B,≥ q)

]∈ Z,

consequently,

℘℘℘(S/ker (f)) = σ(βββS/ker(f)(A,≥ q) | A ∈ ℘℘℘(S/ker (f)), q ≥ 0) ⊆ Z,

thus ℘℘℘(S/ker (f)) = Z.

3. Now let f(s) = f(s′), and take H0 ∈ ℘℘℘(S/ker (f)), choose H ∈ ℘℘℘(T ) according to part 1.for H0, then

η−1ker(f)

[H0

]∈ P (s) ⇔ f−1

[H]∈ P (s) ⇔ H ∈ Q(f(s)) = Q(f(s′))

⇔ f−1[H]∈ P (s′) ⇔ η−1

ker(f)

[H0

]∈ P (s′)

because f : P → Q is a morphism. a

Let us turn to the case of stochastic relations and indicate the relationship of congruencesand the effectivity functions generated through the factor relation. A congruence ρ for astochastic relation K : S S is an equivalence relation with this property: There exists a(unique) stochastic relation Kρ : S/ρ→ S/ρ such that this diagram commutes

Sηρ //

K

S/ρ

S(S)S(ηρ)

// S(S/ρ)

This is the direct translation of Definition 2.6.40 in Section 2.6.2. The diagram translatesto

Kρ([s]ρ)(B) = K(s)(η−1ρ

[B])

Page 346: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

334 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

for all s ∈ S and all B ⊆ S/ρ measurable.

We obtain from Proposition 4.1.26:

Corollary 4.1.33 A congruence ρ for a stochastic relation K : S S is also a congruencefor the associated effectivity function PK . Moreover, PKρ = (PK)ρ, so the effectivity functionassociated with the factor relation Kρ is the factor relation of PK with respect to ρ. a

It is noted that we do not require additional assumptions on the congruence for the stochasticrelation for being a congruence for the associated effectivity function. This indicates that thecondition on tameness captures the general class of effectivity functions, but that subclassesmay impose their own conditions. It indicates also that the condition of being a congruencefor a stochastic relation itself is a fairly strong one when assessed by the rules pertaining tostochastic effectivity functions.

4.1.4 The Alexandrov Topology On Spaces of Measures

Given a topological space (X, τ), the Borel sets B(X) = σ(τ) and the Baire sets Ba(X) comefor free as measurable structures: B(τ) the smallest σ-algebra on X that contains the opensets; measurability of maps with respect to the Borel sets is referred to as Borel measurability .Ba(X) is the smallest σ-algebra on X which contains the functionally closed sets; they provideyet another measurable structure on (X, τ), this time involving the continuous real valuedfunctions. Since B(X) = Ba(X) for a metric space X by Example 4.1.1, the distinctionbetween these σ-algebras vanishes, and the Borel sets as the σ-algebra generated by the opensets dominate the scene.

We will now define a topology of spaces of measures on a topological space in a similar way,and relate this topology to the weak-*-σ-algebra, for the time being in a special case. Fix aHausdorff space (X, τ); the space will be specialized as the discussion proceeds. Define forthe functionally open sets G1, . . . , Gn, the functionally close sets F1, . . . , Fn and ε > 0 forµ0 ∈M(X,Ba(X)) the sets

WG1,...,Gn,ε(µ0) := µ ∈M(X,Ba(X)) | µ(Gi) > µ0(Gi)− ε for 1 ≤ i ≤ n, |µ(X)− µ0(X)| < ε,WF1,...,Fn,ε(µ0) := µ ∈M(X,Ba(X)) | µ(Fi) < µ0(Fi) + ε for 1 ≤ i ≤ n, |µ(X)− µ0(X)| < ε

The topology which has the sets WG1,...,Gn,ε(µ0) as a basis is called the Alexandrov topologyor A-topology [Bog07, 8.10 (iv)]. The A-topology is defined in terms of Baire sets ratherA-topologythan Borel sets of (X, τ). This is so because the Baire sets provide a scenario which take thecontinuous functions on (X, τ) directly into account. This is in general not the case with theBorel sets, which are defined purely in terms of set theoretic operations. But the distinctionvanishes when we turn to metric spaces, see Example 4.1.1. Note also that we deal with finitemeasures here.

Lemma 4.1.34 The A-topology on M(X,Ba(X)) is Hausdorff.

Proof The family of functionally closed sets of X is closed under finite intersections, henceif two measure coincide on the functionally closed sets, they must coincide on the Baire setsBa(X) of X by the π-λ-Theorem 1.6.30. a

Page 347: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 335

Convergence in the A-topology is easily characterized in terms of functionally open or func-tionally closed sets. Recall that for a sequence (cn)n∈N of real numbers the statementlim supn→∞ c ≤ c is equivalent to infn∈N supk≥n ck ≤ c which in turn is equivalent to

∀ε > 0∃n ∈ N∀k ≥ n : ck < c+ ε.

Similarly for lim infn→∞ cn. This proves:

Proposition 4.1.35 Let (µn)n∈N be a sequence of measures in M(X,Ba(X)), then the fol-lowing statements are equivalent.

1. µn → µ in the A-topology.

2. lim supn→∞ µn(F ) ≤ µ(F ) for each functionally closed set F , and µn(X)→ µ(X).

3. lim infn→∞ µn(G) ≥ µ(G) for each functionally open set G, and µn(X)→ µ(X).

a

This criterion is sometimes a little impractical, since it deals with inequalities. We couldhave equality in the limit for all those sets for which the boundary has µ-measure zero, but,alas, the boundary may not be Baire measurable. So we try with an approximation — weapproximate a Baire set from within by a functionally open set (corresponding to the interior)and from the outside by a closed set (corresponding to the closure). This is discussed in somedetail now.

Given µ ∈M(X,Ba(X)), define by Rµ all those Baire sets which have a functional boundary Rµof vanishing µ-measure, formally

Rµ := E ∈ Ba(X) | G ⊆ E ⊆ F, µ(F \G) = 0, G functionally open, F functionally closed.

Hence if X is a metric space, E ∈ Rµ iff µ(∂E) = 0 for the boundary ∂E of E.

This is another criterion for convergence in the A-topology.

Corollary 4.1.36 Let (µn)n∈N be a sequence of Baire measures. Then µn → µ in the A-topology iff µn(E)→ µ(E) for all E ∈ Rµ.

Proof The condition is necessary by Proposition 4.1.35. Assume, on the other hand, thatµn(E) → µ(E) for all E ∈ Rµ, and take a functionally open set G. We find f : X → Rcontinuous such that G = x ∈ X | f(x) > 0. Fix ε > 0, then we can find c > 0 suchthat

µ(G) < µ(x ∈ X | f(x) > c) + ε,

µ(x ∈ X | f(x) > c) = µ(x ∈ X | f(x) ≥ c).

Hence E := x ∈ X | f(x) > c ∈ Rµ, since E is open and F := x ∈ X | f(x) ≥ c is closedwith µ(F \ E) = 0. So µn(E)→ µ(E), by assumption, and

lim infn→∞

µn(G) ≥ limn→∞

µn(E) = µ(E) > µ(G)− ε.

Since ε > 0 was arbitrary, we infer lim infn→∞ µn(G) ≥ µ(G). Because G was an arbitraryfunctionally open set, we infer from Proposition 4.1.35 that (µn)n∈N converges in the A-topology to µ. a

Page 348: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

336 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

The family Rµ has some interesting properties, which will be of use later on, because, as wewill show in a moment, it contains a base for the topology, if the space is completely regular.This holds whenever there are enough continuous functions to separate points from closedsets not containing them. Before we state this property, which will be helpful in the analysisof the A-topology below, we introduce µ-atoms, which are of interest for themselves (we willdefine later, in Definition 4.3.13, atoms on a strictly order theoretic basis, without referenceto measures).

Definition 4.1.37 A set A ∈ A is called an µ-atom iff µ(A) > 0, and if µ(B) ∈ 0, µ(A)for every B ∈ A with B ⊆ A.

Thus a µ-atom does not permit values other than 0 and µ(A) for its measurable subsets, sotwo different µ-atoms A and A′ are essentially disjoint, since µ(A ∩A′) = 0.

Lemma 4.1.38 For the finite measure space (X,A, µ) there exists an at most countable setAi | i ∈ I of atoms such that X \

⋃i∈I Ai is free of µ-atoms.

Proof If we do not have any atoms, we are done. Otherwise, let A1 be an arbitrary atom.This is the beginning. Proceeding inductively, assume that the atoms A1, . . . , An are alreadyselected, and let An := A ∈ A | A ⊆ X \

⋃ni=1Ai is an atom. If An = ∅, we are done.

Otherwise select the atom An+1 ∈ An with µ(An+1) ≥ 12 · supA∈An µ(A). Observe that

A1, . . . , An+1 are mutually disjoint.

Let Ai | i ∈ I be the set of atoms selected in this way, after the selection has terminated.Assume that A ⊆ X \

⋃i∈I Ai is an atom, then the index set I must be infinite, and µ(Ai) ≥

µ(A) for all i ∈ I. But since∑

i∈I µ(Ai) ≤ µ(X) < ∞, we conclude that µ(Ai) → 0,consequently, µ(A) = 0, hence A cannot be a µ-atom. a

This is a useful consequence.

Corollary 4.1.39 Let f : X → R be a continuous function. Then there are at most countablymany r ∈ R such that µ(x ∈ X | f(x) = r) > 0.

Proof Consider the image measure M(f)(µ) : B 7→ µ(f−1[B]) on B(R). If µ(x ∈ X |

f(x) = r) > 0, then r is a M(f)(µ)-atom. By Lemma 4.1.38 there are only countablymany M(f)(µ)-atoms. a

Returning to Rµ we are now in a position to take a closer look at its structure.

Proposition 4.1.40 Rµ is a Boolean algebra. If (X, τ) is completely regular, then Rµ con-tains a basis for the topology τ .

Proof It is immediate that Rµ is closed under complementation, and it is easy to see that itis closed under finite unions.

Let f : X → R be continuous, and define U(f, r) := x ∈ X | f(x) > r, then U(f, r) isopen, and ∂U(f, r) ⊆ x ∈ X | f(x) = r, thus Mf := r ∈ R | µ(∂U(f, r)) > 0 is at mostcountable, such that the sets U(f, r) ∈ Rµ, whenever r 6∈Mf .

Now let x ∈ X and G be an open neighborhood of x. Because X is completely regular (seeDefinition 3.3.17), we can find f : x→ [0, 1] continuous such that f(y) = 1 for all y 6∈ G, andf(x) = 0. Hence we can find r 6∈ Mf such that x ∈ U(f, r) ⊆ G by Corollary 4.1.39. So Rµis in fact a basis for the topology. a

Page 349: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 337

Under the conditions above, Rµ contains a base for τ , we lift this base to M(X,Ba(X)) inthe hope of obtaining a base for the A-topology. This works, as we will show now.

Corollary 4.1.41 Let X be a completely regular topological space, then the A-topology has abasis consisting of sets of the form

QA1,...,An,ε(µ) := ν ∈M(X,Ba(X)) | |µ(Ai)− ν(Ai)| < ε for i = 1, . . . , n

with ε > 0, n ∈ N and A1, . . . , An ∈ Rµ.

Proof Let WG1,...,Gn,ε(µ) with functionally open sets G1, . . . , Gn and ε > 0 be given. SelectAi ∈ Rµ functionally open with Ai ⊆ Gi and µ(Ai) > µ(Gi)− ε/2, then it is easy to see thatQA1,...,An,ε/2(µ) ⊆WG1,...,Gn,ε(µ). a

We will specialize the discussion now to metric spaces. Fix a metric space (X, d), the metricof which we assume to be bounded, otherwise we would switch to the equivalent metric〈x, y〉 7→ d(x, y)/(1−d(x, y)). Recall that the ε-neighborhood Bε of a set B ⊆ X is defined asBε := x ∈ X | d(x,B) < ε. Thus Bε is always an open set. Since the Baire and the Borel Bε

sets coincide in a metric space (see Example 4.1.1), the A-topology is defined on M(X,B(X)),and we will relate it to a metric now.

Define the Levy-Prohorov distance dP (µ, ν) of the measures µ, ν ∈M(X,B(X)) throughLevy-

Prohorovmetric dP

dP (µ, ν) := infε > 0 | ν(B) ≤ µ(Bε) + ε, µ(B) ≤ ν(Bε) + ε for all B ∈ B(X)

We note first that dP defines a metric, and that we can find a metrically exact copy of thebase space X in the space M(X,B(X)).

Lemma 4.1.42 dP is a metric on M(X,B(X)). X is isometrically isomorphic to the setδx | x ∈ X of Dirac measures.

Proof It is clear that dP (µ, ν) = dP (ν, µ). Let dP (µ, ν) = 0, then µ(F ) ≤ ν(F 1/n) + 1/nand ν(F ) ≤ µ(F 1/n) + 1/n for each closed set F ⊆ X, hence ν(F ) = µ(F ) (note thatF 1 ⊇ F 1/2 ⊇ F 1/3 ⊇ . . . and F =

⋂n∈N F

1/n). Thus µ = ν. If we have for all B ∈ B(X)that

ν(B) ≤ µ(Bε) + ε, µ(B) ≤ ν(Bε) + ε and µ(B) ≤ ρ(Bδ) + δ, ρ(B) ≤ m(Bδ) + δ,

then

ν(B) ≤ ρ(Bε+δ) + ε+ δ and ρ(B) ≤ ν(Bε+δ) + ε+ δ,

thus dP (µ, ν) ≤ dP (µ, ρ) + dP (ρ, ν). We also have dP (δx, δy) = d(x, y), from which theisometry derives. a

We will relate the metric topology to the A-topology now. Without additional assumptions,the following relationship is established:

Proposition 4.1.43 Each open set in the A-topology is also metrically open, hence A-topologyis coarser than the metric topology.

Page 350: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

338 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof LetWF1,...,Fn,ε(µ) be an open basic neighborhood of µ in the A-topology with F1, . . . , Fnclosed. We want to find an open metric neighborhood with center µ which is contained inthis A-neighborhood.

Because (F 1/n)n∈N is a decreasing sequence with infn∈N µ(Fn) = µ(F ), whenever F is closed,we find δ > 0 such that µ(F δi ) < µ(Fi) + ε/2 for 1 ≤ i ≤ n and 0 < δ < ε/2. Thus, ifdP (µ, ν) < δ, we have for i = 1, . . . , n that ν(Fi) < µ(F δi ) + δ < µ(Fi) + ε. But this meansthat ν ∈WF1,...,Fn,ε(µ).

Thus each neighborhood in the A-topology contains in fact an open ball for the dP -metric.a

The converse of Proposition 4.1.43 can only be established under additional conditions, which,however, are met for separable metric spaces. It is a generalization of σ-continuity: while thelatter deals with sequences of sets, the concept of τ -regularity deals with the more generalnotion of directed families of open sets (recall that a family M of sets is called directed iffgiven M1,M2 ∈M there exists M ′ ∈M with M1 ∪M2 ⊆M ′).

Definition 4.1.44 A measure µ ∈M(X,B(X)) is called τ -regular iff

µ(⋃G) = sup

G∈Gµ(G)

for each directed family G of open sets.

It is clear that we restrict our attention to open sets, because the union of a directed familyof arbitrary measurable sets is not necessarily measurable. It is also clear that the condi-tion above is satisfied for countable increasing sequences of open sets, so that τ -regularitygeneralizes σ-continuity.

It turns out that finite measures on separable metric spaces are τ -regular. Roughly speaking,this is due to the fact that countably many open sets determine the family of open sets, sothat the space cannot be too large when looked at as a measure space.

Lemma 4.1.45 Let (X, d) be a separable metric space, then each µ ∈ M(X,B(X)) is τ -regular.

Proof Let G0 be a countable basis for the metric topology. If G is a directed family of opensets, we find for each G ∈ G a countable cover (Gi)i∈IG from G0 with G =

⋃i∈IG Gi and

µ(G) = supi∈IG µ(Gi). Thus

µ(⋃G) = supµ

(µ(G) | G ∈ G0, G ⊆

⋃G)

= supG∈G µ(G).

a

As a trivial consequence it is observed that µ(⋃G) = 0, where G is the family of all open sets

G with µ(G) = 0.

The important observation for our purposes is that a τ -regular measure is supported by aclosed set which in terms of µ can be chosen as being as tightly fitting as possible.

Lemma 4.1.46 Let (X, d) be a separable metric space. Given µ ∈M(X,B(X)) with µ(X) >0, there exists a smallest closed set Cµ such that µ(Cµ) = µ(X). Cµ is called the support ofµ and is denoted by supp(µ).supp(µ)

Page 351: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.1. MEASURABLE SETS AND FUNCTIONS 339

We did use the support already for discrete measures; in this case the support is just the setof points which are assigned positive mass, see Example 2.3.11.

Proof Let F be the family of all closed sets F with µ(F ) = µ(X), then X \ F | F ∈ F isa directed family of open sets, hence µ(

⋂F) = infF∈F µ(F ) = µ(X). Define supp(µ) :=

⋂F ,

then supp(µ) is closed with µ(supp(µ)) = µ(X); if F ⊆ X is a closed set with µ(F ) = µ(X),then F ∈ F , hence supp(µ) ⊆ F . a

We may characterize the support of µ also in terms of open sets; this is but a simple conse-quence of Lemma 4.1.46.

Corollary 4.1.47 Under the assumptions of Lemma 4.1.46 we have x ∈ supp(µ) iff µ(U) > 0for each open neighborhood U of x. a

After all these preparations (with some interesting vistas to the landscape of measures), weare in a position to show that the metric topology on M(X,B(X)) coincides with the A-topology for X separable metric. The following lemma will be the central statement; it isformulated and proved separately, because its proof is somewhat technical. Recall that thediameter diam(Q) of Q ⊆ X is defined on page 252 as diam(Q)

diam(Q) := supd(x1, x2) | x1, x2 ∈ Q.

Lemma 4.1.48 Every dP -ball with center µ ∈ M(X,B(X)) contains a neighborhood of µ ofthe A-topology, if (X, d) is separable metric.

Proof Fix µ ∈M(X,B(X)) and ε > 0, pick δ > 0 with 4 · δ < ε; it is no loss of generality toassume that µ(X) = 1. Because X is separable metric, the support S := supp(µ) is defined byLemma 4.1.46. Because S is closed, we can cover S with a countable number (Vn)n∈N of opensets the diameter of which is less that δ and µ(∂Vn) = 0 by Proposition 4.1.40. Define

A1 := V1,

An :=n⋃i=1

Vi \n−1⋃j=1

Vj ,

then (An)n∈N is a mutually disjoint family of sets which cover S, and for which µ(∂An) = 0holds for all n ∈ N. We can find an index k such that µ(

⋃ki=1) > 1− δ. Let T1, . . . , T` be all

sets which are a union of some of the sets A1, . . . , Ak, then

W := WT1,...,T`,ε(µ)

is a neighborhood of µ in the A-topology by Corollary 4.1.41. We claim that dP (µ, ν) < ε forall ν ∈W . In fact, let B ∈ B(X) be arbitrary, and put

A :=⋃Ai | 1 ≤ i ≤ k,A ∩B 6= ∅,

then A is among the T s just constructed, and B ∩ S ⊆ A ∪⋃∞i=k+1Ai. Moreover, we know

that A ⊆ Bδ, because each Ai has a diameter less that δ. This yields

µ(B) = µ(B ∩ S) ≤ µ(A) + δ < ν(A) + 2 · δ ≤ ν(Bδ) + 2 · δ.

Page 352: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

340 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

On the other hand, we have

ν(B) = ν(B ∩ S) + ν(B ∩ (X \ S)) ≤ ν(A ∩⋃∞i=k+1Ai) + 3 · δ

≤ ν(A) + 3 · δ ≤ µ(A) + 3 · δ≤ µ(Bδ) + 4 · δ.

Hence dP (µ, ν) < 4 · δ < ε. Thus W is contained in the open ball centered at µ with radiussmaller ε. a

We have established

Theorem 4.1.49 The A-topology on M(X,B(X)) is metrizable by the Levy-Skohorod metricdP , provided (X, d) is a separable metric space. a

We will see later that dP is not the only metric for this topology, and that the correspond-ing metric space has interesting and helpful properties. Some of these properties are bestderived through an integral representation, for which a careful study of real-valued functionsis required. This is what we are going to investigate in Section 4.2. But before doing this,we have a brief and tentative look at the relation between the Borel sets for A-topology andweak-*-σ-algebra.

Lemma 4.1.50 Let X be a metric space, then the weak-*-σ-algebra is contained in the Borelsets of the A-topology. If the A-topology has a countable basis, both σ-algebras are equal.

Proof Denote by C the Borel sets of the A-topology on M(X,B(X)).

Since X is metric, the Baire sets and the Borel sets coincide. For each closed set F , theevaluation map evF : µ 7→ µ(F ) is upper semi-continuous by Proposition 4.1.35, so that theset

G := A ∈ B(X) | evA is C −measurable

contains all closed sets. Because G is closed under complementation and countable dis-joint unions, we conclude that G contains B(X). Hence ℘℘℘(B(X)) ⊆ C by minimality of℘℘℘(B(X)).

2. Assume that the A-topology has a countable basis, then each open set is represented as acountable union of sets of the formWG1,...,Gn,ε(µ0) withG1, . . . , Gn open. ButWG1,...,Gn,ε(µ0) ∈℘℘℘(B(X)), so that each open set is a member of ℘℘℘(B(X)). This implies the other inclusion.a

We will investigate the A-topology further in Section 4.10 and turn to real-valued functionsnow.

4.2 Real-Valued Functions

We discuss the set of all measurable and bounded functions into the real line now. We showfirst that the set of all these functions is closed under the usual algebraic operations, so thatit is a vector space, and that it is also closed under finite infima and suprema, rendering ita distributive lattice; in fact, algebraic operations and order are compatible. Then we showthat the measurable step functions are dense with respect to pointwise convergence. This

Page 353: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.2. REAL-VALUED FUNCTIONS 341

is an important observation, which will help us later on to transfer relevant properties fromindicator functions (a.k.a. measurable sets) to general measurable functions. This preparesthe stage for discussing convergence of functions in the presence of a measure; we will dealwith convergence almost everywhere, which neglects a set of measure zero for the purposes ofconvergence, and convergence in measure, which is defined in terms of a pseudo metric, butsurprisingly turns out to be related to convergence almost everywhere through subsequencesof subsequences (this sounds a bit mysterious, so carry on).

Lemma 4.2.1 Let f, g : X → R be A-B(R)-measurable functions for the measurable space(X,A). Then f ∧ g, f ∨ g and α · f + β · g are A-B(R)-measurable for α, β ∈ R.

Proof If f is measurable, α · f is. This follows immediately from Lemma 4.1.4. From

x ∈ X | f(x) + g(x) < q =⋃

r1,r2∈Q,r1+r2≤q

(x | f(x) < r1 ∩ x | g(x) < r2

)we conclude that the sum of measurable functions is measurable again. Since

x ∈ X | (f ∧ g)(x) < q = x | f(x) < q ∪ x | g(x) < qx ∈ X | (f ∨ g)(x) < q = x | f(x) < q ∩ x | g(x) < q,

we see that both f ∧ g and f ∨ g are measurable. a

Corollary 4.2.2 If f : X → R is A-B(R)-measurable, so is |f |.

Proof Write |f | = f+ − f− with f+ := f ∨ 0 and f− := (−f) ∨ 0. a

The consequence is that for a measurable space (X,A) the set F(X,A)

F(X,A) := f : N → R | f is A− B(R) measurable

is both a vector space and a distributive lattice; in fact, it is what will be called a vectorlattice later, see Definition 4.8.10 on page 399. Assume that (fn)n∈N ⊆ F(X,A) is a sequenceof bounded measurable functions such that f : x 7→ lim infn→∞ fn(x) is a bounded function,then f ∈ F(X,A). This is so because

x ∈ X | lim infn→∞

fn(x) ≤ q = x | supn∈N

infk≥n

fk(x) ≤ q

=⋂n∈Nx | inf

k≥nfk(x) ≤ q

=⋂n∈N

⋂`∈Nx | inf

k≥nfk(x) < q + 1/`

=⋂n∈N

⋂`∈N

⋃k≥nx | fk(x) < q + 1/`

Similarly, if x 7→ lim supn→∞ fn(x) defines a bounded function, then it is measurable aswell. Consequently, if the sequence (fn(x))n∈N converges to a bounded function f , thenf ∈ F(X,A).

Hence we have shown

Page 354: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

342 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proposition 4.2.3 Let (fn)n∈N ⊆ F(X,A) be a sequence of bounded measurable functions.Then

• If f∗(x) := lim infn→∞ fn(x) defines a bounded function, then f∗ ∈ F(X,A),

• if f∗(x) := lim supn→∞ fn(x) defines a bounded function, then f∗ ∈ F(X,A).

a

We use occasionally the representation of sets through indicator functions. Recall for A ⊆ Xits indicator function

χA(x) :=

1, if x ∈ A0, if x /∈ A.

Clearly, if A is a σ-algebra on X, then A ∈ A iff χA is a A-B(R)-measurable function. Thisis so since we have for the inverse image of an interval under χA

χ−1A

[[0, q]

]=

∅, if q < 0,

X \A, if 0 ≤ q < 1,

X, if q ≥ 1.

A measurable step functionStep func-tion

f =n∑i=1

αi · χAi

is a linear combination of indicator functions with Ai ∈ A. Since χA ∈ F(X,A) for A ∈ A,measurable step functions are indeed measurable functions.

Proposition 4.2.4 Let (X,A) be a measurable space. Then

1. For f ∈ F(X,A) with f ≥ 0 there exists an increasing sequence (fn)n∈N of step functionsfn ∈ F(X,A) with

f(x) = supn∈N

fn(x)

for all x ∈ X.

2. For f ∈ F(X,A) there exists a sequence (fn)n∈N of step functions fn ∈ F(N,A) with

f(x) = limn→∞

fn(x)

for all x ∈ X.

Proof 1. Take f ≥ 0, and assume without loss of generality that f ≤ 1 (otherwise, if0 ≤ f ≤ m, consider f/m). Put

Ai,n := x ∈ X | i/n ≤ f(x) < (i+ 1)/n,

for n ∈ N, 0 ≤ i ≤ n, then Ai,n ∈ A, since f is measurable. Define

fn(x) :=∑

0≤i<2n

i · 2−nχAi,2n .

Page 355: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.2. REAL-VALUED FUNCTIONS 343

Then fn is a measurable step function, and fn ≤ f , moreover (fn)n∈N is increasing. This isso because given n ∈ N, x ∈ X, we can find i such that x ∈ Ai,2n = A2i,2n+1 ∪ A2i+1,2n+1 . Iff(x) < (2i+1)/2n+1, we have x ∈ A2i,2n+1 with fn(x) = fn+1(x), if, however, (2i+1)/2n+1 ≤f(x), we have fn(x) < fn+1(x).

Given ε > 0, choose n0 ∈ N with 2−n < ε for n ≥ n0. Let x ∈ X,n ≥ n0, then x ∈ Ai,2n forsome i, hence |fn(x)− f(x)| = f(x)− i2−n < 2−n < ε. Thus f = supn∈N fn.

2. Given f ∈ F(X,A), write f1 := f ∧ 0 and f2 := f ∨ 0, then f = f1 + f2 with f1 ≤ 0and f2 ≥ 0 as measurable and bounded functions. Hence f2 = supn∈N gn = limn→∞ gn and−f1 = − supn∈N hn = − limn→∞ hn for increasing sequences of step functions (gn)n∈N and(hn)n∈N. Thus f = limn→∞(gn + hn), and gn + hn is a step function for each n ∈ N. a

Given f : X → R with f ≥ 0, the set 〈x, q〉 ∈ X×R | 0 ≤ f(x) ≤ q can be visualized as thearea between the X-axis and the graph of the function. We obtain as a consequence that thisset is measurable, provided f is measurable. This gives an example of a product measurableset. To be specific

Corollary 4.2.5 Let f : X → R with f ≥ 0 be a bounded measurable function for a measur-able space (X,A), and define

C1(f) := 〈x, q〉 | 0 ≤ q 1 f(x) ⊆ X × R

for the relational operator 1 taken from ≥, <,=, 6=, >,≥. Then C1(f) ∈ A⊗ B(R).

Proof We prove the assertion for C(f) := C<(f), from which the other cases may easily bederived, e.g.,

C≤(f) =⋂k∈N〈x, q〉 | f(x) < q + 1/k =

⋂k∈N

C<(f − 1/k).

Consider these cases.

• If f = χA with A ∈ A, then C(f) = X \A× 0 ∪A× [0, 1[∈ A⊗ B(R).

• If f is represented as a step function with a finite number of mutually disjoint steps,say, f =

∑ki=1 ri · χAi with ri ≥ 0 and all Ai ∈ A, then

C(f) =

(X \

k⋃i=1

Ai

)× 0 ∪

k⋃i=1

Ai × [0, ri[∈ A⊗ B(R).

• If f is represented as a monotone limit of step function (fn)n∈N with fn ≥ 0 accordingto Proposition 4.2.4, then C(f) =

⋃n∈NC(fn), thus C(f) ∈ A⊗ B(R).

a

This is a simple first application. We look at the evaluation of a measure at a certain setand want the value to be not smaller than a given threshold. The pairs of measures andcorresponding thresholds constitute a product measurable set, to be specific:

Example 4.2.6 Given a measurable space (X,A) and the measurable set A ∈ A, the set〈µ, r〉 ∈M(X,A)×R | µ(A) 1 r is a member of℘℘℘(X,A)⊗B(R). This is so by Corollary 4.2.5,since evA is ℘℘℘(X,A)-⊗B(R)-measurable ,

Page 356: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

344 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

We obtain also measurability of the validity sets for the simple modal logic discussed above.

Example 4.2.7 Consider the simple modal logic in Example 4.1.11, interpreted through atransition kernel M : (X,A) (X,A). Given a formula ϕ, the set 〈x, r〉 |M(x)([[ϕ]]M ) ≥ ris a member of A ⊗ B(R), from which [[3qϕ]]M may be extracted through a horizontal cutat q (see page 318). Hence this observation generalizes measurability of [[·]]M , one of thecornerstones for interpreting modal logics probabilistically. ,

We will turn now to the interplay of measurable functions and measures and have a lookat different modes of convergence for sequences of measurable functions in the presence of a(finite) measure.

4.2.1 Essentially Bounded Functions

Fix for this section a finite measure space (X,A, µ). We say that a measurable property holdsµ-almost everywhere (abbreviated as µ-a.e.) iff the set on which the property does not holdµ-a.e.has µ-measure zero.

The measurable function f ∈ F(X,A) is called µ-essentially bounded iff

||f ||µ∞ := infa ∈ R | |f | ≤µ a <∞,

where f ≤µ a indicates that f ≤ a holds µ-a.e. Thus a µ-essentially bounded function mayoccasionally take arbitrary large values, but the set of these values must be negligible in termsof µ.

The set

L∞(µ) := L∞(X,A, µ) := f ∈ F(X,A) | ||f ||µ∞ <∞

of all µ-essentially bounded functions is a real vector space, and we have for ||·||µ∞ theseproperties.

Lemma 4.2.8 Let f, g ∈ F(X,A) essentially bounded, α, β ∈ R, then ||·||µ∞ is a pseudo-normon F(X,A), i.e.,

1. If ||f ||µ∞ = 0, then f =µ 0.

2. ||α · f ||µ∞ = |α| · ||f ||µ∞,

3. ||f + g||µ∞ ≤ ||f ||µ∞ + ||g||µ∞.

Proof If ||f ||µ∞ = 0, we have |f | ≤µ 1/n for all n ∈ N, so that

x ∈ X | |f(x)| 6= 0 ⊆⋃n∈Nx ∈ X | |f(x)| ≤ 1/n,

consequently, f =µ 0. The converse is trivial. The second property follows from |f | ≤µ aiff |α · f | ≤µ |α| · a, the third one from the observation that |f | ≤µ a and |g| ≤µ b implies|f + g| ≤ |f |+ |g| ≤µ a+ b. a

Page 357: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.2. REAL-VALUED FUNCTIONS 345

So ||·||µ∞ nearly a norm, but the crucial property that the norm for a vector is zero only if thevector is zero is missing. We factor L∞(X,A, µ) with respect to the equivalence relation =µ,then the set

L∞(µ) := L∞(X,A, µ) := [f ] | f ∈ L∞(X,A, µ)

of all equivalence classes [f ] of µ-essentially bounded measurable functions is a vector spaceagain. This is so because f =µ g and f ′ =µ g

′together imply f + f ′ =µ g + g′, and f =µ gimplies α · f =µ α · g for all α ∈ R. Moreover,

|| [f ] ||µ∞ := ||f ||µ∞

defines a norm on this space. For easier reading we will identify in the sequel f with its class[f ].

We obtain in this way a normed vector space, which is complete with respect to this norm,see Definition 3.6.40 on page 283.

Proposition 4.2.9 (L∞(µ), ||·||µ∞) is a Banach space.

Proof Let (fn)n∈N be a Cauchy sequence in L∞(X,A, µ), and define

N :=⋃

n1,n2∈Nx ∈ X | |fn1(x)− fn2(x)| > ||fn1 − fn2 ||µ∞,

then µ(N) = 0. Put gn := χX\N · fn, then (gn)n∈N converges uniformly with respect to thesupremum norm ||·||∞ to some element g ∈ F(X,A), hence also ||fn − g||µ∞ → 0. Clearly, gis bounded. a

This is the first instance of a vector space intimately connected with a measure space. Wewill encounter several of these spaces in Section 4.11 and discuss them in greater detail, whenintegration is at our disposal.

The convergence of a sequence of measurable functions into R in the presence of a finitemeasure is discussed now. Without a measure, we may use pointwise or uniform convergencefor modelling approximations. Recall that pointwise convergence of a sequence (fn)n∈N offunctions to a function f is given by

∀x ∈ X : limn→∞

fn(x) = f(x), (4.4)

and the stronger form of uniform convergence through

limn→∞

||fn − f ||∞ = 0,

with ||·||∞ as the supremum norm, given by

||f ||∞ := supx∈X|f(x)|.

We will weaken the first condition (4.4), so that it holds not everywhere but almost every-where, so that the set on which it does not hold will be a set of measure zero. This leadsto the notion of convergence almost everywhere, which will turn out to be quite close touniform convergence, as we will see when discussing Egorov’s Theorem. Convergence almosteverywhere will be weakened to convergence in measure, for which we will define a pseudometric. This in turn gives rise to another Banach space upon factoring.

Page 358: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

346 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

4.2.2 Convergence almost everywhere and in measure

Recall that we work in a finite measure space (X,A, µ). The sequence (fn)n∈N of measurablefunctions fn ∈ F(X,A) is said to converge almost everywhere to a function f ∈ F(X,A)(written as fn

a.e.−→ f) iff the sequence (fn(x))n∈N converges pointwise to f(x) for every xfna.e.−→ f

outside a set of measure zero. Thus we have µ(X \K) = 0, where K := x ∈ X | fn(x) →f(x). Because

K =⋂n∈N

⋃m∈N

⋂`≥mx ∈ X | |f`(x)− f(x)| < 1/n,

K is a measurable set. It is clear that fna.e.−→ f and fn

a.e.−→ f ′ implies f =µ f′.

The next lemma shows that convergence everywhere is compatible with the common algebraicoperations on F(X,A) like addition, scalar multiplication and the lattice operations. Sincethese functions can be represented as continuous function of several variables, we formulatethis closure property abstractly in terms of compositions with continuous functions.

Lemma 4.2.10 Let fi,na.e.−→ fi for 1 ≤ i ≤ k, and assume that g : Rk → R is continuous.

Then g (f1,n, . . . , fk,n)a.e.−→ g (f1, . . . , fk).

Proof Put hn := g (f1,n, . . . , fk,n). Since g is continuous, we have

x ∈ X |(hn(x)

)n∈N does not converge ⊆

k⋃j=1

x ∈ X |(fj,n(x)

)n∈N does not converge,

hence the set on the left hand side has measure zero. a

Intuitively, convergence almost everywhere means that the measure of the set⋃n≥kx ∈ X | |fn(x)− f(x)| > ε

tends to zero, as k → ∞, so we are coming closer and closer to the limit function, albeiton a set the measure of which becomes smaller and smaller. We show that this intuitiveunderstanding yields an adequate model for this kind of convergence.

Lemma 4.2.11 Let (fn)n∈N be a sequence of functions in F(X,A) and f ∈ F(X,A). Thenthe following conditions are equivalent

1. fna.e.−→ f .

2. limk→∞ µ(⋃

n≥kx ∈ X | |fn(x)− f(x)| > ε)

= 0 for every ε > 0.

Proof 0. Let us first write down what the equality in property 2. really means, then thePlanproof will be nearly straightforward.

1. Let ε > 0 be given, then there exists k ∈ N with 1/k < ε, so that

limk→∞

µ(⋃n≥kx ∈ X | |fn(x)− f(x)| > ε

) (∗)= µ

(⋂k∈N

⋃n≥kx ∈ X | |fn(x)− f(x)| > ε

)≤ µ(x ∈ X | (fn(x))n∈N does not converge).

Page 359: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.2. REAL-VALUED FUNCTIONS 347

2. Now assume that fna.e.−→ f , then the implication 1 ⇒ 2 is immediate. If, however,

fna.e.−→ f is false, then we find for each ε > 0 so that for all k ∈ N there exists n ≥ k with

µ(x ∈ X | |fn(x)− f(x)| ≥ ε) > 0. Thus property 2. cannot hold. a

Note that the statement above requires a finite measure space, because the measure of adecreasing sequence of sets is the infimum of the individual measures, used in the equationmarked (∗). This is not necessarily valid for non-finite measure space.

The characterization implies that a.e.-Cauchy sequences converge.

Corollary 4.2.12 Let (fn)n∈N be an a.e.-Cauchy sequence in F(X,A). Then (fn)n∈N con-verges.

Proof Because (fn)n∈N is an a.e.-Cauchy sequence, we have that µ(X \ Kε) = 0 for everyε > 0, where

Kε :=⋂k∈N

⋃n,m≥k

x ∈ X | |fn(x)− fm(x)| > ε.

Put

N :=⋃k∈N

K1/k,

gn := fn · χX\N ,

then (gn)n∈N is a Cauchy sequence in F(X,A) which converges pointwise to some f ∈F(X,A). Since µ(X \N) = 0, fn

a.e.−→ f follows. a

Convergence a.e. is very nearly uniform convergence, where very nearly serves to indicatethat the set on which uniform convergence is violated is arbitrarily small. To be specific, wefind for each threshold a set the complement of which has a measure smaller than this bound,on which convergence is uniform. This is what Egorov’s Theorem says.

Proposition 4.2.13 Let fna.e.−→ f for fn, f ∈ F(X,A). Given ε > 0, there exists A ∈ A

such thatEgorov’sTheorem

1. supx∈A |fn(x)− f(x)| → 0,

2. µ(X \A) < ε.

The idea of the proof is that we investigate the set of all x for which uniform convergenceis spoiled by 1/k. This set can be made arbitrarily small in terms of µ, so the countable

Plan ofattack

union of all these sets can be made as small as we want. Outside this set we have uniformconvergence. Let’s look at a more formal treatment now.

Proof Fix ε > 0, then there exists for each k ∈ N an index nk ∈ N such that µ(Bk) < ε/2k+1

withBk :=

⋃m≥nk

x ∈ X | |fm(x)− f(x)| > 1/k.

Now put A :=⋂k∈N(X \Bk), then

µ(X \A) ≤∑k∈N

µ(Bk) ≤ ε,

Page 360: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

348 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

and we have for all k ∈ N

supx∈A|fn(x)− f(x)| ≤ sup

x 6∈Bk|fn(x)− f(x)| ≤ 1/k

for n ≥ nk. Thuslimn→∞

supx∈A|fn(x)− f(x)| = 0,

as claimed. a

Convergence almost everywhere makes sure that the set on which a sequence of functionsdoes not converge has measure zero, and Egorov’s Theorem shows that this is almost uniformconvergence.

Convergence in measure for a finite measure space (X,A, µ) takes another approach: fix ε > 0,and consider the set x ∈ X | |fn(x)− f(x)| > ε. If the measure of this set (for a fixed, butarbitrary ε) tends to zero, as n → ∞, then we say that (fn)n∈N converges in measure to f ,

and write fni.m.−→ f . In order to have a closer look at this notion of convergence, we note thatfn

i.m.−→ f

it is invariant against equality almost everywhere: if fn =µ gn and f =µ g, then fni.m.−→ f

implies gni.m.−→ g, and vice versa.

We will introduce a pseudometric δ on F(X,A) first:

δ(f, g) := infε > 0 | µ(x ∈ X | |f(x)− g(x)| > ε ≤ ε

.

These are some elementary properties of δ:

Lemma 4.2.14 Let f, g, h ∈ F(X,A), then we have

1. δ(f, g) = 0 iff f =µ g,

2. δ(f, g) = δ(g, f),

3. δ(f, g) ≤ δ(f, h) + δ(h, g).

Proof If δ(f, g) = 0, but f 6=µ g, there exists k with µ(x ∈ X | |f(x)− g(x)| > 1/k) > 1/k.This is a contradiction. The other direction is trivial. Symmetry of δ is also trivial, so thetriangle inequality remains to be shown. If |f(x)− g(x)| > ε1 + ε2, then |f(x)− h(x)| > ε1 or|h(x)− g(x)| > ε2, thus

µ(x ∈ X | |f(x)− g(x)| > ε1 + ε2) ≤ µ(x ∈ X | |f(x)− h(x)| > ε1)+ µ(x ∈ X | |h(x)− g(x)| > ε2).

This implies the third property. a

This, then, is the formal definition of convergence in measure:

Definition 4.2.15 The sequence (fn)n∈N in F(X,A) is said to converge in measure to f ∈F(X,A) (written as fn

i.m.−→ f) iff δ(fn, f)→ 0, as n→∞.fni.m.−→ f

We can express convergence in measure in terms of convergence almost everywhere.

Page 361: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.2. REAL-VALUED FUNCTIONS 349

Proposition 4.2.16 (fn)n∈N converges in measure to f iff each subsequence of (fn)n∈N con-tains a subsequence (hn)n∈N with hn

a.e.−→ f .

Proof The proposal singling out a subsequence from a subsequence rather than from thesequence proper appears strange. The proof will show that we need a subsequence to “primethe pump”, i.e., to get going.

“⇒”: Assume fni.m.−→ f , and let ε > 0 be arbitrary but fixed. Let (gn)n∈N be an arbitrary

subsequence of (fn)n∈N. We find a sequence of indices n1 < n2 < . . . such that µ(x ∈X | |gnk(x)− f(x)| > ε) < 1/k2. Let hk := gnk , then we obtain

µ(⋃k≥`x ∈ X | |hk − f | > ε) ≤

∑k≥`

1

k2→ 0,

as `→∞. Hence hka.e.−→ f .

“⇐”: If δ(fn, f) 6→ 0, we can find a subsequence (fnk)k∈N and r > 0 such that µ(x ∈X | |fnk(x) − f(x)| > r) > r for all k ∈ N. Let (gn)n∈N be a subsequence of thissubsequence, then

limn→∞

µ(x ∈ X | |gn − f | > r) ≤ limn→∞

µ(⋃m≥nx ∈ X | |gm − f | > r) = 0

by Lemma 4.2.11. This is a contradiction.

a

Hence convergence almost everywhere implies convergence in measure. Just for the record:

Corollary 4.2.17 If (fn)n∈N converges almost everywhere to f , then the sequence convergesalso in measure to f . a

The converse relationship is a bit more involved. Intuitively, a sequence which converges inmeasure need not converge almost everywhere.

Example 4.2.18 Let Ai,n := [(i − 1)/n, i/n] for n ∈ N and 1 ≤ i ≤ n, and consider thesequence

(fn)n∈N := 〈χA1,1 , χA1,2 , χA2,2 , χA3,1 , χA3,2 , χA3,3 , . . .〉,

so that in generalχA1,n , . . . , χAn,n

is followed byχA1,n+1 , . . . , χAn+1,n+1 .

Let µ be Lebesgue measure λ on B([0, 1]). Given ε > 0, λ(x ∈ [0, 1] | fn(x) > ε) can be

made arbitrarily small for any given ε > 0, hence fni.m.−→ 0. On the other hand,

(fn(x)

)n∈N

fails to converge for any x ∈ [0, 1], so fna.e.−→ 0 is false. ,

We have, however, this observation, which draws atom into our game.

Proposition 4.2.19 Let (Ai)i∈I be the at most countable collection of µ-atoms according toLemma 4.1.38 such that B := X \

⋃i∈I Ai does not contain any atoms. Then these conditions

are equivalent:

Page 362: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

350 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

1. Convergence in measure implies convergence almost everywhere.

2. µ(B) = 0.

Proof 1 ⇒ 2: Assume that µ(B) > 0, then we know that for each k ∈ N there existmutually disjoint measurable subsets B1,k, . . . , Bk,k of B such that µ(Bi,k) = 1/k · µ(B)and B =

⋃1≤i≤k Bi,k. This is so because B does not contain any atoms. Define as in

Example 4.2.18

(fn)n∈N := 〈χB1,1 , χB1,2 , χB2,2 , χB3,1 , χB3,2 , χB3,3 , . . .〉,

so that in general

χB1,n , . . . , χBn,n

is followed by

χB1,n+1 , . . . , χBn+1,n+1 .

Because µ(x ∈ X | fn(x) > ε can be made arbitrarily small for any positive ε, we find

fni.m.−→ 0. If we assume that convergence in measure implies convergence almost everywhere,

we have fna.e.−→ 0, but this is false, because lim infn→∞ fn = 0 and lim supn→∞ fn = χB. Thus

we arrive at a contradiction.

2 ⇒ 1: Let (fn)n∈N be a sequence with fni.m.−→ f . Fix an atom Ai, then µ(x ∈ Ai |

|fn(x)− f(x)| > 1/k) = 0 for all n ≥ nk with nk suitably chosen; this is so because Ai is anatom, hence measurable subsets of Ai take only the values 0 and µ(Ai). Put

g := infn∈N

supn1,n2≥n

|fn1 − fn2 |,

then g(x) 6= 0 iff (fn(x))n∈N does not converge to f(x). We infer µ(x ∈ Ai | g(x) ≥ 2/k) = 0.Because the family (Ai)i∈I is mutually disjoint, we conclude that µ(x ∈ X | g(x) ≥ 2/k) = 0for all k ∈ N. But now look at this

µ(x ∈ X | lim infn→∞

fn(x) < lim supn→∞

fn(x) = µ(x ∈ X | g(x) > 0) = 0.

Consequently, fna.e.−→ f . a

Again we want to be sure that convergence in measure is preserved by the usual algebraicoperations like addition or taking the infimum, so we state as a counterpart to Lemma 4.2.10now as an easy consequence of Proposition 4.2.16.

Lemma 4.2.20 Let fi,ni.m.−→ fi for 1 ≤ i ≤ k, and assume that g : Rk → R is continuous.

Then g (f1,n, . . . , fk,n)i.m.−→ g (f1, . . . , fk).

Proof By iteratively selecting subsequences, we can find subsequences (hi,n)n∈N such that

hi,na.e.−→ fi, as n → ∞ for 1 ≤ i ≤ k. Then apply Lemma 4.2.10 and Proposition 4.2.16.

a

We are in a position now to establish that convergence in measure actually yields a Banachspace. But we have to ge careful with functions which differ on a set of measure zero, renderingthe resulting space non-Hausdorff. Since functions which are equal except on a set of measurezero may be considered to be equal, we simple factor them out, obtaining F (X,A) as theF (X,A)

Page 363: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 351

factor space F(X,A)/=µ of the space F(X,A) of all measurable functions with respect to =µ.Then this is a real vector space again, because the algebraic operations on the equivalenceclasses are well defined. Note that we have δ(f, g) = δ(f ′, g′), provided f =µ g and f ′ =µ g

′.We identify again the class [f ]=µ with f . Define

||f || := δ(f, 0)

for f ∈ F (X,A).

Proposition 4.2.21 (F (X,A), || · ||) is a Banach space.

Proof 1. It follows from Lemma 4.2.14 and the observation δ(f, 0) = 0 ⇔ f =µ 0 that || · ||is a norm, so we have to show that F (X,A) is complete with this norm.

2. Let (fn)n∈N be a Cauchy sequence in F (X,A), then we can find a strictly increasingsequence (`n)n∈N of integers such that δ(f`n , f`n+1) ≤ 1/n2, hence

µ(x ∈ X | |f`n(x)− f`n+1(x)| > 1/n2) ≤ 1/n2.

Let ε > 0 be given, then there exists r ∈ N with∑

n≥r 1/n2 < ε, hence we have⋂n∈N

⋃m,k≥n

x ∈ X | |f`m(x)− f`k(x)| > ε ⊆⋃n≥kx ∈ X | |f`n(x)− f`n+1(x)| < 1/n2,

if k ≥ r. Thus

µ(⋂n∈N

⋃m,k≥n

x ∈ X | |f`m(x)− f`k(x)| > ε) ≤∑n≥k

1/n2 → 0,

as k → ∞. Hence (f`n)n∈N is an a.e. Cauchy sequence which converges a.e. to some

f ∈ F (X,A), which by Proposition 4.2.16 implies that fni.m.−→ f . a

A consequence of (F (X,A), ||·||) being a Banach space is that F(X,A) is complete withrespect to convergence in measure for any finite measure µ on A. Thus for any sequence(fn)n∈N of functions such that for any given ε > 0 there exists n0 such that µ(x ∈ X ||fn(x)− fm(x)| > ε) < ε for all n,m ≥ n0 we can find f ∈ F(X,A) such that fn

i.m.−→ f withrespect to µ.

We will deal with measurable real valued functions again and in greater detail in Section 4.11;then we will have integration as a powerful tool at our disposal, and we will know more aboutHilbert spaces.

Now we turn to the study of σ-algebras and focus on those which have a countable set astheir generator.

4.3 Countably Generated σ-Algebras

Fix a measurable space (X,A). The σ-algebra A is said to be countably generated iff thereexists countable A0 such that A = σ(A0).

Page 364: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

352 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 4.3.1 Let (X, τ) be a topological space with a countable basis. Then B(X) iscountably generated. In fact, if τ0 is the countable basis for τ , then each open set G canbe written as G =

⋃n∈NGn with (Gn)n∈N ⊆ τ0, thus each open set is an element of σ(τ0),

consequently, B(X) = σ(τ0). ,

The observation in Example 4.3.1 implies that the Borel sets for a separable metric space, inparticular for the Polish spaces soon to be introduced, is countably generated.

Having a countable dense subset for a metric space, we can use the corresponding base for afairly helpful characterization of the Borel sets. The next Lemma says that the Borel sets arein this case generated by a countable collection of open balls.

Lemma 4.3.2 Let X be a separable metric space with metric d. B(x, r) is the open ball withradius r and center x. Then

B(X) = σ(B(x, r) | r > 0 rational, x ∈ D),

where D is countable and dense.

Proof Because an open ball is an open set, we infer that

σ(B(x, r) | r > 0 rational, x ∈ D) ⊆ B(X).

Conversely, let G be open. Then there exists a sequence (Bn)n∈N of open balls with rationalradii such that

⋃n∈NBn = G, accounting for the other inclusion. a

Also the characterization of Borel sets in a metric space as the closure of the open (closed)sets under countable unions and countable intersections will be occasionally helpful.

Lemma 4.3.3 The Borel sets in a metric space X are the smallest collection of sets thatcontains the open (closed) sets and that are closed under countable unions and countableintersections.

Proof The smallest collection G of sets that contains the open sets and that is closed undercountable unions and countable intersections is closed under complementation. This is so sinceeach closed set F can be written as the countable intersection

⋂n∈Nx ∈ X | d(x, F ) < 1/n

of open sets, in other words, F is a Gδ-set (see page 253). Thus B(X) ⊆ G; on the other handG ⊆ B(X) by construction. a

The property of being countably generated is, however, not hereditary for a σ-algebra — asub-σ-algebra of a countably generated σ-algebra is not necessarily countably generated. Thisis demonstrated by the following example. Incidentally, we will see in Example 4.4.28 that theintersection of two countably generated σ-algebras need not be countably generated again.This indicates that having a countable generator is a fickle property which has to be observedclosely.

Example 4.3.4 Let

C := A ⊆ R | A or R \A is countable

This σ-algebra is usually referred to the countable-cocountable σ-algebra. Clearly, C ⊆ B(R),and B(R) is countably generated by Example 4.3.1. But C is not countably generated. Assume

Page 365: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 353

that it is, so let C0 be a countable generator for C; we may assume that every element of C0

is countable. Put A :=⋃C0, then A ∈ C, since A is countable. But

D := B ⊆ R | B ⊆ A or B ⊆ R \A

is a σ-algebra, and D = σ(C0). On the other hand there exists a ∈ R with a 6∈ A, thusA ∪ a ∈ C but A ∪ a 6∈ D, a contradiction. ,

Although the entire σ-algebra may not be countably generated, we find for each element of aσ-algebra a countable generator:

Lemma 4.3.5 Let A be a σ-algebra on a set X which is generated by family G of subsets.Then we can find for each A ∈ A a countable subset G0 ⊆ G such that A ∈ σ(G0).

Proof Let D be the set of all A ∈ A for which the assertion is true, then D is closed undercomplements, and G ⊆ A. Moreover, D is closed under countable unions, since the union of acountable family of countable sets is countable again. Hence D is a σ-algebra which containsG, hence it contains A = σ(G). a

This has a fairly interesting and somewhat unexpected consequence, which will be of use lateron. Recall from page 318 or from Example 2.2.4 that A⊗B is the smallest σ-algebra on X×Ywhich contains for measurable spaces (X,A) and (Y,B) all measurable rectangles A×B withA ∈ A and B ∈ B. In particular, P (X) ⊗ P (X) is generated by A × B | A,B ⊆ X. Onemay be tempted to assume that this σ-algebra is the same as P (X ×X), but this is notalways the case, because we have

Proposition 4.3.6 Denote by ∆X the diagonal 〈x, x〉 | x ∈ X for a set X. Then ∆X ∈P (X)⊗ P (X) implies that the cardinality of X does not exceed that of P (N).

Proof Assume ∆X ∈ P (X) ⊗ P (X), then there exists a countable family C ⊆ P (X) suchthat ∆X ∈ σ(A × B | A,B ∈ C). The map q : x 7→ C ∈ C | x ∈ C from X to P (C) isinjective. In fact, suppose it is not, then there exists x 6= x′ with x ∈ C ⇔ x′ ∈ C for allC ∈ C, so we have for all C ∈ C that either x, x′ ⊆ C or x, x′ ∩ C = ∅, so that the pairs〈x, x〉 and 〈x′, x′〉 never occur alone in any A × B with A,B ∈ C. Hence ∆X cannot be amember of σ(A × B | A,B ∈ C), a contradiction. As a consequence, X cannot have moreelements that P (N). a

Now we are in a position to show that we cannot conclude from the fact for a subset S ⊆ X×Ythat S is product measurable whenever all its cuts are measurable, see Lemma 4.1.8.

Example 4.3.7 Let X be a set the cardinality of which is greater than that of P (N), andlet ∆ := 〈x, x〉 | x ∈ X ⊆ X×X be the diagonal of X. Then ∆x = x = ∆x for all x ∈ X,thus ∆x and ∆x are both members of P (X), but ∆ 6∈ P (X)⊗P (X) by Proposition 4.3.6. ,

Among the countably generated measurable spaces those are of interest which permit toseparate points, so that if x 6= x′, we can find A ∈ C with x ∈ A and x′ 6∈ A; they are calledseparable. Formally

Definition 4.3.8 The σ-algebra A is called separable iff it is countably generated, and if forany two different elements of X there exists a measurable set A ∈ A which contains exactlyone of them. The measurable space (X,A) is called separable iff its σ-algebra A is separable.

The argumentation from Proposition 4.3.6 yields

Page 366: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

354 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Corollary 4.3.9 Let A be a separable σ-algebra over the set X with A = σ(A0) for A0

countable. Then A0 separates points, and ∆X ∈ A⊗A.

Proof BecauseA separates points, we obtain from Example 4.1.5 that≡A0 = ∆X , where≡A0

is the equivalence relation defined by A0. So A0 separates points. The representation

X ×X \∆X =⋃A∈A0

A× (X \A) ∪ (X \A)×A.

now yields ∆X ∈ A⊗A. a

In fact, we can say even more.

Proposition 4.3.10 A separable measurable space (X,A) is isomorphic to (X,B(X)) withthe Borel sets coming from a metric d on X such that (X, d) has is a separable metric space.

Proof 1. Let A0 = An | n ∈ N be the countable generator for A which separates points.Define

(M,M) :=∏n∈N

(0, 1,P (0, 1))

as the product of countable many copies of the discrete space (0, 1,P (0, 1)). Then Mhas as a basis the cylinder sets Zv | v ∈ 0, 1k for some k ∈ N with Zv := (tn)n∈N ∈M | 〈m1, . . . ,mk〉 = v for v ∈ 0, 1k, see page 318. Define f : X → M through f(x) :=(χAn(x))n∈N, then f is injective, because A0 separates points. Put Q := f

[X], and Q :=

M∩Q, the trace of M on Q.

Now let Yv := Zv ∩ Q be an element of the generator for Q with v = 〈m1, . . . ,mk〉 , thenf−1

[Yv]

=⋂kj=1Cj with Cj := Aj , if mj = 1, and Cj := X \ Aj otherwise. Consequently,

f : X → Q is A-Q-measurable.

2. Put for x, y ∈ Xd(x, y) :=

∑n∈N

2−n ·∣∣χAn(x)− χAn(y)

∣∣,then d is a metric on X which has

G :=⋂j∈F

Bj | Bj ∈ A0 or X \Bj ∈ A0, F ⊆ N is finite

as a countable basis. In fact, let G ⊆ X be open; given x ∈ G, there exists ε > 0 such thatthe open ball Bε(x) := x′ ∈ X | d(x, x′) < ε with center x and radius ε is contained inG. Now choose k with 2−k < ε, and put v := 〈x1, . . . , xk〉, then x ∈

⋂kj=1Bj ⊆ Bε(x). This

argument shows also that A = B(X).

3. Because (X, d) has a countable basis, it is a separable metric space. The map f : X → Qis a bijection which is measurable, and f−1 is measurable as well. This is so because A ∈A | f

[A]∈ Q is a σ-algebra which contains the basis G. a

The representation is due to Mackey. It gives the representation of separable measurablespaces as subspaces of the countable product of the discrete space (0, 1,P (0, 1). Thisspace is also a compact metric space, so we may say that a separable measurable space isisomorphic to a subspace of a compact metric space. We will make use of this observationlater on.

Page 367: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 355

By the way, this innocently looking statement has some remarkable consequences for ourcontext. Just as an appetizer:

Corollary 4.3.11 Let (X,A) be a separable measurable space. If fi : Xi → X is Ai − A-measurable, where (Xi,Ai) is a measurable space (i = 1, 2), then

f−11 [A]⊗ f−1

2 [A] = (f1 × f2)−1 [A⊗A]

holds.

Proof The product σ-algebra A ⊗ A is generated by the rectangles B1 × B2 with Bi takenfrom some generator B0 for B, i = 1, 2. Since (f1 × f2)−1 [B1 ×B2] = f−1

1 [B1] × f−12 [B2] ,

we see that (f1 × f2)−1 [B ⊗ B] ⊆ f−11 [B] ⊗ f−1

2 [B] . This is true without the assumption ofseparability. Now let τ be a second countable metric topology on Y with B = B(τ) and letτ0 be a countable base for the topology. Then

βp := T1 × T2 | T1, T2 ∈ τ0

is a countable base for the product topology τ ⊗ τ, and (this is the crucial property)

B ⊗ B = B(Y × Y, τ ⊗ τ)

holds: because the projections from X × Y to X and to Y are measurable, we observeB ⊗ B ⊆ B(Y × Y, τ ⊗ τ); because βp is a countable base for the product topology τ ⊗ τ , weinfer the other inclusion.

Since for T1, T2 ∈ τ0 clearly

f−11 [T1]× f−1

2 [T2] ∈ (f1 × f2)−1 [τp] ⊆ (f1 × f2)−1 [B ⊗ B]

holds, the nontrivial inclusion is inferred from the fact that the smallest σ-algebra containingf−1

1 [T1]× f−12 [T2] | T1, T2 ∈ τ0 equals f−1

1 [B]⊗ f−12 [B] . a

Given a measurable function into a separable measurable space, we find that its kernel (definedon page 90) yields a measurable subset in the product of its domain. We will use the kernelfor many a construction, so this little observation is quite helpful.

Corollary 4.3.12 Let f : X → Y be a A-B-measurable map, where (X,A) and (Y,B) aremeasurable spaces, the latter being separable. Then the kernel ker (f) of f is a member ofA⊗A.

Proof Exercise 4.8. a

The observation, made in the proof of Proposition 4.3.6, that it may not always be possibleto separate two different elements in a measurable space through a measurable set leads thereto a contradiction. Nevertheless it leads also to an interesting notion.

Definition 4.3.13 The set A ∈ A is called an atom of A iff B ⊆ A implies B = ∅ or B = Afor all B ∈ A.

For example, each singleton set x is an atom for the σ-algebra P (X). Clearly, being anatom depends also on the σ-algebra. If A is an atom, we have alternatively B ⊆ A or B∩A = ∅for all B ∈ A; this is more radical than being a µ-atom, which merely restricts the values of

Page 368: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

356 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

µ(B) for measurable B ⊆ A to 0 or µ(A). Certainly, if A is an atom, and µ(A) > 0, then Ais a µ-atom.

For a countably generated σ-algebra, atoms are easily identified.

Proposition 4.3.14 Let A0 = An | n ∈ N be a countable generator of A, and define

Aα :=⋂n∈N

Aαnn ,

for α ∈ 0, 1N, where A0 := A,A1 := X \ A. Then Aα | α ∈ 0, 1N, Aα 6= ∅ is the set ofall atoms of A.

Proof Assume that there exist in A two different non-empty subsets B1, B2 of Aα, andtake y1 ∈ B1, y2 ∈ B2. Then y1 ≡A0 y2, but y1 6≡A y2, contradicting the observation inExample 4.1.5. Hence Aα is an atom. Let x ∈ Aα, then Aα is the equivalence class of x withrespect to the equivalence relation ≡A0 , hence with respect to A. Thus each atom is givenby some Aα. a

Incidentally, this gives another proof that the countable-cocountable σ-algebra over R is notcountably generated. Assume it is generated by An | n ∈ N, then

H :=⋂An | An is cocountable ∩

⋂R \An | An is countable

is an atom, but H is also cocountable. This is a contradiction to H being an atom.

We relate atoms to measurable maps:

Lemma 4.3.15 Let f : X → R be A-B(R)-measurable. If A ∈ A is an atom of A, then f isconstant on A.

Proof Assume that we can find x1, x2 ∈ A with f(x1) 6= f(x2), say, f(x1) < c < f(x2). Thenx ∈ A | f(x) < c and x ∈ A | f(x) > c are two non-empty disjoint measurable subsets ofA. This contradicts A being an atom. a

We will specialize now our view of measurable spaces to the Borel sets of Polish spaces andtheir more general cousins, viz., analytic sets. Before we do that, however, we show that forimportant cases the Borel sets B(X × Y ) of X × Y coincide with the product B(X)⊗ B(Y ).We know from Proposition 4.3.6 that this is not always the case.

Proposition 4.3.16 Let X and Y be topological spaces such that Y has a countable base.Then B(X × Y ) = B(X)⊗ B(Y ).

Proof 1. We show first that each open set G ⊆ X × Y is a member of B(X) ⊗ B(Y ). Thisshows that B(X × Y ) ⊆ B(X)⊗B(Y ). If Vn | n ∈ N is a countable base of Y , we can writeG =

⋃Uα × Vm for suitable open sets Uα ⊆ X and Vm ⊆ Y taken from the base. Now define

for m ∈ N fixed the set Wm :=⋃Uα | Uα×Vm ⊆ G, then Wm is open, and G =

⋃Wm×Vm.

This is a countable union of elements from B(X)⊗B(Y ), showing that each open set in X×Yis contained in B(X)⊗ B(Y ).

2. We claim that A × Y ∈ B(X × Y ) for all A ∈ B(X), and that X × B ∈ B(X × Y ) forall B ∈ B(Y ). Assume that we have established these claims, then we infer that A × B =(A× Y ) ∩ (X ×B) ∈ B(X × Y ), from which would follow

B(X)⊗ B(Y ) = σ(A×B | A ∈ B(X), B ∈ B(Y )) ⊆ B(X × Y ).

Page 369: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 357

We use the principle of good sets for proving the first assertion; the second is established inexactly the same way, so we will not bother with it. In fact, put

G := A ∈ B(X) | A× Y ∈ B(X × Y ).

Then G is a σ-algebra which contains the open sets. This is so because for an open set H ⊆ Xthe set H×Y is open in X×Y , thus a Borel set in X×Y . But then G contains the σ-algebragenerated by the open sets, hence G = B(X), and we are done. a

The proof shows that B(X) ⊗ B(Y ) ⊆ B(X × Y ) always holds, and that it is the converseinclusion which is sometimes critical. We will apply Proposition 4.3.16 for example when wehave in one factor a separable metric or even a Polish space.

4.3.1 Borel Sets in Polish and Analytic Spaces

General measurable spaces and even separable metric spaces are sometimes too general forsupporting specific structures. We deal with Polish and analytic spaces which are generalenough to support interesting applications, but have specific properties which help establishingvital properties. We remind the reader first of some basic facts and provide then some helpfultools for working with Polish spaces, and their more general cousins, analytic spaces.

An immediate consequence of Lemma 4.1.4 is that continuity implies Borel measurabil-ity.

Lemma 4.3.17 Let (X1, τ1) and (X2, τ2) be topological spaces. Then f : X1 → X2 is B(τ1)-B(τ2) measurable, provided f is τ1-τ2-continuous. a

We note for later use that the limit of a sequence of measurable functions into a metric spaceis measurable again, see Exercise 4.13.

Proposition 4.3.18 Let (X,A) be a measurable, (Y, d) a metric space, and (fn)n∈N be asequence of A-B(Y )-measurable functions fn : X → Y . Then

• the set C := x ∈ X | (fn(x))n∈N exists is measurable,

• f(x) := limn→∞ fn(x) defines a A ∩ C-B(Y )-measurable map f : C → Y .

a

Neither general topological spaces nor metric spaces offer a structure rich enough for the studyof the transition systems that we will enter into. We need to restrict the class of topologicalspaces to a particularly interesting class of spaces that are traditionally called Polish.

As far as notation goes, we will write down a topological or a metric space without itsadornment through a topology or a metric, unless this becomes really necessary.

Remember that a metric space (X, d) is called complete iff each d-Cauchy sequence has a limit.Recall also that completeness is really a property of the metric rather than the underlyingtopological space, so a metrizable space may be complete with one metric and incompletewith another one, see Example 3.5.18 . In contrast, having a countable base is a topologicalproperty which is invariant under the different metrics the topology may admit.

Page 370: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

358 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Definition 4.3.19 A Polish space X is a topological space the topology of which is metrizablethrough a complete metric, and which has a countable base, or, equivalently, a countable densesubset .

Polishspace

Familiar spaces are Polish, as these examples show.

Example 4.3.20 The real R with their usual topology, which is induced by the open inter-vals, are a Polish space. The open unit interval ]0, 1[ with the usual topology induced by theopen intervals form a Polish space.

The latter comes probably as a surprise, because ]0, 1[ is known not to be complete with theusual metric. But all we need is a dense subset, here we take of course the rationals Q∩]0, 1[,and a complete metric that generates the topology. Define

d(x, y) :=

∣∣∣∣ln x

1− x− ln

y

1− y

∣∣∣∣ ,then this is a complete metric for ]0, 1[. This is so since x 7→ ln(x/(1 − x)) is a continuousbijection from ]0, 1[ to R, and the inverse y 7→ ey/(1 + ey) is also a continuous bijection. ,

Lemma 4.3.21 Let X be a Polish space, and assume that F ⊆ X is closed, then the subspaceF is Polish as well.

Proof F is complete by Lemma 3.5.21. The topology that F inherits from X has a countablebase and is metrizable, so F has a countable dense subset, too. a

Lemma 4.3.22 Let (Xn)n∈N be a sequence of Polish spaces, then their product and theircoproduct are Polish spaces.

Proof Assume that the topology τn on Xn is metrized through metric dn, where it may beassumed that dn ≤ 1 holds (otherwise use for τn the complete metric dn(x, y)/(1 + dn(x, y))).Then (see Proposition 3.5.4)

d((xn)n∈N, (yn)n∈N) :=∑n∈N

2−ndn(xn, yn)

is a complete metric for the product topology∏n∈N τn. For the coproduct, define the complete

metric

d(x, y) :=

2, if x ∈ Xn, y ∈ Xm, n 6= m

dn(x, y), if x, y ∈ Xn.

All this is established through standard arguments. a

Example 4.3.23 The set N of natural numbers with the discrete topology is a Polish spaceon account of being the topological sum of its elements. Thus the set N∞ of all infinitesequences is a Polish space. The sets

Θα := τ ∈ N∞ | α is an initial piece of τ

for α ∈ N∗, the free monoid generated by N, constitute a base for the product topology. ,Θα

Page 371: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 359

This last example will be discussed in much greater detail later on. It permits occasionallyreducing the discussion of properties for general Polish spaces to an investigation of thecorresponding properties of N∞, the structure of the latter space being more easily accessiblethan that of a general space. We apply Example 4.3.23 directly to show that all open subsetsof a metric space X with a countable base can be represented through a single closed set inN∞ ×X.

Proposition 4.3.24 Let X be a separable metric space. Then there exists an open set U ⊆N∞ ×X and a closed set F ⊆ N∞ ×X with these properties:

a. For each open set G ⊆ X there exists t ∈ N∞ such that G = Ut.

b. For each closed set C ⊆ X there exists t ∈ N∞ such that C = Ft.

Proof 0. It is enough to establish the property for open sets; taking complements will proveit for closed ones.

1. Let (Vn)n∈N be a basis for the open sets in X with Vn 6= ∅ for all n ∈ N. Define

U := 〈t, x〉 | t ∈ N∞, x ∈⋃n∈N

Vtn,

then U ⊆ N∞ ×X is open. In fact, let 〈t, x〉 ∈ U , then there exists n ∈ N with x ∈ Vn, thus〈t, x〉 ∈ Θn × Vn ⊆ U, and Θn × Vn is open in the product.

2. Let G ⊆ X be open. Because (Vn)n∈N is a basis for the topology, there exists a sequencet ∈ N∞ with G =

⋃n∈N Vtn = Ut. a

The set U is usually called a universally open set, similar for F , which is universally closed.Universally

openThese universal sets will be used rather heavily when we discuss analytic sets.

We have seen that a closed subset of a Polish space is a Polish space in its own right; a similarargument shows that an open subset of a Polish space is Polish as well. Both observationsturn out to be special cases of the characterization of Polish subspaces through Gδ-sets.

We recall for this characterization an auxiliary statement which permits the extension of acontinuous map from a subspace to a Gδ-set containing it — just far enough to be interestingto us. This has been given in Lemma 3.5.24, to which we refer.

This technical Lemma is an important step in establishing a far reaching characterization ofsubspaces of Polish spaces that are Polish in their own right. A subset X of a Polish spaceis a Polish space in its own right iff it is a Gδ-set. We will present Kuratowski’s proof forit. It is not difficult to show that X must be a Gδ-set, using Lemma 3.5.24. This is done inLemma 4.3.25.

The tricky part, however, is the converse, and at its very center is the following idea: assumethat we have represented X =

⋂k∈NGk with each Gk open, and assume that we have a

Cauchy sequence (xn)n∈N ⊆ X with xn → x. How do we prevent x from being outside X?Well, what we will do is to set up Kuratowski’s trap, preventing the sequence to wander off.

Kuratowski’strap

The trap is a new complete and equivalent metric D, which is makes it impossible for the

Page 372: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

360 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

sequence to behave in a bad way. So if x is trapped to be an element of X, we may concludethat X is complete, and the assertion may be established.

Before we begin with the easier half, we fix a Polish space Y and a complete metric d onY .

Lemma 4.3.25 If X ⊆ Y is a Polish space, then X is a Gδ-set.

Proof X is complete, hence closed in Y . The identity idX : X → Y can be extendedcontinuously by Lemma 3.5.24 to a Gδ-set G with X ⊆ G ⊆ Xa, thus G = X, so X is aGδ-set. a

Now let X =⋂k∈NGk with Gk open for all k ∈ N. In order to prepare for Kuratowski’s trap,

we define

fk(x, x′) :=

∣∣ 1

d(x, Y \Gk)− 1

d(x′, Y \Gk)∣∣

for x, x′ ∈ X. Because Gk is open, we have x ∈ Gk iff d(x, Y \Gk) > 0, so fk is a finite andcontinuous function on X ×X. Now let

Fk(x, x′) :=

fk(x, x′)

1 + fk(x, x′),

D(x, x′) := d(x, x′) +∑k∈N

2−k · Fk(x, x′).

for x, x′ ∈ X.

Then D is a metric on X, and the metrics d and D are equivalent on X. Because d(x, x′) ≤D(x, x′), it is clear that the identity id : (X,D)→ (X, d) is continuous, so it remains to showthat id : (X, d) → (X,D) is continuous. Let x ∈ X be given, and let ε > 0, then we find` ∈ N such that

∑k>` 2−j · Fk(x, x′) < ε/3 for all x′ ∈ X. For k = 1, . . . , ` there exists δj

such that Fj(x, x′) < ε/(3 · `), whenever d(x, x′) < δj , since x 7→ d(x, Y \Gj) is positive and

continuous. Thus define δ := minε/3, δ1, . . . , δ`, then d(x, x′) < δ implies

D(x, x′) ≤ d(x, x′) +∑k=1

2−j · Fj(x, x′) +ε

3<ε

3+∑k=1

ε

3 · `+ε

3= ε.

Thus (X, d) and (X,D) have in fact the same open sets.

When establishing that (X,D) is complete, we spring Kuratowski’s trap. Let (xn)n∈N be aD-Cauchy sequence. Then this sequence is also a d-Cauchy sequence, thus we find x ∈ Y suchthat xn → x, because (Y, d) is complete. We claim that x ∈ X. In fact, if x 6∈ X, we find G`with x 6∈ G`, so that we can find for each ε > 0 some index nε ∈ N with F`(xn, xm) ≥ 1 − εfor n,m ≥ nε. But then D(xn, xm) ≥ (1 − ε)/2` for n,m ≥ nε, so that (xn)n∈N cannot be aD-Cauchy sequence. Consequently, X is complete, hence closed.

Thus we have established:

Theorem 4.3.26 Let Y be a Polish space. Then the subspace X ⊆ Y is a Polish space iff Xis a Gδ-set. a

In particular, open and closed subsets of Polish spaces are Polish spaces in their subspacetopology. Conversely, each Polish space can be represented as a Gδ-set in the Hilbert cube

Page 373: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 361

[0, 1]∞; this is the famous and very useful characterization of Polish spaces due to Alexan-drov [Kur66, III.33.VI].

Theorem 4.3.27 (Alexandrov) Let X be a separable metric space, then X is homeomorphicto a subspace of the Hilbert cube. If X is Polish, this subspace is a Gδ.

Proof

0. The idea is to take a countable and dense subset D of X, and to map each element x ∈ X Ideato the distance it has to each element of D. Since we may assume without loss of generalitythat the metric is bounded, say, by 1, this yields an embedding into the cube [0, 1]∞, whichis compact by Tihonov’s Theorem 3.2.12. This map is investigated, and Theorem 4.3.26 isapplied.

1. We may and do assume again that the metric d is bounded by 1. Let (xn)n∈N be acountable and dense subset of X, and put

f(x) := 〈d(x, x1), d(x, x2), . . .〉.

Then f is injective and continuous. Define g : f[X]→ X as f−1, then g is continuous as

well: assume that f(ym) → f(y) for some y, hence limm→∞ d(ym, xn) = d(y, xn) for eachn ∈ N. Since (xn)n∈N is dense, we find for a given ε > 0 an index n with d(y, xn) < ε;by construction we find for n an index m0 with d(ym, xn) < ε whenever m > m0. Thusd(ym, y) < 2 · ε for m > m0, so that ym → y. This demonstrates that g is continuous, thus fis a homeomorphism.

2. If X is Polish, f[X]⊆ [0, 1]∞ is Polish as well. Thus the second assertion follows from

Theorem 4.3.26. a

Compact metric spaces are Polish. It is inferred from Tihonov’s Theorem that the Hilbertcube [0, 1]∞ is compact, because the unit interval [0, 1] is compact by the Heine-Borel Theo-rem 1.5.46 . Thus Alexandrov’s Theorem 4.3.27 embeds a Polish space as a Gδ into a compactmetric space, the closure of which will be compact.

4.3.2 Manipulating Polish Topologies

We will show now that a Borel map between Polish spaces can be turned into a continuousmap. Specifically, we will show that, given a measurable map between Polish spaces, we canfind on the domain a finer Polish topology with the same Borel sets which renders the mapcontinuous. This will be established through a sequence of auxiliary statements, each of whichwill be of interest and of use in its own right.

We fix for the discussion to follow a Polish space X with topology τ . Recall that a set isclopen in a topological space iff it is both closed and open.

Lemma 4.3.28 Let F be a closed set in X. Then there exists a Polish topology τ ′ such thatτ ⊆ τ ′ (hence τ ′ is finer than τ), F is clopen in τ ′, and B(τ) = B(τ ′).

Proof Both F and X \F are Polish by Theorem 4.3.26, so the topological sum of these Polishspaces is Polish again by Lemma 4.3.22. The sum topology is the desired topology. a

Page 374: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

362 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

We will now add a sequence of certain Borel sets to the topology; this will happen step bystep, so we should know how to manipulate a sequence of Polish topologies. This is explainednow.

Lemma 4.3.29 Let (τn)n∈N be a sequence of Polish topologies τn with τ ⊆ τn.

1. The topology τ∞ generated by⋃n∈N τn is Polish.

2. If τn ⊆ B(τ), then B(τ∞) = B(τ).

Proof 1. The product∏n∈N(Xn, τn) is by Lemma 4.3.22 a Polish space, where Xn = X

for all n. Define the map f : X →∏n∈NXn through x 7→ 〈x, x, . . .〉, then f is τ∞-

∏n∈N τn-

continuous by construction. One infers that f[X]

is a closed subset of∏n∈NXn: if (xn)n∈N /∈

f[X], take xi 6= xj with i < j, and let Gi and Gj be disjoint open neighborhoods of xi resp.

xj . Then ∏`<i

X` ×Gi ×∏i<`<j

X` ×Gj ×∏`>j

X`

is an open neighborhood of (xn)n∈N that is disjoint from f[X]. By Lemma 4.3.21, the latter

set is Polish. On the other hand, f is a homeomorphism between (X, τ∞) and f[X], which

establishes part 1.

2. τn has a countable basis Ui,n | i ∈ N, with Ui,n ∈ B(τ), since τn ⊆ B(τ). This impliesthat τ∞ has Ui,n | i, n ∈ N as a countable basis, which entails B(τ∞) ⊆ B(τ). The otherinclusion is obvious, giving part 2. a

As a consequence, we may add a Borel set to a Polish topology as a clopen set withoutdestroying the property of the space to be Polish or changing the Borel sets. This is extendednow to sequences of Borel sets, as we will see now.

Proposition 4.3.30 If (Bn)n∈N is a sequence of Borel sets in X, then there exists a Polishtopology τ0 on X such that τ0 is finer than τ , τ and τ0 have the same Borel sets, and eachBn is clopen in τ0.

Proof 1. We show first that we may add just one Borel set to the topology without changingthe Borel sets. In fact, call a Borel set B ∈ B(τ) neat if there exists a Polish topology τBthat is finer than τ such that B is clopen with respect to τB, and B(τ) = B(τB). Put

H := B ∈ B(τ) | B is neat.

Then τ ⊆ H, and each closed set is a member ofH by Lemma 4.3.28. Furthermore, H is closedunder complements by construction, and closed under countable unions by Lemma 4.3.29.Thus we may now infer that H = B(τ), so that each Borel set is neat.

2. Now construct inductively Polish topologies τn that are finer than τ with B(τ) = B(τn).Start with τ0 := τ . Adding Bn+1 to the Polish topology τn according to the first partyields a finer Polish topology τn+1 with the same Borel sets. Thus the assertion follows fromLemma 4.3.29. a

We are in a position now which permits turning a Borel map into a continuous one, wheneverthe domain is Polish and the range is a second countable metric space.

Page 375: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.3. COUNTABLY GENERATED σ-ALGEBRAS 363

Proposition 4.3.31 Let Y a separable metric space with topology ϑ. If f : X → Y is aB(τ)-B(ϑ)-Borel measurable map, then there exists a Polish topology τ ′ on X such that τ ′ isfiner than τ , τ and τ ′ have the same Borel sets, and f is τ ′-ϑ continuous.

MakeBorel f

continuous

Proof The metric topology ϑ is generated from the countable basis (Hn)n∈N. Constructfrom the Borel sets f−1

[Hn

]and from τ a Polish topology τ ′ according to Proposition 4.3.30.

Because f−1[Hn

]∈ τ ′ for all n ∈ N, the inverse image of each open set from ϑ is τ ′-open,

hence f is τ ′-ϑ continuous. The construction entails τ and τ ′ having the same Borel sets.a

This property is most useful, because it permits rendering measurable maps continuous, whenthey go into a second countable metric space (thus in particular into a Polish space).

As a preparation for dealing with analytic sets, we will show now that each Borel subsetof the Polish space X is the continuous image of N∞. We begin with a reduction of theproblem space: it is sufficient to establish this property for closed sets. This is justified by thefollowing observation. The proof is sketched by indicating the technical arguments withoutgiving, however, the somewhat laborious but not particularly nutritious details.

Lemma 4.3.32 Assume that each closed set in the Polish space X is a continuous image ofN∞. Then each Borel set of X is a continuous image of N∞.

Proof (Sketch) 0. The plan of the proof is to extend the assumption that each closed Planset to be the image of N∞ to all Borel sets; this assumption is verified in the subsequentProposition 4.3.33; note that a closed set is a Polish space in its own right by Theorem 4.3.26.The extension is done by applying the principle of good sets.

1. LetG := B ∈ B(X) | B = f

[N∞]

for f : N∞ → X continuous

be the set of all good guys. Then G contains by assumption all closed sets. We show that Gis closed under countable unions and countable intersections. Then the assertion will followfrom Lemma 4.3.3.

2. Suppose Bn = fn[N∞]

for the continuous map fn, then

M := 〈t1, t2, . . .〉 | f1(t1) = f2(t2) = . . .

is a closed subset of (N∞)∞, and defining f : 〈t1, t2, . . .〉 7→ f1(t1) yields a continuous mapf : M → X with f

[M]

=⋂n∈NBn. M is homeomorphic to N∞. Thus G is closed under

countable intersections.

3. We show that G is closed also under countable unions. In fact, let Bn ∈ G such thatBn = fn

[N∞]

with fn : N∞ → X continuous. Define

f :

N∞ → X

〈n, t1, t2, . . . , 〉 7→ fn(t1, t2, . . .).

Thusf[N∞]

=⋃n∈N

fn[N∞]

=⋃n∈N

Bn.

Page 376: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

364 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Moreover, f is continuous. If G ⊆ X is open, we have f−1[G]

=⋃n∈Nn × f−1

n

[G]. Since

f−1n

[G]

is open for each n ∈ N, we conclude that f−1[G]

is open, so that f is indeed contin-uous. Thus G is closed under countable unions, and the assertion follows from Lemma 4.3.3.a

Thus it is sufficient to show that each closed subset of a Polish space is the continuous imageon N∞. And since a closed subset of a Polish space is Polish itself by Theorem 4.3.26, wemay restrict our attention to Polish spaces proper.

Proposition 4.3.33 For Polish X there exists a continuous map f : N∞ → X with f[N∞]

=X.

Proof 0. We will define recursively a sequence of closed sets indexed by elements of N∗ thatwill enable us to define a continuous map on N∞.

1. Let d be a metric that makes X complete. Represent X as⋃n∈NAn with closed sets An 6= ∅

such that the diameter diam(An) < 1 for each n ∈ N. Assume that for a word α ∈ N∗ of lengthk the closed set Aα 6= ∅ is defined, and write Aα =

⋃n∈NAαn with closed sets Aαn 6= ∅ such

that diam(Aαn) < 1/(k+1) for n ∈ N. This yields for every t = 〈n1, n2, . . .〉 ∈ N∞ a sequenceof nonempty closed sets (An1n2..nk)k∈N with diameter diam(An1n2..nk) < 1/k. Because themetric is complete,

⋂k∈NAn1n2..nk contains exactly one point, which is defined to be f(t).

This construction renders f : N∞ → X well defined.

2. Because we can find for each x ∈ X an index n′1 ∈ N with x ∈ An′1 , an index n′2 withx ∈ An′1n′2 , and so on. The map just defined is onto, so that f(〈n′1, n′2, n′3, . . .〉) = x forsome t′ := 〈n′1, n′2, n′3, . . .〉 ∈ N∞. Suppose ε > 0 is given. Since the diameters of the sets(An1n2...nk)k∈N tend to 0, we can find k0 ∈ N with diam(An′1n′2..n′k) < ε for all k > k0. Put

α′ := n′1n′2..n

′k0

, then Θα′ is an open neighborhood of t′ with f[Θα′]⊆ Bε,d(f(t′)). Thus

we find for an arbitrary open neighborhood V of f(t′) an open neighborhood U of t′ withf[U]⊆ V , equivalently, U ⊆ f−1

[V]. Thus f is continuous. a

Proposition 4.3.33 permits sometimes the transfer of arguments pertaining to Polish spaces toarguments using infinite sequences. Thus a specific space is studied instead of an abstractlygiven one, the former permitting some rather special constructions. This will be capitalizedon for the investigation of some astonishing properties of analytic sets which we will studynow.

4.4 Analytic Sets and Spaces

We will deal now systematically with analytic sets and spaces. One of the core results of thissection will be the Lusin Separation Theorem, which permits to separate two disjoint analyticsets through disjoint Borel sets, and its immediate consequence, the Souslin Theorem, whichsays that a set which is both analytic and co-analytic is Borel. These beautiful results turnout to be very helpful, e.g., in the investigation of Markov transition systems. In addition,they permit to state and prove a weak form of Kuratowski’s Isomorphism Theorem, statingthat a measurable bijection between two Polish spaces is an isomorphisms (hence its inverseis measurable as well).

But first the definition of analytic and co-analytic sets for a Polish space X.

Page 377: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.4. ANALYTIC SETS AND SPACES 365

Definition 4.4.1 An analytic set in X is the projection of a Borel subset of X × X. Thecomplement of an analytic set is called a co-analytic set.

One may wonder whether these projections are Borel sets, but we will show in a momentthat there are strictly more analytic sets than Borel sets, whenever the underlying Polishspace is uncountable. Thus analytic sets are a proper extension to Borel sets. On the otherhand, analytic sets arise fairly naturally, for example from factoring Polish spaces throughequivalence relations that are generated from a countable collection of Borel sets. We will seethis in Proposition 4.4.22. Consequently it is sometimes more adequate to consider analyticsets rather than their Borel cousins, e.g., when the equivalence of states in a transition systemis at stake.

This is a first characterization of analytic sets (using πX for the projection to X).

Proposition 4.4.2 Let X be a Polish space. Then the following statements are equivalentfor A ⊆ X:

1. A is analytic.

2. There exists a Polish space Y and a Borel set B ⊆ X × Y with A = πX[B].

3. There exists a continuous map f : N∞ → X with f[N∞]

= A.

4. A = πX[C]

for a closed subset C ⊆ X × N∞.

Proof The implication 1 ⇒ 2 is trivial, 2 ⇒ 3 follows from Proposition 4.3.33: B = g[N∞]

for some continuous map g : N∞ → X × Y , so put f := πX g. We obtain 3 ⇒ 4 fromthe observation that the graph 〈t, f(t)〉 | t ∈ N∞ of f is a closed subset of N∞ × X, thefirst projection of which equals A. Finally, 4 ⇒ 1 is obtained again from Proposition 4.3.33.a

As an immediate consequence we obtain that a Borel set is analytic. Just for the record:

Corollary 4.4.3 Each Borel set in a Polish space is analytic.

Proof Proposition 4.4.2 together with Proposition 4.3.33. a

The converse does not hold, as we will show now. This statement is not only of interest inits own right. Historically it initiated the study of analytic and co-analytic sets as a separatediscipline in set theory (what is called now descriptive set theory).

Proposition 4.4.4 Let X be an uncountable Polish space. Then there exists an analytic setthat is not Borel.

We show as a preparation for the proof of Proposition 4.4.4 that analytic sets are closed undercountable unions, intersections, direct and inverse images of Borel maps. Before doing that,we establish a simple but useful property of the graphs of measurable maps.

Lemma 4.4.5 Let (M,M) be a measurable space, f : M → Z be aM-B(Z)-measurable map,where Z is a separable metric space. The graph graph(f) of f is a member if M⊗B(Z).

Proof Exercise 4.9. a

Analytic sets have closure properties that are similar to those of Borel sets, but not quite thesame. Suspiciously missing from the list below is the closure under complementation (which

Page 378: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

366 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

will give rise to Souslin’s Theorem). This, of course is different from Borel sets.

Proposition 4.4.6 Analytic sets in a Polish space X are closed under countable unions andcountable intersections. If Y is another Polish space, with analytic sets A ⊆ X and B ⊆ Y ,and f : X → Y is a Borel map, then f

[A]⊆ Y is analytic in Y , and f−1

[B]

is analytic inX.

Proof 1. Using the characterization of analytic sets in Proposition 4.4.2, it is shown exactlyas in the proof to Lemma 4.3.32 that analytic sets are closed under countable unions and undercountable intersections. We trust that the reader will reproduce those arguments here.

2. Note first that for A ⊆ X the set Y × A is analytic in the Polish space Y × X byProposition 4.4.2. In fact, A = πX

[B]

with B ⊆ X × X Borel by the first part, henceY ×A = πY×X

[Y ×B

]with Y ×B ⊆ Y ×X×X Borel, which is analytic by the second part.

Since y ∈ f[A]

iff 〈x, y〉 ∈ graph(f) for some x ∈ A, we write

f[A]

= πY[Y ×A ∩ 〈y, x〉 | 〈x, y〉 ∈ graph(f)

].

The set 〈y, x〉 | 〈x, y〉 ∈ graph(f) is Borel in Y ×X by Lemma 4.4.5, so the assertion followsfor the direct image. The assertion is proved in exactly the same way for the inverse image.a

Again, the proof for Proposition 4.4.4 will be sketched only, delegating the very technicaldetails to Srivastava’s book [Sri98, Section 2.5]. We give, however, the argument for the casethat the space under consideration is our prototypical space N∞ through a pretty diagonalargument using an universal set. From this and the structural arguments used so far the readerhas no difficulties filling in the details under the leadership of the text mentioned.

Proof (of Proposition 4.4.4) 1. We will deal with the case X = N∞ first, and apply a diagonalargument. Let F ⊆ N∞×(N∞×N∞) be a universal closed set according to Proposition 4.3.24.Thus each closed set C ⊆ N∞ × N∞ can be represented as C = Ft for some t ∈ N∞. Takingfirst projections, we conclude that there exists a universal analytic set U ⊆ N∞ × N∞ suchthat each analytic set A ⊆ N∞ can be represented as Ut for some t ∈ N∞. In fact, we canwrite A =

(π′N∞×N∞

[F])t

with π′N∞×N∞ as the first projection of (N∞ × N∞)× N∞.

Now set

A := ζ | 〈ζ, ζ〉 ∈ U.

Because analytic sets are closed under inverse images f Borel maps by Proposition 4.4.6, A isan analytic set. Suppose that A is a Borel set, then N∞ \A is also a Borel set, hence analytic.Thus we find ξ ∈ N∞ such that N∞ \A = Uξ. But now

ξ ∈ A⇔ 〈ξ, ξ〉 ∈ U ⇔ ξ ∈ Uξ ⇔ ξ ∈ N∞ \A.

This is a contradiction.

2. The general case is reduced to the one treated above by observing that an uncountablePolish space contains a homeomorphic copy on N∞. But since we are interested mainly inshowing that analytic sets are strictly more general than Borel sets, we refrain from a verytechnical discussion of this case and refer the reader to [Sri98, Remark 2.6.5]. a

Page 379: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.4. ANALYTIC SETS AND SPACES 367

4.4.1 Souslin’s Separation Theorem

The representation of an analytic set through a continuous map on N∞ has the remarkableconsequence that we can separate two disjoint analytic sets by disjoint Borel sets (Lusin’sSeparation Theorem). This in turn implies a beautiful characterization of Borel sets due toSouslin which says that an analytic set is Borel iff its complement is analytic as well. Sincethe latter characterization will be most valuable to us, we will discuss it in greater detailnow.

We start with Lusin’s Separation Theorem.

Theorem 4.4.7 Given disjoint analytic sets A and B in a Polish space X, there exist disjointBorel sets E and F with A ⊆ E and B ⊆ F .

Proof

0. We investigate first what it means for two analytic sets to be separated by Borel sets, andThis is the

planshow that this property carries over to sequences of analytic sets. From this observation weargue by contradiction what it means that two analytic sets cannot be separated in a waywhich we want them to. Here the representation of analytic sets as continuous images on N∞is used. We construct in this manner a decreasing sequence of open sets with smaller andsmaller diameters, arriving eventually at a contradiction.

1. Call two analytic sets A and B separated by Borel sets iff there exist disjoint Borel setsE and F A ⊆ E and B ⊆ F . Observe that if two sequences (An)n∈N and (Bn)n∈N have theproperty that Am and Bn can be separated by Borel sets for all m,n ∈ N, then

⋃n∈NAn

and⋃m∈NBm can also be separated by Borel sets. In fact, if Em,n and Fm,n separate An

and Bm, then E :=⋂m∈N

⋃n∈NEm,n and F :=

⋃m∈N

⋂n∈N Fm,n separate

⋃n∈NAn and⋃

m∈NBm.

2. Now suppose that A = f[N∞]

and B = g[N∞]

cannot be separated by Borel sets,where f, g : N∞ → X are continuous and chosen according to Proposition 4.4.2. BecauseN∞ =

⋃j∈NΘj , (Θα is defined in Example 4.3.23 for α ∈ N∗), we find indices k1 and `1

such that f[Θj1]

and g[Θ`1]

cannot be separated by Borel sets. For the same reason, thereexist indices k2 and `2 such that f

[Θj1j2

]and g

[Θ`1`2

]cannot be separated by Borel sets.

Continuing in this way, we define infinite sequences κ := 〈k1, k2, . . .〉 and λ := 〈`1, `2, . . .〉such that for each n ∈ N the sets f

[Θj1j2...jn

]and g

[Θ`1`2...`n

]cannot be separated by Borel

sets.

Because f(κ) ∈ A and g(λ) ∈ B, we know f(κ) 6= g(λ), so we find ε > 0 with d(f(κ), g(λ)) <2 · ε. But we may choose n large enough so that both f

[Θj1j2...jn

]and g

[Θ`1`2...`n

]have a

diameter smaller than ε each. This is a contradiction since we now have separated these setsby open balls. a

We obtain as a consequence Souslin’s Theorem.

Theorem 4.4.8 (Souslin) Let A be an analytic set in a Polish space. If X \A is analytic,Souslin’sTheorem

then A is a Borel set.

Page 380: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

368 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof Let A and X \A be analytic, then they can be separated by disjoint Borel sets E withA ⊆ E and F with X \A ⊆ F by Lusin’s Theorem 4.4.7. Thus A = E is a Borel set. a

Souslin’s Theorem is important when one wants to show that a set is a Borel set that isgiven for example through the image of another Borel set. A typical scenario for its useis establishing for a Borel set A and a Borel map f : X → Y that both C = f

[A]

andY \ C = f

[X \ A

]hold. Then one infers from Proposition 4.4.6 that both C and Y \ C are

analytic, and from Souslin’s Theorem that A is a Borel set. This is a first simple example(see also Lemma 2.6.44):

Proposition 4.4.9 Let f : X → Y be surjective and Borel measurable, where X and Yare Polish. Assume that A ∈ ΣΣΣker(f)(B(X)), hence A ∈ B(X) is ker (f)-invariant. Thenf[A]∈ B(Y ).

Proof Put C := f[A], D := f

[X \ A

], then both C and D are analytic sets by Proposi-

tion 4.4.6. Clearly Y \ C ⊆ D. For establishing the other inclusion, let y ∈ D, hence thereexists x 6∈ A with y = f(x). But y 6∈ C, for otherwise there exists x′ ∈ A with y = f(x′),which implies x ∈ A. Thus y ∈ Y \ C, hence D ⊆ Y \ C, so that we have shown D = Y \ C.We infer f

[A]∈ B(Y ) now from Theorem 4.4.8. a

This yields the following observation as an immediate consequence. It will be extended toanalytic spaces in Proposition 4.4.13 with essentially the same argument.

Corollary 4.4.10 Let f : X → Y be measurable and bijective with X, Y Polish. Then f isa Borel isomorphism. a

We state finally Kuratowski’s Isomorphism Theorem.

Theorem 4.4.11 Any two Borel sets of the same cardinality contained in Polish spaces areBorel isomorphic. a

The proof requires a reduction to the Cantor ternary set, using the tools we have discussedhere so far. Since giving the proof would lead us fairly deep into the Wonderland of DescriptiveSet Theory, we do not give it here and refer rather to [Sri98, Theorem 3.3.13], [Kec94, Section15.B] or [KM76, p. 442].

We make the properties of analytic sets a bit more widely available by introducing analyticspaces. Roughly, an analytic space is Borel isomorphic to an analytic set in a Polish space;to be more precise:

Definition 4.4.12 A measurable space (M,M) is called an analytic space iff there existsa Polish space X and an analytic set A in X such that the measurable spaces (M,M) and(A,B(X)∩A) are Borel isomorphic. The elements of M are then called the Borel sets of M .M is denoted by B(M).

We will omit the σ-algebra from the notation of an analytic space.

Analytic spaces share many favorable properties with analytic sets, and with Polish spaces,but they are a wee bit more general: whereas an analytic set lives in a Polish space, an analyticspace does only require a Polish space to sit in the background somewhere and to be Borelisomorphic to it. This makes life considerably easier, since we are for this reason not obligedto present a Polish space directly when dealing with properties of analytic spaces. We willdemonstrate the use (and power) of the structure theorems studied above for investigating

Page 381: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.4. ANALYTIC SETS AND SPACES 369

properties of analytic spaces and their σ-algebras. The most helpful of these theorems willturn out to be Souslin’s Theorem, which can be applied for showing that a set is a Borel setby showing that it is an analytic set, and that its complement is analytic as well.

Take a Borel measurable bijection between two Polish spaces. It is not a priori clear whether ornot this map is an isomorphism. Souslin’s Theorem gives a helpful hand here as well. We willneed this property in a moment for a characterization of countably generated sub-σ-algebrasof Borel sets, but it appears to be interesting in its own right.

Proposition 4.4.13 Let X and Y be analytic spaces and f : X → Y be a bijection that isBorel measurable. Then f is a Borel isomorphism.

Proof 1. It is no loss of generality to assume that we can find Polish spaces P and Q suchthat X and Y are subsets of P resp. Q. We want to show that f

[X ∩B

]is a Borel set in Y ,

whenever B ∈ B(P ) is a Borel set. For this we need to find a Borel set G ∈ B(Q) such thatf[X ∩B

]= G ∩Q.

2. Clearly, both f[X∩B

]and f

[X\B

]are analytic sets inQ by Proposition 4.4.6, and because

f is injective, they are disjoint. Thus we can find a Borel set G ∈ B(Q) with f[X∩B

]⊆ G∩Y ,

and f[X \ B

]⊆ Q \ G ∩ Y . Because f is surjective, we have f

[X ∩ B

]∪ f[X \ B

], thus

f[X ∩B

]= G ∩ Y a

Separable measurable spaces are characterized through subsets of Polish spaces.

Lemma 4.4.14 The measurable space (M,M) is separable iff there exists a Polish space Xand a subset P ⊆ X such that the measurable spaces (M,M) and (P,B(X) ∩ P ) are Borelisomorphic.

It should be noted that we do not assume P to be a measurable subset of X.

Proof 1. Because B(X) is countably generated for a Polish space X by Lemma 4.3.2, theσ-algebra B(X) ∩ P is countably generated. Since this property is not destroyed by Borelisomorphisms, the condition above is sufficient.

2. It is also necessary by Proposition 4.3.10, because∏n∈N(0, 1,P (0, 1)

)is a Polish space

by Lemma 4.3.22. a

Thus analytic spaces are separable measurable spaces, see Definition 4.3.8.

Corollary 4.4.15 An analytic space is a separable measurable space. a

Let us have a brief look at countably generated sub-σ-algebras of an analytic space. This willhelp establishing for example that the factor space for a particularly interesting and importantclass of equivalence relations is an analytic space. The following statement, which is sometimesreferred to as the Unique Structure Theorem [Arv76, Theorem 3.3.5], says essentially that theBorel sets of an analytic space are uniquely determined by being countably generated and byseparating points. It comes as a consequence of our discussion of Borel isomorphisms.

Proposition 4.4.16 Let X be an analytic space, B0 a countably generated sub-σ-algebra ofB(X) that separates points. Then B0 = B(X).

Proof 1. (X,B0) is a separable measurable space, so there exists a Polish space P and asubset Y ⊆ P of P such that (X,B0) is Borel isomorphic to P,B(P ) ∩ Y by Lemma 4.4.14.Let f be this isomorphism, then B0 = f−1

[B(P ) ∩ Y

].

Page 382: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

370 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

2. f is a Borel map from (X,B(X)) to (Y,B(P ) ∩ Y ), thus Y is an analytic set with B(Y ) =B(X) ∩ P by Proposition 4.4.14. By Proposition 4.4.6, f is an isomorphism, hence B(X) =f−1

[B(P ) ∩ Y

]. But this establishes the assertion. a

This gives an interesting characterization of measurable spaces to be analytic, provided theyhave a separating sequence of sets. Note that the sequence of sets in the following statementis required to separate points, but we do not assume that it generates the σ-algebra for theunderlying space. The statement says that it does, actually.

Lemma 4.4.17 Let X be analytic, f : X → Y be B(X)-B-measurable and onto for a mea-surable space (Y,B), which has a sequence of sets in B that separate points. Then (Y,B) isanalytic.

Proof 1. The idea is to show that an arbitrary measurable set is contained in the σ-algebraPlan

generated by the sequence in question. Thus let (Bn)n∈N be the sequence of sets that separatespoints, take an arbitrary set N ∈ B and define the σ-algebra B0 := σ(Bn | n ∈ N ∪ N).We want to show that N ∈ σ(Bn | n ∈ N), and we show this in a roundabout way byshowing that B = B(Y ) = B0. Here is, how.

2. (Y,B0) is a separable measurable space, so by Lemma 4.4.14 we can find a Polish spaceP with Y ⊆ P and B0 as the trace of B(P ) on Y . Proposition 4.4.6 tells us that Y = f

[X]

is analytic with B0 = B(Y ), and from Proposition 4.4.16 it follows that B(Y ) = σ(Bn |n ∈ N). Thus N ∈ B(Y ), and since N ∈ B is arbitrary, we conclude B ⊆ B(Y ), thusB ⊆ B(Y ) = σ(Bn | n ∈ N) ⊆ B. a

4.4.2 Smooth Equivalence Relations

We will use Lemma 4.4.17 for demonstrating that factoring an analytic space through asmooth equivalence relation yields an analytic space again. This class of relations will bedefined now and briefly characterized here. We give a definition in terms of a determiningsequence of Borel sets and relate other characterizations of smoothness in Lemma 4.4.21.

Definition 4.4.18 Let X be an analytic space and ρ an equivalence relation on X. Then ρis called smooth iff there exists a sequence (An)n∈N of Borel sets such that

Smoothequivalencerelation

x ρ x′ ⇔ ∀n ∈ N : [x ∈ An ⇔ x′ ∈ An].

(An)n∈N is said to determine the relation ρ.

Example 4.4.19 Given an analytic space X, let M : X X be a transition kernel whichinterprets the modal logic presented in Example 4.1.11. Define for a formula ϕ and an elementof x the relation M,x |= ϕ iff x ∈ [[ϕ]]M , thus M,x |= ϕ indicates that formula ϕ is valid instate x. Define the equivalence relation ∼ on X through

x ∼ x′ ⇐⇒ ∀ϕ : [M,x |= ϕ iff M,x′ |= ϕ]

Thus x and x′ cannot be separated through a formula of the logic. Because the logichas only countably many formulas, the relation is smooth with the countable set [[ϕ]]M |ϕ is a formula as determining relation ∼. ,

Page 383: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.4. ANALYTIC SETS AND SPACES 371

We obtain immediately from the definition that a smooth equivalence relation — seen as asubset of the Cartesian product — is a Borel set:

Corollary 4.4.20 Let ρ be a smooth equivalence relation on the analytic space X, then ρ isa Borel subset of X ×X.

Proof Suppose that (An)n∈N determines ρ. Since x ρ x′ is false iff there exists n ∈ N with〈x, x′〉 ∈ (An × (X \An)) ∪ ((X \An)×An) , we obtain

(X ×X) \ ρ =⋃n∈N

(An × (X \An)) ∪ ((X \An)×An) .

This is clearly a Borel set in X ×X. a

The following characterization of smooth equivalence relations is sometimes helpful and showsthat it is not necessary to focus on sequences of sets. It indicates that the kernels of Borelmeasurable maps and smooth relations are intimately related.

Lemma 4.4.21 Let ρ be an equivalence relation on an analytic set X. Then these conditionsare equivalent:

1. ρ is smooth.

2. There exists a sequence (fn)n∈N of Borel maps fn : X → Z into an analytic space Zsuch that ρ =

⋂n∈N ker (fn) .

3. There exists a Borel map f : X → Y into an analytic space Y with ρ = ker (f) .

Proof The proof is essentially an expansion of the definition of smoothness, and the observa-Guide

throughthe proof

tion, that the kernel of a Borel map into an analytic is determined through the inverse imagesof a countable generator. Here we go:

1 ⇒ 2: Let (An)n∈N determine ρ, then

x ρ x′ ⇔ ∀n ∈ N : [x ∈ An ⇔ x′ ∈ An]

⇔ ∀n ∈ N : χAn(x) = χAn(x′).

Thus take Z = 0, 1 and fn := χAn .

2 ⇒ 3: Put Y := Z∞. This is an analytic space in the product σ-algebra, and

f :

X → Y

x 7→ (fn(x))n∈N

is Borel measurable with f(x) = f(x′) iff ∀n ∈ N : fn(x) = fn(x′).

3 ⇒ 1: Since Y is analytic, it is separable; hence the Borel sets are generated through asequence (Bn)n∈N which separates points. Put An := f−1

[Bn], then (An)n∈N is a sequence

of Borel sets, because the base sets Bn are Borel in Y , and because f is Borel measurable.We claim that (An)n∈N determines ρ:

f(x) = f(x′) ⇔ ∀n ∈ N : [f(x) ∈ Bn ⇔ f(x′) ∈ Bn]

(since (Bn)n∈N separates points in Z)

⇔ ∀n ∈ N : [x ∈ An ⇔ x′ ∈ An].

Page 384: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

372 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Thus 〈x, x′〉 ∈ ker (f) is equivalent to being determined by a sequence of measurable sets.a

Thus each smooth equivalence relation may be represented as the kernel of a Borel map, andvice versa. This is an important property which we will put to use frequently.

The interest in analytic spaces comes from the fact that factoring an analytic space througha smooth equivalence relation will result in an analytic space again. This requires first andforemost the definition of a measurable structure induced by the relation. The natural choiceis the structure imposed by the factor map. The final σ-algebra on X/ρ with respect to theBorel sets on X and the factor map ηρ will be chosen; it is denoted by B(X)/ρ. Recall thatB(X)/ρ is the largest σ-algebra C on X/ρ rendering ηρ a B(X)-C-measurable map. Then itturns out that B(X/ρ) coincides with B(X)/ρ :

Proposition 4.4.22 Let X be an analytic space, and assume that α is a smooth equivalencerelation on X. Then X/α is an analytic space.

Proof In accordance with the characterization of smooth relations in Lemma 4.4.21 weassume that α is given through a sequence (fn)n∈N of measurable maps fn : X → R. Thefactor map is measurable and onto. Put En,r := [x]α | x ∈ X, fn(x) < r, then E := En,r |n ∈ N, r ∈ Q is a countable set of element of the factor σ-algebra that separates points. Theassertion now follows without difficulties from Lemma 4.4.17. a

Let us have a look at invariant sets for an equivalence relation α. Recall that a subset A ⊆ Xis invariant for the equivalence relation α on X iff A is the union of α-equivalence classes,see page 330. Thus A ⊆ X is α-invariant iff x ∈ A and x α x′ implies x′ ∈ A. For example,if α = ker (f), then A is α-invariant iff A is what we called f -invariant on page 163, i.e., iffx ∈ A and f(x) = f(x′) imply x′ ∈ A.

Denote by

A∇ :=⋃[x]α | x ∈ A

the smallest α-invariant set containing A, then we have the representationA∇

A∇ = π2

[α ∩ (X ×A)

],

because x′ ∈ A∇ iff there exists x with 〈x′, x〉 ∈ X ×A.

An equivalence relation on X is called analytic resp. closed iff it constitutes an analytic resp.closed subset of the Cartesian product X ×X.

If X is a Polish space, we know that the smooth equivalence relation α ⊆ X ×X is a Borelsubset by Corollary 4.4.20. We want to show that, conversely, each closed equivalence relationα ⊆ X ×X is smooth. This requires the identification of a countable set which generates therelation, and for this we require the following auxiliary statement. It may be called separationthrough invariant sets.

Lemma 4.4.23 Let ρ ⊆ X × X be an analytic equivalence relation on the Polish space Xwith two disjoint analytic sets A and B. If B is ρ-invariant, then there exists a ρ-invariantBorel set C with A ⊆ C and B ∩ C = ∅.

Page 385: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.4. ANALYTIC SETS AND SPACES 373

Proof 0. This is the plan for the proof. If D is an analytic set, D∇ is; this follows from theApproachrepresentation of D∇ above, and from Proposition 4.4.6. It is fundamental for the rest of theproof. We construct a sequence (An)n∈N of invariant analytic sets, and a sequence (Bn)n∈Nof Borel sets with these properties: An ⊆ Bn ⊆ An+1, hence Bn is sandwiched betweenconsecutive elements of the first sequence, A ⊆ A1, and B ∩Bn = ∅ for all n ∈ N.

1. Define A1 := A∇, then A ⊆ A1, and A1 is ρ-invariant. Since B is ρ-invariant as well,we conclude A1 ∩ B = ∅: if x ∈ A1 ∩ B, we find x′ ∈ A with x ρ x′, hence x′ ∈ B,a contradiction. Proceeding inductively, assume that we have already chosen An and Bnwith the properties described above, then put An+1 := B∇n , then An+1 is ρ-invariant andanalytic, also An+1∩B = ∅ by the argument above. Hence we can find a Borel set Bn+1 withAn+1 ⊆ Bn+1 and Bn+1 ∩B = ∅.

2. Now put C :=⋃n∈NBn. Thus C ∈ B(X) and C ∩B = ∅, so it remains to show that C is

ρ-invariant. Let x ∈ C and x ρ x′. Since x ∈ Bn ⊆ B∇n ⊆ Bn+1, we conclude x′ ∈ Bn+1 ⊆ C,and we are done. a

We use this observation now for a closed equivalence relation. Note that the assumption onbeing analytic in the proof above was made use of in order to establish that the invariant hullA∇ of an analytic set A is analytic again.

Proposition 4.4.24 A closed equivalence relation on a Polish space is smooth.

Proof 0. Let X be a Polish space, and α ⊆ X × X be a closed equivalence relation. WeThis is

how theproofworks

have to find a sequence (An)n∈N of Borel sets which determines α. This will be constructedthrough a countable base for the topology in a somewhat roundabout manner.

1. Since X is Polish, it has a countable basis G. Because α is closed, we can write

(X ×X) \ α =⋃Un × Um | Un, Um ∈ G0, Un ∩ Um = ∅

for some countable subset G0 ⊆ G. Fix Un and Um, then also U∇n and Um are disjoint.Select the invariant Borel set An such that Un ⊆ An and An ∩ Um = ∅; this is possible byLemma 4.4.23.

2. We claim that(X ×X) \ α =

⋃n∈N

(An × (X \An).

In fact, if 〈x, x′〉 6∈ α, select Un and Um with 〈x, x′〉 ∈ Un×Um ⊆ An×(X \An). If, conversely,〈x, x′〉 ∈ An × (X \ An), then 〈x, x′〉 ∈ α implies by the invariance of An that x′ ∈ An, acontradiction. a

The Blackwell-Mackey-Theorem analyzes those Borel sets that are unions of A-atoms for asub-σ-algebra A ⊆ B(X). If A is countably generated by, say, (An)n∈N, then it is not difficultto see that an atom in A can be represented as⋂

i∈TAi ∩

⋂i∈N\T

(X \Ai)

for a suitable subset T ⊆ N, see Proposition 4.3.14. It constructs a measurable map f so thatthe set under consideration is ker (f)-invariant, which will be helpful in an the application ofthe Souslin Theorem. But let’s see.

Page 386: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

374 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Theorem 4.4.25 (Blackwell-Mackey) Let X be an analytic space and A ⊆ B(X) be acountably generated sub-σ-algebra of the Borel sets of X. If B ⊆ X is a Borel set that is aunion of atoms of A, then B ∈ A.

The idea of the proof is to show that f[B]

and f[X \ B

]are disjoint analytic sets for the

measurable map f : X → 0, 1∞, and to conclude that B = f−1[C]

for some Borel set C,Idea of theproof

which will be supplied to us through Souslin’s Theorem. Using 0, 1∞ is suggested throughthe countable base for the σ-algebra, because we can then use the indicator functions ofthe base elements. The space 0, 1∞ is compact, and has well-known properties, so it is apleasant enough choice.

Proof Let A be generated by (An)n∈N, and define

f : X → 0, 1∞

throughx 7→ 〈χA1(x), χA2(x), χA3(x), . . .〉.

Then f is A-B(0, 1∞)-measurable. We claim that f[B]

and f[X \B

]are disjoint. Suppose

not, then we find t ∈ 0, 1∞ with t = f(x) = f(x′) for some x ∈ B, x′ ∈ X \ B. Because Bis the union of atoms, we find a subset T ⊆ N with x ∈ An, provided n ∈ T , and x /∈ An,provided n /∈ T . But since f(x) = f(x′), the same holds for x′ as well, which means thatx′ ∈ B, contradicting the choice of x′.

Because f[B]

and f[X\B

]are disjoint analytic sets, we find through Souslin’s Theorem 4.4.8

a Borel set C withf[B]⊆ C, f

[X \B

]∩ C = ∅.

Thus f[B]

= C, so that f−1[f[B]]

= f−1[C]∈ A. We show that f−1

[f[B]]

= B. It isclear that B ⊆ f−1

[f[B]]

, so assume that f(b) ∈ f[B], so f(b) = f(b′) for some b′ ∈ B.

By construction, this means b ∈ B, since B is an union of atoms, hence f−1[f[B]]⊆ B.

Consequently, B = f−1[C]∈ A. a

When investigating modal logics, one wants to be able to identify the σ-algebra which isdefined by the validity sets of the formulas. This can be done through the Blackwell-Mackey-Theorem and is formulated for general smooth equivalence relations with the proviso of beingused for the logics later on.

Proposition 4.4.26 Let ρ be a smooth equivalence relation on the Polish space X, and as-sume that (An)n∈N generates ρ. Then

1. σ(An | n ∈ N) is the σ-algebra ΣΣΣρ(X) of ρ-invariant Borel sets,

2. B(X/ρ) = σ(ηρ[An]| n ∈ N.

Proof 1. The σ-algebra ΣΣΣρ(X) = ΣΣΣρ(B(X)) of ρ-invariant Borel sets is introduced onpage 330. We have to show that ΣΣΣρ(X) = σ(An | n ∈ N.

“⊇”: Each An is a ρ-invariant Borel set.

“⊆”: Let B be an ρ-invariant Borel set, then B =⋃b∈B [b]ρ. Each class [b]ρ can be written

as[b]ρ =

⋂b∈An

An ∩⋂b 6∈An

(X \An),

Page 387: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.4. ANALYTIC SETS AND SPACES 375

thus [b]ρ ∈ σ(An | n ∈ N. Moreover, it is easy to see that the classes are the atomsof this σ-algebra, since we cannot find a proper non-empty ρ-invariant subset of anequivalence class. Thus the Blackwell-Mackey Theorem 4.4.25 shows that B ∈ σ(An |n ∈ N).

2. Now let E := σ(ηρ[An]| n ∈ N, and let g : X/ρ→ P be E-P-measurable for an arbitrary

measurable space (P,P). Thus we have for all C ∈ P

g−1[C]∈ E ⇔ η−1

ρ

[g−1[C]]∈ σ(An | n ∈ N) (since An = η−1

ρ

[ηρ[An]]

)

⇔ η−1ρ

[g−1[C]]∈ I (part 1.)

⇔ η−1ρ

[g−1[C]]∈ B(X)

Thus E is the final σ-algebra with respect to ηρ, hence equals B(X/ρ). a

The following example shows that the equivalence relation generated by a σ-algebra need notreturn this σ-algebra as its invariant sets, if the given σ-algebra is not countably generated.Proposition 4.4.26 assures us that this cannot happen in the countably generated case.

Example 4.4.27 Let C be the countable-cocountable σ-algebra on R. The equivalence rela-tion ≡C generated by C according to Example 4.1.5 is the identity. Hence is it smooth. Theσ-algebra of ≡C-invariant Borel sets equals the Borel set B(R), which is a proper superset ofC. ,

The next example is a somewhat surprising application of the Blackwell-Mackey Theorem,taken from [RR81, Proposition 57]. It shows that the set of countably generated σ-algebrasis not closed under finite intersections, hence fails to be a lattice under inclusion.

Example 4.4.28 There exist two countably generated σ-algebras the intersection of whichis not countably generated. In fact, let A ⊆ [0, 1] be an analytic set which is not Borel,then B(A) is countably generated by Corollary 4.4.15. Let f : [0, 1] → A be a bijection, andconsider C := f−1

[B(A)

], which is countably generated as well. Then D := B([0, 1])∩C is a σ-

algebra which has all singletons in [0, 1] as atoms. Assume that D is countably generated, thenD = B([0, 1]) by the Blackwell-Mackey-Theorem 4.4.25. But this means that C = B([0, 1]), sothat f : [0, 1] → A is a Borel isomorphism, hence A is a Borel set in [0, 1], contradicting theassumption. ,

Among the consequences of Example 4.4.28 is the observation that the set of smooth equiv-alence relations of a Polish space does not form a lattice under inclusion, but is usuallyonly a ∩-semilattice, as the following example shows. Another consequence is mentioned inExercise 4.18.

Example 4.4.29 The intersection α1 ∩ α2 of two smooth equivalence relations α1 and α2 issmooth again: if αi is generated by the Borel sets Ai,n | n ∈ N for i = 1, 2, then α1 ∩ α2 isgenerated by the Borel sets Ai,n | i = 1, 2, n ∈ N. But now take two countably generatedσ-algebras Ai, and let αi be the equivalence relations determined by them, see Example 4.1.5.Then the σ-algebra α1 ∪ α2 is generated by A1 ∩ A2, which is by assumption not countablygenerated. Hence α1 ∪ α2 is not smooth. ,

We digress briefly and establish tameness (Definition 4.1.27) for smooth equivalence rela-tions.

Page 388: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

376 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proposition 4.4.30 If ρ is smooth, and S is Polish, then ρ is tame.

Proof 0. There are many σ-algebras around, so let us see what we have to do. We want toProof out-line

show that

ΣΣΣρ(B(X))⊗ B([0, 1]) = ΣΣΣρ×∆(B(X ⊗ [0, 1])) (4.5)

holds. Since X is Polish and ρ is smooth, X/ρ is an analytic space, so we know from Propo-sition 4.3.16 that B(X/ρ ⊗ [0, 1]) = B(X/ρ) ⊗ B([0, 1]). In order to establish the inclusionfrom left to right in equation (4.5), we show that each member of the left hand set is ρ×∆-invariant. For the converse direction, we show that each member of the set on the right handside can be represented as the inverse image under ηρ×∆ of a set in the factor space.

1. Let G ×D ⊆ X × [0, 1] be a generator of ΣΣΣρ(B(X)) ⊗ B([0, 1]) such that G ∈ ΣΣΣρ(B(X))and D ∈ B([0, 1]). Then G×D is ρ×∆-invariant. But this means

ΣΣΣρ(B(X))⊗ B([0, 1]) = σ(G×D | G ∈ΣΣΣρ(B(X)), D ∈ B([0, 1])) ⊆ΣΣΣρ×∆(B(X ⊗ [0, 1]))

2. Now let H ∈ ΣΣΣρ×∆(B(X) ⊗ [0, 1]), then we find H0 ∈ B((/X × [0, 1])ρ×∆) such thatH = η−1

ρ×∆[H0

]. But

B(X × [0, 1]/ρ×∆) = B(X/ρ⊗ [0, 1]) = B(X/ρ)⊗ B([0, 1]),

the first equality following from (X × [0, 1])/ρ×∆ = X/ρ×[0, 1], and the second from Propo-sition 4.3.16. This is so since ρ is smooth, hence X/ρ is an analytic space, and [0, 1] is a Polishspace, hence has a countable basis. On the other hand, ηρ×∆ = ηρ × id, hence we have foundthat H = (ηρ × id)−1

[H0

]with H0 ∈ B(X/ρ)⊗B([0, 1]). This establishes the other inclusion.

a

This shows that tame equivalence relations constitute a generalization of smooth ones for thecase that we do not work in a Polish environment. —

Sometimes one starts not with a topological space and its Borel sets but rather with a mea-surable space: A standard Borel space (X,A) is a measurable space such that the σ-algebraA equals B(τ) for some Polish topology τ on X. We will not dwell on this distinction.

4.5 The Souslin Operation

The collection of analytic sets is closed under Souslin’s operation A , which we will introducenow. We will also see that complete measure spaces are another important class of measurablespaces which are closed under this operation. Each measurable space can be completed withrespect to its finite measures, so that we do not even need a topology for carrying out theconstructions ahead.

Let N+ be the set of all finite and non-empty sequences of natural numbers. Denote fort = (xn)n∈N ∈ N∞ by t|k = 〈x1, . . . , xk〉 its first k elements. Given a subset C ⊆ P (X),put

A (C) := ⋃t∈N∞

⋂k∈N

At|k | Av ∈ C for all v ∈ N+

Page 389: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.5. THE SOUSLIN OPERATION 377

Note that the outer union may be taken of more than countably many sets. A family (Av)v∈N+

is called a Souslin scheme, which is called regular if Aw ⊆ Av whenever v is an initial pieceof w. Because ⋃

t∈N∞

⋂k∈N

At|k =⋃t∈N∞

⋂k∈N

( ⋂1≤j≤k

At|j),

we can and will restrict our attention to regular Souslin schemes whenever C is closed underfinite intersections.

We will see now that each analytic set can be represented through a Souslin scheme with aspecial shape. This has some interesting consequences, among others that analytic sets areclosed under the Souslin operation.

Proposition 4.5.1 Let X be a Polish space and (Av)v∈N+ be a regular Souslin scheme ofclosed sets such that diam(Av)→ 0, as the length of v goes to infinity. Then

E :=⋃t∈N∞

⋂k∈N

At|k

is an analytic set in X. Conversely, each analytic set can be represented in this way.

Proof 1. Assume E is given through a Souslin scheme, then we represent E = f[F]

withF ⊆ N∞ a closed set and f : F → X continuous.

In fact, put

F := t ∈ N∞ | At|k 6= ∅ for all k.

Then F is a closed subset of N∞: take s ∈ N∞ \ F , then we can find k′ ∈ N with As|k′ = ∅,so that G := t ∈ N∞ | t|k′ = s|k′ is open in N∞, contains s and is disjoint to F . Nowlet t ∈ F , then there exists exactly one point f(t) ∈

⋂k∈NAt|k, since X is complete and the

diameters of the sets involved tend to zero, see Proposition 3.5.25. Then E = f[F]

followsfrom this construction, and f is continuous.

Let t ∈ F and ε > 0 be given, take x := f(t) and let B be the ball with center x and radius ε.Then we can find an index k such that At|k′ ⊆ S for all k′ ≥ k, hence U := s ∈ F | t|k = s|kis an open neighborhood of t with f

[U]⊆ B.

2. Let E be an analytic set, then E = f[N∞]

with f continuous by Proposition 4.4.2. DefineAv as the closure of the set f

[t ∈ N∞ | t|k = v

], if the length of v is k. Then clearly

E =⋃t∈N∞

⋂k∈N

At|k,

since f is continuous. It is also clear that (Av)v∈N+ is regular with diameter tending to zero.a

Before we can enter into the — fairly technical — demonstration that the Souslin operationis idempotent, we need some auxiliary statements.

The first one is readily verified.

Lemma 4.5.2 b(m,n) := 2m−1(2n − 1) defines a bijective map N × N → N. Moreover,m ≤ b(m,n) and n < n′ implies b(m,n) < b(m,n′) for all n, n′,m ∈ N. a

Page 390: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

378 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Given k ∈ N, there exists a unique pair 〈`(k), r(k)〉 ∈ N × N with b(`(k), r(k)) = k. We willneed the functions `, r : N→ N later on. The next function is considerably more complicated,since it caters for a more involved set of parameters.

Lemma 4.5.3 Given z = (zn)n∈N ∈ (NN)N with zn = (zn,m)m∈N and t ∈ NN, define thesequence B(t, z) ∈ NN through

B(t, z)k := b(t(k), z`(k),r(k))

(k ∈ N). Then B : NN × (NN)N → NN is a bijection.

Proof 1. We show first that B is injective. Let 〈t, z〉 6= 〈t′, z′〉. If t 6= t′, we find k with t(k) 6=t′(k), so that b(t(k), z`(k),r(k)) 6= b(t′(k), z′`(k),r(k)) follows, because b is injective. Now assume

that t = t′, but z 6= z′, so we can find i, j ∈ N with zi,j 6= z′i,j . Let k := b(i, j), so that `(k) = iand r(k) = j, hence 〈t(k), z`(k),r(k)〉 6= 〈t(k), z′`(k),r(k)〉, so that B(t, z)k 6= B(t′, z′)k.

2. Now let s ∈ NN, and define t ∈ NN and z ∈ (NN)N

tk := `(sk),

zn,m := r(sb(n,m)).

Then we have for k ∈ N

B(t, z)k = b(tk, z`(k),r(k)) = b(`(sk), r(sb(`(k),r(k)))

)= b(`(sk), r(sk)

)= sk.

a

We construct maps ϕ,ψ from the maps b and B now with special properties. They will bemade use of in the proof that the Souslin operation is idempotent.

Lemma 4.5.4 There exist maps ϕ,ψ : N+ → N+ with this property: let w = B(t, z)|b(n,m),then ϕ(w) = t|m and ψ(w) = zm|n.

Proof Fix v = 〈x1, . . . , xk〉, then define for m := `(k) and n := r(k)

ϕ(v) := 〈`(x1), . . . , `(xm)〉,ψ(v) := 〈r(xb(m,1), . . . , r(xb(m,n))〉

We see from Lemma 4.5.2 that these definitions are possible.

Given t ∈ NN and z ∈ (NN)N, we put k := b(m,n) and v := B(t, z)|k, then we obtain fromthe definition of ϕ resp. ψ

ϕ(v) = 〈`(v1), . . . , `(vm)〉 = t|m

andψ(v) = 〈r(vb(m,0)), . . . , r(vb(m,n))〉 = zm|n

a

The construction shows that A (C) is always closed under countable unions and countableintersections. We are now in a position to prove a much more general observation od theSouslin operation.

Page 391: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.5. THE SOUSLIN OPERATION 379

Theorem 4.5.5 A (A (C)) = A (C).

Proof It is clear that C ⊆ A (C), so we have to establish the other inclusion. Let Dv,w |w ∈ N+ be a Souslin scheme for each v ∈ N+, and put Av :=

⋃s∈N∞

⋂m∈NDv,s|m. Then we

have

A :=⋃t∈N∞

⋂k∈N

At|k

=⋃t∈N∞

⋂k∈N

⋃s∈N∞

⋂m∈N

Dv,s|m

=⋃t∈N∞

⋃(zn)n∈N∈(N∞)N

⋂m∈N

⋂k∈N

Dt|m,zm|n

(∗)=

⋃s∈N∞

⋂k∈N

Cs|k

withCv := Dϕ(v),ψ(v)

for v ∈ N+. So we have to establish the equality marked (∗).

“⊆”: Given x ∈ A, there exists t ∈ N∞ and z ∈ (N∞)N such that x ∈ Dt|m,zm|n . Put

s := B(t, z). Let k ∈ N be arbitrary, then there exists a pair 〈m,n〉 ∈ N × N withk = b(m,n) by Lemma 4.5.2. Thus we have t|m = ϕ(s|k) and zm|n = ψ(s|k). byLemma 4.5.4, from which x ∈ Dt|m,zm|n = Cs|k follows.

“⊇”: Let s ∈ N∞ such that x ∈ Cs|k for all k ∈ N. We can find by Lemma 4.5.3 some t ∈ N∞

and z ∈ (NN)N with B(t, z) = s. Given k, there exist m,n ∈ N with k = b(m,n), henceCs|k = Dt|m,zm|n. Thus x ∈ A.

a

We obtain as an immediate consequence that analytic sets in a Polish space X are closedunder the Souslin operation. This is so because we have seen that the collection of analyticsets is contained in A

(F ⊆ X | F is closed

), so an application of Theorem 4.5.5 proves the

claim. But we can say even more.

Proposition 4.5.6 Assume that the complement of each set in C belongs to A (C), and ∅ ∈ C.Then σ(C) ⊆ A (C). In particular, analytic sets in a Polish space X are closed under theSouslin operation.

Proof We apply for establishing the general statement the principle of good sets. De-fine

G := A ∈ A (C) | X \A ∈ A (C).

Then G is closed under complementation. If (An)n∈N is a sequence in G, then⋂n∈NAn ∈ G,

because A (C) is closed under countable unions. Similarly,⋃n∈NAn ∈ G. Since ∅ ∈ G, we

may conclude that G is a σ-algebra, which contains C by assumption. Hence σ(C) ⊆ σ(G) =G ⊆ A (C). The assertion concerning analytic sets follows from Proposition 4.5.1, because aclosed set in a Gδ-set. a

With complete measure spaces we will meet an important class of measurable spaces, which isclosed under the Souslin operation. As a preparation for this we state and prove an interesting

Page 392: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

380 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

criterion for being closed under this operation. This requires the definition of a particularkind of cover.

Definition 4.5.7 Given a measurable space (X,A) and a subset A ⊆ X, we call A↑ ∈ A anA-cover of A iff

1. A ⊆ A↑.

2. For every B ∈ A with A ⊆ B, P(A↑ \B

)⊆ A.

Thus A↑ ∈ A covers A in the sense that A ⊆ A↑, and if we have another set B isA which coversA as well, then all the sets which make out the difference between A↑ and B are measurable.This last condition indicates that the “breathing space” between A and its cover A↑ is anelement of A. In a technical sense, this will give us considerable room for maneuvering, whenapplying this concept.

In addition it follows that if A ⊆ A′ ⊆ A↑ and A′ ∈ A, then A′ is also an A-cover. Thisconcept sounds fairly artificial and somewhat far fetched, but we will see that arises in anatural way when completing measure spaces. The surprising observation is that a space isclosed under the Souslin operation whenever each subset has an A-cover.

Proposition 4.5.8 Let (X,A) be a measurable space such that each subset of X has an Acover. Then (X,A) is closed under the Souslin operation.

Proof 0. The proof is a bit tricky. We first construct from a regular Souslin scheme a sequenceOutline ofthe proof

of sets which are indexed by the words over N such that each set Bw can be represented asthe union of

(Bwn

)n∈N. For each set Bw there exists an A-cover, which due to the properties

of Bw can be easier manipulated than a cover for the sets in the Souslin scheme proper, inparticular we can look at the difference between the A-cover for Bn and those for Bwn, sothat we can move backward, from longer words to shorter ones. It will then turn out that wecan represent the set defined by the given Souslin scheme through the A-cover given for theempty word ε.

1. LetA :=

⋃t∈N∞

⋂k∈N

Aa|k

with (Av)v∈N+ a regular Souslin scheme in A. Define

Bw :=⋃⋂n∈N

At|n | t ∈ N∞, w is a prefix of t.

for w ∈ N∗ = N+ ∪ ε. Then Bε = A, Bw =⋃n∈NBwn, and Bw ⊆ Aw if w 6= ε.

By assumption, there exists a minimal A-cover B↑w for Bw. We may and do assume thatB↑w ⊆ Aw, and that (B↑w)w∈N∗ is regular (we otherwise force this condition by considering the

A-cover(⋂

v prefix of w(B↑v ∩ Av))w∈N∗ instead). Now put Dw := B↑w \

⋃n∈NB

↑wn for w ∈ N∗.

We obtain from this construction Bw ⊆ B↑w =⋃n∈NB

↑wn ∈ A, hence that every subset of Dw

is in A, since B↑w is an A-cover. Thus every subset of D :=⋃w∈N∗ Dw is in A.

2. We claim that B↑ε \D ⊆ A. In fact, let x ∈ B↑ε \D, then x 6∈ Dε, so we can find k1 ∈ Nwith x ∈ B↑k1 , but x 6∈ Dn1 . Since x 6∈ Dk1 , we find k2 with x ∈ B↑k1,k2 such that x 6∈ Dk1,k2 .

Page 393: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.6. UNIVERSALLY MEASURABLE SETS 381

So we inductively define a sequence t := (kn)n∈N so that x ∈ B↑t|k for all k ∈ N. Because

B↑t|k ⊆ At|k, we conclude that x ∈ A.

3. Hence we obtain B↑ε \ A ⊆ D, and since every subset of D is in A, we conclude that

B↑ε \A ∈ A, which means that A = B↑ε \ (B↑ε \A) ∈ A. a

The concept of being closed under the Souslin operation will now be applied to universallymeasurable sets, in particular to analytic sets in a Polish space.

4.6 Universally Measurable Sets

After these technical preparations we are poised to enter the interesting world of universallymeasurable sets with the closure operations that are associated with them. We define com-plete measure spaces and show that an arbitrary (σ-)finite measure space can be completed,uniquely extending the measure as we go. This leads also to completions with respect to fam-ilies of finite measures, and we show that the resulting measurable spaces are closed underthe Souslin operation.

Two applications are discussed. The first one demonstrates that a measure defined on acountably generated sub-σ-algebra of the Borel sets of an analytic space can be extended tothe Borel sets, albeit not necessarily in a unique way. This result due to Lubin rests on theimportant von Neumann Selection Theorem, giving a universally right inverse to a measurablemap from an analytic to a separable space. Another application of von Neumann’s result isthe observation that under suitable topological assumptions for a surjective map f the liftedmap M(f) is surjective as well. The second application shows that a transition kernel canbe extended to the universal closures of the measurable spaces involved, provided the targetspace is separable. This, however, does not require a selection.

A σ-finite measure space (X,A, µ) is called complete iff µ(A) = 0 with A ∈ A and B ⊆ AComplete

measureimplies B ∈ A. Thus if we have two sets A,A′ ∈ A with A ⊆ A′ and µ(A) = µ(A′), then weknow that each set which can be sandwiched between the two will be measurable as well. Thisis partly anticipated in the discussion in Section 1.6.4, where a similar extension problem isconsidered, but starting from an outer measure. We will discuss the completion of a measurespace and investigate some properties. We first note that it is sufficient to discuss finitemeasure spaces; in fact, assume that we have a collection of mutually disjoint sets (Gn)n∈Nwith Gn ∈ A such that 0 < µ(Gn) <∞ and

⋃n∈NGn = X, consider the measure

µ′(B) :=∑n∈N

µ(B ∩Gn)

2nµ(Gn),

then µ is complete iff µ′ is complete, and µ′ is a probability measure.

We fix for the time being a finite measure µ on a measurable space (X,A).

Page 394: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

382 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

The outer measure µ∗ is defined through

µ∗(C) := inf∑n∈N

µ(An) | C ⊆⋃n∈N

An, An ∈ A for all n ∈ N

= infµ(A) | C ⊆ A,A ∈ A

for any subset C of X, see page 57.µ∗

Definition 4.6.1 Call N ⊆ X a µ-null set iff µ∗(N) = 0. Define Nµ as the set of all µ-nullsets.

Because µ∗ is countable subadditive, we obtain

Lemma 4.6.2 Nµ is a σ-ideal. a

Now assume that we have sets A,A′ ∈ A and N,N ′ ∈ Nµ such that their symmetric differenceswith A∆N = A′∆N ′, so their symmetric differences are the same. Then we may infer µ(A) =µ(A′), because A∆A′ = A∆

(A∆(N∆N ′)

)= N∆N ′ ⊆ N ∪ N ′ ∈ Nµ, and |µ(A) − µ(A′)| ≤

µ(A∆A′). Thus we may construct an extension of µ to the σ-algebra generated by A and Nµin the obvious way. Let us have at some properties of this construction.

Proposition 4.6.3 Define Aµ := σ(A ∪ Nµ) and µ(A∆N) := µ(A) for A ∈ A, N ∈ Nµ.Then

1. Aµ = A∆N | A ∈ A, N ∈ Nµ, and A ∈ Aµ iff there exist sets A′, A′′ ∈ A withA′ ⊆ A ⊆ A′′ and µ∗(A′′ \A′) = 0.

2. µ is a finite measure, and the unique extension of µ to Aµ.

3. The measure space (X,Aµ, µ) is complete. It is called the µ-completion of (X,A, µ).

Proof 0. The proof is a fairly straightforward check of the properties. It rests essentiallyon the observation that Nµ is a σ-ideal, so that the construction of the σ-algebra underconsideration has already been studied.

1. Since Nµ is a σ-ideal, we infer from Lemma 4.1.3 that A ∈ Aµ iff there exists B ∈ A andN ∈ Nµ with A = B∆N . Now consider

C := A ∈ Aµ | ∃A′, A′′ ∈ A : A′ ⊆ A ⊆ A′′, µ∗(A′′ \A′) = 0.

Then C is a σ-algebra which contains A ∪Nµ, thus C = Aµ.

From the observation made just before stating the proposition it becomes clear that µ iswell defined on Aµ. Since µ∗ coincides with µ on Aµ and the outer measure is countablysubadditive by Lemma 1.6.23, we have to show that µ is additive on Aµ. This followsimmediately from the first part. If ν is another extension to µ on Aµ, Nν = Nµ follows, sothat µ(A∆N) = µ(A) = ν(A) = ν(A∆N) whenever A∆N ∈ Aµ.

2. Completeness of (X,Aµ, µ) follows now immediately from the construction. a

Surprisingly, we have received more than we have shopped for, since complete measure spacesare closed under the Souslin operation. This is remarkable because the Souslin operationevidently bears no hint at all at measures which are defined on the base space. In addition,

Page 395: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.6. UNIVERSALLY MEASURABLE SETS 383

measures are defined through countable operations, while the Souslin operation makes use ofthe uncountable space NN.

Proposition 4.6.4 A complete measure space is closed under the Souslin operation.

The proof simply combines the pieces we have constructed already into a sensible picture.

Proof Let (X,A, µ) be complete, then it is enough to show that each B ⊆ X has an A-cover(Definition 4.5.7); then the assertion will follow from Proposition 4.5.8. In fact, given B,construct B∗ ∈ A such that µ(B∗) = µ∗(B), see Lemma 1.6.35. Whenever C ∈ A withB ⊆ C, we evidently have every subset of B∗ \ C in A by completeness. a

These constructions work also for σ-finite measure spaces, as indicated above. Now let Mbe a non-empty set of σ-finite measures on the measurable space (X,A), then define the

M -completion AM and the universal completion A of the σ-algebra A throughAM ,A

AM :=⋂µ∈MAµ,

A :=⋂Aµ | µ is a σ-finite measure on A.

As an immediate consequence this yields that the analytic sets in a Polish space are containedin the universal completion of the Borel sets, specifically

Corollary 4.6.5 Let X be a Polish space and µ be a finite measure on B(X). Then allanalytic sets are contained in B(X)

Proof Proposition 4.5.1 shows that each analytic set can be represented through a Souslinscheme based on closed sets, and Proposition 4.6.4 shows that B(X) is closed under theSouslin operation. a

Just for the record:

Corollary 4.6.6 The universal closure of a measurable space is closed under the Souslinoperation. a

Measurability of maps is preserved when passing to the universal closure.

Lemma 4.6.7 Let f : X → Y be A-B- measurable, then f is A-B measurable.

Proof Let D ∈ B be a universally measurable subset of Y , then we have to show thatE := f−1

[D]

is universally measurable in X. So we have to show that for every finitemeasure µ on A there exists E′, E′′ ∈ A with E′ ⊆ E ⊆ E′′ and µ(E′ \ E′′) = 0.

Define ν as the image of µ under f , so that ν(B) = µ(f−1[B]) for each B ∈ B, then we know

that there exists D′, D′′ ∈ B with D′ ⊆ D ⊆ D′′ such that ν(D′′ \D′) = 0, hence we have forthe measurable sets E′ := f−1

[D′], E′′ := f−1

[D′′]

µ(E′′ \ E′) = µ(f−1[D′′ \D′

]) = ν(D′′ \D′) = 0.

Thus f−1[D]∈ A. a

We will give now two applications of this construction. The first will show that a finite measureon a countably generated sub-σ-algebra of the Borel sets of an analytic space has always an

Page 396: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

384 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

extension to the Borel sets, the second will construct an extension of a stochastic relationK : (X,A) (Y,B) to a stochastic relation K : (X,A) (Y,B), provided the target space(Y,B) is separable. This first application is derived from von Neumann’s Selection Theorem,which is established here as well. It is shown also that a measurable surjection can be liftedto a measurable map between finite measure spaces, provided the target space is a separablemetric space.

4.6.1 Lubin’s Extension through von Neumann’s Selectors

Let X be an analytic space, and B be a countably generated sub-σ-algebra of B(X) we willshow that each finite measure defined on B has at least one extension to a measure on B(X).This is established through a surprising selection argument, as we will see.

As a preparation, we require a universally measurable right inverse of a measurable surjectivemap f : X → Y . We know from the Axiom of Choice that we can find for each y ∈ Y somex ∈ X with f(x) = y, because f−1

[y]| y ∈ Y is a partition of X into non-empty sets, see

Proposition 1.0.1 . Set g(y) := x. Selecting an inverse image in this way will not guarantee,however, that g has any favorable properties, even if, say, both X and Y are compact metricand f is continuous. Hence we will have to proceed in a more systematic way.

We will use the observation that each analytic set in a Polish space can be represented asthe continuous image of N∞, as discussed in Proposition 4.4.2. Again, the strategy of usinga particular space as a reference point pays off. We move the problem to N∞, where we caneasily define a total order, which is then used for solving the problem. Then we port thesolution back to the space from which it originated. We will first formulate a sequence ofauxiliary statements that deal with finding for a given surjective map f : X → Y a mapg : Y → X such that f g = idY . This map g should have some sufficiently pleasantproperties.

In order to make the first step it turns out to be helpful focusing the attention to analyticsets being the continuous images of N∞. This looks a bit far fetched, because we wantto deal with universally measurable sets, but remember that analytic sets are universallymeasurable.

We order N∞ lexicographically by saying that (tn)n∈N (t′n)n∈N iff there exists k ∈ N such(tn) (t′n)that tk ≤ t′k, and tj = t′j for all ` with 1 ≤ j < k. Then defines a total order on N∞. Wewill capitalize on this order, to be more precise, on the interplay between the order and thetopology. Let us briefly look into the order structure of N∞.

Lemma 4.6.8 Each nonempty closed set F ⊆ N∞ has a minimal element in the lexicographicorder.

Proof Let n1 be the minimal first component of all elements of F , n2 be the minimalsecond component of those elements of F that start with n1, etc. This defines an elementt := 〈n1, n2, . . .〉. We claim that t ∈ F . Let U be an open neighborhood of t, then thereexists k ∈ N such that t ∈ Θn1...nk ⊆ U (Θα is defined on page 358). By construction,Θn1...nk ∩ F 6= ∅, thus each open neighborhood of t contains an element of F . Hence t is

Page 397: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.6. UNIVERSALLY MEASURABLE SETS 385

an accumulation point of F , and since f is closed, it contains all its accumulation points.Consequently, t ∈ F . a

We know for f : N∞ → X continuous that the inverse images f−1[y]

with y ∈ f[N∞]

areclosed. Thus me may pick for each y ∈ f

[N∞]

this smallest element. This turns out to be asuitable choice, as the following statement shows.

Lemma 4.6.9 Let X be Polish, Y ⊆ X analytic with Y = f[N∞]

for some continuousf : N∞ → X. Then there exists g : Y → N∞ such that

1. f g = idY ,

2. g is B(Y )-B(N∞)-measurable.

Proof 1. Since f is continuous, the inverse image f−1[y]

for each y ∈ Y is a closed andnonempty set in N∞. Thus this set contains a minimal element g(y) in the lexicographic order by Lemma 4.6.8. It is clear that f(g(y)) = y holds for all y ∈ Y .

2. Denote by A(t′) := t ∈ N∞ | t ≺ t′, then A(t′) is open: let (`n)n∈N = t ≺ t′ and k be thefirst component in which t differs from t′, then Θ`1...`k−1

is an open neighborhood of t that isentirely contained in A(t′). It is easy to see that A(t′) | t′ ∈ N∞ is a generator for the Borelsets of N∞.

3. We claim that g−1[A(t′)

]= f

[A(t′)

]holds. In fact, let y ∈ g−1

[A(t′)

], so that g(y) ∈ A(t′),

then y = f(g(y)) ∈ f[A(t′)

]. If, on the other hand, y = f(t) with t ≺ t′, then by construction

t ∈ f−1[y], thus g(y) t ≺ t′, settling the other inclusion.

This equality implies that g−1[A(t′)

]is an analytic set, because it is the image of an open set

under a continuous map. Consequently, g−1[A(t′)

]is universally measurable for each A(t′)

by Corollary 4.6.5. Thus g is a universally measurable map. a

This statement is the work horse for establishing that a right inverse exists for surjective Borelmaps between an analytic space and a separable measurable space. All we need to do now isto massage things into a shape that will render this result applicable in the desired context.The following theorem is usually attributed to von Neumann.

Theorem 4.6.10 Let X be an analytic space, (Y,B) a separable measurable space and f :X → Y a surjective measurable map. Then there exists g : Y → X with these properties:

VonNeumann’s

SelectionTheorem

1. f g = idY ,

2. g is B-B(X)-measurable.

Proof 0. Lemma 4.6.9 gives the technical link which permits to use N∞ as an intermediaryfor which we have already a partial solution.

1. We may and do assume by Lemma 4.4.17 that Y is an analytic subset of a Polish spaceQ, and that X is an analytic subset of a Polish space P . x 7→ 〈x, f(x)〉 is a bijective Borelmap from X to the graph of f , so graph(f) is an analytic set by Proposition 4.4.6. Thus wecan find a continuous map F : N∞ → P ×Q with F

[N∞]

= graph(f). Consequently, πQ Fis a continuous map from N∞ to Q with

(πQ F )[N∞]

= πQ[graph(f)

]= Y.

Page 398: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

386 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Now let G : Y → N∞ be chosen according to Lemma 4.6.9 for πQ F . Then g := πP F G :Y → X is the map we are looking for:

• g is universally measurable, because G is, and because πP F are continuous, henceuniversally measurable as well,

• f g = f (πP F G) = (f πP ) F G = πQ F G = idY , so g is right inverse to f .

a

Due to its generality, the von Neumann Selection Theorem has many applications in diverseareas, many of them surprising. The art in applying it is to reformulate the problem in asuitable manner so that the requirements of this selection theorem are satisfied. We pick twoapplications, viz., showing that the image M(f) of a surjective Borel map f yields a surjectiveBorel map again, and Lubin’s measure extension.

Proposition 4.6.11 Let X be an analytic space, Y a second countable metric space. Iff : X → Y is a surjective Borel map, so is M(f) : M(X)→M(Y ).

Proof 1. From Theorem 4.6.10 we find a map g : Y → X such that f g = idY and g isB(Y ) − B(X)-measurable.

2. Let ν ∈ M(Y ), and define µ := M(g)(ν), then µ ∈ M(X,B(X)) by construction. Restrictµ to the Borel sets on X, obtaining µ0 ∈M(X,B(X)). Since we have for each set B ⊆ Y theequality g−1

[f−1[B]

]= B, we see that for each B ∈ B(Y )

M(f)(µ0)(B) = µ0(f−1[B]) = µ(f−1

[B]) = ν(g−1

[f−1

[B]]

) = ν(B)

holds. a

This has as a consequence that M is an endofunctor on the category of Polish or analyticspaces with surjective Borel maps as morphisms; it displays a pretty interaction of reasoningin measurable spaces and arguing in categories.

The following extension theorem due to Lubin shows that one can extend a finite measurefrom a countably generated sub-σ-algebra to the Borel sets of an analytic space. In contrastto classical extension theorems like Theorem 1.6.29 it does not permit to conclude that theextension is uniquely determined.

Theorem 4.6.12 Let X be an analytic space, and µ be a finite measure on a countablygenerated sub-σ-algebra A ⊆ B(X). Then there exists an extension of µ to a finite measure νon B(X).

Proof Let (An)n∈N be the generator of A, and define the map f : X → 0, 1N throughx 7→

(χAn(x)

)n∈N. Then M := f

[X]

is an analytic space, and f is B(X)-B(M) measurableby Proposition 4.3.10 and Proposition 4.4.6. Moreover,

A = f−1[C]| C ∈ B(M). (4.6)

By von Neumann’s Selection Theorem 4.6.10 there exists g : M → X with f g = idM whichis B(M)-B(X)-measurable. Define

ν(B) := µ((g f)−1

[B])

Page 399: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.6. UNIVERSALLY MEASURABLE SETS 387

for B ∈ B(X) with µ as the completion of µ on A. Since we have g−1[B]∈ B(M) for

B ∈ B(X), we may conclude from (4.6) that f−1[g−1[B]]∈ A. ν is an extension to µ. In

fact, given A ∈ A, we know that A = f−1[C]

for some C ∈ B(M), so that we obtain

ν(A) = µ((g f)−1

[f−1

[C]])

= µ(f−1 g−1 f−1[C]

)(∗)= µ

(f−1

[C])

= µ(A)= µ(A).

(∗) holds, since f g = idM . This completes the proof. a

Lubin’s Theorem can be rephrased in a slightly different way as follows. The identity idA :(X,B(X))→ (X,A) is measurable, because A is a sub-σ-algebra of B(X). Hence it induces ameasurable map S(idA) : S(X,B(X))→ S(X,A). Lubin’s Theorem then implies that S(idA)is surjective. This is so since S(idA)(µ) is just the restriction of µ to the sub-σ-algebra A fora given µ ∈ S(X,B(X)).

4.6.2 Completing a Transition Kernel

In some probabilistic models for modal logics it becomes necessary to assume that the statespace is closed under Souslin’s operation, see for example [Dob12b] or Section 4.9.4, on theother hand one may not always assume that a complete measure space is given. Henceone needs to complete it, but it is then also mandatory to extend the transition law to thecompletion as well. This means that an extension of the transition law to the completionbecomes necessary. This problem will be studied now.

The completion of a measure space is described in terms of null sets and using inner and outerapproximations, see Proposition 4.6.3. We will use the latter here, fixing measurable spaces(X;A) and (Y,B). Denote by SX the smallest σ-algebra on X which contains A and whichis closed under the Souslin operation, hence SX ⊆ A by Corollary 4.6.6.

Fix K : (X,A) (Y,B) as a transition kernel, and assume first that B = B(Y ) is the σ-algebra of Borel sets for a second countable metric space. Hence the topology τ of Y has acountable base τ0, which in turn implies that G =

⋃H ∈ τ0 | H ⊆ G for each open set

G ∈ τ .

For each x ∈ X we have a finite measure K(x) through the transition kernel K. We associateto K(x) an outer measure

(K(x)

)∗on the power set of X. We want to show that the

mapx 7→

(K(x)

)∗(A)

is SX -measurable for each A ⊆ Y ; define for convenience

K∗(x) :=(K(x)

)∗.

Establishing measurability is broken into a sequence of steps.

We need the following regularity argument (but compare Exercise 4.12 for the non-metriccase)

Lemma 4.6.13 Let µ be a finite measure on (Y,B(Y )), B ∈ B(Y ). Then we can find foreach ε > 0 an open set G ⊆ Y with B ⊆ G and a closed set F ⊇ B such that µ(G \ F ) < ε.

Page 400: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

388 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof 0. Let

G := B ∈ B(Y ) | the assertion is true for B.

We will use a variant of the principle of good sets by showing that G has these properties:Outline ofthe proof

• G is closed under complementation.

• The open sets (and, by implication, the closed sets) are contained in G.

• G is closed under taking disjoint countable unions.

This will permit applying the π-λ-Theorem, because the open sets are a ∩-closed generatorof the Borel sets.

1. That G is closed under complementation trivial. G contains the open as well as the closedsets. If F ⊆ Y is closed, we can represent F =

⋂n∈NGn with (Gn)n∈N as a decreasing sequence

of open sets, hence µ(F ) = infn∈N µ(Fn) = limn→∞ µ(Fn), so that G also contains the closedsets; one arguments similarly for the open sets as increasing unions of open sets.

2. Now let (Bn)n∈N be a sequence of mutually disjoint sets in G, select Gn open for Bn andε/2−(n+1), then G :=

⋃n∈NGn is open with B :=

⋃n∈NBn ⊆ G and µ(G \B) ≤ ε. Similarly,

select the sequence (Fn)n∈N with Fn ⊆ Bn and µ(Bn \ Fn) < ε/2−(n+1) for all n ∈ N, putF :=

⋃n∈N Fn and select m ∈ N with µ(F \

⋃mn=1 Fn) < ε/2, then F ′ :=

⋃mn=1 Fn is closed,

F ′ ⊆ B, and µ(B \ F ′) < ε.

3. Hence G is closed under complementation as well as countable disjoint unions. This impliesG = B(Y ) by the π-λ Theorem 1.6.30, since G contains the open sets. a

Fix A ⊂ Y for the moment. We claim that

K∗(x)(A) = infK(x)(G) | A ⊆ G open

holds for each x ∈ X. In fact, given ε > 0, there exists A ⊆ A0 ∈ B(Y ) with K(x)(A0) −K∗(x)(A) < ε/2.Applying Lemma 4.6.13 toK(x), we find an open setG ⊇ A0 withK(x)(G)−K(x)(A0) < ε/2, thus K(x)(G)−K∗(x)(A) < ε.

Since τ0 is a countable base for the open sets, which we may assume to be closed under finiteunions (because otherwise G1 ∪ . . . ∪Gk | k ∈ N, G1, . . . , Gk ∈ τ0 is a countable base whichhas this property), we obtain

K∗(x)(A) = infsupn∈N

K(x)(Gn) | A ⊆⋃n∈N

Gn, (Gn)n∈N ⊆ τ0 increases. (4.7)

Let

GA := (Gn)n∈N ⊆ τ0 | (Gn)n∈N increases and A ⊆⋃n∈N

Gn

be the set of all increasing sequences from base τ0 which cover A. Partition GA into thesets

NA := g ∈ GA | g contains only a finite number of sets,MA := GA \ NA.

Page 401: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.6. UNIVERSALLY MEASURABLE SETS 389

Because τ0 is countable, NA is.

We want so show that K∗, suitably restricted, is the extension we are looking for. In orderto establish this, we build a tree with basic open sets as nodes. The offsprings of node G ∈ τ0

are those open sets G′ ∈ τ0 which contain G. Thus each node has at most countably manyoffsprings. This tree will be used to construct a Souslin scheme.

Lemma 4.6.14 There exists an injective map Φ :MA → NN such that g | k = g′ | k impliesΦ(g) | k = Φ(g′) | k for all k ∈ N.

Proof 1. Build an infinite tree in this way: the root is the empty set, a node G at level k hasall elements G′ from τ0 with G ⊆ G′ as offsprings. Remove from the tree all paths H1, H2, . . .such that A 6⊆

⋃n∈NHn. Call the resulting tree T .

2. Put G0 := ∅, and let T1,G0 be the set of nodes of T on level 1 (hence just the offspringsof the root G0), then there exists an injective map Φ1,G0 : T1,G0 → N. If G1, . . . , Gk is afinite path to inner node Gk in T , denote by Tk+1,G1,...,Gk the set of all offsprings of Gk, andlet

Φk+1,G1,...,Gk : Tk+1,G1,...,Gk → N

be an injective map. Define

Φ :

MA → NN,

(Gn)n∈N 7→(Φn,G1,...,Gn−1(Gn)

)n∈N.

3. Assume Φ(g) = Φ(g′), then an inductive reasoning shows that g = g′. In fact, G1 =G′1, since Φ1,∅ is injective. If g | k = g′ | k has already been established, we know thatΦk+1,G1,...,Gk = Φk+1,G′1,...,G

′k

is injective, so that Gk+1 = G′k+1 follows. A similar inductiveargument shows that Φ(g) | k = Φ(g′) | k, provided g | k = g′ | k for each k ∈ N holds.a

The following lemmata collect some helpful properties.

Lemma 4.6.15 g = g′ iff Φ(g) | k = Φ(g′) | k for all k ∈ N, whenever g, g′ ∈MA. a

The argument for establishing the following statement uses the tree and the maps associatedwith it for constructing a suitable Souslin scheme.

Lemma 4.6.16 Denote by Jk := α | k | α ∈ Φ[MA

] all initial pieces of sequences in the

image of Φ. Then α ∈ Φ[MA

]iff α | k ∈ Jk for all k ∈ N.

Proof Assume that α = Φ(g) ∈ Φ[MA

]with g = (Cn)n∈N ∈ MA and α | k ∈ Jk for

all k ∈ N, so for given k there exists g(k) = (C(k)n)n∈N ∈ MA with α | k = Φ(g(k)) | k.

Because Φ1 is injective, we obtain C1 = C(1)1 . Assume for the induction step that Gi = G

(j)i

has been shown for 1 ≤ i, j ≤ k. Then we obtain from Φ(g) | k + 1 = Φ(g(k+1)) | k + 1

that G1 = G(k+1)1 , . . . , Gk = G

(k+1)k . Since Φk+1,G1,...,Gk is injective, the equality above implies

Gk+1 = G(k+1)k+1 . Hence g = g(k) for all k ∈ N, and α ∈ Φ

[MA

]is established. The reverse

implication is trivial. a

Lemma 4.6.17 Er := x ∈ X | K∗(x)(A) ≤ r ∈ SX for r ∈ R+.

Page 402: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

390 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof The set Er can be written as

Er =⋃

g∈NA

x ∈ X | K(x)(⋃

g)≤ r ∪

⋃g∈MA

x ∈ X | K(x)(⋃

g)≤ r

Because NA is countable, and K : X Y is a transition kernel, we infer⋃g∈NA

x ∈ X | K(x)(⋃

g)≤ r ∈ B(X)

Put for v ∈ N+

Dv :=

∅, if v /∈

⋃k∈N Jk,

x ∈ X | K(x)(Gn) ≤ r, if v = Φ((Gn)n∈N

)| n.

Lemma 4.6.15 and Lemma 4.6.16 show that Dv ∈ B(X) is well defined. Because⋃g∈MA

x ∈ X | K(x)(⋃

g)≤ r =

⋃α∈NN

⋂n∈N

Dα|n, (4.8)

and because SX is closed under the Souslin operation and contains B(X), we conclude thatEr ∈ SX . a

Proposition 4.6.18 Let K : (X;A) (Y,B) be a transition kernel, and assume that Y isa separable metric space. Let SX be the smallest σ-algebra which contains A and which isclosed under the Souslin operation. Then there exists a unique transition kernel

K : (X,SX) (Y,B(Y )K(x)|x∈X

)

extending K.

Proof 1. Put K(x)(A) := K∗(x)(A) for x ∈ X and A ∈ B(Y )K(x)|x∈X

. Because A is anelement of the K(x)-completion of B(Y ), we know that K (x) = K(x) defines a subprobability

on B(Y )K(x)|x∈X

. It is clear thatK(x) is the unique extension ofK(x) to the latter σ-algebra.It remains to be shown that K is a transition kernel.

2. Fix A ∈ B(Y )K(x)|x∈X

and q ∈ [0, 1], then

x ∈ X | K∗(x)(A) < q =⋃`∈N

⋃g∈GA

x ∈ X | K(x)(⋃

g)≤ q − 1

`

The latter set is a member of SX by Lemma 4.6.17. a

Separability of the target space is required because it is this property which makes sure thatthe measure for each Borel set can be approximated arbitrarily well from within by closedsets, and from the outside by open sets Lemma 4.6.13.

Before discussing consequences, a mild generalization to separable measurable spaces shouldbe mentioned. Proposition 4.6.18 yields as an immediate consequence:

Page 403: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.7. MEASURABLE SELECTIONS 391

Corollary 4.6.19 Let K : (X;A) (Y,B) be a transition kernel such that (Y,B) is aseparable measurable space. Assume that X is a σ-algebra on X which is closed under the

Souslin operation with SX ⊆ X , and that Y is a σ-algebra on X with B ⊆ Y ⊆ BK(x)|x∈X.

Then there exists a unique extension (X,X ) (Y,Y) to K. In particular K has a uniqueextension to a transition kernel K : (X,A) (Y,B).

Proof This follows from Proposition 4.6.18 and the characterization of separable measurablespaces in Proposition 4.3.10. a

4.7 Measurable Selections

Looking again at von Neumann’s Selection Theorem 4.6.10, we found for a given surjectionf : X → Y a universally measurable map g : Y → X with f g = idY . This can be rephrased:we have g(y) ∈ f−1

[y]

for each y ∈ Y , so g may be considered a universally measurableselection for the set valued map y 7→ f−1

[y].

We will consider constructing a selection from a slightly different angle by assuming that(X,A) is a measurable, Y is a Polish space, and that we have a set valued map F : X →P (Y ) \ ∅ for which a measurable selection is to be constructed, i.e., a measurable (notmerely universally measurable) map g : X → Y such that g(x) ∈ F (x) for all x ∈ X. Clearly,the Axiom of Choice guarantees the existence of a map which picks an element from F (x) foreach x, but again this is not enough.

We assume that F (x) ⊆ Y , F (x) is closed, and F (x) 6= ∅ for all x ∈ X and that it is measur-able. Since F does not necessarily take single values only, we have to define measurability inthis case. Denote by F(Y ) the set of all closed and non-empty subsets of Y .

Definition 4.7.1 A map F : X → F(Y ) from a measurable space (X,A) to the closed non-empty subsets of a Polish space Y is called measurable (or a measurable relation) iff Fw

Fw(G) := x ∈ X | F (x) ∩G 6= ∅ ∈ A

for every open subset G ⊆ Y . The map s : X → Y is called a measurable selector for F iff sis A-B(Y )-measurable such that s(x) ∈ F (x) for all x ∈ X.

Since f(x)∩G 6= ∅ iff f(x) ∈ G, measurability as defined in this definition is a generalizationof measurability for point valued maps f : X → Y .

The selection theorem due to Kuratowski and Ryll-Nardzewski tell us that a measurableselection exists for a measurable closed valued map, provided Y is Polish. To be specific:

Kuratowskiand Ryll-

Nardzewskiselectiontheorem

Theorem 4.7.2 Given a measurable space (X,A) and a Polish space Y , a measurable mapF : X → F(Y ) has a measurable selector.

Proof 0. Fix a complete metric d on Y . As usual, B(y, r) is the open ball with centery ∈ Y and radius r > 0. Recall that the distance of an element y to a closed set C isd(y, C) := infd(y, y′) | y′ ∈ C, hence d(y, C) = 0 iff y ∈ C. The idea of the proof is to define

Idea

Page 404: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

392 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

a sequence (fn)n∈N of measurable maps such that fn(x) comes closer and closer to F (x),measured in terms of the distance d(fn(x), F (x)) of fn(x) to F (x), and that

(fn(x)

)n∈N is a

Cauchy sequence in the complete space Y for each x.

1. Let (yn)n∈N be dense, and define f1(x) := yn, if n is the smallest index k so that F (x) ∩B(yk, 1) 6= ∅. Then f1 : X → Y is A-B(Y ) measurable, because the map takes only acountable number of values and

x ∈ X | f1(x) = yn = Fw(B(yn, 1)) \n−1⋃k=1

Fw(B(yk, 1)).

Proceeding inductively, assume that we have defined measurable maps f1, . . . , fn such that

d(fj(x), fj+1(x)) < 2−(j−1), 1 ≤ j < nd(fj(x), F (x)) < 2−j , 1 ≤ j ≤ n

Put Xk := x ∈ X | fn(x) = yk, and define fk+1(x) := y` for x ∈ Xk, where ` is thesmallest index m such that F (x) ∩ B(yk, 2

−n) ∩ B(ym, 2−(n+1)) 6= ∅. Moreover, there exists

y′ ∈ B(yk, 2−n) ∩B(ym, 2

−(n+1), thus

d(fn(x), fn+1(x)) ≤ d(fn(x), y′) + f(fn+1(x), y′) < 2−n + 2−(n+1)

The argumentation from above shows that fn+1 takes only countably many values, and weknow that d(fn+1(x), F (x)) < 2−(n+1).

2. Thus (fn(x))n∈N) is a Cauchy sequence for each x ∈ X. Since (Y, d) is complete, the limitf(x) := limn→∞ fn(x) exists with d(f(x), F (x)) = 0, hence f(x) ∈ F (x), because F (x) isclosed. Moreover, as a pointwise limit of a sequence of measurable functions, f is measurable,so f is the desired measurable selector. a

It is possible to weaken the conditions on F and on A, see Exercise 4.21. This theorem hasan interesting consequence, viz., that we can find a sequence of dense selectors for F .

Corollary 4.7.3 Under the assumptions of Theorem 4.7.2, a measurable map F : X → F(Y )has a sequence (fn)n∈N of measurable selectors such that fn(x) | n ∈ N is dense in F (x) foreach x ∈ X.

Proof 1. We use notations from above. Let again (yn)n∈N be a dense sequence in Y , anddefine for n,m ∈ N the map

Fn,m(x) :=

F (x) ∩B(yn, 2

−m), if x ∈ Fw(B(yn, 2−m))

F (x), otherwise

Denote by Hn,m(x) the closure of Fn,m(x).

2. Hn,m : X → F(Y ) is measurable. In fact, put A1 := Fw(B(yn, 2−m)), A2 := X \ A1, then

A1, A2 ∈ A, because F is measurable and B(yn, 2−m) is open. But then we have for an open

set G ⊆ Y

x ∈ X | Hn,m ∩G 6= ∅ = x ∈ X | Fn,m ∩G 6= ∅= x ∈ A1 | F (x) ∩G ∩B(yn, 2

−m) 6= ∅ ∪ x ∈ A2 | F (x) ∩G 6= ∅,

Page 405: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.8. INTEGRATION 393

thus Hwn,m(G) ∈ A.

3. We can find a measurable selector sn,m for Hn,m by Theorem 4.7.2, so we have to showthat sn,m(x) | n,m ∈ N is dense in F (x) for each x ∈ X. Let y ∈ F (x). Given ε > 0,select m with 2−m < ε/2; there exists yn with d(y, yn) < 2−m. Thus x ∈ Hw

n,m(B(yn, 2−m)),

and sn,m(x) is a member of the closure of B(yn, 2−m), which means d(y, sn,m(x)) < ε. Now

arrange sn,m(x) | n,m ∈ N as a sequence, then the assertion follows. a

This is a first application of measurable selections.

Example 4.7.4 Call a map h : X → B(Y ) for the Polish space Y hit-measurable iff h ismeasurable with respect to A and HG(B(Y )), where G is the set of all open sets in Y , seeExample 4.1.2. Thus h is hit-measurable iff x ∈ X | h(x) ∩ U 6= ∅ ∈ A for each open setU ⊆ Y . If h is image finite (i.e., h(x) is always non-empty and finite), then there exists asequence (fn)n∈N of measurable maps fn : X → Y such that h(x) = fn(x) | n ∈ N for eachx ∈ X. This is so because h : X → F(Y ) is measurable, hence Corollary 4.7.3 is applicable.,

Transition kernels into Polish spaces induce a measurable closed valued map, for which selec-tors exist.

Example 4.7.5 Let under the assumptions of Theorem 4.7.2 K : (X,A) (Y,B(Y )) bea transition kernel with K(x)(Y ) > 0 for all x ∈ X. Then there exists a measurable mapf : X → Y such that K(x)(U) > 0, whenever U is an open neighborhood of f(x).

In fact, F : x 7→ supp(K(x)) takes non-empty and closed values by Lemma 4.1.46. If G ⊆ Yis open, then

Fw(G) = x ∈ X | supp(K(x)) ∩G 6= ∅ = x ∈ X | K(x)(G) > 0 ∈ A.

Thus F has a measurable selector f by Theorem 4.7.2. The assertion now follows fromCorollary 4.1.47

Perceiving a stochastic relationK : (X,A) (Y,B(Y )) as a probabilistic model for transitionssuch that K(x)(B) is the probability for making a transition from x to B (with K(x)(Y ) ≤ 1),we may interpret the selection f as one possible deterministic version for a transition: thestate f(x) is possible, since f(x) ∈ supp(K(x)), which entails K(x)(U) > 0 for every openneighborhood U of f(x). There exists a sequence (fn)n∈N of measurable selectors for F suchthat fn(x) | n ∈ N is dense in F (x); this may be interpreted as a form of stochasticnondeterminism. ,

4.8 Integration

After having studied the structure of measurable sets under various conditions on the under-lying space with an occasional side glance at real-valued measurable functions, we will discussintegration now. This is a fundamental operation associated with measures. The integral ofa function with respect to a measure will be what you expect it to be, viz., for non-negativefunctions the area between the curve and the x-axis. This view will be confirmed later on,when Fubini’s Theorem will be available for computing measures in Cartesian products. For

Page 406: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

394 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

the time being, we build up the integral in a fairly straightforward way through an approxi-mation by step functions, obtaining a linear map with some favorable properties, for examplethe Lebesgue Dominated Convergence Theorem. All the necessary constructions are given inthis section, offering more than one occasion to exercise the well-known ε-δ-arguments, whichare necessary, but not particularly entertaining. But that’s life.

The second part of this section offers a complementary view — it starts from a positive linearmap with some additional continuity properties and develops a measure from it. This isDaniell’s approach, suggesting that measure and integral are really most of the time two sidesof the same coin. We show that this duality comes to life especially when we are dealingwith a compact metric space: Here the celebrated Riesz Representation Theorem gives abijection between probability measures on the Borel sets and normed positive linear functionson the continuous real-valued functions. We formulate and prove this theorem here; it shouldbe mentioned that this is not the most general version available, as with most other resultsdiscussed here (but probably there is no such thing as a most general version, since thedevelopment did branch out into wildly different directions).

This section will be fundamental for the discussions and results later in this chapter. Mostresults are formulated for finite or σ-finite measures, and usually no attempt has been madeto find the boundary delineating a development.

4.8.1 From Measure to Integral

We fix a measure space (X,A, µ). Denote for the moment by T (X,A) the set of all measurablestep functions, and by T+(X,A) the non-negative step functions; similarly, F+(X,A) are theT (X,A)non-negative measurable functions. Note that T (X,A) is a vector space under the usualoperations, and that it is a lattice under finite or countable pointwise suprema and infima.We know from Proposition 4.2.4 that we can approximate each bounded measurable functionby a sequence of step functions from below.F(X,A)

Define ∫X

n∑i=1

αi · χAi dµ :=

n∑i=1

αi · µ(Ai) (4.9)

as the integral with respect to µ for the step function∑n

i=1 αi · χAi ∈ T (X,A). Exercise 4.22tells us that the integral is well defined: if f, g ∈ T (X,A) with f = g, then∑

α∈Rα · µ(x ∈ X | f(x) = α) =

∑β∈R

β · µ(x ∈ X | g(x) = β).

Thus the definition (4.9) yields the same value for the integral. These are some elementaryproperties of the integral for step functions.

Lemma 4.8.1 Let f, g ∈ T (X,A) be step functions, α ∈ R. Then

1.∫X α · f dµ = α ·

∫X f dµ,

2.∫X(f + g) dµ =

∫X f dµ+

∫X g dµ,

Page 407: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.8. INTEGRATION 395

3. if f ≥ 0, then∫X f dµ ≥ 0, in particular, the map f 7→

∫X f dµ is monotone,

4.∫X χA dµ = µ(A) for A ∈ A,

5. |∫X f dµ| ≤

∫X |f | dµ.

Moreover the map A 7→∫A f dµ :=

∫X f · χA dµ is additive on A whenever f ∈ T+(X,A). a

We know from Proposition 4.2.4 that we can find for f ∈ F+(X,A) a sequence (fn)n∈N inT+(X,A) such that f1 ≤ f2 ≤ . . . and supn∈N fn = f . This observation is used for thedefinition of the integral for f . We define∫

Xf dµ := sup

∫Xg dµ | g ≤ f and g ∈ T+(X,A)

Note that the right hand side may be infinite; we will discuss this shortly.

The central observation is formulated in Levi’s Theorem:

Theorem 4.8.2 Let (fn)n∈N be an increasing sequence of functions in F+(X,A) with limitf , then the limit of

(∫X fn dµ

)n∈N exists and equals

∫X f dµ.

Levi’sTheorem

Proof 1. Because the integral is monotone in the integrand by Lemma 4.8.1, the limit

` := limn→∞

∫Xfn dµ

exists (possibly in R ∪ ∞), and we know from monotonicity that ` ≤∫X f dµ.

2. Let f = c > 0 be a constant, and let 0 < d < c. Then supn∈N d · χx∈X|fn(x)≥d = d, hencewe obtain ∫

Xf dµ ≥

∫Xfn dµ ≥

∫x∈X|fn(x)≥d

fn dµ ≥ d · µ(x ∈ X | fn(x) ≥ d)

for every n ∈ N, thus ∫Xf dµ ≥ d · µ(X).

Letting d approaching c, we see that∫Xf dµ ≥ lim

n→∞

∫Xfn dµ ≥ c · µ(X) =

∫Xf dµ.

This gives the desired equality.

3. If f = c · χA with A ∈ A, we restrict the measure space to (A,A ∩ A,µ), so the result istrue also for step functions based on one single set.

4. Let f =∑n

i=1 αi ·χAi be a step function, then we may assume that the sets A1, . . . , An aremutually disjoint. Consider fi := f · χAi = αi · χAi and apply the previous step to fi, takingadditivity from Lemma 4.8.1, part 2. into account.

5. Now consider the general case. Select step functions (gn)n∈N with gn ∈ T+(X,A) such thatgn ≤ fn and |

∫X fn dµ −

∫X gn dµ| < 1/n. We may and do assume that g1 ≤ g2 ≤ . . ., for

Page 408: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

396 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

we otherwise may pass to the step function hn := supg1, . . . , gn. Let 0 ≤ g ≤ f be a stepfunction, then limn→∞(gn ∧ g) = g, so that we obtain from the previous step∫

Xg dµ = lim

n→∞

∫Xgn ∧ g dµ ≤ lim

n→∞

∫Xgn dµ ≤ lim

n→∞

∫Xfn dµ

Because∫X g dµ may be chosen arbitrarily close to `, we finally obtain

limn→∞

∫Xfn dµ ≤

∫Xf dµ ≤ lim

n→∞

∫Xfn dµ,

which implies the assertion for arbitrary f ∈ F+(X,A). a

Since we can approximate each non-negative measurable function from below and from aboveby step functions (Proposition 4.2.4 and Exercise 4.7), we obtain from Levi’s Theorem forf ∈ F+(X,A) the representation

sup∫

Xg dµ | T+(X,A) 3 g ≤ f

=

∫Xf dµ = inf

∫Xg dµ | f ≤ g ∈ T+(X,A)

.

This strongly resembles — and generalizes — the familiar construction of the Riemann integralfor a continuous function f over a bounded interval by sandwiching it between lower and uppersums of step functions.

Compatibility of the integral with scalar multiplication and with addition is now an easyconsequence of Levi’s Theorem:

Corollary 4.8.3 Let a ≥ 0 and b ≥ 0 be non-negative real numbers, then∫Xa · f + b · g dµ = a ·

∫Xf dµ+ b ·

∫Xg dµ

for f, g ∈ F+(X,A).

Proof Let (fn)n∈N and (gn)n∈N be sequences of step functions which converge monotonicallyto f resp. g. Then (a ·fn+ b ·gn)n∈N is a sequence of step functions converging monotonicallyto a · f + b · g. Apply Levi’s Theorem 4.8.2 and the linearity of the integral on step functionsfrom Lemma 4.8.1 to obtain the assertion. a

Given an arbitrary f ∈ F(X,A), we can decompose f into a positive and a negative partf+ := f ∨ 0 resp. f− := (−f) ∨ 0, so that f = f+ − f− and |f | = f+ + f−.

A function f ∈ F(X,A) is called integrable (with respect to µ) iffIntegrable ∫X|f | dµ <∞,

in this case we set ∫Xf dµ :=

∫Xf+ dµ−

∫Xf− dµ.

In fact, because f+ ≤ f , we obtain from Lemma 4.8.1 that∫X f

+ dµ < ∞, similarly we seethat

∫X f− dµ < ∞. The integral is well defined, because if f = f1 − f2 with f1, f2 ≥ 0, we

Page 409: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.8. INTEGRATION 397

conclude f1 ≤ f ≤ |f |, hence∫X f1 dµ < ∞, and f2 ≤ |f |, so that

∫X f2 dµ < ∞, which

implies∫X f

+ dµ +∫X f2 dµ =

∫X f− dµ +

∫X f1 dµ by Corollary 4.8.3. Thus we obtain in

fact∫X f

+ dµ−∫X f− dµ =

∫X f1 dµ−

∫X f2 dµ.

This special case is also of interest: let A ∈ A, define for f integrable∫Af dµ :=

∫Xf · χA dµ

(note that |f · χA| ≤ |f |). We emphasize occasionally the integration variable by writing∫X f(x) dµ(x) instead of

∫X f dµ.

Collecting some useful and a.e. used properties, we state

Proposition 4.8.4 Let f, g ∈ F(X,A) be measurable functions, then

1. If f ≥µ 0, then∫X f dµ = 0 iff f =µ 0.

2. If f is integrable, and |g| ≤µ |f |, then g is integrable.

3. If f and g are integrable, then so are a ·f+b ·g for all a, b ∈ R, and∫X a · f + b · g dµ =

a ·∫X f dµ+ b ·

∫X g dµ.

4. If f , and g are integrable, and f ≤µ g, then∫X g dµ ≤

∫X f dµ.

5. If f is integrable, then |∫X f dµ| ≤

∫X |f | dµ.

a

We now state and prove some statements which relate sequences of functions to their integrals.The first one is traditionally called Fatou’s Lemma.

Proposition 4.8.5 Let (fn)n∈N be a sequence in F+(X,A). ThenFatou’sLemma

∫X

lim infn→∞

fn dµ ≤ lim infn→∞

∫Xfn dµ

Proof Since (infm≥n fm)n∈N is an increasing sequence of measurable functions in F+(X,A),we obtain from Levi’s Theorem 4.8.2∫

Xf dµ = lim

n→∞

∫X

infm≥n

fm dµ = supn∈N

∫X

infm≥n

fm dµ.

Because we plainly have by monotonicity∫X infm≥n fm dµ ≤ infm≥n

∫X fm dµ, the assertion

follows. a

The Lebesgue Dominated Convergence Theorem is a very important and eagerly used tool; itcan be derived now easily from Fatou’s Lemma.

Theorem 4.8.6 Let (fn)n∈N be a sequence of measurable functions with fna.e.−→ f for some

LebesgueDominated

Conver-gence

Theorem

measurable function f , and |fn| ≤µ g for all n ∈ N and an integrable function g. Then fnand f are integrable, and

limn→∞

∫Xfn dµ =

∫Xf dµ and lim

n→∞

∫X|fn − f | dµ = 0.

Page 410: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

398 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof 1. It is no loss of generality to assume that fn → f and ∀n ∈ N : fn ≤ g pointwise(otherwise modify the fn, f and g on a set of µ-measure zero). Because |fn| ≤ g, we concludefrom Proposition 4.8.4 that fn is integrable, and since f ≤ g holds as well, we infer that f isintegrable as well.

2. Put gn := |f | + g − |fn − f |, then gn ≥ 0, and gn is integrable for all n ∈ N. We obtainfrom Fatou’s Lemma∫

X|f |+ g dµ =

∫X

lim infn→∞

gn dµ

≤ lim infn→∞

∫Xgn dµ

=

∫X|f |+ g dµ− lim sup

n→∞

∫X|fn − f | dµ.

Hence we obtain lim supn→∞∫X |fn − f | dµ = 0, thus limn→∞

∫X |fn − f | dµ = 0.

3. We finally note that∣∣∫Xfn dµ−

∫Xf dµ

∣∣ =∣∣∫X

(fn − f) dµ∣∣ ≤ ∫

X|fn − f | dµ,

which completes the proof. a

The following is an immediate consequence of the Lebesgue Theorem. We know from Calculusthat interchanging integration and infinite summation may be dangerous, so we gain a goodcriterion here permitting this operation.

Corollary 4.8.7 Let (fn)n∈N be a sequence of measurable functions, g integrable, such that|∑n

k=1 fk| ≤µ g for all n ∈ N. Then all fn as well as f :=∑

n∈N fn are integrable, and∫X f dµ =

∑n∈N

∫X fn dµ. a

Moreover, we conclude that each nonnegative measurable function begets a finite measure.This observation will be fruitful for the discussion of Lp-spaces in Section 4.11.

Corollary 4.8.8 Let f ≥µ 0 be an integrable function, then A 7→∫A f dµ defines a finite

measure on A.

Proof All the properties of a measure are immediate, σ-additivity follows from Corol-lary 4.8.7. a

Integration with respect to an image measure is also available right away. It yields the fairlyhelpful change of variables formula for image measures.

Corollary 4.8.9 Let (Y,B) a measurable space and g : X → Y be A-B-measurable. Thenh ∈ F(Y,B) is M(g)(µ) integrable iff g h is µ-integrable, and in this case we have

Change ofvariables ∫

Yh dM(g)(µ) =

∫Xh g dµ. (4.10)

Proof We show first that formula (4.10) is true for step functions. In fact, if h = χB with ameasurable set B, then we obtain from the definition∫

YχB dM(g)(µ) = M(g)(µ)(B) = µ(g−1

[B]) =

∫XχB g dµ

Page 411: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.8. INTEGRATION 399

(since χB(g(x)) = 1 iff x ∈ g−1[B]). This observation extends by linearity to step functions,

so that we obtain for h =∑n

i=1 bi · χBi∫Yh dM(g)(µ) =

n∑i=1

bi ·∫XχBi g dµ =

∫Xh g dµ

Thus the assertion now follows from Levi’s Theorem 4.8.2. a

The reader is probably familiar with the change of variables formula in classical calculus.It deals with k-dimensional Lebesgue measure λk, and a differentiable and injective mapT : V → W from an open set V ⊆ Rk to a bounded set W ⊆ Rk. T is assumed to have acontinuous inverse. Then the integral of a measurable and bounded function f : T

[V]→ R

can be expressed in terms of the integral over V of f T and the Jacobian JT of T . To bespecific ∫

T[V] f dλk =

∫V

(f T ) · |JT | dλk.

Recall that the Jacobian JT of T is the determinant of the partial derivatives of T , i.e.,

JT (x) = det((∂Ti(x)

∂xj)).

This representation can be derived from the representation for the integral with respect to theimage measure from Corollary 4.8.9 and from the Radon-Nikodym Theorem 4.11.26 througha somewhat lengthy application of results from fairly elementary linear algebra. We do notwant to develop this apparatus in the present presentation, we will, however, provide a glimpseat the one-dimensional situation in Proposition 4.11.29. The reader is referred for the generalcase rather to Rudin’s exposition [Rud74, p. 181 - 188] or to Stromberg’s more elementarydiscussion in [Str81, p. 385 - 392]; if you read German, Elstrodt’s derivation [Els99, § V.4]should not be missed.

4.8.2 The Daniell Integral and Riesz’s Representation Theorem

The previous section developed the integral from a finite or σ-finite measure; the result was alinear functional on a subspace of measurable functions, which will be investigated in greaterdetail later on. This section will demonstrate that it is possible to obtain a measure froma linear functional on a well behaved space of functions. This approach was proposed byP. J. Daniell ca. 1920, it is called in his honor the Daniell integral. It is useful when a linearfunctional is given, and one wants to show that this functional is actually defined by a measure,which then permits putting the machinery of measure theory into action. We will encountersuch a situation, e.g., when studying linear functionals on spaces of integrable functions.Specifically, we derive the Riesz Representation Theorem, which shows that there is a one-to-one correspondence between probability measures and normed positive linear functionalson the vector lattice of continuous real valued functions on a compact metric space.

Let us fix a set X throughout. We will also fix a set F of functions X → R which is assumedto be a vector space (as always, over the reals) with a special property.

Definition 4.8.10 A vector space F ⊆ RX is called a vector lattice iff |f | ∈ F wheneverf ∈ F .

Page 412: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

400 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Now fix the vector lattice F . Each vector lattice is indeed a lattice: define

f ∨ g := (|f − g|+ f + g)/2,

f ∧ g := −((−f) ∨ (−g)

)f ≤ g ⇔ f ∨ g = g

⇔ f ∧ g = f

Thus F contains with f and g also f ∧ g and f ∨ g, and it is easy to see that ≤ defines apartial order on F such that supf, g = f ∨ g and inff, g = f ∧ g. Note that we havemaxα, β = (|α − β| + α + β)/2 for α, β ∈ R, thus we conclude that f ≤ g iff f(x) ≤ g(x)for all x ∈ R.

We will find these properties helpful; they will be used silently below.

Lemma 4.8.11 If 0 ≤ α ≤ β ∈ R and f ∈ F with f ≥ 0, then α · f ≤ β · f . If f, g ∈ F withf ≤ g, then f + h ≤ g + h for all h ∈ F . Also, f ∧ g + f ∨ g = f + g.

Proof Because f ≥ 0, we obtain from α ≤ β

2 ·((α · f) ∨ (β · f)

)= (|α− β|+ α+ β) · f = 2 · (α ∨ β) · f = 2 · β · f.

This establishes the first claim. The second one follows from

2 ·((f + h) ∨ (g + h)

)= |f − g|+ f + g + 2 · h = 2 · (g + h).

The third one is established through the observation that it holds pointwise, and from theobservation that f ≤ g iff f(x) ≤ g(x) for all x ∈ X. a

We assume that 1 ∈ F , and that a function L : F → R is given, which has these proper-ties:

• L(α · f + β · g) = α · L(f) + β · L(g), so that L is linear,

• if f ≥ 0, then L(f) ≥ 0, so that L is positive,

• L(1) = 1, so that L is normed,

• If (fn)n∈N is a sequence in F which decreases monotonically to 0, then limn→∞ L(fn) =0, so that L is continuous from above at 0.

These are some immediate consequences from the properties of L.

Lemma 4.8.12 If f, g ∈ F , then L(f ∧ g) +L(f ∨ g) = L(f) +L(g). If (fn)n∈N and (gn)n∈Nare increasing sequences of non-negative functions in F with limn→∞ fn ≤ limn→∞ gn, thenlimn→∞ L(fn) ≤ limn→∞ L(gn).

Proof The first property follows from the linearity of L. For the second one, we observe thatlimk→∞(fn∧gk) = fn ∈ F , the latter sequence being increasing. Consequently, we have

L(fn) ≤ limk→∞

L(fn ∧ gk) ≤ limk→∞

L(gk)

for all n ∈ N, which implies the assertion. a

Page 413: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.8. INTEGRATION 401

F determines a σ-algebra A on X, viz., the smallest σ-algebra which renders each f ∈ Fmeasurable. We will show now that L determines a unique probability measure on A suchthat

L(f) =

∫Xf dµ

holds for all f ∈ F .

This will be done in a sequence of steps. A brief outline looks like this: We will first show Outlinethat L can be extended to the set L+ of all bounded monotone limits from the non-negativeelements of F , and that the extension respects monotone limits. From L+ we extract viaindicator functions an algebra of sets, and from the extension to L an outer measure. Thiswill then turn out to yield the desired probability.

Define

L+ := f : X → R | f is bounded, there exists 0 ≤ fn ∈ F increasing with f = limn→∞

fn.

Define L(f) := limn→∞ L(fn) for f ∈ L+, whenever f = limn→∞ fn for the increasing se-quence (fn)n∈N ⊆ F . Then we obtain from Lemma 4.8.12 that this extension L on L+ is welldefined, and it is clear that L(f) ≥ 0, and that L(α · f + β · g) = α ·L(f) + β ·L(g), wheneverf, g ∈ L+ and α, β ∈ R+. We see also that f, g ∈ L+ implies that f ∧ g, f ∨ g ∈ L+ withL(f ∧ g) + L(f ∨ g) = L(f) + L(g). It turns out that L also respects the limits of increasingsequences.

Lemma 4.8.13 Let (fn)n∈N ⊆ L+ be an increasing and uniformly bounded sequence, thenL(limn→∞ fn) = limn→∞ L(fn).

Proof Because fn ∈ L+, we know that there exists for each n ∈ N an increasing sequence(fm,n)m∈N of elements fm,n ∈ F such that fn = limm→∞ fm,n. Define

gm := supn≤m

fm,n.

Then (gm)m∈N is an increasing sequence in F with fm,n ≤ gm, and gm ≤ f1∨f2∨. . .∨fm = fm,so that gm is sandwiched between fm,n and fm for all m ∈ N and n ≤ m. This yieldsL(fm,n) ≤ L(gm) ≤ L(fm) for these n,m. Thus limn→∞ fn = limm→∞ gm, and hence

limn→∞

L(fn) = limm→∞

L(gm) = L( limm→∞

gm) = L( limn→∞

fn).

Thus we have shown that limn→∞ fn can be obtained as the limit of an increasing sequenceof functions from F ; because (fn)n∈N is uniformly bounded, this limit is an element of L+.a

Now define

G := G ⊆ X | χG ∈ L+,µ(G) := L(χG) for G ∈ G.

Then G is closed under finite intersections and finite unions by the remarks made beforeLemma 4.8.13. Moreover, G is closed under countable unions with µ(

⋃n∈NGn) = limn→∞ µ(Gn),

Page 414: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

402 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

if (Gn)n∈N is an increasing sequence in G. Also µ(X) = 1. Now define, as in the Caratheodoryapproach in Section 1.6.3

µ∗(A) := infµ(G) | G ∈ G, A ⊆ G,B := B ⊆ X | µ∗(B) + µ∗(X \B) = 1.

We obtain from the Caratheodory extension process (Theorem 1.6.29)

Proposition 4.8.14 B is a σ-algebra, and µ∗ is countably additive on B.a

Put µ(B) := µ∗(B) for B ∈ B, then (X,B, µ) is a measure space, and µ is a probabilitymeasure on (X,B).

In order to carry out the programme sketched above, we need a σ-algebra. We have onone hand the σ-algebra A generated by F , and on the other hand B, gleaned from theCaratheodory extension. It is not immediately clear how these σ-algebras are related to eachother. And then we also have G as an intermediate family of sets, obtained from L+. Thisdiagram shows the objects we will to discuss, together with a short hand indication of therespective relationships:

A bird’seye view

F

σ(·) // A

=

L+χ

// G ⊆Caratheodory

//

σ(·)=

66

B

We investigate the relationship of A and G first.

Lemma 4.8.15 A = σ(G).

Proof 1. Because A is the smallest σ-algebra rendering all elements of F measurable, and be-cause each element of L+ is the limit of a sequence of elements of F , we obtainA-measurabilityfor each element of L+. Thus G ⊆ A.

2. Let f ∈ L+ and c ∈ R+, then fn := 1 ∧ n · supf − c, 0 ∈ L+, and χx∈X|f(x)>c =limn→∞ fn. This is a monotone limit. Hence x ∈ X | f(x) > c ∈ G, thus in particular eachelement of F is σ(G)-measurable. This implies that A ⊆ σ(G) holds. a

The relationship between B and G is a bit more difficult to establish.

Lemma 4.8.16 G ⊆ B.

Proof We have to show that µ∗(G) + µ∗(X \ G) = 1 for all G ∈ G. Fix G ∈ G. We obtainfrom additivity that µ(G) + µ(H) = µ(G ∩H) + µ(G ∪H) ≥ µ(X) = 1 holds for any H ∈ Gwith X \ G ⊆ H, so that µ∗(G) + µ∗(X \ G) ≤ 1 remains to be shown. The idea is toIdeaapproximate χG for G ∈ G by a suitable sequence from F , and to manipulate this sequenceaccordingly.

Because G ∈ G, there exists an increasing sequence (fn)n∈N of elements in F such thatχG = supn∈N fn, consequently, χX\G = infn∈N(1 − fn). Now let n ∈ N, and 0 < c ≤ 1, then

Page 415: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.8. INTEGRATION 403

X \ G ⊆ Un,c := x ∈ X | 1 − fn(x) > c with Un,c ∈ G. Because χUn,c ≤ (1 − fn)/c, weobtain µ∗(X \G) ≤ L(1− fn)/c; this inequality holds for all c and all n ∈ N. Letting c→ 1and n→∞, this yields µ∗(X \G) ≤ 1− µ∗(G).

Consequently, µ∗(G) + µ∗(X \G) = 1 for all G ∈ G, which establishes the claim. a

This yields the desired relationship of A, the σ-algebra generated by the functions in F , andB, the σ-algebra obtained from the extension process.

Corollary 4.8.17 A ⊆ B, and each element of L+ is B-measurable.

Proof We have seen that A = σ(G) and that G ⊆ B, so the first assertion follows fromProposition 4.8.14. The second assertion is immediate from the first one. a

Because µ is countably additive, hence a probability measure on B, and because each elementof F is B-measurable, the integral

∫X f dµ is defined, and we are done.

Theorem 4.8.18 Let F be a vector lattice of functions X → R with 1 ∈ F , L : F → Rbe a linear and monotone functional on F such that L(1) = 1, and L(fn) → 0, whenever(fn)n∈N ⊆ F decreases to 0. Then there exists a unique probability measure µ on the σ-algebra A generated by F such that

L(f) =

∫Xf dµ

holds for all f ∈ F .

Proof Let G and B be constructed as above.

Existence: Because A ⊆ B, we may restrict µ to A, obtaining a probability measure. Fixf ∈ F , then f is B-measurable, hence

∫X f dµ is defined. Assume first that 0 ≤ f ≤ 1,

hence f ∈ L+. We can write f = limn→∞ fn with step functions fn, the contributingsets being members of G. Hence L(fn) =

∫X fn dµ, since L(χG) = µ(G) by construction.

Consequently, we obtain from Lemma 4.8.13 and Lebesgue’s Dominated ConvergenceTheorem 4.8.6

L(f) = L( limn→∞

fn) = limn→∞

L(fn) = limn→∞

∫Xfn dµ =

∫X

limn→∞

fn dµ =

∫Xf dµ.

This implies the assertion also for bounded f ∈ F with f ≥ 0. If 0 ≤ f is unbounded,write f = supn∈N(f∧n) and apply Levi’s Theorem 4.8.2. In the general case, decomposef = f+ − f− with f+ := f ∨ 0 and f− := (−f) ∨ 0, and apply the foregoing.

Uniqueness: Assume that there exists a probability measure ν on A with L(f) =∫X f dν for

all f ∈ F , then the construction shows that µ(G) = L(χG) = ν(G) for all G ∈ G. SinceG is closed under finite intersections, and since A = σ(G), we conclude that ν(A) = µ(A)for all A ∈ A.

This establishes the claim. a

We obtain as a consequence the famous Riesz Representation Theorem, which we state andformulate for the metric case. Recall from Section 3.6.3 that C(X) is the linear space ofall bounded continuous functions X → R on a topological X. We state the result first for

Page 416: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

404 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

metric spaces and for bounded continuous functions, specializing the result subsequently tothe compact metric case.

The reason for not formulating the Riesz Representation Theorem immediately for generaltopological spaces is that Theorem 4.8.18 works with the σ-algebra generated — in thiscase — by C(X); this is in general the σ-algebra of Baire sets, which in turn may be properlycontained in the Borel sets. Thus one obtains in the general case a Baire measure which then

Ba(X) vs.B(X)

would have to be extended uniquely to a Borel measure. This is discussed in detail in [Bog07,Sec. 7.3].

Corollary 4.8.19 Let X be a metric space, and let L : C(X)→ R be a positive linear functionwith limn→∞ L(fn) = 0 for each sequence (fn)n∈N ⊆ C(X) which decreases monotonically to0. Then there exists a unique finite Borel measure µ such that

L(f) =

∫Xf dµ

holds for all f ∈ C(X).

Proof It is clear that C(X) is a vector lattice with 1 ∈ C(X). We may and do assume thatL(1) = 1. The result follows immediately from Theorem 4.8.18 now. a

If we take a compact metric space, then each continuous map X → R is bounded. We showthat the assumption on L’s continuity follows from compactness (the latter is usually referredto as Dini’s Theorem, see Proposition 3.6.41).

Theorem 4.8.20 Let X be a compact metric space. Given a positive linear functional L :C(X)→ R, there exists a unique finite Borel measure µ such that

L(f) =

∫Xf dµ

holds for all f ∈ C(X).

Riesz Rep-resentationTheorem

Proof It is clear that C(X) is a vector lattice which contains 1. Again, we assume thatL(1) = 1 holds. In order to apply Theorem 4.8.18, we have to show that limn→∞ L(fn) = 0,

Proof obli-gation

whenever (fn)n∈N ⊆ C(X) decreases monotonically to 0. But since X is compact, we claimthat supx∈X fn(x)→ 0, as n→∞.

This is so because x ∈ X | fn ≥ c is a family of closed sets with empty intersection forany c > 0, so we find by compactness a finite subfamily with empty intersection. Hence theassumption that supx∈X fn(x) ≥ c > 0 for all n ∈ N would lead to a contradiction (note thatthis is a variation of the argument in the proof of Proposition 3.6.41). Thus the assertionfollows from Theorem 4.8.18. a

Because f 7→∫X f dµ defines for each Borel measure µ a positive linear functional on C(X),

and because a measure on a metric space is uniquely determined by its integral on the boundedcontinuous functions, we obtain:

Corollary 4.8.21 For a compact metric space X there is a bijection between positive linearfunctionals on C(X) and finite Borel measures. a

Page 417: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 405

A typical scenario for the application of the Riesz Theorem runs like this: one starts witha probability measure on a metric space X. This space can be embedded into a compactmetric space X ′, say, and one knows that the integral on the bounded continuous functionson X extends to a positive linear map on the continuous functions on X ′. Then the RieszRepresentation Theorem kicks in and gives a probability measure on X ′. We will see asituation like this when investigating the weak topology on the space of all finite measures ona Polish space in Section 4.10.

4.9 Product Measures

As a first application of integration we show that the product of two finite measures yields ameasure again. This will lead to the Fubini’s Theorem on product integration, which evaluatesa product integrable function on a product along its vertical or its horizontal cuts (in thissense it may be compared to a line sweeping algorithm — you traverse the Cartesian product,and in each instance you measure the cut).

We apply this to infinite products, first with a countable index set, then for an arbitraryone. Infinite products are a special case of projective systems, which may be describedas sequences of probabilities which are related through projections. We show that such aprojective system has a projective limit,i.e., a measure on the set of all sequences such thatthe projective system proper is obtained through a projection. This construction is, however,only feasible in a Polish space, since here a compactness argument is available which ascertainsthat the measure we are looking for is σ-additive.

A small step leads to projective limits for stochastic relations. We demonstrate an applicationfor projective limits through the interpretation for the logic CSL. An interpretation for gamelogic, i.e., a modal logic the modalities re given by games, is discussed as well, since now alltools for this quest are provided.

Fix for the time being two finite measure spaces (X,A, µ) and (Y,B, ν). The Cartesianproduct X × Y is endowed with the product σ-algebra A⊗B which is the smallest σ-algebracontaining all measurable rectangles A×B with A ∈ A and B ∈ B, see Section 4.1.1.

Take Q ∈ A ⊗ B, then we know from Lemma 4.1.8 that Qx and Qy are measurable sets forx ∈ X and y ∈ Y . We need measurability urgently here because otherwise functions likex 7→ ν(Qx) and y 7→ µ(Qy) would not be defined. In fact, we can say more about thesefunctions.

Lemma 4.9.1 Let Q ∈ A ⊗ B be a measurable set, then both ϕ(x) := ν(Qx) and ψ(y) :=µ(Qy) define bounded measurable functions with∫

Xν(Qx) dµ(x) =

∫Yµ(Qy) dν(y).

Proof We use the same argument as in the proof of Lemma 4.1.8 for establishing thatboth ϕ and ψ are measurable functions, noting that ν((A × B)x) = χA(x) · ν(B), similarly,µ((A×B)y) = χB(y) · µ(A). In the next step the set of all Q ∈ A⊗B is shown to satisfy theassumptions of the π-λ-Theorem 1.6.30.

Page 418: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

406 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

In the same way, the equality of the integrals is finally established, noting that∫Xν((A×B)x) dµ(x) = µ(A) · ν(B) =

∫Xµ((A×B)y) dν(y).

a

Thus it does not matter which cut to take — integrating with the other measure will yield inany case the same result. Visualize this in the Cartesian plane. You have a figure F ⊆ R2,say, for simplicity, F is compact. For each x ∈ R, Fx is a vertical line, probably broken, thelength `(x) of which you can determine. Then

∫R `(x) dx yields the area A of F . But you may

obtain A also by measuring the — also probably broken — horizontal line F y with lengthr(y) and integrating

∫R r(y) dy.

Lemma 4.9.1 yields without much ado

Theorem 4.9.2 Given the finite measure spaces (X,A, µ) and (Y,B, ν), there exists a uniquefinite measure µ ⊗ ν on A ⊗ B such that (µ ⊗ ν)(A × B) = µ(A) · ν(B) for A ∈ A, B ∈ B.Moreover,

(µ⊗ ν)(Q) =

∫Xν(Qx) dµ(x) =

∫Yµ(Qy) dν(y)

holds for all Q ∈ A⊗ B.

Proof 1. We establish the existence of µ ⊗ ν by an appeal to Lemma 4.9.1 and to theproperties of the integral according to Proposition 4.8.4. Define

(µ⊗ ν)(Q) :=

∫Xν(Qx) dµ(x),

then this defines a finite measure on A⊗ B:

Monotonicity: Let Q ⊆ Q′, then Qx ⊆ Q′x for all x ∈ X, hence∫X ν(Qx) dµ(x) ≤∫

X ν(Q′x) dµ(x). Thus µ⊗ ν is monotone.

Additivity: If Q and Q′ are disjoint, then Qx ∩ Q′x = (Q ∩ Q′)x = ∅ for all x ∈ X. Thusµ⊗ ν is additive.

σ-additivity: Let (Qn)n∈N be a sequence of disjoint measurable sets, then (Qn,x)n∈N is dis-joint for all x ∈ X, and∫

Xν(⋃n∈N

Qn,x) dµ(x) =

∫X

∑n∈N

ν(Qn,x) dµ(x) =∑n∈N

∫Xν(Qn,x) dµ(x)

by Corollary 4.8.7. Thus µ⊗ ν is σ-additive.

2. It remains to establish uniqueness. Here we repeat essentially the argumentation fromLemma 1.6.31 on page 62. Suppose that ρ is a finite measure on A ⊗ B with ρ(A × B) =µ(A) · ν(B) for all A ∈ A and all B ∈ B. Then

G := Q ∈ A⊗ B | ρ(Q) = (µ⊗ ν)(Q)

contains the generator A × B | A ∈ A, B ∈ B of A ⊗ B, which is closed under finiteintersections. Because both ρ and µ ⊗ ν are measures, G is closed under countable disjoint

Page 419: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 407

unions, because both contenders are finite, G is also closed under complementation. Theπ-λ-Theorem 1.6.30 shows that G = A⊗ B. Thus µ⊗ ν is uniquely determined. a

Theorem 4.9.2 holds also for σ-finite measures. In fact, assume that the contributing measurespaces are σ-finite, and let (Xn)n∈N resp. (Yn)n∈N be increasing sequences in A resp. B suchthat µ(Xn) <∞ and ν(Yn) <∞ for all n ∈ N, and

⋃n∈NXn = X and

⋃n∈N Yn = Y . Localize

µ and ν to Xn resp. Yn by defining µn(A) := µ(A∩Xn), similarly, νn(B) := ν(B ∩ Yn); sincethese measures are finite, we can extend them uniquely to a measure µn⊗νn on A⊗B. Since⋃n∈NXn × Yn = X × Y with the increasing sequence (Xn × Yn)n∈N, we set

(µ⊗ ν)(Q) := supn∈N

(µn ⊗ νn)(Q).

Then µ ⊗ ν is a σ-finite measure on A ⊗ B. Now assume that we have another σ-finitemeasure ρ on A ⊗ B with ρ(A × B) = µ(A) · ν(B) for all A ∈ A and B ∈ B. Defineρn(Q) := ρ(Q∩ (Xn× Yn)), hence ρn = µn⊗ νn by uniqueness of the extension to µn and νn,so that we obtain

ρ(Q) = supn∈N

ρn(Q) = supn∈N

(µn ⊗ νn)(Q) = (µ⊗ ν)(Q)

for all Q ∈ A⊗ B. Thus we have shown

Corollary 4.9.3 Given two σ-finite measure spaces (X,A, µ) and (Y,B, ν), there exists aunique σ-finite measure µ⊗ ν on A⊗ B such that (µ⊗ ν)(A×B) = µ(A) · ν(B). We have

(µ⊗ ν)(Q) =

∫Xν(Qx) dµ(x) =

∫Yµ(Qy) dν(y)

a

The construction of the product measure has been done here through integration of cuts.An alternative would have been the canonical approach. It would have investigated the map〈A,B〉 7→ µ(A) · ν(B) on the set of all rectangles, and then put the extension machinerydeveloped through the Caratheodory approach into action. It is a matter of taste whichapproach to prefer. —

The following example displays a slight generalization (a finite measure is but a constanttransition kernel).

Example 4.9.4 Let K : (X,A) (Y,B) a transition kernel (see Definition 4.1.9) such thatthe map x 7→ K(x)(Y ) is integrable with respect to the finite measure µ. Then

(µ⊗K)(Q) :=

∫XK(x)(Qx) dµ(x)

defines a finite measure on (X×Y,A⊗B). The π-λ-Theorem 1.6.30 tell us that this measure isuniquely determined by the condition (µ⊗K)(A×B) =

∫AK(x)(B) dµ(x) for A ∈ A, B ∈ B.

Interpret in a probabilistic setting K(x)(B) as the probability that an input x ∈ X yieldsan output in B ∈ B, and assume that µ gives the initial probability with which the systemstarts, then µ⊗K gives the probability of all pairings, i.e., (µ⊗K)(Q) is the probability thata pair 〈x, y〉 consisting of an input value x ∈ X and an output value y ∈ Y will be a memberof Q ∈ A⊗ B. ,

Page 420: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

408 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

This may be further extended, replacing the measure on K’s domain by a transition kernelas well.

Example 4.9.5 Consider the scenario of Example 4.9.4 again, but take a third measurablespace (Z, C) with a transition kernel L : (Z, C) (X,A) into account; assume furthermorethat x 7→ K(x)(Y ) is integrable for each L(z). Then L(z) ⊗ K defines a finite measure on(X × Y,A ⊗ B) for each z ∈ Z according to Example 4.9.4. We claim that this definesa transition kernel (Z, C) (X × Y,A ⊗ B). For this to be true, we have to show thatz 7→

∫X K(x)(Qx) dL(z)(x) is measurable for each Q ∈ A ⊗ B. This is a typical application

for the principle of good sets through the π-λ-Theorem.

In fact, consider

Q := Q ∈ A⊗ B | the assertion is true for Q.

Then Q is closed under complementation. It is also closed under countable disjoint unions byCorollary 4.8.7. If Q = A × B is a measurable rectangle, we have

∫X K(x)(Qx) dL(z)(x) =∫

AK(x)(B) dL(z)(x). Then Exercise 4.14 shows that this is a measurable function Z → R.Thus Q contains all measurable rectangles, so Q = A ⊗ B by the π-λ-Theorem 1.6.30. Thisestablishes measurability of z 7→

∫X K(x)(Qx) dL(z)(x) and shows that it defines a transition

kernel. ,

As a slight modification, the next example shows the composition of transition kernels, usuallycalled convolution.

Example 4.9.6 Let K : (X,A) (Y,B) and L : (Y,B) (Z, C) be transition kernels,and assume that the map y 7→ L(y)(Z) is integrable with respect to measures K(x) for anarbitrary x ∈ X. Define for x ∈ X and C ∈ C

(L ∗K)(x)(C) :=

∫XL(y)(C) dK(x)(y).

Then L ∗ K : (X,A) (Z, C) is a transition kernel. In fact, (L ∗ K)(x) is for x ∈ Xfixed a finite measure on C according to Corollary 4.8.7. From Exercise 4.14 we infer thatx 7→

∫X L(y)(C) dK(x)(y) is a measurable function, since y 7→ L(y)(C) is measurable for all

C ∈ C.

Because transition kernels are the Kleisli morphisms for the endofunctor M on the category ofmeasurable spaces Example 2.4.8, it is not difficult to see that this defines Kleisli composition;in particular it follows that this composition is associative. ,

Example 4.9.7 Let f ∈ F+(X,A), then we know that “the area under the graph”, viz.,

C≤(f) := 〈x, r〉 | x ∈ X, 0 ≤ r ≤ f(x)

is a member of A ⊗ B(R). This was shown in Corollary 4.2.5. Then Corollary 4.9.3 tells usthat

(µ⊗ λ)(C≤(f)) =

∫Xλ((C≤(f))x

)dµ(x),

where λ is Lebesgue measure on B(R). Because

λ((C≤(f))x

)= λ(r | 0 ≤ r ≤ f(x)) = f(x),

Page 421: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 409

we obtain

(µ⊗ λ)(C≤(f)) =

∫Xf dµ.

On the other hand,

(µ⊗ λ)(C≤(f)) =

∫R+

µ((C≤(f)r

)dλ(r),

and this gives the integration formula∫Xf dµ =

∫ ∞0

µ(x ∈ X | f(x) ≥ r) dr. (4.11)

In this way, the integral of a non-negative function may be interpreted as measuring the areaunder its graph. ,

With a similar technique, we show that µ 7→∫ 1

0 µ(Br) dr defines a measurable map forB ∈ A ⊗ B([0, 1]); this may be perceived as the average evaluation of a set B ⊆ X × [0, 1].Moreover the set of all these evaluations is a measurable subset of S(X,A)× [0, 1].

Example 4.9.8 Let (X,A) be a measurable space, then

ΛB(µ) :=

∫ 1

0µ(Br) dr

defines a ℘℘℘(X,A)-B(R)-measurable map on S(X,A), whenever B ∈ A⊗B(R). This is estab-lished through the principle of good sets: put

G := B ∈ A⊗ B([0, 1]) | ΛB is ℘℘℘(X,A)− B([0, 1]) measurable.

Then G contains all sets A×D with A ∈ A, B ∈ B([0, 1]). This is so because

µ((A×D)r

)=

µ(A), if r ∈ D,0, otherwise.

Hence ΛA×D(µ) =∫D µ(A) dλ = µ(A) · λ(D). This is certainly measurable. Thus G contains

the generator A ×D | A ∈ A, D ∈ B([0, 1]) of A ⊗ B([0, 1]), which in turn is closed underfinite intersections. It is clear that G is closed under complementation and under countabledisjoint unions, the latter by Corollary 4.8.7. Hence we obtain from the π-λ-Theorem 1.6.30that G = A⊗ B([0, 1]).

We obtain from Corollary 4.2.5 that the set

〈µ, q〉 ∈ S(X,A)× [0, 1] |∫ 1

0µ(Br) dr 1 q

is a member of ℘℘℘(X,A)⊗ B([0, 1]), whenever B ⊆ X × [0, 1] is a member of A⊗ B([0, 1]). ,

Page 422: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

410 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

4.9.1 Fubini’s Theorem

In order to discuss integration with respect to a product measure, we introduce the cuts ofa function f : X × Y → R, defining fx := λy.f(x, y) and fy := λx.f(x, y). Thus we havef(x, y) = fx(y) = fy(x), the first equality resembling currying.

For the discussion to follow, we will admit also the values −∞,+∞ as function values. SoR, Admit+∞,−∞

define R := R ∪ −∞,+∞, and let B ⊆ R be a Borel set iff B ∩ R ∈ B(R). Measurabilityof functions extends accordingly: if f : X → R is measurable, then in particular x ∈ X |f(x) ∈ R ∈ A, and the set of values on which f takes the values +∞ or −∞ is a memberof A. Denote by F(X,A) the set of measurable functions with values in R, and by F+(X,A)those which take non-negative values. The integral

∫X f dµ and integrability is defined in the

F(X,A) same way as above for f ∈ F+(X,A). Then it is clear that f ∈ F+(X,A) is integrable ifff · χx∈X|f(x)∈R is integrable and µ(x ∈ X | f(x) =∞) = 0.

With this in mind, we tackle the integration of a measurable function f : X × Y → R for thefinite measure spaces (X,A, µ) and (Y,B, ν).

Proposition 4.9.9 Let f ∈ F+(X × Y,A⊗ B), then

1. λx.∫Y fx dν and λy.

∫X f

y dµ are measurable functions X → R resp. Y → R.

2. We have ∫X×Y

f dµ⊗ ν =

∫X

(∫Yfx dν

)dµ(x) =

∫Y

(∫Xfy dµ

)dν(y).

Proof 0. The proof is a bit longish, but, after all, there is a lot to show. It is an applicationof the machinery developed so far, and shows that it is well oiled. We start off by establishing

Line of at-tack

the claim for step functions, then we go on to apply Levi’s Theorem for settling the generalcase.

1. Let f =∑n

i=1 ai · χQi be a step function with ai ≥ 0 and Qi ∈ A ⊗ B for i = 1, . . . , n.Then ∫

Yfx dν =

n∑i=1

ai · ν(Qi,x).

This is a measurable function X → R by Lemma 4.9.1. We obtain∫X×Y

f dµ⊗ ν =

n∑i=1

ai · (µ⊗ ν)(Qi)

=

n∑i=1

ai ·∫Xν(Qi,x) dµ(x)

=

∫X

n∑i=1

ai · ν(Qi,x) dµ(x)

=

∫X

(∫Yfx dν

)dµ(x)

Page 423: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 411

Interchanging the roles of µ and ν, we obtain the representation of λy.∫X×Y f dµ⊗ ν in terms

of λy.∫X f

y dµ and ν. Thus the assertion is true for step functions.

2. In the general case we know that we can find an increasing sequence (fn)n∈N of stepfunctions with f = supn∈N fn. Given x ∈ X, we infer that fx = supn∈N fx,n, so that∫

Yfx dν = sup

n∈N

∫Xfn,x dν

by Levi’s Theorem 4.8.2. This implies measurability. Applying Levi’s Theorem again to theresults from part 1., we have∫

X×Yf dµ⊗ ν = sup

n∈N

∫X×Y

fn dµ⊗ ν

= supn∈N

∫X

(∫Yfn,x dν

)dµ(x)

=

∫X

(supn∈N

∫Yfn,x dν

)dµ(x)

=

∫X×Y

(∫Yfx dν

)dµ(x)

Again, interchanging roles yields the symmetric equality. a

This yields as an immediate consequence that the cuts of a product integrable function arealmost everywhere integrable, to be specific:

Corollary 4.9.10 Let f : X × Y → R be µ⊗ ν integrable, and put

A := x ∈ X | fx is not ν-integrable,B := y ∈ Y | fy is not µ-integrable.

Then A ∈ A, B ∈ B, and µ(A) = ν(B) = 0.

Proof Because A = x ∈ X |∫Y |fx| dν =∞, we see that A ∈ A. By the additivity of the

integral, we have∫X×Y

|f | dµ⊗ ν =

∫X\A

(∫Y|fx| dν

)dµ(x) +

∫A

(∫Y|fx| dν

)dµ(x) <∞,

hence µ(A) = 0. B is treated in the same way. a

It is helpful to extend our integral in a minor way. Assume that∫X |f | dµ <∞ for f : X → R

measurable, and that µ(A) = 0 with A := x ∈ X | |f(x)| = ∞. Change f on A to a finitevalue, obtaining a measurable function f∗ : X → R, and define∫

Xf dµ :=

∫Xf∗ dµ.

Thus f 7→∫X f dµ does not notice this change on a set of measure zero. In this way, we

assume always that an integrable function takes finite values, even if we have to convince itto do so on a set of measure zero.

With this in mind, we obtain

Page 424: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

412 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Corollary 4.9.11 Let f : X × Y → R be integrable, then λx.∫Y fx dν and λy.

∫X f

y dν areintegrable with respect to µ resp. ν, and∫

X×Yf dµ⊗ ν =

∫X

(∫Yfx dν

)dµ(x) =

∫Y

(∫Xfy dµ

)dν(y).

Proof After the modification on a set of µ-measure zero, we know that

∣∣∫Xfx dν

∣∣ ≤ ∫Y|fx| dν <∞

for all x ∈ X, so that λx.∫Y fx dν is integrable with respect to µ; similarly, λy.

∫X f

y dν isintegrable with respect to ν for all y ∈ Y . We obtain from Proposition 4.9.9 and the linearityof the integral ∫

X×Yf dµ⊗ ν =

∫X×Y

f+ dµ⊗ ν −∫X×Y

f− dµ⊗ ν

=

∫X

(∫Yf+x dν

)dµ(x)−

∫X

(∫Yf−x dν

)dµ(x)

=

∫X

(∫Yf+x dν −

∫Yf−x dν

)dµ(x)

=

∫X

(∫Yfx dν

)dµ(x)

The second equation is treated in exactly the same way. a

Now we now how to treat a function which is integrable, but we do not yet have a crite-rion for integrability. The elegance of Fubini’s Theorem shines through the observation thatthe existence of the iterated integrals yields integrability for the product integral. To bespecific:

Theorem 4.9.12 Let f : X × Y → R be measurable. Then these statements are equivalent

1.∫X×Y |f | dµ⊗ ν <∞.

2.∫X

(∫Y |fx| dν

)dµ(x) <∞.

3.∫Y

(∫X |f

y| dµ)dν(y) <∞.

Under one of these conditions, f is µ⊗ ν-integrable, andFubini’sTheorem ∫

X×Yf dµ⊗ ν =

∫X

(∫Yfx dν

)dµ(x) =

∫Y

(∫Xfy dµ

)dν(y). (4.12)

Proof We discuss only 1 ⇒ 2, the other implications are proved similarly. From Proposi-tion 4.9.9 it is inferred that |f | is integrable, so 2. holds by Corollary 4.9.11, from which wealso obtain representation (4.12). a

Product integration and Fubini’s Theorem are an essential extension to our tool kit.

Page 425: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 413

4.9.2 Infinite Products and Projective Limits

We will discuss as a first application of product integration now the existence of infiniteproducts for probability measures. Then we will go on to establish the existence of projec-tive limits. These limits will be useful for interpreting a logic which works with continuoustime.

Corollary 4.9.3 extends to a finite number of σ-finite measure spaces in a natural way. Let(Xi,Ai, µi) be σ-finite measure spaces for 1 ≤ i ≤ n, the uniquely determined product measureon A1 ⊗ . . .⊗An is denoted by µ1 ⊗ . . .⊗ µn, and we infer from Corollary 4.9.3 that we maywrite

(µ1 ⊗ . . .⊗ µn)(Q) =

∫X2×...×Xn

µ1(Qx2,...,xn) d(µ2 ⊗ . . .⊗ µn)(x2, . . . , xn),

=

∫X1×...×Xn−1

µn(Qx1,...,xn−1) d(µ1 ⊗ . . .⊗ µn−1)(x1, . . . , xn−1)

whenever Q ∈ A1 ⊗ . . .⊗An. This is fairly straightforward.

Infinite Products

We will have a closer look now at infinite products, where we restrict ourselves to probabilitymeasures, and here we consider the countable case first. So let (Xn,An, $n) be a measurespace with a probability measure $n on An for n ∈ N. We want to construct the infiniteproduct of this sequence.

Let us fix some notations first and then describe the goal in greater detail. Put

X(n) :=∏k≥n

Xk,

A(n) := A×X(n+`) | A ∈ An ⊗ . . .⊗An+`−1 for some ` ∈ N.

The elements of A(n) are the cylinder sets for X(n). Thus X(1) =∏n∈NXn, and

⊗n∈NAn =

Cylindersets

σ(A(1)). Given A ∈ A(n), we can write A as A = C ×Xn+` with C ∈ An ⊗ . . .⊗An+`−1. Soif we set

$(n)(A) := $n ⊗ . . .⊗$n+`−1(C),

then $(n) is well defined on A(n), and it is readily verified that it is monotone and additivewith $(n)(∅) = 0 and $(n)(X(n)) = 1. Moreover, we infer from Theorem 4.9.2 that

$(n)(C) =

∫Xn+1×...×Xn+m

$n(Cxn+1,...,xn+m) d($n+1 ⊗ . . .⊗$n+m)(xn+1 . . . xn+m)

for all C ∈ A(n).

The goal is to show that there exists a unique probability measure $ on⊗

n∈N(Xn,An) such Goal

that $(A×X(n+1)

)= ($1⊗. . .⊗$n)(A) whenever A ∈ An⊗A(n+1). If we can show that $(1)

Page 426: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

414 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

is σ-additive on A(1), then we can extend $(1) to the desired σ-algebra by Theorem 1.6.29.For this it is sufficient to show that infn∈N$

(1)(An) > ε > 0 implies⋂n∈NAn 6= ∅ for any

!⇐=

decreasing sequence (An)n∈N in A(1).

The basic idea is to construct a sequence (xn)n∈N ∈⋂n∈NAn. We do this step by step.Basic idea

• First we determine an element x1 ∈ X1 such that we can expand the — admittedly veryshort — partial sequence x1 to a sequence which is contained in all An; this means thatwe have to have Ax1n 6= ∅ for all n ∈ N, because Ax1n contains all possible continuationsof x1 into An. We conclude that these sets are non-empty, because their measure isstrictly positive.

• If we have such an x1, we start working on the second element of the sequence, so wehave a look at some x2 ∈ X2 such that we can expand x1, x2 to a sequence which iscontained in all An, so we have to have Ax1,x2n 6= ∅ for all n ∈ N. Again, we look for x2

so that the measure of Ax1,x2n is strictly positive for each n.

• Continuing in this fashion, we obtain the desired sequence, which then has to be anelement of

⋂n∈NAn by construction.

This is the plan. Let us explore finding x1. Put

E(n)1 := x1 ∈ X1 | $(2)(Ax1n ) > ε/2.

Because

$(1)(An) =

∫X1

$(2)(Ax1n ) d$1(x1)

we have

0 < ε < $(1)(An) =

∫E

(n)1

$(2)(Ax1n ) d$1(x1) +

∫X1\E(n)

1

$(2)(Ax1n ) d$1(x1)

≤ $1(E(n)1 ) + ε/2 ·$(1)(X1 \ E(n)

1 )

≤ $1(E(n)1 ) + ε/2.

Thus $1(E(n)1 ) ≥ ε/2 for all n ∈ N. Since A1 ⊇ A2 ⊇ . . ., we have also E

(1)1 ⊇ E

(2)1 ⊇ . . ., so

let E1 :=⋂n∈NE

(n)1 , then E1 ∈ A1 with $1(E1) ≥ ε/2 > 0. In particular, E1 6= ∅. Pick and

fix x1 ∈ E1. Then Ax1n ∈ A(2), and $(2)(Ax1n ) > ε/2 for all n ∈ N.

Let us have a look at how to find the second element; this is but a small variation of the idea

just presented. Put E(n)2 := x2 ∈ X2 | $(3)(Ax1,x2n ) > ε/4 for n ∈ N. Because

$(2)(Ax1n ) =

∫X2

$(3)(Ax1,x2n ) d$2(x2),

we obtain similarly $2(E(n)2 ) ≥ ε/4 for all n ∈ N. Again, we have a decreasing sequence,

and putting E2 :=⋂n∈NE

(n)2 , we have $2(E2) ≥ ε/4, so that E2 6= ∅. Pick x2 ∈ E2, then

Ax1,x2n ∈ A(3) and $(3)(Ax1,x2n ) > ε/4 for all n ∈ N.

Page 427: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 415

In this manner we determine inductively for each k ∈ N the finite sequence 〈x1, . . . , xk〉 ∈X1 × . . . ×Xk such that $(k+1)(Ax1,...,xkn ) > ε/2k for all n ∈ N. Consider now the sequence(xn)n∈N. From the construction it is clear that 〈x1, x2, . . . , xk, . . .〉 ∈

⋂n∈NAn. This shows

that⋂n∈NAn 6= ∅, and it implies by the argumentation above that $(1) is σ-additive on the

algebra A(1).

Hence we have established:

Theorem 4.9.13 Let (Xn,An, $n) be probability spaces for all n ∈ N. Then there exists aunique probability measure $ on

⊗n∈N(Xn,An) such that

$(A×∏k>nXk) = ($1 ⊗ . . .⊗$n)(A)

for all A ∈⊗n

i=1Ai. a

Define the projection π∞n : (xn)n∈N 7→ 〈x1, . . . , xn〉 from∏n∈NXn to

∏ni=1Xi. In terms of

image measures, the theorem states that there exists a unique probability measure $ on theinfinite product such that S(π∞n )($) = $1 ⊗ . . .⊗$n.

Non countable index sets. Now let us have a look at the general case, in which the indexset is not necessarily countable. Let (Xi,Ai, µi) be a family of probability spaces for i ∈ I,put X :=

∏i∈I Xi and A :=

⊗i∈I Ai. Given J ⊆ I, define πJ : (xi)i∈I 7→ (xi)i∈J as the

projection X →∏i∈J Xi. Put AJ := π−1

J

[⊗j∈J Aj

].

Although the index set I may be large, the measurable sets in A are always determined by acountable subset of the index set:

Lemma 4.9.14 Given A ∈ A, there exists a countable subset J ⊆ I such that χA(x) =χA(x′), whenever πJ(x) = πJ(x′).

Proof Let G be the set of all A ∈ A for which the assertion is true. Then G is a σ-algebrawhich contains π−1

i[Ai]

for every i ∈ I, hence G = A. a

This yields as an immediate consequence

Corollary 4.9.15 A =⋃AJ | J ⊆ I is countable.

Proof It is enough to show that the set on the right hand side is a σ-algebra. This followseasily from Lemma 4.9.14. a

We obtain from this observation, and from the previous result for the countable case thatarbitrary products exist.

Theorem 4.9.16 Let (Xi,Ai, µi) be a family of probability spaces for i ∈ I. Then thereexists a unique probability measure µ on

∏i∈I(Xi,Ai) such that

µ(π−1i1,...,ik

[C]) = (µi1 ⊗ . . .⊗ µik)(C) (4.13)

for all C ∈⊗k

j=1Aij and all i1, . . . , ik ∈ I.

Proof Let A ∈ A, then there exists a countable subset J ⊆ I such that A ∈ AJ . Let µJ bethe corresponding product measure on AJ . Define µ(A) := µJ(A), then it it easy to see thatµ is a well defined measure on A, since the extension to countable products is unique. Fromthe construction it follows also that the desired property (4.13) is satisfied. a

Page 428: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

416 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Projective Limits

For the interpretation of some logics the projective limit of a projective family of stochasticrelations is helpful; this is the natural extension of a product. It will be discussed now. Denoteby X∞ :=

∏k∈NX the countable product of X with itself; recall that P is the probability

functor, assigning to each measurable space its probability measures.

Definition 4.9.17 Let X be a Polish space, and (µn)n∈N a sequence of probability measuresµn ∈ P (Xn). This sequence is called a projective system iff

µn(A) = µn+1(A×X)

for all n ∈ N and all Borel sets A ∈ B(Xn). A probability measure µ∞ ∈ P (X∞) is called theprojective limit of the projective system (µn)n∈N iff

µn(A) = µ∞(A×∏j>nX)

for all n ∈ N and A ∈ B(Xn).

Defining the projections

πn+1n : 〈x1, . . . , xn+1〉 7→ 〈x1, . . . , xn〉

and

π∞n : 〈x1, . . . , 〉 7→ 〈x1, . . . , xn〉,

the projectivity condition on (µn)n∈N can be rewritten as

µn = P(πn+1n

)(µn+1) for all n ∈ N

and the condition on µ∞ to be a projective limit as

µn = P (π∞n ) (µ∞) for all n ∈ N

Thus a sequence of measures is a projective system iff each measure is the projection of thenext one; its projective limit is characterized through the property that its values on cylindersets coincides with the value of a member of the sequence, after taking projections. A specialcase is given by product measures. Assume that µn = ν1 ⊗ . . . ⊗ νn, where (νn)n∈N is asequence of probability measures on X. Then the condition on projectivity is satisfied, andthe projective limit is the infinite product constructed above. It should be noted, however,that the projectivity condition does not express µn+1(A×B) in terms of µn(A) for an arbitrarymeasurable set B ⊆ X, as the product measure does; it only says that µn(A) = µn+1(A×X)holds.

It is not immediately obvious that a projective limit exists in general, given the rather weakdependency of the measures. In general, it will not, and this is why. The basic idea for the

Page 429: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 417

construction of the infinite product has been to define the limit on the cylinder sets and thento extend this set function — but it has to be established that it is indeed σ-additive, andthis is difficult in general. The crucial property in the proof above has been that µnk(Ak)→ 0

Crucialproperty

whenever (An)n∈N is a sequence of cylinder sets Ak (with at most nk components that donot equal X) that decreases to ∅. This property has been established above for the case ofthe infinite product through Fubini’s Theorem, but this is not available in the general settingconsidered here, because we do not deal with an infinite product of measures. We will see,however, that a topological argument will be helpful. This is why we did postulate the basespace X to be Polish.

We start with an even stronger topological condition, viz., that the space under considerationis compact and metric. The central statement is

Proposition 4.9.18 Let X be a compact metric space. Then the projective system (µn)n∈Nhas a unique projective limit µ∞.

Proof 0. Let A = A′k×∏j>kX be a cylinder set with A′k ∈ B(Xk). Define µ∗(A) := µk(A

′k).

Then µ∗ is well defined on the cylinder sets, since the sequence forms a projective system.Idea of the

proofIn order to show that µ∗ is σ-additive on the cylinder sets, we take a decreasing sequence(An)n∈N of cylinder sets with

⋂n∈NAn = ∅ and show that infn∈N µ

∗(An) = 0. In fact, supposethat (An)n∈N is decreasing with µ∗(An) ≥ δ for all n ∈ N, then we show that

⋂n∈NAn 6= ∅

by approximating each An from within through a compact set, which is given to us throughLemma 4.6.13. Then we are in a position to apply the observation that compact sets have thefinite intersection property: a decreasing sequence of non-empty compact sets cannot have anempty intersection.

1. We can write An = A′n ×∏j>kn

X for some A′n ∈ B(Xkn). From Lemma 4.6.13 we getfor each n a closed, hence compact set K ′n ⊆ A′n such that µkn(A′n \ K ′n) < δ/2n. BecauseX∞ is compact by Tihonov’s Theorem 3.2.12, K ′′n := K ′n ×

∏j>kn

X is a compact set, andKn :=

⋂nj=1K

′′j ⊆ An is compact as well, with

µ∗(An \Kn) ≤ µ∗(n⋃j=1

A′′n \K ′′j ) ≤n∑j=i

µ∗(A′′j \K ′′j ) =n∑j=1

µkj (A′j \K ′j) ≤

∞∑j=1

δ/2j = δ.

Thus (Kn)n∈N is a decreasing sequence of nonempty compact sets. Consequently, by the finiteintersection property for compact sets,

∅ 6=⋂n∈N

Kn ⊆⋂n∈N

An.

2. Since the cylinder sets generate the Borel sets of X∞, and since µ∗ is σ-additive, we knowthat there exists a unique extension µ∞ ∈ P (X∞) to it. Clearly, if A ⊆ Xn is a Borel set,then

µ∞(A×∏j>nX) = µ∗(A×

∏j>nX) = µn(A),

so we have constructed a projective limit. Hurray!

Page 430: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

418 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

3. Uniqueness is established by an appeal to the π-λ-Theorem. Suppose that µ′ is anotherprobability measure in P (X∞) that has the desired property. Consider

D := D ∈ B(X∞) | µ∞(D) = µ′(D).

It is clear the D contains all cylinder sets, that it is closed under complements, and undercountable disjoint unions. By the π-λ-Theorem 1.6.30 D contains the σ-algebra generated bythe cylinder sets, hence all Borel subset of X∞. This establishes uniqueness of the extension.a

The proof makes critical use of the observation that we can approximate the measure of aBorel set arbitrarily well by compact sets from within; see Lemma 4.6.13. It is also importantto observe that compact sets have the finite intersection property: if each finite intersectionof a family of compact sets is nonempty, the intersection of the entire family cannot be empty.Consequently the proof given above works in general Hausdorff spaces, provided the measuresunder consideration have the approximation property mentioned above.

We free ourselves from the restrictive assumption of having a compact metric space using theAlexandrov embedding of a Polish space into a compact metric space.

Proposition 4.9.19 Let X be a Polish space, (µn)n∈N be a projective system on X. Thenthere exists a unique projective limit µ∞ ∈ P (X∞) for (µn)n∈N.

Proof X is a dense measurable subset of a compact metric space ~X by Alexandrov’s The-orem 4.3.27. Defining ~µn(B) := µn(B ∩ Xn) for the Borel set B ⊆ ~Xn yields a projectivesystem (~µn)n∈N on ~X with a projective limit ~µ∞ by Proposition 4.9.18. Since by construction~µ∞(X∞) = 1, restrict ~µ∞ to the Borel sets of X∞, then the assertion follows. a

An interesting application of this construction arises through stochastic relations that form aprojective system. We will show now that there exists a kernel which may be perceived as a(pointwise) projective limit.

Corollary 4.9.20 Let X and Y be Polish spaces, and assume that J (n) : X Y n is astochastic relation for each n ∈ N such that the sequence

(J (n)(x)

)n∈N forms a projective

system on Y for each x ∈ X, in particular J (n)(x)(Y n) = 1 for all x ∈ X. Then thereexists a unique sub-Markov kernel J∞ on X and Y∞ such that J∞(x) is the projective limitof(J (n)(x)

)n∈N for each x ∈ X.

Proof 0. Let for x fixed J∞(x) be the projective limit of the projective system(J (n)(x)

)n∈N.

By the definition of a stochastic relation we need to show that the map x 7→ J∞(x)(B) ismeasurable for every B ∈ B(Y∞).

1. We apply the principle of good sets, considering

D := B ∈ B(Y∞) | x 7→ J∞(x)(B) is measurable.

Then the general properties of measurable functions imply that D is a σ-algebra on Y∞. Takea cylinder set B = B0 ×

∏j>k Y with B0 ∈ B(Y k) for some k ∈ N, then, by the properties

of the projective limit, we have J∞(x)(B) = J (k)(x)(B0). But x 7→ J (k)(x)(B0) constitutes ameasurable function on X. Consequently, B ∈ D, and so D contains the cylinder sets whichgenerate B(Y∞). Thus measurability is established for each Borel set B ⊆ Y∞, arguing withthe π-λ-Theorem 1.6.30 as in the last part of the proof for Proposition 4.9.18. a

Page 431: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 419

4.9.3 Case Study: Continuous Time Stochastic Logic

We illustrate the construction of the projective limit through the interpretation of a pathlogic over infinite paths; the logic is called CSL — continuous time stochastic logic. Since thediscussion of this application requires some preparations, some of which are of independentinterest, we develop the example in a series of steps.

We introduce CSL now and describe it informally first.

Fix P as a countable set of atomic propositions. We define recursively state formulas andpath formulas for CSL:

State formulas are defined through the syntax

ϕ ::= > | a | ¬ϕ | ϕ ∧ ϕ′ | S./p(ϕ) | P./p(ψ).

Here a ∈ P is an atomic proposition, ψ is a path formula, ./ is one of the relationaloperators <,≤,≥, >, and p ∈ [0, 1] is a rational number.

Path formulas are defined through

ψ ::= X I ϕ | ϕ UI ϕ′

with ϕ,ϕ′ as state formulas, I ⊆ R+ a closed interval of the real numbers with rationalbounds (including I = R+).

The operator S./p(ϕ) gives the steady-state probability for ϕ to hold with the boundary condi-tion ./ p; the formula P replaces quantification: the path-quantifier formula P./p(ψ) holds ina state s iff the probability of all paths starting in s and satisfying ψ is specified by ./ p. Thusψ holds on almost all paths starting from s iff s satisfies P≥1(ψ), a path being an alternatinginfinite sequence σ = 〈s0, t0, s1, t1, . . .〉 of states xi and of times ti. Note that the time is beingmade explicit here. The next-operator X I ϕ is assumed to hold on path σ iff s1 satisfies ϕ,and t0 ∈ I holds. Finally, the until-operator ϕ1 UI ϕ2 holds on path σ iff we can find a pointin time t ∈ I such that the state σ@t which σ occupies at time t satisfies ϕ2, and for all times σ@tt′ before that, σ@t′ satisfies ϕ1.

A Polish state space S is fixed; this space is used for modelling a transition system takes alsotime into account. We are not only interested in the next state of a transition but also in thetime after which to make a transition. So the basic probabilistic datum will be a stochasticrelation M : S R+ × S; if we are in state s, we will do a transition to a new state s′ afterwe did wait some specified time t; M(s)(D) will give the probability that the pair 〈t, s′〉 ∈ D.We assume that M(s)(R+ × S) = 1 holds for all s ∈ S.

A path σ is an element of the set (S × R+)∞. Path σ = 〈s0, t0, s1, t1, . . .〉 may be written as Path

s0t0−→ s1

t1−→ . . . with the interpretation that ti is the time spent in state si. Given i ∈ N,denote si by σ[i] as the (i+1)-st state of σ, and let δ(σ, i) := ti. Let for t ∈ R+ the index j bethe smallest index k such that t <

∑ki=0 ti, and put σ@t := σ[j], if j is defined; set σ@t := #,

otherwise (here # is a new symbol not in S ∪R+). S# denotes S ∪#; this is a Polish spacewhen endowed with the sum σ-algebra. The definition of σ@t makes sure that for any time twe can find a rational time t′ with σ@t = σ@t′.

Page 432: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

420 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

We will deal only with infinite paths. This is no loss of generality because events that happenat a certain time with probability 0 will have the effect that the corresponding infinite pathsoccur only with probability 0. Thus we do not prune the path; this makes the notationsomewhat easier to handle.

The Borel sets B((S × R+)∞) are the smallest σ-algebra which contains all the cylindersets

n∏j=1

(Bj × Ij)×∏j>n

(S × R+) | n ∈ N, I1, . . . , In rational intervals, B1, . . . , Bn ∈ B(S).

Thus a cylinder set is an infinite product that is determined through the finite product of aninterval with a Borel set in S. It will be helpful to remember that the intersection of twocylinder sets is again a cylinder set.

Given M : S R+ × S with Polish S, define inductively M1 := M , and

Mn+1(s0)(D) :=

∫(R+×S)n

M(sn)(Dt0,s1,...,tn−1,sn) dMn(s0)(t0, s1, . . . , tn−1, sn)

for the Borel set D ⊆ (R+×S)n+1. Let us illustrate this for n = 1. Given D ∈ B((R+×S)2)and s0 ∈ S as a state to start from, we want to calculate the probability M2(s0)(D) that〈t0, s1, t1, s2〉 ∈ D. This is the probability for the initial path 〈s0, t0, s1, t1, s2〉 (a pathlet),given the initial state s0. Since 〈t0, s1〉 is taken care of in the first step, we fix it and calculateM(s1)(〈t1, s2〉 | 〈t0, s1, t1, s2〉 ∈ D) = M(s1)(Dt0,s1), by averaging, using the probabilityprovided by M(s0), so that we obtain

M2(s0)(D) =

∫R+×S

M(s1)(Dt0,s1) dM(s0)(t0, s1)

We obtain for the general case Mn+1(s0)(D) as the probability for 〈s0, t0, . . . , sn, tn, sn+1〉 asthe initial piece of an infinite path to be a member of D. This probability indicates that westart in s0, remain in this state for t0 units of time, then enter state s1, remain there for t1time units, etc., and finally leave state sn after tn time units, entering sn+1, all this happeningwithin D.

We claim that (Mn(s))n∈N is a projective system. We first see from Example 4.9.5 thatMn : S (R+ × S)n defines a transition kernel for each n ∈ N. Let D = A × (R+ ×S) with A ∈ B((R+ × S)n), then M(sn)(Dt0,s1,...,tn−1,sn) = M(sn)(R+ × S) = 1 for all〈t0, s1, . . . , tn−1, sn〉 ∈ A, so that we obtain Mn+1(s)(A×(R+×S)) = Mn(s)(A). Consequently,the condition on projectivity is satisfied. Hence there exists a unique projective limit, thus atransition kernel

M∞ : S (R+ × S)∞

with

Mn(s)(A) = M∞(s)(A×

∏k>n

(R+ × S))

for all s ∈ S and for all A ∈ B((R+ × S)n).

Page 433: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 421

The projective limit displays indeed limiting behavior: suppose B is an infinite measurablecube

∏n∈NBn with Bn ∈ B(R+ × S) as Borel sets. Because

B =⋂n∈N

(∏1≤j≤nBj ×

∏j>n(R+ × S)

)represents B as the intersection of a monotonically decreasing sequence, we have for alls ∈ S

M∞(s)(B) = limn→∞M∞(s)(∏

1≤j≤nBj ×∏j>n(R+ × S)

)= limn→∞Mn(s)

(∏1≤j≤nBj

).

Hence M∞(s)(B) is the limit of the probabilities Mn(s)(Bn) at step n.

In this way models based on a Polish state space S yield stochastic relations S (R+×S)∞

through projective limits. Without this limit it would be difficult to model the transitionbehavior on infinite paths; the assumption that we work in Polish spaces makes sure thatthese limits in fact do exist. To get started, we need to assume that given a state s ∈ S, thereis always a state to change into after a finite amount of time.

We obtain — as an aside — a recursive formulation for the transition law M : X (R+×S)∞

as a first consequence of the construction for the projective limit. Interestingly, it reflects thedomain equation (R+ × S)∞ = (R+ × S)× (R+ ×X)∞.

Lemma 4.9.21 If D ∈ B((R+ × S)∞), then

M∞(s)(D) =

∫R+×S

M∞(s′)(D〈t,s′〉) M1(s)(d〈t, s′〉)

holds for all s ∈ S.

Proof Recall that D〈t,s′〉 = τ | 〈t, s′, τ〉 ∈ D. Let

D = (H1 × . . .×Hn+1)×∏j>n

(R+ × S)

be a cylinder set with Hi ∈ B(R+ × S), 1 ≤ i ≤ n+ 1. The equation in question in this caseboils down to

Mn+1(s)(H1 × . . .×Hn+1) =

∫H1

Mn(s′)(H2 × . . .×Hn+1)M1(s)(d〈t, s′〉).

This may easily be derived from the definition of the projective sequence. Consequently, theequation in question holds for all cylinder sets, thus the π-λ-Theorem 1.6.30 implies that itholds for all Borel subsets of (R+ × S)∞. a

This decomposition indicates that we may first select in state s a new state and a transitiontime; with these data the system then works just as if the selected new state would have beenthe initial state. The system does not have a memory but reacts depending on its currentstate, no matter how it arrived there. Lemma 4.9.21 may accordingly be interpreted as aMarkov property for a process the behavior of which is independent of the specific step that

Markovproperty

is undertaken.

We need some information about the @-operator before continuing.

Page 434: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

422 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Lemma 4.9.22 〈σ, t〉 7→ σ@t is a Borel measurable map from (S × R+)∞ × R+ to S#. Inparticular, the set 〈σ, t〉 | σ@t ∈ S is a measurable subset of (S × R+)∞ × R+.

Proof

0. Note that we claim joint measurability in both components (which is strictly strongerthan measurability in each component). Thus we have to show that 〈σ, t〉 | σ@t ∈ A is ameasurable subset of (S × R+)∞ × R+, whenever A ⊆ S# is Borel.

1. Because for fixed i ∈ N the map σ 7→ δ(σ, i) is a projection, δ(·, i) is measurable, henceσ 7→

∑ji=0 δ(σ, i) is. Consequently,

〈σ, t〉 | σ@t = # = 〈σ, t〉 | ∀j : t ≥∑j

i=0 δ(σ, i) =⋂j≥0〈σ, t〉 | t ≥

∑ji=0 δ(σ, i).

This is clearly a measurable set.

2. Put stop(σ, t) := infk ≥ 0 | t <∑k

i=0 δ(σ, i), thus stop(σ, t) is the smallest index forwhich the accumulated waiting times exceed t.

Xk := 〈σ, t〉 | stop(σ, t) = k = 〈σ, t〉 |∑k−1

i=0 δ(σ, i) ≤ t <∑k

i=0 δ(σ, i)

is a measurable set by Corollary 4.2.5. Now let B ∈ B(S) be a Borel set, then

〈σ, t〉 | σ@t ∈ B =⋃k≥0

〈σ, t〉 | σ@t ∈ B, stop(σ, t) = k

=⋃k≥0

〈σ, t〉 | σ[k] ∈ B, stop(σ, t) = k

=⋃k∈N

(Xk ∩

(∏i<k

(S × R+)× (B × R+)×∏i>k

(S × R+)))

.

Because Xk is measurable, the latter set is measurable. This establishes measurability of the@-map. a

As a consequence, we establish that some sets and maps, which will be important for the laterdevelopment, are actually measurable. A notational convention for improving readability isConventionproposed: the letter σ will always denote a generic element of (S × R+)∞, and the letter ϑalways a generic element of R+ × (S × R+)∞.

Proposition 4.9.23 We observe the following properties:

1. 〈σ, t〉 | limi→∞ δ(σ, i) = t is a measurable subset of (S × R+)∞ × R+,

2. let N∞ : S (R+ × S)∞ be a stochastic relation, then

s 7→ lim inft→∞

N∞(s)(ϑ | 〈s, ϑ〉@t ∈ A)

s 7→ lim supt→∞

N∞(s)(ϑ | 〈s, ϑ〉@t ∈ A)

constitute measurable maps X → R+ for each Borel set A ⊆ S.

Page 435: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 423

Proof 0. The proof makes crucial use of the fact that the real line is a complete metric space,and that the rational numbers are a dense and countable set.

1. In order to establish part 1, write

〈σ, t〉 | limi→∞

δ(σ, i) = t =⋂

Q3ε>0

⋃n∈N

⋂m≥n〈σ, t〉 || δ(σ,m)− t |< ε.

By Lemma 4.9.22, the set

〈σ, t〉 || δ(σ,m)− t |< ε = 〈σ, t〉 | δ(σ,m) > t− ε ∩ 〈σ, t〉 | δ(σ,m) < t+ ε

is a measurable subset of (S × R+)∞ × R+, and since the union and the intersections arecountable, measurability is inferred.

2. From the definition of the @-operator it is immediate that given an infinite path σ and atime t ∈ R+, there exists a rational t′ with σ@t = σ@t′. Thus we obtain for an arbitrary realnumber x, an arbitrary Borel set A ⊆ S and s ∈ S

lim inft→∞

N∞(s)(ϑ | 〈s, ϑ〉@t ∈ A) ≤ x⇔ supt≥0

infr≥t

N∞(s)(ϑ | 〈s, ϑ〉@r ∈ A) ≤ x

⇔ supQ3t≥0

infQ3r≥t

N∞(s)(ϑ | 〈s, ϑ〉@r ∈ A) ≤ x

⇔ s ∈⋂

Q3t≥0

⋃Q3r≥t

Ar,x

withAr,x :=

s′ | N∞(s′)

(ϑ | 〈s′, ϑ〉@r ∈ A

)≤ x

.

We infer that Ar,x is a measurable subset of S from the fact that N∞ is a stochastic relationand from Exercise 4.16. Since a map f : W → R is measurable iff each of the sets w ∈W | f(w) ≤ s is a measurable subset of W , the assertion follows for the first map. Thesecond part is established in exactly the same way, using that f : W → R is measurable iffw ∈W | f(w) ≥ s is a measurable subset of W , and observing

lim supt→∞

N∞(s)(ϑ | 〈x, ϑ〉@t ∈ A) ≥ x⇔ infQ3t≥0

supQ3r≥t

N∞(x)(ϑ | 〈s, ϑ〉@r ∈ A) ≥ x.

a

This has some consequences which will come in useful for the interpretation of CSL. Beforestating them, it is noted that the statement above (and the consequences below) do not makeuse of N∞ being a projective limit; in fact, we assume N∞ : S (R+×S)∞ to be an arbitrarystochastic relation. A glimpse at the proof shows that these statements even hold for finitetransition kernels, but since we will use it for the probabilistic case, we stick to stochasticrelations.

Now for the consequences. As a first consequence we obtain that the set on which the asymp-totic behavior of the transition times is reasonable (in the sense that it tends probabilisticallyto a limit) is well behaved in terms of measurability:

Corollary 4.9.24 Let A ⊆ X be a Borel set, and assume that N∞ : S (R+ × S)∞ is astochastic relation. Then

Page 436: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

424 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

1. the set QA := s ∈ S | limt→∞N∞(s)(ϑ | 〈s, ϑ〉@t ∈ A) exists on which the limitexists is a Borel subset of S,

2. s 7→ limt→∞N∞(s)(ϑ | 〈s, ϑ〉@t ∈ A is a measurable map QA → R+.

Proof Since s ∈ QA iff

lim inft→∞

N∞(x)(ϑ | 〈s, ϑ〉@t ∈ A) = lim supt→∞

N∞(x)(ϑ | 〈s, ϑ〉@t ∈ A),

and since the set on which two Borel measurable maps coincide is a Borel set itself, the firstassertion follows from Proposition 4.9.23, part 2. This implies the second assertion as well.a

When dealing with the semantics of the until operator later, we will also need to establishmeasurability of certain sets. Preparing for that, we state:

Lemma 4.9.25 Assume that A1 and A2 are Borel subsets of S, and let I ⊆ R+ be an interval,then

U(I, A1, A2) := σ | ∃t ∈ I : σ@t ∈ A2 ∧ ∀t′ ∈ [0, t[: σ@t′ ∈ A1

is a measurable set of paths, thus U(I, A1, A2) ∈ B((S × R+)∞).

Proof 0. Remember that, given a path σ and a time t ∈ R+, there exists a rational timetr ≤ t with σ@t = σ@tr. Consequently,

U(I, A1, A2) =⋃

t∈Q∩I

(σ | σ@t ∈ A2 ∩

⋂t′∈Q∩[0,t]

σ | σ@t′ ∈ A1).

The inner intersection is countable and is performed over measurable sets by Lemma 4.9.22,thus forming a measurable set of paths. Intersecting it with a measurable set and forming acountable union yields a measurable set again. a

Interpretation of CSL. Now that a description for the behavior of paths is available, we areready for a probabilistic interpretation of CSL. We have started from the assumption that theone-step behavior is governed through a stochastic relation M : S R+×S with M(s)(R+×S) = 1 for all s ∈ S from which the stochastic relation M∞ : S R+ × (S ×R+)∞ has beenconstructed. The interpretations for the formulas can be established now, and we show thatthe sets of states resp. paths on which formulas are valid are Borel measurable.

To get started on the formal definition of the semantics, we assume that we know for eachatomic proposition which state it is satisfied in. Thus we fix a map ` that maps P to B(S),

` : P →B(S)

assigning each atomic proposition a Borel set of states.

The semantics is described as usual recursively through relation |= between states resp. paths,and formulas. Hence s |= ϕ means that state formula ϕ holds in state s, and σ |= ψ meansthat path formula ψ is true on path σ.

Here we go:

1. s |= > is true for all s ∈ S.s |= ϕ

Page 437: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 425

2. s |= a iff s ∈ `(a).

3. s |= ϕ1 ∧ ϕ2 iff s |= ϕ1 and s |= ϕ2.

4. s |= ¬ϕ iff s |= ϕ is false.

5. s |= S./p(ϕ) iff limt→∞M∞(s)(ϑ | 〈s, ϑ〉@t |= ϕ) exists and is ./ p.

6. s |= P./p(ψ) iff M∞(s)(ϑ | 〈s, ϑ〉 |= ψ) ./ p.

7. σ |= X I ϕ iff σ[1] |= ϕ and δ(σ, 0) ∈ I. σ |= ψ

8. σ |= ϕ1 UI ϕ2 iff ∃t ∈ I : σ@t |= ϕ2 and ∀t′ ∈ [0, t[: σ@t′ |= ϕ1.

Most interpretations should be obvious. Given a state s, we say that s |= S./p(ϕ) iff theasymptotic behavior of the paths starting at s gets eventually stable with a limiting probabilitygiven by ./ p. Similarly, s |= P./p(ψ) holds iff the probability that path formula ψ holds forall s-paths is specified through ./ p. For 〈s0, t0, s1, . . . , 〉 |= X I ϕ to hold we require s1 |= ϕafter a waiting time t0 for the transition to be a member of interval I. Finally, σ |= ϕ1 UI ϕ2

holds iff we can find a time point t in the interval I such that the corresponding state σ@tsatisfies ϕ2, and for all states on that path before t, formula ϕ1 is assumed to hold. Thekinship to CTL is obvious.

Denote by [[ϕ]] and [[ψ]] the set of all states for which the state formula ϕ holds, resp. the set [[ϕ]], [[ψ]]of all paths for which the path formula ϕ is valid. We do not distinguish notationally betweenthese sets, as far as the basic domains are concerned, since it should always be clear whetherwe describe a state formula or a path formula.

We show that we are dealing with measurable sets. Most of the work for establishing this hasbeen done already. What remains to be done is to fit in the patterns that we have set up inProposition 4.9.23 and its Corollaries.

Proposition 4.9.26 The set [[ξ]] is Borel, whenever ξ is a state formula or a path formula.

Proof 0. The proof proceeds by induction on the structure of the formula ξ. The inductionstarts with the formula >, for which the assertion is true, and with the atomic propositions, forwhich the assertion follows from the assumption on `: [[a]] = `(a) ∈ B(S). We assume for theinduction step that we have established that [[ϕ]], [[ϕ1]] and [[ϕ2]] are Borel measurable.

1. For the next-operator we write

[[X I ϕ]] = σ | σ[1] ∈ [[ϕ]] and δ(σ, 0) ∈ I.

This is the cylinder set (S × I × [[ϕ]]× R+)× (S × R+)∞, hence is a Borel set.

2. The until-operator may be represented through

[[ϕ1 UI ϕ2]] = U(I, [[ϕ1]], [[ϕ2]]),

which is a Borel set by Lemma 4.9.25.

3. Since M∞ : S (R+ × S)∞ is a stochastic relation, we know that

[[P./p(ψ)]] = s ∈ S |M∞(s)(ϑ | 〈s, ϑ〉 ∈ [[ϕ]]) ./ p

Page 438: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

426 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

is a Borel set.

4. We know from Corollary 4.9.24 that the set

Q[[ϕ]] := s ∈ S | limt→∞

M∞(s)(ϑ | 〈s, ϑ〉@t ∈ [[ϕ]]) exists

is a Borel set, and that

Jϕ : Q[[ϕ]] 3 s 7→ limt→∞

M∞(x) (ϑ | 〈s, ϑ〉@t ∈ [[ϕ]]) ∈ [0, 1]

is a Borel measurable function. Consequently,

[[S./p(ϕ)]] = s ∈ Q[[ϕ]] | Jϕ(s) ./ p

is a Borel set. a

Measurability of the sets on which a given formula is valid constitutes of course a prerequisitefor computing interesting properties. So we can compute, e.g.,

P≥0.5((¬down) U [10,20] S≥0.8(up2 ∨ up3)))

as the set of all states that with probability at least 0.5 will reach a state between 10 and 20time units so that the system is operational (up2, up3 ∈ P ) in a steady state with a probabilityof at least 0.8; prior to reaching this state, the system must be operational continuously(down ∈ P ).

The description of the semantics is just the basis for entering into the investigation of expres-sivity of the models associated with M and with `. We leave CSL here, however, and note thatthe construction of the projective limit is the basic and fundamental ingredient for furtherinvestigations.

4.9.4 Case Study: A Stochastic Interpretation of Game Logic

Game logic is a modal logics the modalities of which are given by games. The grammar forgames has been presented and discussed in Example 2.7.5; here it is again:

g ::= γ | g1 ∪ g2 | g1 ∩ g2 | g1; g2 | gd | g∗ | g× | ϕ?

with γ ∈ Γ and ϕ a formula of the underlying logic; the set Γ is the collection of primitivegames, from which compound games are constructed.

This section will deal with a stochastic interpretation of game logic, based on the discussionfrom Section 2.7 which indicates that Kripke models are not fully adequate for this task. Itresults in an interpretation of game logics through neighborhood models, which in turn arebased on effectivity functions. Since we have stochastic effectivity functions at our disposal,we will give a probabilistic interpretation based on them. We refer also to the discussionabove concerning the observation that the games we are discussing here are different from theBanach-Mazur games which are introduced in Chapter 1 and put to good use in Section 1.7and in Section 3.5.2.

Page 439: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 427

The games which we consider here are assumed to be determined, so if one player does nothave a winning strategy, the other player has one (we do not formalize the notion of a strategyhere); determinedness serves here the purpose of relating the players’ behavior to each other.In this sense, determinedness has the consequence that if we model Angels’s portfolio for agame γ ∈ Γ through the effectivity function Pγ , then Demon’s portfolio in state s is given byA | S \A 6∈ Pγ(s), which defines also an effectivity function.

We make the following assumptions (when writing down games, we assume for simplicity thatcomposition binds tighter than angelic or demonic choice):

¶ (gd)d is identical to g.

· Demonic choice can be represented through angelic choice: The game g1 ∩ g2 coincideswith the game (gd1 ∪ gd2)d.

¸ Similarly, demonic iteration can be represented through its angelic counterpart: (g×)d

is equal to (gd)∗,

¹ Composition is right distributive with respect to angelic choice: Making a decision toplay g1 or g2 and then playing g should be the same as deciding to play g1; g or g2; g,thus (g1 ∪ g2); g equals g1; g ∪ g2; g.

Note that left distributivity would mean that a choice between g; g1 and g; g2 is the sameas playing first g then g1∪g2; this is a somewhat restrictive assumption, since the choiceof playing g1 or g2 may be a decision made by Angel only after g is completed. Thuswe do not assume this in general (it will turn out, however, that in Kripke generatedmodels these choices are in fact equivalent, see Proposition 4.9.40).

º We assume similarly that g∗; g0 equals g0 ∪ g∗; g; g0. Hence when playing g∗; g0 Angelmay decide to play g not at all and to continue with g0 right away, or to play g∗ followedby g; g0. Thus g∗; g0 expands to g0 ∪ g; g0 ∪ g; g; g0 ∪ . . . .

» (g1; g2)d is the same as gd1 ; gd2 .

¼ The binary operations (composition, angelic and demonic choice) are commutative andassociative.

We do not discuss for the time being the test operator, since its semantics depends on amodel; we will fit in the operator later in (page 439), when all the necessary operations areavailable.

This is the stage on which we play our game.

Definition 4.9.27 A game frame G =((S,A), (Pγ)γ∈Γ

)has a measurable space (S,A) of

states and a t-measurable map Pγ : S → E(S,A) for each primitive game γ ∈ Γ .

As usual, we will omit the reference to the σ-algebra of the state space S, unless we reallyneed to write it down. Fix a game frame G = (S, (Pγ)γ∈Γ ), the set of primitive games isextended by the empty game ε, and set Pε := D with D as the Dirac effectivity functionaccording to Example 4.1.18. We assume in the sequel that ε ∈ Γ .

We define now recursively the set valued function ΩG(g | A, q) with the informal meaningthat this set describes the set of states so that Angel has a strategy of reaching a state in set

Page 440: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

428 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

A with probability greater than q upon playing game g. Assume that A ∈ A is a measurablesubset of S, and 0 ≤ q < 1, and define for 0 ≤ k ≤ ∞

Q(k)(q) := 〈a1, . . . , ak〉 ∈ Qk | ai ≥ 0 and

k∑i=1

ai ≤ q.

as the set of all non-negative rational k-tuples the sum of which does not exceed q.

À Let γ ∈ Γ , then put

ΩG(γ | A, q) := s ∈ S | βββA(A,> q) ∈ Pγ(s),

in particular ΩG(ε | A, q) = s ∈ S | δs(A) > q = A. Thus s ∈ ΩG(γ | A, q) iff Angelhas βββA(A,> q) in its portfolio when playing γ in state s. This entails that the set of allstate distributions which evaluate at A with a probability greater than q can be effectedby Angel in this situation. If Angel does not play at all, hence if the game γ equals ε,nothing is about to change, which means ΩG(ε | A, q) = s | δs ∈ βββA(A,> q) = A.

Á Let g be a game, then

ΩG(gd | A, q) := S \ΩG(g | S \A, q).

The game is determined, thus Demon can reach a set of states iff Angel does not have astrategy for reaching the complement. Consequently, upon playing g in state s, Demoncan reach a state in A with probability greater than q iff Angel cannot reach a state inS \A with probability greater q.

Illustrating, let us assume for the moment that Pγ = PKγ , i.e., that the effectivityfunction for γ ∈ Γ is generated from a stochastic relation Kγ . Then

s ∈ ΩG(γd | A, q)⇔ s /∈ ΩG(γ | S \A, q)⇔ Kγ(s)(S \A) ≤ q.

In general, s ∈ ΩG(γd | A, q) iff βββA(S \A,> q) /∈ Pγ(s) for γ ∈ Γ . This is exactly whatone would expect in a determined game.

 Assume s is a state such that Angel has a strategy for reaching a state in A when playingthe game g1∪g2 with probability not greater than q. Then Angel should have a strategyin s for reaching a state in A when playing game g1 with probability not greater than a1

and playing game g2 with probability not greater than a2 such that a1 + a2 ≤ q. Thus

ΩG(g1 ∪ g2 | A, q) :=⋂

a∈Q(2)(q)

(ΩG(A | g1, a1) ∪ΩG(A | g2, a2)

).

à Right distributivity of composition over angelic choice translates to this equation.

ΩG((g1 ∪ g2); g | A, q) := ΩG(g1; g ∪ g2; g | A, q).

Ä If γ ∈ Γ , put

ΩG(γ; g | A, q) := s ∈ S | Gg(A, q) ∈ Pγ(s),

Page 441: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 429

where

Gg(A, q) := µ ∈ S(S) |∫ 1

0µ(ΩG(g | A, r)) dr > q. (4.14)

Suppose that ΩG(g | A, r) is already defined for each r as the set of states for whichAngel has a strategy to effect a state in A through playing g with probability greaterthan r. Given a distribution µ over the states, the integral

∫ 10 µ(ΩG(g | A, r)) dr is

the expected value for entering a state in A through playing g for µ. The set Gg(A, q)collects all distributions the expected value of which is greater that q. We ask for allstates such that Angel has this set in its portfolio when playing γ in this state. Beingable to select this set from the portfolio means that when playing γ and subsequentlyg a state in A may be reached, with probability greater than q.

Å This is just the translation of assumption º. above with a repeated application of therule  for angelic choice:

ΩG(g∗; g0 | A, q) :=⋂

a∈Q(q)(∞)

⋃n≥0

ΩG(gn; g0 | A, an+1)

with g0 := ε and gn := g; . . . ; g (n times).

One observes that q 7→ ΩG(γ | A, q) is a monotonically decreasing function for γ ∈ Γ , becauseq1 ≥ q2 implies βββA(> q1, A) ⊆ βββA(> q2, A), so that βββA(> q1, A) ∈ Pγ(s) implies βββA(>q2, A) ∈ Pγ(s). The situation changes, however, when Demon enters the stage, because thenq 7→ ΩG(γd | A,Q) is increasing. So q 7→ ΩG(g | A, q) can be guaranteed to be monotonicallydecreasing only if g belongs to the PDL fragment (shattering the hope to simplify somearguments due to monotonicity.

It has to be established for every game g that ΩG(g | A, q) ∈ A, provided A ∈ A. Welook at different cases, but we have do some preparations first. They address the embeddedintegration which is found in case Ä.

Corollary 4.9.28 Let B ∈ A⊗ B([0, 1]), then

〈µ, q〉 ∈ S(S)× [0, 1] |∫ 1

0µ(Br) dr ./ q

defines a measurable subset of S(S)× [0, 1].

Proof 1. We claim that the map µ 7→∫ 1

0 µ(Br) dr is ℘℘℘(S)-B([0, 1])-measurable. The claim Planis established in a step wise manner, first we establish that Br is a measurable set for eachr, then we show that r 7→ µ(Br) gives a measurable function for every µ, and then we arenearly done: The assertion follows from Corollary 4.2.5.

2. Lemma 4.1.8 tells us that Br ∈ A for all B ∈ A ⊗ B([0, 1]), so that µ(Br) is defined foreach r ∈ [0, 1]. The map r 7→ µ(Br) defines a measurable map [0, 1] → R+; this is shownin Lemma 4.9.1, hence

∫ 10 µ(Br) dr is defined. We finally show that the map in question is

measurable by applying the principle of good sets. Let

G := B ∈ A⊗ B([0, 1]) | the assertion is true for B.

Page 442: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

430 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Then B = A × C ∈ G for A ∈ A and C ∈ B([0, 1]), since∫ 1

0 µ(Br) dr = µ(A) · λ(C), whichdefines by a ℘℘℘(S)-B([0, 1])-measurable function, λ being Lebesgue measure on [0, 1]. G isclosed under complementation and under disjoint countable unions by Corollary 4.8.7. Hencewe obtain G = A⊗ B([0, 1]) from the π-λ-Theorem 1.6.30, because the set of all measurablerectangles generates this σ-algebra and is closed under finite intersection. a

This technical property on measurability has this as an immediate consequence:

Corollary 4.9.29 Let P : S → E(S) be a stochastic effectivity function, and assume B ∈A⊗ B([0, 1]). Put

G := 〈µ, q〉 ∈ S(S)× [0, 1] |∫ 1

0µ(Br) dr ./ q.

Then〈s, q〉 ∈ S × [0, 1] | Gq ∈ P (s) ∈ A ⊗ B([0, 1]).

Proof G ∈ ℘℘℘(S)⊗B([0, 1]) by Corollary 4.9.28, so the assertion follows from t-measurability.a

Perceiving the set ΩG(g | A, r) as the result of transforming set A through game g, we maysay something about the effect of the transformation which is induced by game γ; g withγ ∈ Γ . We need this technical property for making sure that we do not leave the kingdom ofmeasurable sets while playing our games.

Lemma 4.9.30 Let g be a game such that 〈s, r〉 ∈ S × [0, 1] | s ∈ ΩG(g | A, r) is ameasurable subset of S × [0, 1], and assume that γ ∈ Γ . Then

〈s, r〉 ∈ S × [0, 1] | s ∈ ΩG(γ; g | A, r)

is a measurable subset of S × [0, 1].

Proof This follows from Corollary 4.9.29, because Pγ is t-measurable. a

The transformation associated with the indefinite iteration in Å above involves an uncountableintersection, since for q > 0 the set Q(∞)(q) has the cardinality of the continuum. Since σ-algebras are closed only under countable operations, we might in this way generate a setwhich is not measurable at all, unless we do take cautionary measures into account.

If the state space is closed under the Souslin operation, which is discussed in Section 4.5,then the resulting set will still be measurable. A fairly popular class of spaces closed underthis operation is the class of universally complete measurable spaces, see Proposition 4.6.4.The class of analytic sets in a Polish space is closed under the Souslin operation as well,see Proposition 4.5.1 together with Theorem 4.5.5 (but we have to be careful here, becausethe class of analytic sets in not closed under complementation, and demonic argumentationintroduces complements).

In order to deal with indefinite iteration in Å, we establish first that we can encode theelements of Q(∞)(q) conveniently through infinite sequences over N0. This will then serve asan encoding, so that we obtain ΩG(g∗; g0 | A, q) as the result of a Souslin scheme, as definedon page 377.

Recall that ·|n is the truncation operator which takes the first n elements of an infinitesequence or a sequence of length greater than n.

Page 443: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 431

Lemma 4.9.31 There exists for q > 0 a bijection f : NN0 → Q(∞)(q) such that α|n = α′|n

implies f(α)|n = f(α′)|n for all n ∈ N and all α, α′ ∈ NN0 .

Proof 0. Let us look at the idea for a proof. Define Dn := a|n | a ∈ Q(∞)(q) as theIdea for a

proofset of all truncated sequences in Q(∞)(q) of length n. Since q > 0, the set D1 = [0, q] ∩ Qis not empty and countable, hence we can find a bijection `1 : D1 → N0. Let us see whathappens when we consider D2. If 〈a1, a2〉 ∈ D2, we consider two cases: a1 + a2 = q ora1+a2 < q. In the former case, we put `2(a1, a2) := 〈`1(a1), 0〉, in the latter case we know that[0, q−(a1+a2)]∩Q 6= ∅ is countable, thus we find a bijection ha1,a2 : [0, q−(a1+a2)]∩Q→ N0,so we put `2(a1, a2) := 〈`1(a1), ha1,a2(a2)〉. This yields a bijection, and the projection of `2equals `1. The inductive step is done through the same argumentation. From this constructionwe piece together a bijection Q(∞)(q) → NN

0 , which will be inverted to give the function weare looking for.

1. We construct first inductively for each n ∈ N a bijection `n : Dn → Nn0 such that`n+1(w)|n = `n(w|n) holds for all w ∈ Dn+1. `1 : D1 = [0, q] ∩ Q → N0 is an arbitrarybijection, and if `n is already defined, put for w = 〈w1, . . . , wn+1〉 ∈ Dn+1

`n+1(w) :=

〈`n(w1, . . . , wn), 0〉, if w1 + . . .+ wn = q,

〈`n(w1, . . . , wn), hw1,...,wn(wn+1)〉, otherwise,

where hw1,...,wn : [0, q − (w1 + . . .+ wn)] ∩Q→ N is a bijection. An easy inductive argumentshows that `n+1 is bijective, if `n is, and that the projectivity condition `n+1(w)|n = `n(w|n)holds for each w ∈ Dn+1.

2. Define `∞(a)|n := `n(a|n) for a ∈ Q(∞)(q) and n ∈ N, then the projectivity condition makessure that `∞ : Q(∞)(q) → NN

0 is well defined. Because each `n is a bijection, `∞ is. Assumethat α|n = α′|n holds for α, α′ ∈ NN

0 and for some n ∈ N, then α = `∞(a), α′ = `∞(a′)for some a, a′ ∈ Q(∞)(q). Thus α|n = `∞(a)|n = `n(a|n), hence a|n = a′|n, since `n isinjective.

3. Now let f be the inverse to `∞, then the assertion follows. a

After this technical preparation, we are in a position to establish this important prop-erty.

Proposition 4.9.32 Let g and g0 be games such that ΩG(gn; g0 | A, r) is a measurable subsetof S for each n ∈ N and each rational r ∈ [0, 1]. Assume that the measurable space S isclosed under the Souslin operation. Then ΩG(g∗; g0 | A, q) is a measurable subset of S foreach n ∈ N and each rational q ∈ [0, 1].

Proof 0. The proof just encodes the representation of ΩG(g∗; g0 | A, q) so that a Souslinscheme arises.

1. Let f : NN0 → Q(∞)(q) from the encoding Lemma 4.9.31. Write

ΩG(g∗; g0 | A, q) =⋂

a∈Q(q)(∞)

⋃n≥0

ΩG(gn; g0 | A, an+1)

=⋂α∈NN

0

⋃n≥0

Cα|n

Page 444: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

432 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

with Cα|n := ΩG(gn; g0 | A, f(α)n+1). a

Thus we know that the construction above does not lead us outside the realm of measurablesets, provided the space is closed under the Souslin operation. We may think of A 7→ ΩG(g |A, q) as a set transformation. But this applies for the time being only to those games thesyntactic pattern is is given by one of the cases above, and we have to investigate now whetherwe can interpret each game that way.

Call the game g interpretable iff ΩG(g | A, q) is defined for each A ∈ A, q ∈ [0, 1] rational soInter-pretablegame

that the setGr(g,A) := 〈s, q〉 ∈ S × [0, 1] | s ∈ ΩG(g | A, q)

is a measurable subset of S × [0, 1]. Note that we exclude for the time being the test opera-tor.

Lemma 4.9.33 Each game g is interpretable, provided the state space is closed under theSouslin operation.

Proof 0. The proof employs a kind of inductive strategy. We show first that all primitiveStrategygames are interpretable, and that the dual of an interpretable game is interpretable again.Then we go through the different cases and have a look at what happens.

1. LetJ := g | g is interpretable.

We show that J contains all games.

2. Let A ∈ A, then

〈µ, q〉 ∈ S(S)× [0, 1] | µ ∈ βββA(A,> q) = 〈µ, q〉 ∈ S(S)× [0, 1] | µ(A) > q

is a measurable subset of S(S)× [0, 1], see Corollary 4.2.5. Thus, if γ ∈ Γ , we have

Gr(γ,A) = 〈s, q〉 ∈ S × [0, 1] | βββA(A,> q),

which is a measurable subset of S × [0, 1], because Pγ is t-measurable. Consequently, weobtain Γ ⊆ J .

3. Clearly, J is closed under demonization and angelic choice, hence under demonic choice aswell. Now let

L := g | g; g1 is interpretable for all interpretable g1.

Then Lemma 4.9.30 implies that Γ ∪ γd | γ ∈ Γ ⊆ L. Moreover, because angelic choicedistributes from the left over composition, L is closed under angelic choice. It is also closedunder demonization: Let g ∈ L and take an interpretable game g1, then g; gd1 is interpretable,thus the interpretation of (g; gd1)d is defined, hence gd; g1 is interpretable. Clearly, L is closedunder composition. Thus g ∈ L implies g∗ ∈ L as well as g× ∈ L, so that L is the set of allgames.

This implies that J is closed under composition, and hence both under angelic and demoniciteration. Consequently, J contains all games. a

Summarizing, we obtain.

Page 445: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 433

Proposition 4.9.34 Assume that the state space is closed under the Souslin operation, thenwe have ΩG(g | A, q) ∈ A for all games g, A ∈ A and 0 ≤ q ≤ 1.

Proof We infer for each game g from Lemma 4.9.33 that for A ∈ A the set Gr(g,A) is ameasurable subset of S × [0, 1]. But ΩG(g | A, q) = Gr(g,A)q. a

We relate game frames to each other through morphisms. Suppose that H = (T, (Qγ)γ∈Γ )is another game frame, then f : G → H is a game frame morphism iff f : Pγ → Qγ is amorphism for the associated effectivity functions for all γ ∈ Γ (these morphisms are definedin Definition 4.1.25).

Gameframe

morphism

The transformations above are compatible with frame morphisms.

Proposition 4.9.35 Let f : G → H be a game frame morphism, and assume that ΩG(g | ·, q)and ΩH(g | ·, q) always transforms measurable sets into measurable sets for all games g andall q. Then we have

f−1[ΩH(g | B, q)

]= ΩG(g | f−1

[B], q)

for all games g, all measurable sets B ∈ B(T ) and all q.

Proof 0. The proof proceeds by induction on g. Because f is a morphism, the assertion istrue for g = γ ∈ Γ . Because f−1 is compatible with the Boolean operations on sets, it issufficient to consider the case g = γ; g1 in detail. It wants essentially the computation of animage measure, and it looks really worse than it is.

1. Assume that the assertion is true for game g1, fix B ∈ B, q ≥ 0. Then

Gg1,G(f−1[B], q) = µ ∈ S(S) |

∫ 1

0µ(ΩG(g1 | f−1

[B], r))dr > q

(∗)= µ ∈ S(S) |

∫ 1

0µ(f−1

[ΩG(g1 | B, r)

])dr > q

(⊕)= µ ∈ S(S) |

∫ 1

0Sf(µ)

(ΩH(g1 | B, r)

)dr > q

= (Sf)−1[ν ∈ S(T ) |

∫ 1

0ν(ΩH(g1 | f−1

[B], r))dr > q

]= (Sf)−1

[Gg1,H(B, q)

]The equation (∗) derives from the induction hypothesis, and (⊕) from the definition of(Sf)(µ).

2. Because f : Pγ → Qγ is a morphism, we obtain from the first part

ΩG(γ; g1 | f−1[B], q) = s ∈ S | Gg1,G(f−1

[B], q) ∈ Pγ(s)

= s ∈ S | (Sf)−1[Gg1,H(B, q)

]∈ Pγ(s)

= s ∈ S | Gg1,H(B, q) ∈ Qγ(f(s))= f−1

[ΩH(γ; g1 | B, q)

]This shows that the assertion is also true for g = γ; g1. a

Let us briefly interpret Proposition 4.9.35 in terms of natural transformations, providing somecoalgebraic sfumatura.

Page 446: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

434 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 4.9.36 We extract in this example the measurable sets A from a measurable space(S,A) which serves as a state space for our discussions. Define T (S) := A, then T acts as acontravariant functor from the category of measurable spaces satisfying the Souslin conditionto the category of sets, where the measurable map f : S → T is mapped to T f : T (T )→ T (S)by T f := f−1.

Fix a game g and a real q ∈ [0, 1], then ΩG(g | ·, q) : T (S) → T (S) by Proposition 4.9.34,provided S satisfies the Souslin condition. Then ΩG(g | ·, q) induces a natural transformationT → T , because by Proposition 4.9.35 this diagram commutes:

S

f

T (T )

ΩH(g|·,q)

f−1// T (S)

ΩG(g|·,q)

T T (T )f−1

// T (S)

,

Kripke generated frames.

A stochastic Kripke frame K = (S, (Kγ)γ∈Γ ) is a measurable state space S, each primitivegame γ ∈ Γ is associated with a stochastic relation Kγ : S S, see Example 4.2.7. Mor-phisms carry over in the obvious fashion from stochastic relations to stochastic Kripke framesby applying the defining condition to the stochastic relation associated with each primitivegame, see Section 4.1.3.

We associate with K a game frame GK := (S, (PKγ )γ∈Γ ). Thus the transformations associatedwith games considered above are also applicable to Kripke models. We will discuss this shortly.An application of Proposition 4.1.24 for each γ ∈ Γ states under which conditions a gameframe is generated by a stochastic Kripke frame. Just for the record:

Proposition 4.9.37 Let G = (S, (Pγ)γ∈Γ ) be a game frame. Then these conditions areequivalent

1. There exists a stochastic game frame K with G = GK.

2. Rγ(s) := 〈r,A〉 | βββA(A,≥ r) ∈ Pγ(s) defines a characteristic relation on S such thatPγ(s) ` Rγ(s) for each state s ∈ S, γ ∈ Γ . a

Let K = (S, (Kγ)γ∈Γ ) be a Kripke frame with associated game frame GK. Kγ : S S areKleisli morphisms; their product — sometimes called convolution — is defined through

(Kγ1 ∗Kγ2)(s)(A) :=

∫SKγ2(t)(A) Kγ1(s)(dt),

see Example 4.9.6. Intuitively, this gives the probability of reaching a state in A ∈ A, providedwe start with game γ1 in state s and continue with game γ2, averaging over intermediatestates (here γ1, γ2 ∈ Γ ). The observation that composing stochastic relations models thecomposition of modalities is one of the cornerstones for the interpretation of modal logicsthrough Kripke models [Pan09, Dob09].

Page 447: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 435

Let ΩG(A | g, q) be defined as above when working in the game frame associated with Kripkeframe K. It turns out that ΩG(A | γ1; . . . ; γk, q) can be described in terms of the Kleisliproduct for Kγ1 , . . . ,Kγk , provided γ1, . . . , γk ∈ Γ are primitive games.

Proposition 4.9.38 Assume that γ1, . . . , γk ∈ Γ , then this equality holds in the game frameG associated with the Kripke frame K

ΩG(A | γ1; . . . γk, q) = s ∈ S | (Kγ1 ∗ . . . ∗Kγk)(s)(A) > q.

for all A ∈ A, 0 ≤ q < 1.

Proof 1. The proof proceeds by induction on k. If k = 1, we have

s ∈ ΩG(A | γ1, q)⇔ βββA(A,> q) ∈ PK,γ1(s)⇔ Kγ1(s)(A) > q.

2. Assume that the claim is established for k, and let γ0 ∈ Γ . Then, borrowing the notationfrom above,

s ∈ ΩG(A | γ0; γ1; . . . ; γk, q)⇔ Gγ1;...;γk(A, q) ∈ PK,γ0(s)

⇔ Kγ0(s) ∈ Gγ1;...;γk(A, q)

⇔∫ 1

0Kγ0(s)(ΩG(A | γ1; . . . ; γk, r)) dr > q

(†)⇔∫ 1

0Kγ0(s)(t | (Kγ1 ∗ . . . ∗Kγk)(t)(A) > r) dr > q

(‡)⇔∫S

(Kγ1 ∗ . . . ∗Kγk)(t)(A) Kγ0(s)(dt) > q

(‖)⇔ (Kγ0 ∗Kγ1 ∗ . . . ∗Kγk)(s)(A) > q

Here (†) marks the induction hypothesis, (‡) is just the integral representation given ineq. (4.11) in Example 4.9.7, and (‖) is the definition of the Kleisli product. This establishesthe claim for k + 1. a

We note as a consequence that the respective definitions of state transformations throughthe games under consideration coincide for game frames generated by Kripke frames. On theother hand it is noted that the definition of these transformations for general frames basedon effectivity functions extends the one which has been used for Kripke frames for generalmodal logics.

Distributivity in the PDL fragment

The games which are described through the grammar from Example 2.7.4

t ::= ψ | t1 ∪ t2 | t1; t2 | t∗ | ϕ?

with ψ ∈ Ψ := Γ as the set of atomic programs, and ϕ a formula of the underlying modallogic define the PDL fragment of game logic. The corresponding games are called programsfor simplicity. We will show now that in this fragment

ΩG(· | g1; (g2 ∪ g3), ·) = ΩG(· | g1; g2 ∪ g1; g3), ·)

Page 448: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

436 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

holds, provided frame G is generated by a stochastic Kripke frame K.

Recall that Mσ(S,A) is the set of σ-finite measures on (S,A) to the extended non-negativereals. Mσ(S) is closed under addition and under multiplication with non-negative reals (weomit the σ-algebra in the sequel). The set is also closed under countable sums: given (µn)n∈Nwith µn ∈Mσ(S), put (∑

n∈Nµn)(A) := sup

n∈N

∑i≤n

µi(A).

Then∑

n∈N µn is monotone and σ-additive with(∑

n∈N µn)(∅) = 0, hence a member of

Mσ(S).

Call a map N : S → Mσ(S) an extended kernel iff for each A ∈ A the map s 7→ N(s)(A) ismeasurable with the usual conventions regarding measurability to the extended reals R, seeSection 4.9.1. Extended kernels are closed under convolution: Put

(N1 ∗N2)(s)(A) :=

∫SN2(t)(A) N1(s)(dt),

then N1 ∗ N2 is an extended kernel again. This is but the Kleisli composition applied toextended kernels. Thus Mσ(S) is closed under convolution which distributes both from the leftand from the right under addition and under scalar multiplication. Note that the countablesum of extended kernels is an extended kernel as well.

Define recursively for the stochastic relations in the Kripke frame K

Kg1∪g2 := Kg1 +Kg2 ,

Kg1;g2 := Kg1 ∗Kg2 ,

Kg∗ :=∑n≥0

Kgn .

This defines Kg for each g in the PDL-fragment of game logic, similar to the proposalin [Koz85].

Define

L(A | g, r) := S \ΩG(A | g, r),

where G is the game frame associated with the Kripke frame K over state space S, A ∈ A isa measurable set, g is a program, i.e., a member of the PDL fragment, and r ∈ [0, 1]. It ismore convenient to work with these complements, as we will see in a moment.

Lemma 4.9.39 L(A | g, r) = s ∈ S | Kg(s)(A) ≤ r holds for all programs g, all measurablesets A ∈ A and all r ∈ [0, 1].

Proof 1. The proof is fairly straightforward and proceeds by induction on g. Assume thatg = γ ∈ Γ is a primitive program, then

Kγ(s)(A) ≤ r ⇔ Kγ(s) /∈ βββA(A,> r)⇔ βββA(A,> r) /∈ Pγ(s)⇔ s /∈ ΩG(A | g, r)

Page 449: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 437

2. Assume that the assertion is true for g1 and g2, then

L(A | g1 ∪ g2, r) =⋂

〈a1,a2〉∈Q(k)(r)

(L(A | g1, a1) ∩ L(A | g2, a2)

)=

⋂〈a1,a2〉∈Q(k)(r)

(s | Kg1(s)(A) ≤ a1 ∩ s | Kg2(s)(A) ≤ a2

)= s ∈ S | (Kg1 +Kg2)(s)(A) ≤ r= s ∈ S | Kg1∪g2(s)(A) ≤ r

3. The proof for angelic iteration g∗ is very similar, observing that∑

n≥0Kgn(s)(A) ≤ r iff

there exists a sequence (an)n∈N ∈ Q(∞)(r) with Kgn(s)(A) ≤ an for all n ∈ N.

4. Finally, assume that the assertion is true for program g, and take γ ∈ Γ . Then, borrowingthe notation from (4.14)

Gg(A, r) /∈ Pγ(s)⇔ Kγ(s) /∈ Gg(A, r)

⇔∫ 1

0Kγ(s)(ΩG(A | g, t)) dt ≤ r

(†)⇔∫ 1

0Kγ(s)(x ∈ S | Kg(x)(A) > t) dt ≤ r

(‡)⇔∫SKg(t)(A) Kγ(s)(dt) ≤ r

(∗)⇔ Kγ;g(s)(A) ≤ r.

Here (†) is the induction hypothesis, (‡) derives from eq. (4.11) in Example 4.9.7 and (∗)comes from the definition of the convolution. a

It follows from this representation that for each program g the set L(A | g∗, r) is a measurablesubset of S, provided A ∈ A. This holds even without the assumption that the state space Sis closed under the Souslin operation.

Proposition 4.9.40 If games g1, g2, g3 are in the PDL fragment, and the game frame G isgenerated by a Kripke frame, then

ΩG(A | g1; (g2 ∪ g3), r) = ΩG(A | g1; g2 ∪ g1; g3, r) (4.15)

ΩG(A | (g1 ∪ g2); g3, r) = ΩG(A | g1; g3 ∪ g2; g3, r) (4.16)

for all A ∈ A, r ≥ 0.

Proof Right distributivity (4.16) is a basic assumption, which is given here for the sake ofcompleteness. It remains to establish left distributivity (4.15). Here it suffices to prove theequality for the respective complements. But this is easily established through Lemma 4.9.39and the observation that Kg1;(g2∪g3) = Kg1;g2 + Kg1;g3 holds, because integration of non-negative functions is additive. a

Page 450: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

438 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Game Models

We discuss briefly game models, because we need a model for the discussion of the testoperator, which has been delayed so far. A game model describes the semantics of a gamelogic, which in turn is a modal logic where the modalities are given through games. It isdefined through grammar

ϕ = > | p | ϕ1 ∧ ϕ2 | 〈g〉qϕ,

see Example 4.1.11. Here p ∈ Ψ is an atomic proposition, g is a game, and q ∈ [0, 1] is a realnumber. Intuitively, formula 〈g〉qϕ is true in state s if playing game g in state s will result ina state in which formula ϕ holds with a probability greater than q.

Definition 4.9.41 A game model G =((S,A), (Pγ)γ∈Γ , (Vp)p∈Ψ

)over the measurable space

S is given by a game frame((S,A), (Pγ)γ∈Γ

), and by a family (Vp)p∈Ψ ⊆ A of sets which

assigns to each atomic statement a measurable set of state space S. We denote the underlyinggame frame by G as well.

Define the validity sets for each formula recursively as follows:[[ϕ]]G

[[>]]G := S

[[p]]G := Vp, if p ∈ Ψ[[ϕ1 ∧ ϕ2]]G := [[ϕ1]]G ∩ [[ϕ2]]G

[[〈g〉qϕ]]G := ΩG([[ϕ]]G | g, q)

Accordingly, we say that formula ϕ holds in state s, in symbols G, s |= ϕ, iff s ∈ [[ϕ]]G .G, s |= ϕ

The definition of [[〈g〉qϕ]]G has a coalgebraic flavor. Coalgebraic logics define the validityof modal formulas through special predicate liftings associated with the modalities, see Sec-tion 2.7.3. This connection becomes manifest through Example 4.9.36 where ΩG(· | g, q) isshown to be a natural transformation.

Proposition 4.9.42 If state space S is closed under the Souslin operation, [[ϕ]]G is a mea-surable subset for all formulas ϕ. Moreover, 〈s, r〉 | s ∈ [[〈g〉rϕ]]G ∈ A ⊗ [0, 1].

Proof The proof proceeds by induction on the formula ϕ. If ϕ = p ∈ Ψ is an atomicproposition, then the assertion follows from Vp ∈ A. The straightforward induction step usesLemma 4.9.30. a

Stochastic Kripke models are defined similarly to game models: K =((S,A), (Kγ)γ∈Γ , (Vp)p∈Ψ

)is called a stochastic Kripke model iff

((S,A), (Kγ)γ∈Γ

)is a stochastic Kripke frame with

Vp ∈ A for each atomic proposition p ∈ Ψ . Validity of a formula in the state of a stochasticKripke model is defined as validity in the associated game model. Thus we know that forprimitive games γ1, . . . , γn ∈ Γ

s ∈ [[〈γ1; . . . ; γn〉qϕ]]G ⇔ K, s |= 〈γ1; . . . ; γn〉qϕ⇔(Kγ1 ∗ . . . ∗Kγn

)(s)([[ϕ]]K) ≥ q (4.17)

holds (Proposition 4.9.38), and that games are semantically equivalent to their distributivecounterparts (Proposition 4.9.40).

Page 451: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 439

Let H = (T, (Qγ)γ∈Γ , (Wp)p∈Ψ ) be a second game model, then a measurable map f : S → Twhich is also a frame morphism f : (S, (Pγ)γ∈Γ )→ (T, (Qγ)γ∈Γ ) is called a model morphismf : G → H iff f−1

[Wp

]= Vp holds for all atomic propositions, i.e., if f(s) ∈ Wp iff s ∈ Vp

Modelmorphism

always holds. Model morphisms are compatible with validity:

Proposition 4.9.43 Let ϕ be a formula of game logic, f : G → H be a model morphism.Then

G, s |= ϕ iff H, f(s) |= ϕ.

Proof The claim is equivalent to saying that

[[ϕ]]G = f−1[[[ϕ]]H

]for all formulas ϕ. This is established through induction on the formula ϕ. Because f is amodel morphism, the assertion holds for atomic proposition. The induction step is establishedthrough Proposition 4.9.35. a

We assume from now on that the respective state spaces of our models are closed under theSouslin operation.

The Test Operator

The test operator ϕ? may be incorporated now. Given a formula ϕ, Angel may test whetheror not the formula is satisfied; this yields the two games ϕ? and ϕ¿. Game ϕ? checks whether ϕ?, ϕ¿formula ϕ is satisfied in the current state; if it is, Angel continues with the next game, if itis not, Angel loses. Similarly for ϕ¿: Angel checks, whether formula ϕ is not satisfied. Notethat we do not have negation in our logic, so we cannot test directly for ¬ϕ. We can test,however, whether a state state s does not satisfy a formula ϕ by evaluating s ∈ S \ [[ϕ]]G ,since complementation is available in the underlying σ-algebra. So we actually extend ourconsiderations by incorporating two test operators, but talk for convenience usually about“the” test operator.

In order to seamlessly integrate these testing games into our models, we define for eachformula two effectivity functions for positive and for negative testing, resp. The followingtechnical observations will be helpful. It helps transporting stochastic effectivity functionsalong measurable maps.

Lemma 4.9.44 Let P be a stochastic effectivity function on S, and assume that F : S(S)→S(S) is measurable, then P ′(s) := W ∈ ℘℘℘(S) | F−1

[W]∈ P (s) defines a stochastic effec-

tivity function on S.

Proof P ′(s) is upper closed, since P (s) is, so t-measurability has to be established. LetH ∈ ℘℘℘(S)⊗ B([0, 1]) be a test set, then Hq ∈ P ′(s)⇔

((F × id[0,1])

−1[H])q ∈ P (s). Since F

is measurable, F ×id[0,1] : S(S)× [0, 1]→ S(S)× [0, 1] is, hence (F × id[0,1])−1[H]

is a memberof ℘℘℘(S)⊗B([0, 1]). Because P is t-measurable, we conclude 〈s, q〉 | Hq ∈ P ′(s) ∈ A⊗ [0, 1].a

Page 452: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

440 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Lemma 4.9.45 Define for A ∈ A, µ ∈ S(S) and B ∈ A the localization to A as FA(µ)(B) := Localizationµ(A ∩B). Then FA : S(S)→ S(S) is measurable.

Proof This follows from F−1A

[βββA(C, ./ q)

]= βββA(A ∩ C, ./ q). a

FA localizes measures to the set A, because everything outside A is discarded. Now definefor state s and formula ϕ

Pϕ?(s) := W ∈ ℘℘℘(S) | F−1[[ϕ]]G

[W]∈ ID(s),

Pϕ¿(s) := W ∈ ℘℘℘(S) | F−1S\[[ϕ]]G

[W]∈ ID(s),

where ID is the Dirac function defined in Example 4.1.18. Let us decode the definitionfor [[ϕ]]G . We have W ∈ Pϕ?(s) iff δs ∈ F−1

[[ϕ]]G

[W], thus iff F[[ϕ]]G (δs) ∈ W . Specializing to

W = βββA(A,> q) translates the latter condition to F[[ϕ]]G (δs)(A) > q hence to δs([[ϕ]]G∩A) > q,which in turn is equivalent to G, s |= ϕ and s ∈ A. Thus we see

βββA(A,> q) ∈ Pϕ?(s)⇔ G, s |= ϕ and s ∈ AβββA(A,> q) ∈ Pϕ¿(s)⇔ G, s 6|= ϕ and s ∈ A

(the argument for Pϕ¿ is completely analogous). We obtain

Proposition 4.9.46 Pϕ? and Pϕ¿ define a stochastic effectivity function for each formula ϕ.

Proof From Proposition 4.9.42 we infer that [[ϕ]]G ∈ A, consequently, F[[ϕ]]G and FS\[[ϕ]]Gare measurable functions S(S) → S(S) by Lemma 4.9.45. Thus the assertion follows fromLemma 4.9.44. a

Given a stochastic Kripke model, test operators may be defined as well; they serve also for theintegration of the test operators into PDL in a similar way. The definitions for the associatedstochastic relations Kϕ? : S S and Kϕ¿ : S S read

Kϕ?(s) := F[[ϕ]]G (D(s)),

Kϕ¿(s) := FS\[[ϕ]]G (D(s)).

These relations can be defined for formulas of game logic as well, when they are interpretedthrough a Kripke model. Thus we have, e.g.,

Kϕ?(s)(B) =

1, if s ∈ B and G, s |= ϕ

0, otherwise.

This is but a special case, since Pϕ? and Pϕ¿ are generated by these stochastic relations.

Lemma 4.9.47 Let ϕ be a formula of game logic, then Pϕ? = PKϕ? and Pϕ¿ = PKϕ¿ .

Proof The assertions follow from expanding the definitions. a

This extension integrates well into the scenario, because it is compatible with morphisms. Wewill establish this now. Because a model morphism is given by a morphism for the underlyinggame frame, and a morphisms for game frames is determined by morphisms for the underlyingeffectivity functions, it is enough to show that a morphism f : Pγ → Qγ for all γ ∈ Γ is alsoa morphism f : Pϕ? → Qϕ? for all formulas ϕ, similarly for ϕ¿.

Page 453: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.9. PRODUCT MEASURES 441

Proposition 4.9.48 Let G and H be game models over state spaces S and T , resp. Assumethat f : G → H is a morphism of game models; define for each formula ϕ the effectivityfunctions Pϕ? and Pϕ¿ for G resp. Qϕ? and Qϕ¿ for H. Then f is a morphism Pϕ? → Qϕ?

and Pϕ¿ → Qϕ¿ for each formula ϕ.

Proof 0. Fix formula ϕ; we will prove the assertion only for ϕ?, the proof for ϕ¿ is the same.The notation is fairly overloaded. We will use primed quantities when referring to H andstate space T , and unprimed ones when referring to model G with state space S.

The plan for the proof is as follows: we first show that Sf transforms Fϕ?D into F ′ϕ?D′; here The planwe use the assumption that f is a morphisms, so the validity sets are respects by the inverseimage of f . But once we have shown this, the proof proper is just a matter of showing thatthe corresponding diagram commutes by comparing E(f)(Pϕ?(s)) against Qϕ?(f(s)).

Note first that S(f)(Fϕ?(D(s))) = F ′ϕ?(D′(f(s))), because we have for each G ∈ B

Sf(Fϕ?(D(s))

)(G) = Fϕ?(D(s))(f−1

[G]) = D(s)

([[ϕ]]G ∩ f−1

[G])

(†)= D(s)

(f−1

[[[ϕ]]H

]∩ f−1

[G])

= D′(f(s))([[ϕ]]H ∩G

)= F ′ϕ?

(D′(f(s))

)(G).

We have used f−1[[[ϕ]]H

]= [[ϕ]]G in (†), since f is a morphism, see Proposition 4.9.43. But

now we may conclude

W ∈ E(f)(Pϕ?(s))⇔ Sf(Fϕ?(D(s))) ∈W ⇔ F ′ϕ?(D′(f(s))) ∈W ⇔W ∈ Qϕ?(f(s)),

hence E(f) Pϕ? = Qϕ? f is established. a

Example 4.9.49 We compute [[〈p?; g〉qϕ]]G and [[〈p¿; g〉qϕ]]G for a primitive formula p ∈ Ψand an arbitrary game g for the sake of illustration.

First a technical remark: Let λ be Lebesgue measure on the unit interval, then

G, s |= 〈g〉qϕ⇔ λ(r ∈ [0, 1] | G, s |= 〈g〉rϕ) > q (4.18)

In fact, the map r 7→ [[〈g〉rϕ]]G is monotone and decreasing; this is intuitively clear: if Angelwill have a strategy for reaching a state in which formula ϕ holds with probability at leastq > q′, it will have a strategy for reaching such a state with probability at least q′. But thisentails that the set r ∈ [0, 1] | G, s |= 〈g〉rϕ constitutes an interval which contains 0 if it isnot empty. This interval is longer than q (i.e., its Lebesgue measure is greater than q) iff q iscontained in it. From this (4.18) follows.

Now assume G, s |= 〈p?; g〉qϕ. Thus

Gg([[ϕ]]G , q) ∈ Pp?(s)⇔ FVp(D(s)) ∈ Gg([[ϕ]]G , q)⇔∫ 1

0D(s)(Vp ∩ [[〈g〉rϕ]]G) dr > q,

which means

D(s)(Vp) ·∫ 1

0D(s)([[〈g〉rϕ]]G) dr > q.

Page 454: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

442 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

This implies

D(s)(Vp) = 1 and

∫ 1

0D(s)([[〈g〉rϕ]]G) dr > q,

the latter integral being equal to λ(r ∈ [0, 1] | G, s |= 〈g〉rϕ). Hence by (4.18) it follows thatG, s |= 〈g〉qϕ. Thus we have found

G, s |= 〈p?; g〉qϕ⇔ G, s |= p ∧ 〈g〉qϕ.

Replacing in the above argumentation Vp by S \Vp, we see that G, s |= 〈p¿; g〉qϕ is equivalentto

D(s)(S \ Vp) = 1 and

∫ 1

0D(s)([[〈g〉rϕ]]G) dr > q.

Because we do not have negation in our logic, we obtain

G, s |= 〈p¿; g〉qϕ⇔ G, s 6|= p and G, s |= 〈g〉qϕ.

,

We leave this logic now and return to the discussion of topological properties of the space ofall finite measures.

4.10 The Weak Topology

Now that we have integration at our disposal, we will look again at topological issues for thespace of finite measures. We fix in this section (X, d) as a metric space; recall that C(X) isthe space of all bounded continuous functions X → R. This space induces the weak topologyon the space M(X) = M(X,B(X)) of all finite Borel measures on (X,B(X)). This is thesmallest topology which renders the evaluation map

evf : µ 7→∫Xf dµ

continuous for every continuous and bounded map f : X → R, i.e., the initial topologywith respect to (evf )f∈C(X), see Definition 3.1.14. This topology is fairly natural, and itis related to the topologies on M(X) considered so far, the Alexandrov topology and thetopology given by the Levy-Prohorov metric, which are discussed in Section 4.1.4. We willshow that these topologies are the same, provided the underlying space is Polish, and wewill for this case demonstrate that M(X) is a Polish space itself. Somewhat weaker resultsmay be obtained if the base space is only separable metric, and it turns out that tightness,i.e., inner approximability through compact sets, is the property which sets Polish spacesapart. We introduce also a very handy metric for the weak topology due to Hutchinson. Twocase studies on bisimulations of Markov transition systems and on quotients for stochasticrelations demonstrate the interplay of topological considerations with selection arguments,which become available on M(X) once this space is identified as Polish.

Define as the basis for the topology the sets

Uf1,...,fn,ε(µ) := ν ∈M(X) |∣∣∫X fi dν −

∫X fi dµ

∣∣ < ε for 1 ≤ i ≤ n

Page 455: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 443

with ε > 0 and f1, . . . , fn ∈ C(X). Call the topology the weak topology on M(X).

With respect to convergence, we have this characterization, which indicates the relationshipbetween the weak topology and the Alexandrov-topology investigated in Section 4.1.4.

Theorem 4.10.1 The following statements are equivalent for a sequence (µn)n∈N ⊆M(X).

1. µn → µ in the weak topology.

2.∫X f dµn →

∫X f dµ for all f ∈ C(X).

3.∫X f dµn →

∫X f dµ for all bounded and uniformly continuous f : X → R.

4. µn → µ in the A-topology.

Proof The implications 1 ⇒ 2 and 2 ⇒ 3 are trivial.

3 ⇒ 4: LetG ⊆ X be open, then fk(x) := 1∧k·d(x,X\G) defines a uniformly continuous map(because |d(x,X \G)−d(x′, X \G)| ≤ d(x, x′)), and 0 ≤ f1 ≤ f2 ≤ . . . with limk→∞ fk = χG.Hence

∫X fk dµ ≤

∫X χG dµ = µ(G), and by monotone convergence

∫X fk dµ→ µ(G). From

the assumption we know that∫X fk dµn →

∫X fk dµ, as n → ∞, so that we obtain for all

k ∈ Nlimn→∞

∫Xfk dµn ≤ lim inf

n→∞µn(G),

with in turn implies µ(G) ≤ lim infn→∞ µn(G).

4 ⇒ 2 We may assume that f ≥ 0, because the integral is linear. By Example 4.9.7,equation (4.11), we can represent the integral through∫

Xf dν =

∫ ∞0

ν(x ∈ X | f(x) > t) dt.

Since f is continuous, the set x ∈ X | f(x) > t is open. By Fatou’s Lemma (Proposi-tion 4.8.5) we obtain from the assumption

lim infn→∞

∫Xf dµn = lim inf

n→∞

∫ ∞0

µn(x ∈ X | f(x) > t) dt

≥∫ ∞

0lim infn→∞

µn(x ∈ X | f(x) > t) dt

≥∫ ∞

0µ(x ∈ X | f(x) > t) dt

=

∫Xf dµ.

Because f ≥ 0 is bounded, we find T ∈ R such that f(x) ≤ T for all x ∈ X, hence g(x) :=T − f(x) defines a non-negative and bounded function. Then by the preceding argumentlim infn→∞

∫X g dµn ≥

∫X g dµ. Since µn(X)→ µ(X), we infer

lim supn→∞

∫Xf dµn ≤

∫Xf dµ,

which implies the desired equality. a

Page 456: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

444 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Let X be separable, then the A-topology is metrized by the Prohorov metric (Theorem 4.1.49).Thus we have established that the metric topology and the topology of weak convergence arethe same for separable metric spaces. Just for the record:

Theorem 4.10.2 Let X be a separable metric space, then the Prohorov metric is a metricfor the topology of weak convergence. a

It is now easy to find a dense subset in M(X). As one might expect, the measures living ondiscrete subsets are dense. Before stating and proving the corresponding statement, we havea brief look at the embedding of X into M(X).

Example 4.10.3 The base space X is embedded into M(X) as a closed subset throughx 7→ δx. In fact, let (δxn)n∈N be a sequence which converges weakly to µ ∈ M(X). Wehave in particular µ(X) = limn→∞ δxn(X) = 1, hence µ ∈ P (X). Now assume that (xn)n∈Ndoes not converge, hence it does not have a convergent subsequence in X. Then the setS := xn | n ∈ N is closed in X, so are all subsets of S. Take an infinite subset C ⊆ Swith an infinite complement S \ C, then µ(C) ≥ lim supn→∞ δxn(C) = 1, and with the sameargument µ(S \ C) = 1. This contradicts µ(X) = 1. Thus we can find x ∈ X with xn → x,hence δxn → δx, so that the image of X in M(X) is closed. ,

This is what one would expect: the discrete measures form a dense set in the topology ofweak convergence.

Proposition 4.10.4 Let X be a separable metric space. The set

∑k∈N rk · δxk | xk ∈ X, rk ≥ 0

of discrete measures is dense in the topology of weak convergence.

Proof 0. The plan for the proof goes like this: we cover the space with Borel sets of smallPlan forthe proof

diameter, and then take a uniformly continuous function as a witness. Uniform continuitythen makes for uniform deviations on these sets, which establishes the claim.

1. Fix µ ∈ M(X). Cover X for each k ∈ N with open sets (Gn,k)n∈N, each of which has adiameter not less that 1/k. Convert the cover through the first entrance trick to a cover ofmutually disjoint Borel sets An,k ⊆ Gn,k, eliminating all empty sets arising from this process.Select an arbitrary xn,k ∈ An,k. We claim that

µn :=∑k∈N

µ(An,k) · δxn,k

converges weakly to µ.

2. In fact, let f : X → R be a uniformly continuous and bounded map. Since f is uniformlycontinuous,

ηn := supk∈N

(sup

x∈An,kf(x)− inf

x∈An,kf(x)

)

Page 457: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 445

tends to 0, as n→∞. Thus∣∣∣∣∫Xf dµn −

∫Xf dµ

∣∣∣∣ =∣∣∑k∈N

(∫An,k

f dµn −∫An,k

f dµ)∣∣

≤ ηn ·∑k∈N

µ(An,k)

≤ ηn · µ(X)

→ 0.

This establishes the claim and proves the assertion. a

We may arrange the cover in the proof in such a way that the points are taken from a denseset. Hence we obtain immediately

Corollary 4.10.5 If X is a separable metric space, then M(X) is a separable metric spacein the topology of weak convergence.

Proof Because∑n

k=1 rk · δxk →∑

k∈N rk · δxk , as n→∞ in the weak topology, and becausethe rationals Q are dense in the reals, we obtain from Proposition 4.10.4 that

∑nk=1 rk · δxk |

xk ∈ D, 0 ≤ rk ∈ Q, n ∈ N

is a countable and dense subset of M(X), whenever D ⊆ X is acountable and dense subset of X. a

Another immediate consequence refers to the weak-*-σ-algebra. We obtain from Lemma 4.1.50together with Corollary 4.10.5

Corollary 4.10.6 Let X be a metric space, then the weak-*-σ-algebra are the Borel sets ofthe A-topology. a

We will show now that M(X) is a Polish space, provided X is one; thus applying the M-functorto a Polish space does not leave the realm of Polish spaces.

We know from Alexandrov’s Theorem 4.3.27 that a separable metrizable space is Polish iff itcan be embedded as a Gδ-set into the Hilbert cube. We show first that for compact metricX the space S(X) of all subprobability measures with the topology of weak convergence isitself a compact metric space. This is established by embedding it as a closed subspace into[−1,+1]∞. But there is nothing special about taking S; the important property is that allmeasures are uniformly bounded (by 1, in this case). Any other bound would also do.

We require for this the Stone-Weierstraß Theorem which implies that the unit ball in thespace of all bounded continuous functions on a compact metric space is separable itself byCorollary 3.6.46. The idea of the embedding is to take a countable dense sequence (gn)n∈Nof this unit ball. Since we are dealing with probability measures, and since we know thateach gn maps X into the interval [−1, 1], we know that −1 ≤

∫X gn dµ ≤ 1 for each µ. This

then spawns the desired map, which together with its inverse is shown through the RieszRepresentation Theorem to be continuous.

Well, this is the plan of attack for establishing

Proposition 4.10.7 Let X be a compact metric space. Then S(X) is a compact metric space.

Proof 1. The space C(X) of continuous maps into the reals is for compact metric X aseparable Banach space under the sup-norm ‖ · ‖∞ by Corollary 3.6.46. The closed unit

Page 458: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

446 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

ballC1 := f ∈ C(X) | ‖f‖∞ ≤ 1

is a separable metric space in its own right, because it is Polish by Theorem 4.3.26. Let(gn)n∈N be a countable sense subset in C1, and define

Ω : S(X) 3 ν 7→ 〈∫X g1 dν,

∫X g2 dν, . . .〉 ∈ [−1, 1]∞.

Then Ω is injective, because the sequence (gn)n∈N is dense.

2. Also, Ω−1 is continuous. In fact, let (µn)n∈N be a sequence in S(X) such that(Ω(µn)

)n∈N

converges in [−1, 1]∞, put αi := limn→∞∫X gi dµn. For each f ∈ C1 there exists a subsequence

(gnk)k∈N such that ‖f − gnk‖∞ → 0 as k →∞, because (gn)n∈N is dense in C1. Thus

L(f) := limn→∞

∫Xf dµn

exists. Define L(α ·f) := α ·L(f), for α ∈ R, then it is immediate that L : C(X)→ R is linearand that L(f) ≥ 0, provided f ≥ 0. The Riesz Representation Theorem 4.8.20 now gives aunique µ ∈ S(X) with

L(f) =

∫Xf dµ,

and the construction shows that

limn→∞

Ω(µn) = 〈∫Xg1 dµ,

∫Xg2 dµ, . . .〉.

3. Consequently, Ω : S(X) → Ω[S(X)

]is a homeomorphism, and Ω

[S(X)

]is closed, hence

compact. Thus S(X) is compact. a

We obtain as a first consequence

Proposition 4.10.8 X is compact iff S(X) is, whenever X is a Polish space.

Proof It remains to show that X is compact, provided S(X) is. Choose a complete metricd for X. Thus X is isometrically embedded into S(X) by x 7→ δx with A := δx | x ∈ Xbeing closed. We could appeal to Example 4.10.3, but a direct argument is available as well.In fact, if δxn → µ in the weak topology, then (xn)n∈N is a Cauchy sequence in X on accountof the isometry. Since (X, d) is complete, xn → x for some x ∈ X, hence µ = δx, thus A isclosed, hence compact. a

The next step for showing that M(X) is Polish is nearly canonical. If X is a Polish space,it may be embedded as a Gδ-set into a compact space X, the subprobabilities of which aretopologically a closed subset of [−1,+1]∞, as we have just seen. We will show now that M(X)is a Gδ in M(X) as well.

Proposition 4.10.9 Let X be a Polish space. Then M(X) is a Polish space in the topologyof weak convergence.

Proof 1. Embed X as a Gδ-subset into a compact metric space X, hence X ∈ B(X).Put

M0 := µ ∈M(X) | µ(X \X) = 0,

Page 459: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 447

so M0 contains exactly those finite measures on X that are concentrated on X. Then M0 ishomeomorphic to M(X).

2. Write X as X =⋂n∈NGn, where (Gn)n∈N is a sequence of open sets in X. Given r > 0,

the setΓk,r := µ ∈M(X) | µ(X \Gk) < r

is open in M(X). In fact, if µn /∈ Γk,r converges to µ0 in the weak topology, then

µ0(X \Gk) ≥ lim supn→∞

µn(X \Gk) ≥ r

by Theorem 4.10.1, since X \ Gk is closed. Consequently, µ0 /∈ Γk,r. This shows that Γk,r isopen, because its complement is closed. Thus

M0 =⋂n∈N

⋂k∈N

Γn,1/k

is a Gδ-set, and the assertion follows. a

Thus we obtain as a consequence

Proposition 4.10.10 M(X) is a Polish space in the topology of weak convergence iff X is.

Proof Let M(X) be Polish. The base space X is embedded into M(X) as a closed subset byExample 4.10.3, hence is a Polish space by Theorem 4.3.26. a

Let µ ∈ M(X) with X Polish. Since X has a countable basis, we know from Lemma 4.1.46that µ is supported by a closed set, since µ is τ -regular. But in the presence of a completemetric we can say a bit more, viz., that the value of µ(A) may be approximated from withinby compact sets to arbitrary precision.

Definition 4.10.11 A finite Borel measure µ is called tight iff

µ(A) = supµ(K) | K ⊆ A compact

holds for all A ∈ B(X).

Thus tightness means for µ that we can find for any ε > 0 and for any Borel set A ⊆ X acompact set K ⊆ A with µ(A \K) < ε. Because a finite measure on a separable metric spaceis regular, i.e., µ(A) can be approximated from within A by closed sets (Lemma 4.6.13), itsuffices in this case to consider tightness at X, hence to postulate that there exists for anyε > 0 a compact set K ⊆ X with µ(X \ K) < ε. We know in addition that each finitemeasure is τ -regular by Lemma 4.1.45, hence the union of a directed family of open sets hasthe supremum of these open sets as its measure. Capitalizing on this and on completeness,we find

Proposition 4.10.12 Each finite Borel measure on a Polish space X is tight.

Proof 0. We cover the space with open sets which are determined as the open neighborhood Approachof finite sets. From this a directed cover of open sets is easily constructed, and since weknow that the measure is τ -regular, we extract suitable finite sets, from which a compactset is manufactured. This set is then shown to be suitable for our purposes. The important

Page 460: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

448 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

property here is τ -regularity and the observation that a complete bounded set in a metricspace is compact.

1. We show first that we can find for each ε > 0 a compact set K ⊆ X with µ(X \K) < ε.In fact, given a complete metric d, consider

G :=x ∈ X | d(x,M) < 1/n |M ⊆ X is finite

.

Then G is a directed collection of open sets with⋃G = X, thus we know from τ -regularity

of µ that µ(X) = supµ(G) | G ∈ G. Consequently, given ε > 0 there exists for each n ∈ Na finite set Mn ⊆ X with µ(x ∈ X | d(x,Mn) < 1/n) > µ(X)− ε/2n. Now define

K :=⋂n∈Nx ∈ X | d(x,Mn) ≤ 1/n.

Then K is closed, and complete (since (X, d) is complete). Because each Mn is finite, K istotally bounded. Thus K is compact by Theorem 3.5.32. We obtain

µ(X \K) ≤∑n∈N

µ(x ∈ X | d(x,Mn) ≥ 1/n) ≤∑n∈N

ε · 2−n = ε.

2. Now let A ∈ B(X), then for ε > 0 there exists F ⊆ A closed with µ(A\F ) < ε/2, and choseK ⊆ X compact with µ(X \K) < ε/2. Then K ∩F ⊆ A is compact with µ(A \ (F ∩K)) < ε.a

Tightness is sometimes an essential ingredient when arguing about measures on a Polishspace. The discussion on the Hutchinson metric in the next section provides an example, itshows that at crucial point tightness kicks in and saves the day.

4.10.1 The Hutchinson Metric

We will explore now another approach to the weak topology for Polish spaces through theHutchinson metric. Given a fixed metric d on X and a fixed real γ > 0, define

Vγ := f : X → R | |f(x)− f(y)| ≤ d(x, y) and |f(x)| ≤ γ for all x, y ∈ X,

Thus f is a member of Vγ iff f is non-expanding (hence has a Lipschitz constant 1), and iff itssupremum norm ‖f‖∞ is bounded by γ. Trivially, all elements of Vγ are uniformly continuous.Note the explicit dependence of the elements of Vγ on the metric d. The Hutchinson distanceHγ(µ, ν) between µ, ν ∈M(X) is defined as

Hγ(µ, ν) := supf∈Vγ

(∫Xf dµ−

∫Xf dν

).

Then Hγ is easily seen to be a metric on M(X). Hγ is called the Hutchinson metric (sometimesalso Hutchinson-Monge-Kantorovicz metric).

The relationship between this metric and the topology of weak convergence is stated in Propo-sition 4.10.13, the proof of which follows [Edg98, Theorem 2.5.17]. The program goes as

Discussionand plan

Page 461: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 449

follows. We first show that convergence in the Hutchinson metric implies convergence in theweak topology. This is a straightforward approximation argument on closed sets throughsuitable continuous functions. The converse is more complicated and relies on tightness. Wefind for the target measure µ of a converging sequence a good approximating compact set,which can be covered by a finite number of open sets the boundaries of which vanish for µ.From this we construct a suitable approximation in the Hutchinson metric; clearly, uniformboundedness will be used heavily here.

Proposition 4.10.13 Let X be a Polish space. Then Hγ is a metric for the topology of weakconvergence on M(X) for any γ > 0.

Proof 1. We may and do assume that γ = 1, otherwise we scale accordingly. Now letH1(µn, µ) → 0 as n → ∞, then limn→∞ µn(X) = µ(X). Let F ⊆ X be closed, then we canfind for given ε > 0 a function f ∈ V1 such that f(x) = 1 for x ∈ F , and

∫X f dm ≤ µ(F ) + ε.

This gives

lim supn→∞

µn(F ) ≤ limn→∞

∫Xf dµn =

∫Xf dµ ≤ µ(F ) + ε

Thus convergence in the Hutchinson metric implies convergence in the A-topology, hence inthe topology of weak convergence, by Proposition 4.1.35.

2. Now assume that µn → µ in the topology of weak convergence, thus µn(A)→ µ(A) for allA ∈ B(X) with µ(∂A) = 0 by Corollary 4.1.36; we assume that µn and µ are probability mea-sures, otherwise we scale again. Because X is Polish, µ is tight by Proposition 4.10.12.

Fix ε > 0, then there exists a compact set K ⊆ X with

µ(X \K) <ε

5 · γ.

Given x ∈ K, there exists an open ball Br(x) with center x and radius r such that 0 < r < ε/10such that µ(∂Br(x)) = 0, see Corollary 4.1.39. Because K is compact, a finite number ofthese balls will suffice, thus K ⊆ Br1(x1)∪ . . .∪Brp(xp). Transform this cover into a disjointcover by setting

E1 := Br1(x1),

E2 := Br2(x2) \ E1,

. . .

Ep := Brp(xp) \ (E1 ∪ . . . ∪ Ep−1)

E0 := S \ (E1 ∪ . . . ∪ Ep)

We observe these properties:

1. For i = 1, . . . , p, the diameter of each Ei is not greater than 2 · ri, hence smaller thatε/5,

2. For i = 1, . . . , p, ∂Ei ⊆ ∂(Br1(x1) ∪ . . . ∪ Brp(xp)

), thus ∂Ei ⊆ (∂Br1(x1)) ∪ . . . ∪

(∂Brp(xp)), hence µ(∂Ei) = 0.

3. Because the boundary of a set is also the boundary of its complement, we concludeµ(∂E0) = 0 as well. Moreover, µ(E0) < ε/(5 · γ), since E0 ⊆ X \K.

Page 462: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

450 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Eliminate all Ei which are empty. Select η > 0 such that p · η < ε/5, and determine n0 ∈ Nso that |µn(Ei)− µ(Ei)| < η for i = 0, . . . , p and n ≥ n0.

We have to show that

supf∈Vγ

(∫Xf dµn −

∫Xf dµ

)→ 0, as n→∞.

So take f ∈ Vγ and fix n ≥ n0. Let i = 1, . . . , p, pick an arbitrary ei ∈ Ei; because each Eihas a diameter not greater than ε/5, we know that |f(x) − f(ei)| < ε/5 for each x ∈ Ei. Ifx ∈ E0, we have |f(x)| ≤ γ. Now we are getting somewhere: let n ≥ n0, then we obtain∫

Xf dµn =

p∑i=0

∫Ei

f dµn

≤ γ · µn(E0) +

p∑i=1

(f(ti) +

ε

5

)· µn(Ei)

≤ γ · (µ(E0) + η) +

p∑i=1

(f(ti) +

ε

5

)· (µ(Ei) + η)

≤ γ · ( ε

5 · γ+ η) +

p∑i=1

(f(ti)−ε

5) · µ(Ei) +

2 · ε5

p∑i=1

µ(Ei) +p · ε · η

5

≤∫Xf dµ+ ε

Recall thatp∑i=1

µ(Ei) ≤p∑i=0

µ(Ei) = µ(X) = 1,

and that ∫Ei

f dµ ≥ µ(Ei) · (f(ti)− ε/5).

In a similar fashion, we obtain∫X f dµn ≥

∫X f dµ− ε, so that we have established

|∫Xf dµ−

∫Xf dµn| < ε

for n ≥ n0. Since f ∈ Vγ was arbitrary, we have shown that Hγ(µn, µ)→ 0. a

The Hutchinson metric is occasionally easier to use that the Prohorov metric, because inte-grals may sometimes easier manipulated in convergence arguments than ε-neighborhoods ofsets.

4.10.2 Case Study: Eilenberg-Moore Algebras for the Giry Monad

We will study the Eilenberg-Moore algebras for the Giry monad again, but this time for thenon-discrete case. Theorem 2.5.23 contains a complete characterization of these algebras forthe discrete probability functor D as the positive convex structures on X. We will derive acomplete characterization for the probability functor on Polish spaces from this.

Page 463: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 451

We work in the category of Polish spaces with continuous maps as morphisms and the Borelsets as the σ-algebra. The Giry monad (P, e,m) is introduced in Example 2.4.8; its functorialpart is the subprobability functor P, the unit e and the multiplication m are for a Polish spaceX defined through

eX(x) := δx,

mX(M)(A) :=

∫Xϑ(A) M(dϑ)

for x ∈ X, M ∈ P2X and A ∈ B(X). An Eilenberg-Moore algebra h : PX → X is a morphismso that these diagrams from page 138 commute

P2XPh //

mX

PX

h

XeX //

idX''

PX

h

PX

h// X x

We note first that unit and multiplication of the monad are compatible with the weak topol-ogy.

Lemma 4.10.14 Given a Polish space X, the unit eX : X → PX and the multiplicationmX : P2X → PX are continuous in the respective weak topologies.

Proof We know this already from Lemma 4.1.42 for the unit, so we have to establish theclaim for the multiplication. Let M ∈ P2(X) and f ∈ C(X) a bounded continous function,then ∫

Xf dmX(M) =

∫PX

(∫Xf dϑ

)dM(ϑ) (4.19)

holds. Granted that we have shown this, we argue then as follows: by the definition of theweak convergence on PX, the map Ef : ϑ 7→

∫X f dϑ is continuous, whenever f ∈ C(X), thus

M 7→∫PX Ef dM is continuous by the definition of the weak topology on P2X. But this is

just mX .

So it remains to establish the equation (4.19). If f = χA for A ∈ B(X), this is just the defi-nition of mX , so the equation holds in this case. Since the integral is linear, equation (4.19)holds for bounded step functions f . If f ≥ 0 is bounded and measurable, Levi’s Theorem 4.8.2in combination with Lebesgue’s Dominated Convergence Theorem 4.8.6 shows that the equa-tion holds. The general case now follows from decomposing f = f+ − f− with f+ ≥ 0 andf− ≥ 0. a

Now fix a Polish space X and a complete metric d; the Hutchinson metric Hγ for some γ > 0is assumed as a metric for the topology of weak convergence, see Proposition 4.10.13. Put asin Section 2.5.2

Ω := 〈α1, . . . , αk〉 | k ∈ N, αi ≥ 0,k∑i=1

αi ≤ 1

Given an Eilenberg-Moore algebra 〈X,h〉 for P, define for α = 〈α1, . . . , αn〉 ∈ Ω the map

〈α1, . . . , αn〉h(x1, . . . , xn) := h(∑n

i=1 αi · δxi).

Page 464: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

452 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Then the lengthy computation in Lemma 2.5.20 shows that α 7→ αh defines a positive convexstructure. Let, conversely, a positive convex structure p be given, then Lemma 2.5.21 showsthat

hp(∑n

i=1 αi · δxi)

:=∑p

1≤i≤n αi · xifor 〈α1, . . . , αn〉 ∈ Ω and x1, . . . , xn ∈ X defines an algebra hp for the discrete probabilityfunctor DX. Since the discrete probability measures are dense in PX by Proposition 4.10.4,we look for a continuous extension from DX to PX. In fact, this is possible provided p isregular in the following sense:

Definition 4.10.15 The positive convex structure p is said to be regular iff the oscillation∅hp(µ) = 0 for every µ ∈ PX.

Thus we measure the oscillation ∅hp(µ) = infdiam(hp[DX ∩ B(µ, r)

]) | r > 0 for hp at

every probability measure µ ∈ PX. Here B(µ, r) is the ball with center µ and radius r forthe Hutchinson metric Hγ ; the oscillation of a function is defined on page 252.

Thus p is regular iff we know for all µ ∈ PX that, given ε > 0 there exists r > 0 suchthat

d(∑p

1≤i≤n αi · xi,∑p

1≤i≤m βj · yj)< ε,

whenever ∣∣∑ni=1 αi · f(xi)−

∫X f dµ

∣∣ < r and∣∣∑m

j=1 βi · f(yi)−∫X f dµ

∣∣ < r

for all f ∈ Vγ .

This gives the following characterization of the Eilenberg-Moore algebras for the non-discretecase, generalizing the “discrete” Theorem 2.5.23 to the non-discrete case.

Theorem 4.10.16 The Eilenberg-Moore algebras for the Giry monad for Polish spaces areexactly the regular positive convex structures.

Completecharacteri-zation

Proof 1. Let 〈X,h〉 be an Eilenberg-Moore algebra, then h : PX → X is continuous, hencethe restriction of h to funDX has oscillation 0 at each µ ∈ PX. But this means that thecorresponding positive convex structure is regular.

2. Conversely, given a regular positive convex structure p, the associated map hp : DX → Xhas oscillation zero at each µ ∈ PX. Thus there exists a unique continuous extension h′p :PX → X by Lemma 3.5.24. It remains to show that h′p satisfies the laws of an Eilenberg-Moore algebra. This follows easily from Lemma 4.10.14, since hp satisfies the correspondingequations. a

We now have a complete characterization of the Eilenberg-Moore algebras for the probabilitymonad over Polish spaces. This closes the gap we had to leave in Section 2.5.2 because wedid not yet have the necessary tools at our disposal. It displays an interesting cooperationbetween arguments from categories, topology and measure theory.

4.10.3 Case Study: Bisimulation

Bisimilarity is an important notion in the theory of concurrent systems, introduced originallyby Milner for transition systems, see Section 2.6.1 for a general discussion. We will show in

Page 465: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 453

this section that the methods developed so far may be used in the investigation of bisimilarityfor stochastic systems. We will first show that the category of stochastic relations has semi-pullbacks and use this information for a construction of bisimulations for these systems.

If we are in a general category K, then the semi-pullback for two morphisms f : a → c andg : b → c with common range c consists of an object x and of morphisms pa : x → a andpb : x→ b such that f pa = g pb, i.e., such that this diagram commutes in K:

x

pa

pb // b

g

a

f// c

We want to show that semi-pullbacks exist for stochastic relations over Polish spaces. Thisrequires some preparations, provided through selection arguments.

The next statement appears to be interesting in its own right; it shows that a measurableselection for weakly continuous stochastic relations exist.

Proposition 4.10.17 Let Xi, Yi be Polish spaces, Ki : Xi Yi be a weakly continuousstochastic relation, i = 1, 2. Let A ⊆ X1 × X2 and B ⊆ Y1 × Y2 be closed subsets of therespective Cartesian products with projections equal to the base spaces, and assume that for〈x1, x2〉 ∈ A the set

Γ (x1, x2) := µ ∈ S(B) | S(βi)(µ) = Ki(xi), i = 1, 2

is not empty, βi : B → Yi denoting the projections. Then there exists a stochastic relationM : A B such that M(x1, x2) ∈ Γ (x1, x2) for all 〈x1, x2〉 ∈ A.

Proof 0. The proof proceeds as follows. First, Yi is embedded into its Alexandrov com-Outline ofthe proof

pactification Yi we the aim of obtaining a selector from the Kuratowski and Ryll-NardzewskiSelection Theorem. But we have to make sure that the set valued map remains concentratedon the originally given space. This can be established, we obtain a selection for the embeddingand adjust the selection accordingly.

1. Let Yi for i = 1, 2 be the Alexandrov compactification of Yi and B the closure of B inY 1×Y 2. Then B is compact and contains the embedding of B into Y 1×Y 2, which we identifywith B, as a Borel subset. This is so since Yi is a Borel subset in its compactification. Theprojections βi : B → Y i are the continuous extensions to the projections βi : B → Yi.

2. The map ri : S(Yi) → S(Y i) with ri(µ)(G) := µ(G ∩ Yi) for G ∈ B(Y i) is continuous; infact, it is an isometry with respect to the respective Hutchinson metrics, once we have fixedmetrics for the underlying spaces. Define for 〈x1, x2〉 ∈ A the set

Γ0(x1, x2) := µ ∈ S(B) | S(βi)(µ) = (ri Ki)(xi), i = 1, 2 .

Thus Γ0 maps A to the nonempty closed subsets of S(B), since S(βi) and riKi are continuous

Page 466: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

454 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

for i = 1, 2. If µ ∈ Γ0(x1, x2), then

µ(B \B) ≤ µ(B ∩ (Y 1 \ Y1 × Y 2) ∪ (Y 1 × Y 2 \ Y2)

)= S(β1)(µ)(Y 1 \ Y1) + S(β2)(µ)(Y 2 \ Y2)

=(r1 K1

)(x1)(Y 1 \ Y1) +

(r2 K2

)(x2)(Y 2 \ Y2)

= 0.

Hence all members of Γ0(x1, x2) are concentrated on B.

3. Let C ⊆ S(B) be compact, and assume that (tn)n∈N is a converging sequence in A withtn ∈ Γw0 (C) for all n ∈ N such that tn → t0 ∈ A. Then there exists some µn ∈ C ∩ Γ0(tn)for each n ∈ N. Since C is compact, there exists a convergent subsequence, which we assumeto be the sequence itself, so µn → µ for some µ ∈ C in the topology of weak convergence.Continuity of S(βi) and of Ki(xi) for i = 1, 2 implies µ ∈ Γ0. Consequently, Γw0 (C) is a closedsubset of A.

4. Since S(B) is compact, we may represent each open set G as a countable union of compactsets (Cn)n∈N, so that

Γw0 (G) =⋃n∈N

Γw0 (Cn),

hence Γw0 (G) is a Borel set in A. The Kuratowski&Ryll-Nardzewski Selection Theorem 4.7.2together with Lemma 4.1.10 gives us a stochastic relation M0 : A B with M0(x1, x2) ∈Γ0(x1, x2) for all 〈x1, x2〉 ∈ A. Define M(x1, x2) as the restriction of M0(x1, x2) to the Borelsets of B, then M : A B is the desired relation, because M0(x1, x2)(B \B) = 0. a

For the construction we are about to undertake we will put to work the selection machineryjust developed; this requires us to show that the set from which we want to select is non-empty.The following technical argument will be of assistance.

Assume that we have Polish spaces X1, X2 and a separable measure space (Z, C) with surjec-tive and measurable maps fi : Xi → Z for i = 1, 2. We also have subprobability measuresµi ∈ S(Xi). Since (Z, C) is separable, we may assume that C constitutes the Borel sets forsome metric space (Z, d) so that d has a countable dense subset, see Proposition 4.3.10.Proposition 4.3.31 then tells us that we may assume that f1 and f2 are continuous. Nowdefine

S := 〈x1, x2〉 ∈ X1 ×X2 | f1(x1) = f2(x2)A := S ∩ (f1 × f2)−1

[C ⊗ C

].

Since ∆Z := 〈z, z〉 | z ∈ Z is a closed subset of Z × Z, and since f1 and f2 are continuous,S = (f1 × f2)−1

[∆Z

]is a closed subset of the Polish space X1×X2, hence a Polish space itself

by Lemma 4.3.21. Now assume that we have a finite measure ϑ on A such that S(πi)(ϑ)(Ei) =µi(Ei) for all Ei ∈ f−1

i

[C], i = 1, 2 with π1 : X1 → Z and π2 : X2 → Z as the projections.

Now A ⊆ B(S) is usually not the σ-algebra of Borel sets for some Polish topology on S,which, however, will be needed. Here Lubin’s construction steps in.

Lemma 4.10.18 In the notation above, there exists a measure ϑ+ on the Borel sets of Sextending ϑ such that S(πi)(ϑ

+)(Ei) = µi(Ei) holds for all Ei ∈ B(S).

Page 467: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 455

Proof Because C is countably generated, C ⊗ C is, so A is a countably generated σ-algebra.By Lubin’s Theorem 4.6.12 there exists an extension ϑ+ to ϑ.a

So much for the technical preparations; we will now turn to bisimulations. A bisimulationrelates two transition systems which are connected through a mediating system. In order todefine this, we need morphisms. In the case of stochastic systems, recall from Example 2.1.14that a morphism m = (f, g) : K1 → K2 for stochastic relations Ki : (Xi,Ai) (Yi,Bi)(i = 1, 2) over general measurable spaces is given through the measurable maps f : X1 → X2

and g : Y1 → Y2 such that this diagram of measurable maps commutes

(X1,A1)

K1

f // (X2,A2)

K2

S(Y1,B1)

S(g)// S(Y2,B2)

Equivalently, K2(f(x1)) = S(g)(K1(x1)), which translates toK2(f(x1))(B) = K1(x1)(g−1[B])

for all B ∈ B2.

Definition 4.10.19 The stochastic relations Ki : (Xi,Ai) (Yi,Bi) (i = 1, 2), are calledbisimilar iff there exist a stochastic relation M : (A,X ) (B,Y) and surjective morphisms Bisimilaritymi = (fi, gi) : M → Ki such that the σ-algebra g−1

1

[B1

]∩ g−1

2

[B2

]is nontrivial, i.e., contains

not only ∅ and B. The relation M is called mediating.

The first condition on bisimilarity is in accordance with the general definition of bisimilarityof coalgebras; it requests that m1 and m2 form a span of morphisms

K1 Mm1oo m2 // K2.

Hence, the following diagram of measurable maps is supposed to commute with mi = (fi, gi)for i = 1, 2

(X1,A1)

K1

(A,X )f1oo f2 //

M

(X2,A2)

K2

S(Y1,B1) S(B,Y)

S(g1)oo

S(g2)// S(Y2,B2)

Thus, for each a ∈ A,D ∈ B1, E ∈ B2 the equalities

K1

(f(a)

)(D) =

(S(g1) M

)(a)(D) = M(a)

(g−1

1

[D])

K2

(f2(a)

)(E) =

(S(g2) M

)(a)(E) = M(a)

(g−1

2

[E])

should be satisfied. The second condition, however, is special; it states that we can find anevent C∗ ∈ Y which is common to both K1 and K2 in the sense that

g−11

[B1

]= C∗ = g−1

2

[B2

]for some B1 ∈ B1 and B2 ∈ B2 such that both C∗ 6= ∅ and C∗ 6= B hold (note that for C∗ = ∅or C∗ = B we can always take the empty and the full set, respectively). Given such a C∗

with B1, B2 from above we get for each a ∈ A

K1(f1(a))(B1) = M(a)(g−11

[B1

]) = M(a)(C∗) = M(a)(g−1

2

[B2

]) = K2(g2(a))(B2);

Page 468: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

456 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

thus the event C∗ ties K1 and K2 together. Loosely speaking, g−11

[B1

]∩ g−1

2

[B2

]can be

described as the σ-algebra of common events, which is required to be nontrivial.

Note that without the second condition two relations K1 and K2 which are strictly prob-abilistic (i.e., for which the entire space is always assigned probability 1) would always bebisimilar: Put A := X1 ×X2, B := Y1 × Y2 and set for 〈x1, x2〉 ∈ A as the mediating relationM(x1, x2) := K1(x1)⊗K2(x2); that is, define M pointwise to be the product measure of K1

and K2. Then the projections will make the diagram commutative. But although this notionof bisimilarity is suggested sometimes, it is way too weak, because bisimulations relate tran-sition systems, and it does not promise particularly interesting insights when two arbitrarysystems can be related. It is also clear that using products for mediation does not work forthe subprobabilistic case.

We will show now that we can construct a bisimulation for stochastic relations which are linkedthrough a co-span K1 Koo // K2. The center K of this co-span should be defined oversecond countable metric spaces, K1 and K2 over Polish spaces. This situation is sometimeseasy to obtain, e.g., when factoring Kripke models over Polish spaces through a suitable logic;then K is defined over analytic spaces, which are separable metric. This is described in greaterdetail in Example 4.10.21.

Proposition 4.10.20 Let Ki : Xi Yi be stochastic relations over Polish spaces, and as-sume that K : X Y is a stochastic relation, where X,Y are second countable metricspaces. Assume that we have a cospan of morphisms mi : Ki → K, i = 1, 2, then there ex-ists a stochastic relation M and morphisms m+

i : M Ki, i = 1, 2 rendering this diagramcommutative.

Mm+

1 //

m+2

K2

m2

K1 m1

// K

The stochastic relation M is defined over Polish spaces.

Proof 1. Assume Ki = (Xi, Yi,Ki) with mi = (fi, gi), i = 1, 2. Because of Proposition 4.3.31we may assume that the respective σ-algebras on X1 and X2 are obtained from Polish topolo-gies which render f1 and K1 as well as f2 and K2 continuous. These topologies are fixed forthe proof. Put

A := 〈x1, x2〉 ∈ X1 ×X2 | f1(x1) = f2(x2),B := 〈y1, y2〉 ∈ Y1 × Y2 | g1(y1) = g2(y2),

then both A and B are closed, hence Polish. αi : A→ Xi and βi : B → Yi are the projections,i = 1, 2. The diagrams

X1f1 //

K1

X

K

X2f2oo

K2

S(Y1)

S(g1)// S(Y ) S(Y2)

S(g2)oo

are commutative by assumption, thus we know that for xi ∈ Xi

K(f1(x1)) = S(g1)(K1(x1)) and K(f2(x2)) = S(g2)(K2(x2))

Page 469: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 457

holds. The construction implies that (g1β1)(y1, y2) = (g2β2)(y1, y2) is true for 〈y1, y2〉 ∈ B,and g1 β1 : B → Y is surjective.

2. Fix 〈x1, x2〉 ∈ A. Separability of the target spaces now enters: We know that the image ofa surjective map under S is onto again by Proposition 4.6.11, so that there exists µ0 ∈ S(B)with S(g1 β1)(µ0) = K(f1(x1)), consequently, S(gi βi)(µ0) = S(gi)(Ki(xi)) (i = 1, 2). Butthis means for i = 1, 2

∀Ei ∈ g−1i

[B(Y )

]: S(βi)(µ0)(Ei) = Ki(xi)(Ei).

Put

Γ (x1, x2) := µ ∈ S(B) | S(β1)(µ) = K1(x1) and S(β2)(µ) = K2(x2),

then Lemma 4.10.18 shows that Γ (x1, x2) 6= ∅.

3. The set

Γw(C) = 〈x1, x2〉 ∈ A | Γ (x1, x2) ∩ C 6= ∅

is closed in A for compact C ⊆ S(B). This is shown exactly as in the second part of theproof for Proposition 4.10.17, from which now is inferred that there exists a measurable mapM : A→ S(B) such that M(x1, x2) ∈ Γ (x1, x2) holds for every 〈x1, x2〉 ∈ A. Thus M : A Bis a stochastic relation with

K1 α1 = S(β1) M and K2 α2 = S(β2) M.

Thus M with m+1 := (α1, β1) and m+

2 := (α2, β2) is the desired semi-pullback. a

Now we know that we may construct from a co-span of stochastic relations a span. Let ushave a look at a typical situation in which such a co-span may occur.

Example 4.10.21 Consider the modal logic from Example 4.1.11 again, and interpret thelogic through stochastic relations K : S S and L : T T over the Polish spaces S andT . The equivalence relations ∼K and ∼L are defined as in Example 4.4.19. Because wehave only countably many formulas, these relations are smooth. For readability, denote theequivalence class associated with ∼K by [·]K , similar for [·]L. Because ∼K and ∼L are smooth,the factor spaces S/K resp. T/L are analytic spaces when equipped with the final σ-algebrawith respect to ηK resp. ηL by Proposition 4.4.22. The factor relation KF : S/K S/K isthen the unique relation which makes this diagram commutative

S

K

ηK // S/K

KF

S(S)S(ηK)

// S(S/K)

This translates to K(s)(η−1K

[B]) = KF ([s]K)(B) for all B ∈ B(S/K) and all s ∈ X.

Associate with each formula ϕ its validity sets [[ϕ]]K resp. [[ϕ]]L, and call s ∈ S logicallyequivalent to t ∈ T iff we have for each formula ϕ

s ∈ [[ϕ]]K ⇔ t ∈ [[ϕ]]L

Page 470: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

458 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Hence s and t are logically equivalent iff no formula is able to distinguish between states sand t; call the stochastic relations K and L logically equivalent iff given s ∈ S there exists

Logicalequivalence

t ∈ T such that s and t are logically equivalent, and vice versa.

Now assume that K and L are logically equivalent, and consider

Φ :=〈[s]K , [t]L〉 | s ∈ S and t ∈ T are logically equivalent

.

Then Φ is the graph of a bijective map; this is easy to see. Denote the map by Φ as well.Since Φ−1

[ηL[[[ϕ]]L

]]= ηK

[[[ϕ]]K

], and since the set ηL

[[[ϕ]]L

]| ϕ is a formula generates

B(T/L) by Proposition 4.4.26, Φ : S/K → T/L is Borel measurable; interchanging the rolesof K and L yields measurability of Φ−1.

Hence we have this picture for logical equivalent K and L:

L

ηL

KΦηK

// LF

,

This example can be generalized to the case that the relations operate on two spaces ratherthan only on one. Let K : X Y be a transition kernel over the Polish spaces X andY . Then the pair (κ, λ) of smooth equivalence relations κ on X and λ on Y is called acongruence for K iff there exists a transition kernel Kκ,λ : X/κ Y/λ rendering the diagramCongruencecommutative:

X

K

ηκ // X/κ

Kκ,λ

YS(ηλ)

// Y/λ

Because ηκ is an epimorphism, Kκ,λ is uniquely determined, if it exists (for a discussionof congruences for stochastic coalgebras, see Section 2.6.2). Commutativity of the diagramtranslates to

K(x)(η−1λ

[B]) = Kκ,λ([x]κ)(B)

for all x ∈ X and all B ∈ B(Y/λ). Call in analogy to Example 4.10.21 the transition kernelsK1 : X1 Y1 and K2 : X2 Y2 logically equivalent iff there exist congruences (κ1, λ1) for

Logicalequivalencethroughfactors

K1 and (κ2, λ2) for K2 such that the factor relations Kκ1,λ1 and Kκ2,λ2 are isomorphic.

In the spirit of this discussion, we obtain from Proposition 4.10.20

Theorem 4.10.22 Logically equivalent stochastic relations over Polish spaces are bisimilar.

Proof 1. The proof applies Proposition 4.10.20; first it has to show how to satisfy theassumptions of that statement. Let Ki : Xi Yi be stochastic relations over Polish spaces fori = 1, 2. We assume thatK1 is logically equivalent toK2, hence there exist congruences (κi, λi)for Ki such that the associated stochastic relations Kκi,λi : Xi/κi Yi/λi are isomorphic.

Page 471: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 459

Denote this isomorphism by (ϕ,ψ), so ϕ : X1/κ1 → X2/κ2 and ψ : Y1/λ1 → Y2/λ2 are inparticular measurable bijections, so are their inverses.

2. Let η2 := (ηκ2 , ηλ2) be the factor morphisms η2 : K2 → Kκ2,λ2 , and put η1 := (ϕ ηκ1 , ψ ηλ1), thus we obtain this co-span of morphisms

K1η1 // Kκ2,λ2 K2

η2oo

Because both X2/κ2 and Y2/λ2 are analytic spaces on account of κ2 and λ2 being smooth, seeProposition 4.4.22, we apply Proposition 4.10.20 and obtain a mediating relation M : A Bwith Polish A and B such that the projections αi : A → Xi and βi : B → Yi are morphismsfor i = 1, 2. Here

A := 〈x1, x2〉 | ϕ([x1]κ1) = [x2]κ2,B := 〈y1, y2〉 | ϕ([y1]λ1) = [y2]λ2.

It remains to be demonstrated that the σ-algebra of common events, viz., the intersectionβ−1

1

[B(Y1)

]∩ β−1

2

[B(Y2)

]is not trivial.

3. Let U2 ∈ B(Y2) be λ2-invariant. Then ηλ2[U2

]∈ B(Y2/λ2), because U2 = η−1

λ2

[ηλ2[U2

]]on

account of U2 being λ2-invariant. Thus U1 := η−1λ1

[ψ−1

[ηλ2[U2

]]]is an λ1-invariant Borel set

in Y1 with

〈y1, y2〉 ∈ (Y1 × U2) ∩B ⇔ y2 ∈ U2 and ψ([y1]λ1) = [y2]λ2⇔ 〈y1, y2〉 ∈ (U1 × U2) ∩B.

One shows in exactly the same way

〈y1, y2〉 ∈ (U1 × Y2) ∩B ⇔ 〈y1, y2〉 ∈ (U1 × U2) ∩B.

Consequently, (U1 × U2) ∩B belongs to both β−11

[B(Y1)

]and β−1

2

[B(Y2)

], so that this inter-

section is not trivial. a

Call a class A of spaces closed under bisimulations if the mediating relation for stochastic re-lations over spaces from A is again defined over spaces from A. Then the result above showsthat Polish spaces are closed under bisimulations. This generalizes a result by Desharnais,Edalat and Panangaden [Eda99, DEP02] which demonstrates — through a completely differ-ent approach — that analytic spaces are closed under bisimulations; Sanchez Terraf [ST11]has shown that general measurable spaces are not closed under bisimulations. In view ofvon Neumann’s Selection Theorem 4.6.10 it might be interesting to see whether completemeasurable spaces are closed.

We present finally a situation in which no semi-pullback exists. A first example in thisdirection was presented in [ST11, Theorem 12]. It is based on the extension of Lebesguemeasure to a σ-algebra which does contain the Borel sets of [0, 1] augmented by a non-measurable set, and it shows that one can construct Markov transition systems which donot have a semi-pullback. The example below extends this by showing that one does nothave to consider transition systems, but that a look at the measures on which they are basedsuffices.

Page 472: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

460 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 4.10.23 A morphism f : (X,A, µ) → (Y,B, ν) of measure spaces is an A-B-measurable map f : X → Y such that ν = M(f)(µ). Since each finite measure can be viewedas a transition kernel, this is a special case of morphisms for transition kernels. If B is a sub-σ-algebra of A with µ an extension to ν, then the identity is a morphisms (X,A, µ)→ (X,B, ν).

Denote Lebesgue measure on ([0, 1],B([0, 1])) by λ. Assuming the Axiom of Choice, we knowthat there exists W ⊆ [0, 1] with λ∗(W ) = 0 and λ∗(W ) = 1 by Lemma 1.7.7. Denote byAW := σ(B([0, 1]) ∪ W) the smallest σ-algebra containing the Borel sets of [0, 1] and W .Then we know from Exercise 4.6 that we can find for each α ∈ [0, 1] a measure µα on AWwhich extends λ such that µα(W ) = α.

Hence by the remark just made, the identity yields a morphism fα : ([0, 1],AW , µα) →([0, 1],B([0, 1]), λ). Now let α 6= β, then

([0, 1],AW , µα)fα // ([0, 1],B([0, 1]), λ) ([0, 1],AW , µβ)

fβoo

is a co-span of morphisms.

We claim that this co-span does not have a semi-pullpack. In fact, assume that (P,P, ρ)with morphisms πα and πβ is a semi-pullback, then fα πα = fβ πβ, so that πα = πβ, andπ−1α

[W]

= π−1β

[W]∈ P. But then

α = µα(W ) = ρ(π−1α

[W]) = ρ(π−1

β

[W]) = µβ(W ) = β.

This contradicts the assumption that α 6= β. ,

This example shows that the topological assumptions imposed above are indeed necessary. Itassumes the Axiom of Choice, so one might ask what happens if this axiom is replaced bythe Axiom of Determinacy. We know that the latter one implies that each subset of the unitinterval is λ-measurable by Theorem 1.7.14, so λ∗(W ) = λ∗(W ) holds for each W ⊆ [0, 1].Then at least the construction above does not work (on the other hand, we made freely use ofTihonov’s Theorem, which is known to be equivalent to the Axiom of Choice [Her06, Theorem4.68], so there is probably no escape from the Axiom of Choice).

4.10.4 Case Study: Quotients for Stochastic Relations

As Monty Python used to say, “And now for something completely different!” We will dealnow with quotients for stochastic relations, perceived as morphisms in the Kleisli categoryover the monad which is given by the subprobability functor. We will first have a look atsurjective maps as epimorphisms in the category of sets, explaining the problem there, showthat a straightforward approach gleaned from the category of sets does not appear promising,and show then that measurable selections are the appropriate tool for tackling the problem.The example demonstrates also that constructions which are fairly straightforward for setsmay become somewhat involved in the category of stochastic relations.

For motivation, we start with surjective maps on a fixed set M , serving as a domain. Letf : M → X and g : M → Y be onto, and define the partial order f ≤ g iff f = ζ g for someζ : Y → X. Clearly, ≤ is reflexive and transitive; the equivalence relation ∼ defines throughf ∼ g iff f ≤ g and g ≤ f is of interest here. Thus f = ζ g and g = ξ f for suitable

Page 473: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 461

ζ : Y → X and ξ : X → Y . Because surjective maps are epimorphisms in the category ofsets with maps as morphisms, we obtain ζ ξ = idX and ξ ζ = idY . Hence ζ and ξ arebijections. The surjections f and g, both with domain M , are equivalent iff there exists abijection β with f = β g. This is called a quotient object for M We know that the surjection

Quotientobject in

Setf : M → Y can be factored as f = f ηker(f) with f : [x]ker(f) 7→ f(x) as the bijection, seeProposition 2.1.26. Thus for maps, the quotient objects for M may be identified through thequotient maps ηker(f), in a similar way, the quotient objects in the category of groups can beidentified through normal subgroups; see [ML97, V.7] for a discussion. Quotients seem to beinteresting algebraically.

We turn to stochastic relations. The subprobability functor on the category of measurablespaces is the functorial part of the Giry monad, and the stochastic relations are just theKleisli morphism for this monad, see the discussion in Example 2.4.8 on page 126. LetK : (X,A) (Y,B) be a stochastic relation, then Exercise 4.14 shows that

K(µ) : B 7→∫XK(x)(B) dµ(x)

defines a℘℘℘(X,A)-℘℘℘(Y,B)-measurable map S(X,A)→ S(Y,B); K is the Kleisli map associatedwith the Kleisli morphism K(it should not be confused with the completion of K as discussedin Section 4.6.2). It is clear that K 7→ K is injective, because K(δx) = K(x).

It will helpful to evaluate the integral with respect to K(µ): let g : Y → R be bounded andmeasurable, then ∫

Yg dK(µ) =

∫X

∫Yg(y) dK(x)(y) dµ(x). (4.20)

In order to show this, assume first that g = χB for B ∈ B, then both sides evaluate toK(µ)(B), so the representation is valid for indicator functions. Linearity of the integralyields the representation for step functions. Since we may find for general g a sequence(gn)n∈N of step functions with limn→∞ gn(y) = g(y) for all y ∈ Y , and since g is bounded,hence integrable with respect to all finite measures, we obtain from Lebesgue’s DominatedConvergence Theorem 4.8.6 that∫

Yg dK(µ) = lim

n→∞

∫Ygn dKK(µ)

= limn→∞

∫X

∫Ygn(y) dK(x)(y) dµ(x)

=

∫X

limn→∞

∫Ygn(y) dK(x)(y) dµ(x)

=

∫X

∫Yg(y) dK(x)(y) dµ(x)

This gives the desired representation.

The Kleisli map is related to the convolution operation defined in Example 4.9.6:

Lemma 4.10.24 Let K : (X,A) (Y,B) and L : (Y,B) (Z, C), then L ∗K = L K.

Page 474: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

462 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof Evaluate both the left and the right hand side for µ ∈ S(X,A) and C ∈ C:

L ∗K(µ)(C) =

∫X

∫YL(y)(C) dK(x)(y) dµ(x)

=

∫YL(y)(C) dK(µ)(y) by (4.20)

= L(K)(µ)(C)

This implies the desired equality. a

Associate with each measurable f : Y → Z a stochastic relation δf : Y Z throughδf (y)(C) := δy(f

−1[C]), then δf = S(f)δ, and a direct computation shows δf ∗K = S(f)K.

In fact, (δf ∗K

)(x)(C) =

∫Yδf (y)(C) K(x)(dy)

=

∫Yχf−1[C](y) K(x)(dy)

= K(x)(f−1[C])

=(S(f) K

)(x)(C).

On the other hand, if f : W → X is measurable, then(K ∗ δf

)(w)(B) =

∫XK(x)(B) δf (w)(dx) = (K f)(w)(B).

In particular, it follows that eX := S(idX) is the neutral element: K = eX ∗K = K ∗eX = K.Recall that K is an epimorphism in the Kleisli category iff L1∗K = L2∗K implies L1 = L2 forany stochastic relations L1, L2 : (Y,B) (Z, C). Lemma 4.10.24 tells us that if the Kleisli mapK is onto, then K is an epimorphism. Now let K : (X,A) (Y,B) and L : (X,A) (Z, C)be stochastic relations, and assume that both K and L are epis. Define as above

K ≤ L⇔ K = J ∗ L for some J : (Z, C) (Y,B)

K ≈ L⇔ K ≤ L and L ≤ K

Hence we can find in case K ≤ L a stochastic relation J such that

K(x)(B) =

∫ZJ(z)(B) dL(x)(z)

for x ∈ X and B ∈ B.

We will deal for the rest of this section with Polish spaces. Fix X as a Polish spaces. Foridentifying the quotients with respect to Kleisli morphisms, one could be tempted to mimic theapproach observed for the sets as outlined above. This is studied in the next example.

Example 4.10.25 Let K : X Y be a stochastic relation with Polish Y which is an epi.X/ker (K) is an analytic space, since K : X → S(Y ) is a measurable map into the Polishspace S(Y ) by Proposition 4.10.10, so that ker (K) is smooth. Define the map EK : X →

Page 475: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 463

S(X/ker (K)) through EK(x) := δ[x]ker(K), hence we obtain for each x ∈ X, and each Borel

set G ∈ B(X/ker (K))

EK(x)(B) = δ[x]ker(K)(G) = δx(η−1

ker(K)

[G]) = S(ηker(K))(δx)(G).

Thus EK is an epi as well: take µ ∈ S(X) and G ∈ B(X/ker (K)), then

EK(µ)(G) =∫X EK(x)(G) dµ(x) =

∫X δx(η−1

ker(K)

[G]) dµ(x)

= µ(η−1ker(K)

[G]) = S(ηker(K))(µ)(G),

so that EK = S(ηker(K)); since the image of a surjective map under S is surjective again byProposition 4.6.11, we conclude that EK is an epi. Now define for x ∈ X the map

K]([x]ker(K)) := K(x),

then the construction of the final σ-algebra on X/ker (K) shows that K] is well defined andconstitutes a stochastic relation K] : X/ker (K) Y . Moreover we obtain for x ∈ X,H ∈B(Y ) by the change of variables formula in Corollary 4.8.9

(K]∗EK)(x)(H) =

∫X/ker(K)

K](t)(H) dEK(x)(t)

=

∫X/ker(K)

K](t)(H) dS(ηker(K))(δx)(t)

=

∫XK]([w]ker(K))(H) dδx(w)

=

∫XK(w)(H) dδx(w)

= K(x)(H).

Consequently, K can be factored as K = K] ∗ EK with the epi EK . But there is no reasonwhy in general K] should be invertible; for this to hold, the map K] : S(X/ker (K))→ S(Y )is required to be injective. Hence K ≈ EK holds only in special cases. ,

This last example indicates that a characterization of quotients for the Kleisli category atleast for the Giry monad cannot be derived directly by translating a characterization for theunderlying category from the category of sets.

For the rest of the section we discuss the Kleisli category for the Giry monad over Polish spaces,hence we deal with stochastic relations. Let X, Y and Z be Polish, and fix K : X Y andL : X Z so that K ≈ L. Hence there exists J : Y Z with inverse H : Z Y andL = J ∗K and K = H ∗ L. Because both K and L are epis, we obtain these simultaneousequations

H ∗ J = eY and J ∗H = eZ .

They entail ∫ZH(z)(B) dJ(y)(z) = δy(B) and

∫YJ(y)(C) dH(z)(y) = δz(C)

Page 476: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

464 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

for all y ∈ Y, z ∈ Z and B ∈ B(Y ), C ∈ B(Z). Because singletons are Borel sets, theseequalities imply∫

ZH(z)(y) dJ(y)(z) = 1 and

∫YJ(y)(z) dH(z)(y) = 1.

Consequently, we obtain

∀y ∈ Y : J(y)(z ∈ Z | H(z)(y) = 1) = 1,

∀z ∈ Z : H(z)(y ∈ Y | J(y)(z) = 1) = 1.

Proposition 4.10.26 There exist Borel maps f : Y → Z and g : Z → Y such thatH(f(y)

)(y) = 1 and J

(g(z)

)(z) = 1 for all y ∈ Y, z ∈ Z

Proof 1. Define P := 〈y, z〉 ∈ Y × Z | H(z)(y) = 1, and Q := 〈z, y〉 ∈ Z × Y |J(y)(z) = 1, then P and Q are Borel sets. We establish this for P , the argumentation forQ is very similar.

2. With a view towards Proposition 4.3.31 we may and do assume that H : Z → S(Y ) iscontinuous. Let

(〈yn, zn〉

)n∈N be a sequence in P with 〈yn, zn〉 → 〈y, z〉, hence the sequence(

H(zn))n∈N converges weakly H(z). Given m ∈ N there exists n0 ∈ N such that yn ∈ V1/m(y)

for all n0 ≥ n, where V1/m(y) is the closed ball of radius 1/m around y. Since H is weaklycontinuous, we obtain

lim supn→∞

H(zn)(V1/m(y)

)≤ H(z)

(V1/m(y)

)from Proposition 4.1.35, hence

H(z)(V1/m(y)

)= 1.

Because ⋂m∈N

V1/m(y) = y,

we conclude H(z)(y) = 1, thus 〈y, z〉 ∈ P . Consequently, P is a closed subset of Y × Z,hence a Borel set.

3. Since P is closed, the cut Py at y is closed as well, and we have

J(y)(Py) = J(y)(z ∈ Z | H(z)(y) = 1 = 1,

thus we obtain supp(J(y)) ⊆ Py, because the support supp(J(y)) is the smallest closed set Cwith J(y)(C) = 1. Since y 7→ supp(J(y)) is measurable, as we have seen in Example 4.7.5,we obtain from Theorem 4.7.2 a measurable map f : Y → Z with f(y) ∈ supp(J(y)) ⊆ Pyfor all y ∈ Y , thus H(f(y))(y) = 1 for all y ∈ Y .

4. In the same way we obtain measurable g : Z → Y with the desired properties. a

Discussing the maps f, g obtained above from H and J , we see that

H f = eY and J g = eZ ,

Page 477: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.10. THE WEAK TOPOLOGY 465

and we calculate through the change of variables formula in Corollary 4.8.9 for each z0 ∈ Yand each H ∈ B(Z)

(H∗(S(f) H)

)(z0)(H) =

∫ZH(z)(H) (S(f) H)(z0)(dz)

=

∫YH(f(y))(H) H(z0)(dy)

=

∫Yδy(H) H(z0)(dy)

= H(z0)(H).

Thus H∗(S(f) H) = H, and because H is a mono, we infer that S(f) H = eZ . Since

S(f) H = (eZ f)∗H = J∗H

we infer on account of H being an epi that J = eZ f . Similarly we see that H = eY g.

Lemma 4.10.27 Given stochastic relations J : Y Z and H : Z Y with H ∗J = eY andJ ∗H = eZ , there exist Borel isomorphisms f : Y → Z and g : Z → Y with J = eZ f, andH = eY g.

Proof We infer for y ∈ Y from

δy(G) = eY (y)(G)

= (H ∗ J)(y)(G)

=

∫ZH(z)(G) dJ(y)(z)

= δf(y)(g−1[G])

= δy(f−1[g−1[G]]

)

for all Borel sets G ∈ B(Y ) that g f = idY , similarly, f g = idZ is inferred. Hence theBorel maps f and g are bijections, thus Borel isomorphisms. a

This yields a characterization of the quotient equivalence relation in the Kleisli category forthe Giry monad.

Proposition 4.10.28 Assume the stochastic relations K : X Y and L : X Z are bothepimorphisms with respect to Kleisli composition, then these conditions are equivalent

1. K ≈ L.

2. L = S(f) K for a Borel isomorphism f : Y → Z.

Proof 1 ⇒ 2: Because K ≈ L, there exists an invertible J : Y Z with inverse H : Z Yand L = J ∗K. We infer from Lemma 4.10.27 the existence of a Borel isomorphism f : Y → Z

Page 478: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

466 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

such that J = ηZ f . Consequently, we have for x ∈ X and the Borel set H ∈ B(Z)

L(x)(H) =

∫YJ(y)(H) dK(x)(y)

=

∫Yδf(y)(H) dK(x)(y)

= K(x)(f−1[H])

=(S(f) K

)(x)(H)

2 ⇒ 1: If L = S(f) K = (ηZ f) ∗ K for the Borel isomorphism f : Y → Z, thenK = (ηY g) ∗ L with g : Z → Y as the inverse to f . a

Consequently, given the epimorphisms K : X Y and L : X Z, the relation K ≈ Lentails their their base spaces Y and Z being Borel isomorphic, and vice versa. Hence theBorel isomorphism classes are the quotient objects for this relation.

This classification should be complemented by a characterization of epimorphic Kleisli mor-phisms for this monad. This seems to be an open question.

4.11 Lp-Spaces

We will construct now for a measure space (X,A, µ) a family Lp(µ) | 1 ≤ p ≤ ∞ of Banachspaces. Some properties of these spaces are discussed now, in particular we will identify theirdual spaces. The case p = 2 gives the particularly interesting space L2(µ), which is a Hilbertspace under the inner product 〈f, g〉 7→

∫X f · g dµ. Hilbert spaces have some properties which

will turn out to be helpful, and which will be exploited for the underlying measure spaces.For example, von Neumann obtained from a representation of their continuous linear mapsboth the Lebesgue decomposition and the Radon-Nikodym Theorem derivative in one step!We join Rudin’s exposition [Rud74, Section 6] in giving the truly ravishing proof here. Butwe are jumping ahead. After investigating the basic properties of Hilbert spaces including theclosest approximation property and the identification of continuous linear functions we moveto a discussion of the more general Lp-spaces and investigate the positive linear functionalson them.

Some important developments like the definition of signed measures are briefly touched, someare not. The topics which had to be omitted here include the weak topology induced by Lqon Lp for conjugate pairs p, q; this would have required some investigations into convexity,which would have led into a wonderful, wondrous but unfortunately far-away country.

The last section deals with disintegration as an application of both the Radon-Nikodymderivative and the Hahn Extension Theorem. It deals with the problem of decomposing afinite measure on a product into its projection onto the first component and an associatedtransition kernel. This corresponds to reversing a Markov transition system with a giveninitial distribution akin the converse of a relation in a set-oriented setting.

Page 479: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 467

4.11.1 A Spoonful Hilbert Space Theory

Let H be a real vector space. A map (·, ·) : H × H → R is said to be an inner product iffInner

productthese conditions hold for all x, y, z ∈ H and all α, β ∈ R:

1. (x, y) = (y, x),so the inner product is commutative.

2. (α · x+ β · z, y) = α · (x, y) + β · (z, y), so the inner product is linear in the first, hencealso in the second component.

3. (x, x) ≥ 0, and (x, x) = 0 iff x = 0.

We confine ourselves to real vector spaces. Hence the laws for the inner product are somewhatsimplified in comparison to vector spaces over the complex number. There one would, e.g.postulate that (y, x) is the complex conjugate for (x, y).

The inner product is the natural generalization of the scalar product in Euclidean spaces

(〈x1, . . . , xn〉, 〈y1, . . . , yn〉) :=n∑i=1

xi · yi,

which satisfies these laws, as one verifies readily.

We fix an inner product (·, ·) on H. Define the norm of x ∈ H through

‖x‖ :=√

(x, x),

this is possible because (x, x) ≥ 0.

The norm has a very appealing geometric property, which is known as the parallelogramlaw : The sum of the squares of the diagonals is the sum of the squares of the sides in aparallelogram.

Parallelo-gram

law‖x+ y‖2 + ‖x− y‖2 = 2 · ‖x‖2 + 2 · ‖y‖2

holds for all x, y ∈ H, see Exercise 4.30.

Before investigating ‖ · ‖ in detail, we need the Schwarz inequality as a tool. It relates thenorm to the inner product of two elements. Here it is.

Lemma 4.11.1 |(x, y)| ≤ ‖x‖ · ‖y‖.Schwarz

inequality

Proof Let a := ‖x‖2, b := ‖y‖2, and c := |(x, y)|. Then c = t · (x, y) with t ∈ −1,+1. Wehave for each real r

0 ≤ (x− r · t · y, x− r · t · y) = (x, x)− 2 · r · t · (x, y) + r2 · (y, y),

thus a− 2 · r · c+ r2 · b ≥ 0. If b = 0, we must also have c = 0, otherwise the inequality wouldbe false for large positive r. Hence the inequality is true in this case. So we may assume thatb 6= 0. Put r := c/b, so that a ≥ c2/b, so that a · b ≥ c2, from which the desired inequalityfollows. a

Schwarz’s inequality will help in establishing that a vector space with an inner product is anormed space, as introduced in Definition 3.6.38.

Page 480: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

468 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proposition 4.11.2 Let H be a real vector space with an inner product, then (H, ‖ · ‖) is anormed space.

Proof It is clear from the definition of the inner product that ‖α · x‖ = |α| · ‖x‖, and that‖x‖ = 0 iff x = 0; the crucial point is the triangle inequality. We have

‖x+ y‖2 = (x+ y, x+ y) = ‖x‖2 + ‖y‖2 + 2 · (x, y)

≤ ‖x‖2 + 2 · ‖x‖ · ‖y‖+ ‖y‖2 (by Lemma 4.11.1)

= (‖x‖+ ‖y‖)2.

a

Thus each inner product space yields a normed space, consequently it spawns a metric spacethrough 〈x, y〉 7→ ‖x− y‖. Finite dimensional vector spaces Rn are Hilbert spaces under theinner product mentioned above. It produces for Rn the familiar Euclidean distance

‖x− y‖ =

√√√√ n∑i=1

(xi − yi)2.

We will meet square integrable functions as another class of Hilbert spaces, but before dis-cussing them, we need some preparations.

Corollary 4.11.3 The maps x 7→ ‖x‖ and x 7→ (x, y) with y ∈ H fixed are continuous.

Proof We obtain from ‖x‖ ≤ ‖y‖+‖x−y‖ and ‖y‖ ≤ ‖x‖+‖x−y‖ that∣∣‖x‖−‖y‖∣∣ ≤ ‖x−y‖,

hence the norm is continuous. From Schwarz’s inequality we see that |(x, y) − (x′, y)| =|(x− x′, y)| ≤ ‖x− x′‖ · ‖y‖, which shows that (·, y) is continuous. a

From the properties of the inner product it is apparent that x 7→ (x, y) is a continuous linearfunctional in the following sense.

Definition 4.11.4 Let H be an inner product space with norm ‖·‖. A linear map L : H → Rwhich is continuous in the norm topology is called a continuous linear functional on H.

We saw linear functionals already in Section 1.5.4, where the domination through a sublinearmap was concentrated on, leading to an extension. This, however, is not the focus in thepresent discussion.

If L : H → R is a continuous linear functional, then its kernelKern(L)

Kern(L) := x ∈ H | L(x) = 0

is a closed linear subspace of H, i.e., is a real vector space in its own right. Note that

〈x, y〉 ∈ ker (L) iff x− y ∈ Kern(L),

so that both versions of kernels are related in an obvious way.

Say that x ∈ H is orthogonal to y ∈ H iff (x, y) = 0, and denote this by x⊥y. This is thegeneralization of the familiar concept of orthogonality in Euclidean spaces, which is formulated

Page 481: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 469

also in terms of the inner product. Given a linear subspace M ⊆ H, define the orthogonalcomplement M⊥ of M asM⊥

M⊥ := y ∈ H | x⊥y for all x ∈M.

The orthogonal complement is a linear subspace as well, and it is closed by Corollary 4.11.3,since M =

⋂x∈My ∈ H | (x, y) = 0. Then M ∩M⊥ = 0, since a vector z ∈ M ∩M⊥ is

orthogonal to itself, hence (z, z) = 0, which implies z = 0.

Hilbert spaces are introduced now as those linear spaces for which this metric is complete.Our goal is to show that continuous linear functionals on a Hilbert space H are given exactlythrough the inner product.

Definition 4.11.5 A Hilbert space is a real vector space which is a complete metric spaceunder the induced metric.

Note that we fix the metric for which the space is to be complete, noting that completenessis not a property of the underlying topological space but rather of a specific metric. It is alsoworth while noting that a Hilbert space is a topological group, as discussed in Example 3.1.25,and hence that it is a complete uniform space.

Recall that a subset C ⊆ H is called convex iff it contains with two points also the straight Convexline between them, thus iff α · x+ (1− α) · y ∈ C, whenever x, y ∈ C and 0 ≤ α ≤ 1.

A key tool for our development is the observation that a closed convex subset of a Hilbertspace has a unique element of smallest norm. This property is familiar from Euclidean spaces.Visualize a compact convex set in R3, then this set has a unique point which is closest to theorigin. The statement below is more general, because it refers to closed and convex sets.

Proposition 4.11.6 Let C ⊆ H be a closed and convex subset of the Hilbert space H. Thenthere exists a unique y ∈ C such that ‖y‖ = infz∈C ‖z‖.

Proof 0. We construct a sequence (xn)n∈N in C the norms of which converge against the Planinfimum of the vectors’ length in C. Using the parallelogram law, we find that the vectorsthemselves converge to a point of minimal length, which by convexity must belong the C.1. Put r := infz∈C ‖z‖, and let x, y ∈ C, hence by convexity (x + y)/2 ∈ C as well. Theparallelogram law gives

‖x− y‖2 = 2 · ‖x‖2 + 2 · ‖y‖2 − 4 · ‖(x+ y)/2‖2 ≤ 2 · ‖x‖2 + 2 · ‖y‖2 − 4 · r2.

Hence if we have two vectors x ∈ C and y ∈ C of minimal norm, we obtain x = y. Thus, ifsuch a vector exists, it must be unique.

2. Let (xn)n∈N be a sequence in C such that limn→∞ ‖xn‖ = r. At this point, we have onlyinformation about the sequence

(‖xx‖

)n∈N of real numbers, but we can actually show that

the sequence proper is a Cauchy sequence. It works like this. We obtain, again from theparallelogram law, the estimate

‖xn − xm‖ ≤ 2 · (‖xn‖2 + ‖xm‖2 − 2 · r2),

Page 482: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

470 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

so that for each ε > 0 we find n0 such that ‖xn − xm‖ < ε if n,m ≥ n0. Hence (xn)n∈N isactually a Cauchy sequence, and sinceH is complete, we find some x such that limn→∞ xn = x.Clearly, ‖x‖ = r, and since C is closed, we infer that x ∈ C. a

Note how the geometric properties of an inner product space, formulated through the paral-lelogram law, and the metric property of being complete cooperate.

This unique approximation property has two remarkable consequences. The first one estab-lishes for each element x ∈ H a unique representation as x = x1 + x2 with x1 ∈ M andx2 ∈ M⊥ for a closed linear subspace M of H, and the second one shows that the only con-tinuous linear maps on the Hilbert space H are given by the maps λx.(x, y) for y ∈ H. Weneed the first one for establishing the second one, so both find their place in this somewhatminimal discussion of Hilbert spaces.

Proposition 4.11.7 Let H be a Hilbert space, M ⊆ H a closed linear subspace. Each x ∈ Hhas a unique representation x = x1 + x2 with x1 ∈M and x2 ∈M⊥.

Proof 1. If such a representation exists, it must be unique. In fact, assume that x1 + x2 =x = y1 + y2 with x1, y1 ∈ M and x2, y2 ∈ M⊥, then x1 − y1 = y2 − x2 ∈ M ∩M⊥, whichimplies x1 = y1 and x2 = y2 by the remark above.

2. Fix x ∈ H, we may and do assume that x 6∈ M , and define C := x − y | y ∈ M, thenC is convex, and, because M is closed, it is closed as well. Thus we find an element in Cwhich is of smallest norm, say, x− x1 with x1 ∈ M . Put x2 := x− x1, and we have to showthat x2 ∈ M⊥, hence that (x2, y) = 0 for any y ∈ M . Let y ∈ M,y 6= 0 and choose α ∈ Rarbitrarily (for the moment, we’ll fix it later). Then x2 − α · y = x − (x1 + α · y) ∈ C, thus‖x2 − α · y‖2 ≥ ‖x2‖2. Expanding, we obtain

(x2 − α · y, x2 − α · y) = (x2, x2)− 2 · α · (x2, y) + α2 · (y, y) ≥ (x2, x2).

Now put α := (x2, y)/(y, y), then the above inequality yields

−2 · (x2, y)2

(y, y)+

(x2, y)2

(y, y)≥ 0,

which implies −(x2, y)2 ≥ 0, hence (x2, y) = 0. Thus x2 ∈M⊥. a

Thus H is decomposed into M and M⊥ for any closed linear subspace M of H in the sensethat each element of H can be written as a sum of elements of M and of M⊥, and, evenbetter, this decomposition is unique. These elements are perceived as the projections to thesubspaces. In the case that we can represent M as the kernel x ∈ H | L(x) = 0 of acontinuous linear map L : H → R with L 6= 0 we can say actually more.

Lemma 4.11.8 Given Hilbert space H, let L : H → R be a continuous linear functional withL 6= 0. Then Kern(L)⊥ is isomorphic to R

Proof Define ϕ(y) := L(y) for y ∈ Kern(L)⊥. Then ϕ(α · y + β · y′) = α · ϕ(y) + β · ϕ(y′)follows from the linearity of L. If ϕ(y) = ϕ(y′), then y − y′ ∈ Kern(L) ∩Kern(L)⊥, so thaty = y′, hence ϕ is one-to-one. Given t ∈ R, we find x ∈ H with L(x) = t; decompose x asx1 +x2 with x1 ∈ Kern(L) and x2 ∈ Kern(L)⊥, then ϕ(x2) = L(x−x1) = t. Thus ϕ is onto.Hence we have found a linear and bijective map Kern(L)⊥ → R. a

Page 483: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 471

Returning to the decomposition of an element x ∈ H, we fix an arbitrary y ∈ Kern(L) \ 0.Then we may write x = x1 +α ·y, where α ∈ R. This follows immediately from Lemma 4.11.8,and it has the consequence we are aiming at.

Theorem 4.11.9 Let H be a Hilbert space, L : H → R be a continuous linear functional.Then there exists y ∈ H with L(x) = (x, y) for all x ∈ H.

Proof If L = 0, this is trivial. Hence we assume that L 6= 0. Thus we can find z ∈ Kern(L)⊥

with L(z) = 1; put y = γ ·z so that L(y) = (y, y). Each x ∈ H can be written as x = x1 +α ·ywith x1 ∈ Kern(L). Hence

L(x) = L(x1 + α · y) = α · L(y) = α · (y, y) = (x1 + α · y, y) = (x, y).

Thus L = λx.(x, y) is established. a

Thus a Hilbert space does not only determine the space of all continuous linear maps onit, but it is actually this space. Wonderful world of Hilbert spaces! This properties setsHilbert spaces apart, making them particularly interesting for applications, e.g., in quantumcomputing.

The rather abstract view of Hilbert spaces discussed in this section will be put to use now tothe more specific case of integrable functions.

4.11.2 The Lp-Spaces are Banach Spaces

We will investigate now the structure of integrable functions for a fixed σ-finite measurespace (X,A, µ). We will obtain a family of Banach spaces, all of which have some interestingproperties. In the course of investigations, we will usually not distinguish between functionswhich differ only on a set of measure zero (because the measure will not be aware of thedifferences). For this, we introduced above the equivalence relation =µ (“equal µ.almosteverywhere”) in with f =µ g iff µ(x ∈ X | f(x) 6= g(x)) = 0, see Section 4.2.1 on page 344.In those cases where we will need to look at the value of a function at certain points, we willmake sure that we will point out the difference.

Let us see how this works in practice. Define L1(µ)

L1(µ) := f ∈ F(X,A) |∫X |f | dµ <∞,

thus f ∈ L1(µ) iff f : X → R is measurable and has a finite µ-integral.

Then this space defines a vector space and closed with respect to | · |, hence we have imme-diately

Proposition 4.11.10 L1(µ) is a vector lattice.a

Now put L1(µ)L1(µ) := [f ] | f ∈ L1(µ),

then we have to explain how to perform the algebraic operations on the equivalence classes(note that we write [f ] rather than [f ]µ, which we will do when more than one measure has

Page 484: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

472 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

to be involved). Since the set of all nullsets is a σ-ideal, these operations are easily shown tobe well-defined:

[f ] + [g] := [f + g],

[f ] · [g] := [f · g],

α · [f ] := [α · f ]

Thus we obtain

Proposition 4.11.11 L1(µ) is a vector lattice.a

Let f ∈ L1(µ), then we define‖ · ‖1‖f‖1 :=

∫X|f | dµ

as the L1-norm for f . Let us have a look at the properties which a decent norm should have.First, we have ‖f‖1 ≥ 0, and ‖α ·f‖1 = α · ‖f‖1, this is immediate. Because |f+g| ≤ |f |+ |g|,the triangle inequality holds. Finally, let ‖f‖1 = 0, thus

∫X |f | dµ = 0, consequently, f =µ 0,

which means f = [0].

This will be a basis for the definition of a whole family of linear spaces of integrable functions.Call the positive real numbers p and q conjugate iff they satisfy

Conjugatenumbers

1

p+

1

q= 1

(for example, 2 is conjugate to itself). This may be extended to p = 0, so that we also consider0 and ∞ as conjugate numbers, but using this pair will be made explicit.

The first step for extending the definition of L1 will be Holder’s inequality, which is based onthis simple geometric fact:

Lemma 4.11.12 Let a, b be positive real numbers, p > 0 conjugate to q, then

a · b ≤ ap

p+bq

q,

equality holding iff b = ap−1.

Proof The exponential function is convex, i.e., we have

e(1−α)·x−α·y ≤ (1− α) · ex + α · ey

for all x, y ∈ R and 0 ≤ α ≤ 1. Because both a > 0 and b > 0, we find r, s such that a = er/p

and b = es/q. Since p and q are conjugate, we obtain from 1/p = 1− 1/q

a · b = er/p+s/q ≤ es

p+eq

q=ap

p+bq

q.

a

Page 485: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 473

This betrays one of the secrets of conjugate p and q, viz., that they give rise to a convexcombination.

We are ready to formulate and prove Holder’s inequality, arguably one of the most fre-quently used inequalities in integration (as we will see as well); the proof follows the onegiven for [Rud74, Theorem 3.5].

Proposition 4.11.13 Let p > 0 and q > 0 be conjugate, f and g be non-negative measurablefunctions on X. Then

Holder’sinequality∫

Xf · g dµ ≤

(∫Xfp dµ

)1/p · (∫Xgq dµ

)1/q.

Proof Put for simplicity

A :=(∫Xfp dµ

)1/pand B :=

(∫Xgq dµ

)1/q.

If A = 0, we may conclude from f =µ 0 that f · g =µ 0, so there is nothing to prove. If A > 0and B =∞, the inequality is trivial, so we assume that 0 < A <∞, 0 < B <∞. Put

F :=f

A,G :=

g

B,

thus we obtain ∫XF p dµ =

∫XGq dµ = 1.

We obtain F (x)·G(x) ≤ F (x)p/p+G(x)q/q for every x ∈ X from Lemma 4.11.12, hence∫XF ·G dµ ≤ 1

p·∫XF p dµ+

1

q·∫XGq dµ ≤ 1

p+

1

q= 1.

Multiplying both sides with A ·B > 0 now yields the desired result. a

This gives Minkowski’s inequality as a consequence. Put for f : X → R measurable, and forMinkowski’s

inequalityp ≥ 1

‖f‖p :=(∫X|f |p dµ

)1/p.

Proposition 4.11.14 Let 1 ≤ p <∞ and let f and g be non-negative measurable functionson X. Then

‖f + g‖p ≤ ‖f‖p + ‖g‖p

Proof The inequality follows for p = 1 from the triangle inequality for | · |, so we may assumethat p > 1. We may also assume that f, g ≥ 0. Then we obtain from Holder’s inequality withq conjugate to p

‖f + g‖pp =

∫X

(f + g)p−1 · f dµ+

∫X

(f + g)p−1 · g dµ

≤ ‖f + g‖p/qp ·(‖f‖p + ‖g‖p

)

Page 486: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

474 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Now assume that ‖f + g‖p =∞, we may divide by the factor ‖f + g‖p/qp , and we obtain thedesired inequality from p− p/q = p · (1− 1/q) = 1. If, however, the left hand side is infinite,then the inequality

(f + g)p ≤ 2p ·maxfp, gp ≤ 2p · (fp + gp)

shows that the right hand side is infinite as well. a

Given 1 ≤ p <∞, define

Lp(µ) := f ∈ F(X,A) | ‖f‖p <∞

with Lp(µ) as the corresponding set of =µ-equivalence classes. An immediate consequenceLp(µ), Lp(µ)from Minkowski’s inequality is

Proposition 4.11.15 Lp(µ) is a linear space over R, and ‖ · ‖p is a pseudo-norm on it.Lp(µ) is a normed space.

Proof It is immediate from Proposition 4.11.14 that f + g ∈ Lp(µ) whenever f, g ∈ Lp(µ),and Lp(µ) is closed under scalar multiplication as well. That ‖ · ‖p is a pseudo-norm isalso immediate. Because scalar multiplication and addition are compatible with formingequivalence classes, the set Lp(µ) of classes is a real vector space as well. As usual, we willidentify f with its class, unless otherwise stated. Now f ∈ Lp(µ) with ‖f‖p = 0, then |f | =µ 0,hence f =µ 0, thus f = 0. So ‖ · ‖p is a norm on Lp(µ). a

In Section 4.2.1 the vector spaces L∞(µ) and L∞(µ) are introduced, so we have now a family(Lp(µ)

)1≤p≤∞ of vector spaces together with their associated spaces

(Lp(µ)

)1≤p≤∞ of µ-

equivalence classes, which are normed spaces. They share the property of being Banachspaces.

Proposition 4.11.16 Lp(µ) is a Banach space for 1 ≤ p ≤ ∞.

Proof 1. Let us first assume that the measure is finite. We know already from Proposi-tion 4.2.9 that L∞(µ) is a Banach space, so we may assume that p <∞.

Given (fn)n∈N as a Cauchy sequence in Lp(µ), then we obtain

εp · µ(x ∈ X | |fn − fm| ≥ ε) ≤∫X|fn − fm|p dµ.

for ε > 0. Thus (fn)n∈N is a Cauchy sequence for convergence in measure, so we can find

f ∈ F(X,A) such that fni.m.−→ f by Proposition 4.2.21. Proposition 4.2.16 tells us that we can

find a subsequence (fnk)k∈N such that fnka.e.−→ f . But we do not yet know that f ∈ Lp(µ).

We infer limk→∞ |fnk − f |p = 0 outside a set of measure zero. Thus we obtain from Fatou’sLemma (Proposition 4.8.5) for every n ∈ N∫

X|f − fn|p dµ ≤ lim inf

k→∞

∫X|fnk − fn|

p dµ.

Thus f − fn ∈ Lp(µ) for all n ∈ N, and from f = (f − fn) + fn we infer f ∈ Lp(µ), sinceLp(µ) is closed under addition. We see also that ‖f − fn‖p → 0, as n→∞.

2. If the measure space is σ-finite, we may write∫X f dµ as limn→∞

∫Anf dµ, where µ(An) <

∞ for an increasing sequence (An)n∈N of measurable sets with⋃n∈NAn = X. Since the

Page 487: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 475

restriction to each An yields a finite measure space, where the result holds, it is not difficultto see that completeness holds for the whole space as well. Specifically, given ε > 0, thereexists n0 ∈ N so that for all n,m ≥ n0

‖fn − fm‖p ≤ ‖fn − fm‖(n)p + ε

holds, with ‖g‖(n)p :=

(∫X |g|

p dµn)1/p

, and µn : B 7→ µ(B ∩ An) as the measure µ localized

to An. Then ‖fn − f‖(n)p → 0, from which we obtain ‖fn − f‖p → 0. Hence completeness is

also valid for the σ-finite case. a

Example 4.11.17 Let | · | be the counting measure on (N,P (N)), then this is a σ-finitemeasure space. Define

`p := Lp(| · |), 1 ≤ p <∞,`∞ := L∞(| · |).

Then `p is the set of all real sequences (xn)n∈N with∑

n∈N |xn|p < ∞, and (xn)n∈N ∈ `∞ iffsupn∈N |xn| < ∞. Note that we do not need to pass to equivalence classes, since |A| = 0 iffA = ∅. These spaces are well known and well studied; here they make their only appearance.,

The case p = 2 deserves particular attention, since the norm is in this case obtained from theinner product

(f, g) :=

∫Xf · g dµ.

In fact, linearity of the integral shows that

(α · f + β · g, h) = α · (f, h) + β · (g, h)

holds, commutativity of multiplications yields (f, g) = (g, f), finally it is clear that (f, f) ≥ 0always holds. If we have f ∈ L2(µ) with f =µ 0, then we know that also (f, f) = 0, thus(f, f) = 0 iff f = 0 in L2(µ).

Thus we obtain from Proposition 4.11.16

Corollary 4.11.18 Lp(µ) is a Hilbert space with the inner product (f, g) :=∫X f · g dµ. a

This will have some interesting consequences, which we will explore in Section 4.11.3.

Before doing so, we show that the step functions belonging to Lp are dense.

Corollary 4.11.19 Given 1 ≤ p <∞, the set

D := f ∈ T (X,A) | µ(x ∈ X | f(x) 6= 0) <∞

is dense in Lp(µ) with respect to ‖ · ‖p.

Proof The proof makes use of the fact that the step functions are dense with respect topointwise convergence: we’ll just have to mark those functions which are in Lp(µ). Assumethat f ∈ Lp(µ) with f ≥ 0, then there exists by Proposition 4.2.4 an increasing sequence(gn)n∈N of step functions with f(x) = limn→∞ fn(x). Because 0 ≤ gn ≤ f , we conclude gn ∈D, and we know from Lebesgue’s Dominated Convergence Theorem 4.8.6 that ‖f−gn‖p → 0.

Page 488: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

476 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Thus every non-negative element of Lp(µ) can be approximated through elements of D in the‖ · ‖p-norm. In the general case, decompose f = f+ − f− and apply the argument to bothsummands separately. a

Because the rationals form a countable and dense subset of the reals, we take all step functionswith rational coefficients, and obtain

Corollary 4.11.20 Lp(µ) is a separable Banach space for 1 ≤ p <∞. a

Note that we did exclude the case p = ∞; in fact, L∞(µ) is usually not a separable Banachspace, as this example shows.

Example 4.11.21 Let λ be Lebesgue measure on the Borel sets of the unit interval [0, 1].Put ft := χ[0,t] for 0 ≤ t ≤ 1, then ft ∈ L∞(λ) for all t, and we have ||fs − ft||λ∞ = 1 for0 < s < t < 1. Let

Kt := f ∈ L∞(λ) | ||f − ft||λ∞ < 1/2,

thus Ks ∩Kt = ∅ for s 6= t (if g ∈ Ks ∩Kt, then ||fs − ft||λ∞ ≤ ||g − ft||λ∞ + ||fs − g||λ∞ < 1).On the other hand, each Kt is open, so if we have a countable subset D ⊆ L∞(λ), thenKt ∩D = ∅ for uncountably many t. Thus D cannot be dense. But this means that L∞(λ)is not separable. ,

This is the first installment on the properties of Lp-spaces. We will be back with a generaldiscussion in Section 4.11.4 after having explored the Lebesgue-Radon-Nikodym Theorem asa valuable tool in general, and for our discussion in particular.

4.11.3 The Lebesgue-Radon-Nikodym Theorem

The Hilbert space structure of the L2 spaces will now be used for decomposing a measureinto an absolutely and a singular part with respect to another measure, and for constructinga density. This construction requires a more general study of the relationship between twomeasures.

We even go a bit beyond that and define absolute continuity and singularity as a relationshipof two arbitrary additive set functions. This will be specialized fairly quickly to a relationshipbetween finite measures, but this added generality will turn out to be beneficial nevertheless,as we will see.

Definition 4.11.22 Let (X,A) be a measurable space with two additive set functions ρ, ζ :A → R.

1. ρ is said to be absolutely continuous with respect to ζ (ρ << ζ) iff ρ(E) = 0 for everyρ << ζE ∈ A for which ζ(A) = 0.

2. ρ is said to be concentrated on A ∈ A iff ρ(E) = ρ(E ∩A) for all E ∈ A.

3. ρ and ζ are called mutually singular (ρ ⊥ ζ) iff there exists a pair of disjoint sets Aρ ⊥ ζand B such that ρ is concentrated on A and ζ is concentrated on B.

Page 489: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 477

If two additive set functions are mutually singular, they live on disjoint measurable sets inthe same measurable space. These are elementary properties.

Lemma 4.11.23 Let ρ1, ρ2, ζ : A → R additive set functions, then we have for a1, a2 ∈ R

1. If ρ1 ⊥ ζ, and ρ2 ⊥ ζ, then a1 · ρ1 + a2 · ρ2 ⊥ ζ.

2. If ρ1 << ζ, and ρ2 << ζ, then a1 · ρ1 + a2 · ρ2 << ζ.

3. If ρ1 << ζ and ρ2 ⊥ ζ, then ρ1 ⊥ ρ2.

4. If ρ << ζ and ρ ⊥ ζ, then ρ = 0.

Proof 1. For proving 1, note that we can find a measurable set B and sets A1, A2 ∈ A withB ∩ (A1 ∪A2) = ∅ with ζ(E) = ζ(E ∩B) and ρi(E) = ρi(E ∩Ai) for i = 1, 2. By additivity,we obtain (a1 ·ρ1 +a2 ·ρ2)(E) = (a1 ·ρ1 +a2 ·ρ2)(E∩ (A1∪A2)). Property 2 is obvious.

2. ρ2 is concentrated on A2, ζ is concentrated on B with A ∩ B = ∅, hence ζ(E ∩ A2) = 0,thus ρ1(E ∩ A2) = 0 for all E ∈ A. Additivity implies ρ1(E) = ρ1

(E ∩ (X \ A2)

), so ρ1 is

concentrated on X \ A2. This proves 3. For proving 4, note that ρ << ζ and ρ ⊥ ζ implyρ ⊥ ρ by property 3, which implies ρ = 0. a

We specialize these relations now to finite measures on A. Absolute continuity can be ex-pressed in a different way, which makes the concept more transparent. Specifically, absolutecontinuity could have been be defined akin to the well-known ε-δ definition of continuity forreal functions. Then the name becomes a bit more descriptive.

Lemma 4.11.24 Given measures µ and ν on a measurable space (X,A), these conditionsare equivalent:

1. µ << ν.

2. For every ε > 0 there exists δ > 0 such that ν(A) < δ implies µ(A) < ε for all measurablesets A ∈ A.

Proof 1 ⇒ 2: Assume that we can find ε > 0 so that there exist sets An ∈ A withν(An) < 2−n but µ(An) ≥ ε. Then we have µ(

⋃k≥nAk) ≥ ε for all n ∈ N, consequently,

by monotone convergence, also µ(⋂

n∈N⋃k≥nAk

)≥ ε. On the other hand, ν(

⋃k≥nAk) ≤∑

k≥n 2−k = 2−n+1 for all n ∈ N, so by monotone convergence again, ν(⋂

n∈N⋃k≥nAk

)= 0.

Thus µ << ν does not hold.

2 ⇒ 1: Let ν(A) = 0, then µ(A) ≤ ε for every ε > 0, hence µ << ν is true. a

Given two measures µ and ν, one, say µ, can be decomposed uniquely as a sum µa + µssuch that µa << ν and µs ⊥ ν, additionally µs ⊥ µa holds. This is stated and proved in thefollowing theorem, which actually shows much more, viz., that there exists a density h of µa Densitywith respect to ν. This means that µa(A) =

∫A h dν holds for all A ∈ A.

Densities are familiar from probability distributions, for example, the normal distributionN(0, 1) has the density e−x

2/2/√

2 · π. This means that a random variable which is dis-tributed according to N(0, 1) takes values in the Borel set A ∈ B(R) with probability∫A e−x2/2/

√2 · π dx.

Page 490: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

478 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

This is but a special case. What a density is used for in our context will be described now alsoin greater detail. Before entering into formalities, it is noted that the decomposition is usuallycalled the Lebesgue decomposition of µ with respect to ν, and that the density h is usuallycalled the Radon-Nikodym derivative of µa with respect to ν and denoted by dµ/dν.

The proof both for the existence of Lebesgue decomposition and of the Radon-Nikodymderivative is done in one step. The beautiful proof given below was proposed by von Neumann,see [Rud74, 6.9]. Here we go:

Theorem 4.11.25 Let µ and ν be finite measures on (X,A).

1. There exists a unique pair µa and µs of finite measures on (X,A) such that µ = µa+µswith µa << ν, µa ⊥ ν. In addition, µa ⊥ µs holds.

2. There exists a unique h ∈ L1(ν) such that

µa(A) =

∫Ah dν

for all A ∈ A.

The line of attack will be as follows: we show that f 7→∫X f dµ is a continuous linear func-

Overviewof theproof

tional on the Hilbert space L2(µ+ ν). We can express this functional by the representationfor these functionals on Hilbert spaces through some function g ∈ L2(µ+ ν), hence∫

Xf dµ =

∫Xf · g d(µ+ ν)

(note the way the measures µ and µ+ ν interact by exploiting the integral with respect to µas a linear functional on L2(µ)). A closer investigation of g will then yield the sets we needfor the decomposition, and permit constructing the density h.

Proof 1. Define the finite measure ϕ := µ+ ν on A; note that∫Xf dϕ =

∫Xf dµ+

∫Xf dν

holds for all measurable f for which the sum on the right hand side is defined; this followsfrom Levi’s Theorem 4.8.2 (for f ≥ 0) and from additivity (for general f). We show first thatL : f 7→

∫X f dµ is a continuous linear operator on L2(ϕ). In fact,

∣∣∫Xf dµ

∣∣ ≤ ∫X|f | dϕ =

∫X|f | · 1 dϕ ≤

(∫Xd|f |2

)1/2 ·√ϕ(X)

by Schwarz’s inequality (Lemma 4.11.1). Thus

sup‖f‖2≤1

|L(f)| ≤√ϕ(X) <∞.

Hence L is continuous (Exercise 4.26), thus by Theorem 4.11.9 there exists g ∈ L2(µ) suchthat

L(f) =

∫Xf · g dϕ (4.21)

Page 491: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 479

for all f ∈ L2(µ).

2. Let f = χA for A ∈ A, then we obtain∫Ag dϕ = µ(A) ≤ ϕ(A)

from (4.21). This yields 0 ≤ g ≤ 1 ϕ-a.e.; we can change g on a set of ϕ-measure 0 to the effectthat 0 ≤ g(x) ≤ 1 holds for all x ∈ X. This will not affect the representation in (4.21).

We know that ∫X

(1− g) · f dµ =

∫Xf · g dν (4.22)

holds for all f ∈ L2(ϕ). Put

A := x ∈ X | 0 ≤ g(x) < 1,B := x ∈ X | g(x) = 1,

then A,B ∈ A, and we define for E ∈ A

µa(E) := µ(E ∩A),

µs(E) := µ(E ∩B).

If f = χB, then we obtain ν(B) =∫B g dν =

∫B 0 dµ = 0 from (4.22), thus ν(B) = 0, so that

µs ⊥ ν.

3. Replace for a fixed E ∈ A in (4.22) the function f by (1 + g + . . . + gn) · χE , then wehave ∫

E(1− gn+1) dµ =

∫Eg · (1 + g + . . .+ gn) dν.

Look at the integrand on the right hand side: it equals zero on B, and increases monotonicallyto 1 on A, hence limn→∞

∫E (1− gn+1) dµ = µ(E ∩ A) = µa(E). This provides a bound for

the left hand side for all n ∈ N. The integrand on the left hand side converges monotonicallyto some function 0 ≤ h ∈ L1(ν) with limn→∞

∫E g · (1 + g + . . .+ gn) dν =

∫E h dν by Levi’s

Theorem 4.8.2. Hence we have ∫Eh dν = µa(E)

for all E ∈ A, in particular µa << ν.

4. Assume that we can find another pair µ′a and µ′s with µ′a << ν and µ′s ⊥ ν and µ = µ′a+µ′s.Then we have µa−µ′a = µ′s−µs with µa − µ′a << ν and µ′s − µs ⊥ ν by Lemma 4.11.23, henceµs − µ′s = 0, again by Lemma 4.11.23, which implies µa − µ′a = 0. So the decomposition isunique. From this, uniqueness of the density h in inferred. a

We obtain as a consequence the well-know Radon-Nikodym Theorem:

Theorem 4.11.26 Let µ and ν be finite measures on (X,A) with µ << ν. Then there existsRadon-

NikodymTheorem

a unique h ∈ L1(µ) with µ(A) =∫A h dν for all A ∈ A. Moreover, f ∈ L1(µ) iff f ·h ∈ L1(ν),

in this case ∫Xf dµ =

∫Xf · h dν.

Page 492: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

480 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

h is called the Radon-Nikodym derivative of µ with respect to ν and sometimes denoted bydµ/dν. dµ/dν

Proof Write m = µa+µs, where µa and µs are the Lebesgue decomposition of µ with respectto ν by Theorem 4.11.25. Since µs ⊥ ν, we find µs = 0, so that µa = µ. Then apply thesecond part of Theorem 4.11.25 to µ. This accounts for the first part. The second part followsfrom this by an approximation through step functions according to Corollary 4.11.19. a

Note that the Radon-Nikodym Theorem gives a one-to-one correspondence of finite measuresµ such that µ << ν and the Banach space L1(ν).

Theorem 4.11.25 can be extended to complex measures; we will comment on this after theJordan Decomposition will be established in Proposition 4.11.32.

Both constructions have, as one might expect, a plethora of applications. We will not discussthe Lebesgue decomposition further, but rather focus on the Radon-Nikodym Theorem anddiscuss two applications, viz., identifying the dual space of the Lp-spaces for p < ∞, anddisintegrating a measure on a product space.

But this is a place to have a look at integration by substitution, a technique well-knownfrom Calculus. The multi-dimensional case has been mentioned at the end of Section 4.8.1on page 399, we deal here with the one-dimensional case. The approach displays a prettyinterplay of integrating with respect to an image measure, and the Radon-Nikodym Theorem,which should not be missed.

We prepare the stage with an auxiliary statement, which is of interest of its own.

Lemma 4.11.27 Let (X,A, µ) and (Y,B, ρ) be finite measure spaces, ψ : X → Y be measur-able and onto such that ρ∗(ψ

[A]) = 0, whenever µ(A) = 0. Put ν := M(ψ)(µ). Then there

exists a measurable function w : X → R+ such that

1. f ∈ L1(ρ) iff (f g) · w ∈ L1(µ).

2.∫Y f(y) dρ(y) =

∫X (f ψ)(x) · g(x) dµ(x) for all f ∈ L1(ρ).

Proof We show first that ρ << ν, from which we obtain a derivative. This is used thenPlanthrough the change of variable formula from Corollary 4.8.9 for obtaining the desired re-sult.

In fact, assume that ν(B) = 0 for some B ∈ B, equivalently, µ(ψ−1[B]) = 0. Thus by

assumption 0 = ρ∗(ψ[ψ−1

[B]]

) = ρ(B), since B = ψ[ψ−1

[B]]

due to ψ being onto. Thuswe find g1 : Y → R+ such that f ∈ L1(ρ) iff f · g1 ∈ L1(ν) and

∫Y f dρ =

∫Y f · g1 dν. Since

ν = M(ψ)(µ), we obtain from Corollary 4.8.9 that∫Yf dρ =

∫X

(f ψ) · (g1 ψ) dµ.

Thus putting g := g1 ψ, the assertion follows. a

The role of ν as the image measure is interesting here. It just serves as a kind of facilitator,but it remains in the background. Only the measures ρ and µ are acting, the image measureis used only for obtaining the Radon-Nikodym derivative, and for converting its integral toan integral with respect to its preimage through change of variables.

Page 493: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 481

We specialize things now to intervals on the real line and make restrictive assumptions on ψ.Then — voila! — the well known formula on integration by substitution will result.

But first a more general consequence of Lemma 4.11.27 is to be presented. We will be workingwith Lebesgue measure on intervals of the reals. Here we assume that ψ : [α, β] → [a, b] iscontinuous with the additional property that λ(A) = 0 implies λ∗(ψ

[A]) = 0 for all A ⊆

B([α, β]). This class of functions is generally known as absolutely continuous and discussedin great detail in [HS65, Section 18, Theorem (18.25)]. We obtain from Lemma 4.11.27

Corollary 4.11.28 Let [α, β] ⊆ R be a closed interval, ψ : [α, β]→ [a, b] be a surjective andabsolutely continuous function. Then there exists a Borel measurable function w : [α, β]→ Rsuch that

1. f ∈ L1([a, b], λ) iff (f ψ) · w ∈ L1([α, β], λ)

2.∫ ba f(x) dx =

∫ βα (f(ψ(t)) · w(t) dt.

Proof The assertion follows from Lemma 4.11.27 by specializing µ and ρ to λ. a

If we restrict ψ further, we obtain even more specific informations about the function w. Thefollowing proof shows how we exploit the properties of ψ, viz., being monotone and having acontinuous first derivative, through the definition of the integral as a limit of approximationson a system on subintervals which get smaller and smaller. The subdivisions in the domainare then related to the one in the range of ψ, the relationship is done through Lagrange’sTheorem which brings in the derivative. But see for yourself:

Proposition 4.11.29 Assume that ψ : [α, β] → [a, b] is continuous and monotone with acontinuous first derivative such that ψ(α) = a and ψ(β) = b. Then f is Lebesgue integrableover [a, b] iff (f ψ) · ψ′ is Lebesgue integrable over [α, β], and∫ b

af(x) dx =

∫ β

αf(ψ(z)) · ψ′(z) dz

holds.

We follow [Fic64, Nr. 316] in his proof. The basic idea is to approximate the integral through Basic ideastep functions, which are obtained by subdividing the interval [α, β] into sub intervals, andto refine the subdivisions, using uniform continuity both of ψ and ψ′ on its compact domain.So this is a fairly classical proof.

Proof 0. We may assume that f ≥ 0, otherwise we decompose f = f+−f− with f+, f− ≥ 0.Also we assume that f is bounded by some constant L, otherwise we establish the propertyfor f ∧ n with n ∈ N, and, letting n → ∞, appeal to Levi’s Theorem 4.8.2. Moreover weassume that ψ is increasing.

1. The interval [α, β] is subdivided through α = z0 < z1 < . . . < zn = β; put xi := ψ(zi),then a = x0 ≤ x1 ≤ . . . ≤ xn = b, and ∆zi := zi+1 − zi, and ∆xi := xi+1 − xi. Let` := maxi=1,...,n−1∆zi, then if `→ 0, the maximal difference maxi=1,...,n−1∆xi tends to 0 aswell, because ψ is uniformly continuous. This is so since the interval [α, β] is compact.

For approximating the integral∫ βα f(ψ(z)) · ψ′(z) dz we select ζi from each interval [zi, zi+1]

Page 494: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

482 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

and writeS :=

∑i

f(ψ(ζi)) · ψ′(ζi) ·∆zi.

Put ξi := ψ(ζi), hence xi ≤ ξi ≤ xi+1. By Lagrange’s Formula1 there exists τi ∈ [zi, zi+1] such

that ∆xi = ψ′(τi) ·∆zi, so that we can write as an approximation to the integral∫ ba f(x) dx

the sum

s :=∑i

f(ξi) ·∆zi

=∑i

f(ξi) · ψ(τi) ·∆zi

=∑i

f(ψ(ζi)) · ψ′(τi) ·∆zi.

If ` → 0, we know that s →∫ ba f(x) dx and S →

∫ βα f(ψ(z)) · ψ′(z) dz, so that we have to

get a handle at the difference |S − s|. We claim that this difference tends to zero, as ` → 0.Given ε > 0, we find δ > 0 such that |ψ′(ζi)− ψ′(τi)| < ε, provided ` < δ. This is so becauseψ′ is continuous, hence uniformly continuous. But then we obtain by telescoping

|S − s| ≤∑i

|f(ψ(ζi))| · |ψ′(ζi)− ψ′(τi)| ·∆zi < L · (β − α) · ε.

Thus the difference vanishes, and we obtain indeed the equality claimed above. a

4.11.4 Continuous Linear Functionals on Lp

After all these preparations, and an excursion into classical Calculus, we will investigate nowcontinuous linear functionals on the Lp-spaces and show that the map f 7→

∫X f dµ plays an

important role in identifying them. For full generality with respect to the functional concernedwe introduce signed measures here and show that they may be obtained in a fairly specificway from the (unsigned) measures considered so far.

But before entering into this discussion, some general remarks. If V is a real vector spacewith a norm ‖ · ‖, then a map Λ : V → R is a linear functional on V iff it is compatible withthe vector space structure, i.e., iff Λ(α ·x+β ·y) = α ·Λ(x)+β ·Λ(y) holds for all x, y ∈ V andall α, β ∈ R. If Λ 6= 0, the range of Λ is unbounded, so supx∈V |Λ(x)| = ∞. Consequently itis difficult to assign to Λ something like the sup-norm for characterizing continuity. It turnsout, however, that we may investigate continuity through the behavior of Λ on the unit ballof V , so we define

‖Λ‖ := sup‖x‖≤1

|Λ(x)|

Call Λ bounded iff ‖Λ‖ <∞. Then Λ is continuous iff Λ is bounded, see Exercise 4.26.

Now let µ be a finite measure with p and q conjugate to each other, see page 472. Define forg ∈ Lq(µ) the linear functional

Λg(f) :=

∫Xf · g dµ

1Recall that Lagrange’s Formula says the following: Assume that g is continuous on the interval [c, d] with acontinuous derivative g′ on the open interval ]c, d[. Then there exists t ∈]c, d[ such that g(d)−g(c) = g′(t)·(d−c).

Page 495: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 483

on Lp(µ), then we know from Holder’s inequality in Proposition 4.11.13 that

‖Λg‖ ≤ sup‖f‖p≤1

∫X|f · g| dµ ≤ ‖g‖q

That was easy. But what about the converse? Given a bounded linear functional Λ on Lp(µ),does there exists g ∈ Lq(µ) with Λ = Λg? It is immediate that this will not work in general,since Λg(f) ≥ 0, provided f ≥ 0. So we have to assume that Λ maps positive functions to anon-negative value. Call Λ positive iff this is the case.

Summarizing, we consider maps Λ : Lp(µ)→ R with these properties:

Linearity: Λ(α · x+ β · y) = α · Λ(x) + β · Λ(y) holds for all x, y ∈ V and all α, β ∈ R.

Boundedness: ‖Λ‖ := sup‖f‖p=1 |Λ(f)| ≤ ∞ (hence |Λ(f)| ≤ ‖Λ‖ · ‖f‖p for all f).

Positiveness: f ≥ 0 ⇒ Λ(f) ≥ 0 (note that f ≥ 0 means f ′ ≥ 0-almost everywhere withrespect to µ for each representative f ′ of f by our convention).

We will first work on this restricted problem, and then we will expand the answer. Thiswill require a slight generalization: we will talk about signed measures rather than aboutmeasures.

Let’s jump right in:

Theorem 4.11.30 Assume that µ is a finite measure on (X,A), 1 ≤ p <∞, and that Λ is abounded positive linear functional on Lp(µ). Then there exists a unique g ∈ Lq(µ) such that

Λ(f) =

∫Xf · g dµ

holds for each f ∈ Lp(µ). In addition, ‖Λ‖ = ‖g‖q.

This is our line of attack: We will first see that we obtain from Λ a finite measure ν on A suchLine ofattack

that ν << µ. The Radon-Nikodym Theorem will then give us a density g := dν/dµ which willturn out to be the function we are looking for. This is shown by separating the cases p = 1and p > 1.

Proof 1. Define for A ∈ Aν(A) := Λ(χA).

Then A ⊆ B implies χA ≤ χB, hence Λ(χA) ≤ Λ(χB). Because Λ is monotone, ν is monotoneas well. Since Λ is linear, we have ν(∅) = 0, and ν is additive. Let (An)n∈N be an increasingsequence of measurable sets with A :=

⋃n∈NAn, then χA\An → 0, and thus

ν(A)− ν(An) = ‖χA\An‖pp = Λ(χA\An)p → 0,

since Λ is continuous. Thus Λ is a finite measure on A (note ν(X) = Λ(1) < ∞). Ifµ(A) = 0, we see that χA =µ 0, thus Λ(χA) = 0 (we are dealing with the =µ-class of χA),so that ν(A) = 0. Thus ν << µ, and the Radon-Nikodym Theorem 4.11.26 provides us withg ∈ L1(µ) with

Λ(χA) = ν(A) =

∫Ag dµ

Page 496: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

484 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

for all A ∈ A. Since the integral as well as Λ are linear, we obtain from this

Λ(f) =

∫Xf · g dµ

for all step functions f .

2. We have to show that g ∈ Lq(µ). Consider these cases.

Case p = 1: We have for each A ∈ A∣∣∫Ag dµ

∣∣ ≤ |Λ(χA)| ≤ ‖Λ‖ · ‖χA‖1 = ‖Λ‖ · µ(A)

But this implies |g(x)| ≤µ ‖Λ‖, thus ‖g‖∞ ≤ ‖Λ‖.

Case 1 < p <∞: Lett := χx∈X|g(x)≥0 − χx∈X|g(x)<0,

then |g| = t · g, and t is measurable, since g is. Define An := x ∈ X | |g(x)| ≤ n, andput f := χAn · |g|q−1 · t. Then

|f |p · χAn = |g|(q−1)·p · χAn= |g|q · χAn ,

χAn · (f · g) = χAn · |g|q−1 · t · g= χAn · |g|q · t,

thus ∫An

|g|q dµ =

∫An

f · g dµ = Λ(f) ≤ ‖Λ‖ ·(∫An

|g|q dµ)1/p

Since 1− 1/p = 1/q, dividing by the factor ‖Λ‖ and raising the result by q yields∫En

|g|q dµ ≤ ‖Λ‖q.

By Lebesgue’s Dominated Convergence Theorem 4.8.6 we obtain that ‖g‖q ≤ ‖Λ‖ holds,hence g ∈ Lq(µ), and ‖g‖q = ‖Λ‖.

The proof is completed now by the observation that Λ(f) =∫X f · g dµ holds for all step

functions f . Since both sides of this equation represent continuous functions, and since thestep functions are dense in Lp(µ) by Corollary 4.11.19, the equality holds on all of Lp(µ).a

This representation says something only for positive linear functions; what about the rest?It turns out that we need to extend our notion of measures to signed measures, and that avery similar statement holds for signed measures (of course we would have to explain whatthe integral of a signed measure is, but this will work out very smoothly). So what we willdo next is to define signed measures, and to relate them to the measures with which we haveworked until now. We follow essentially Halmos’ exposition [Hal50, § 29].

Definition 4.11.31 A map µ : A → R is said to be a signed measure iff µ is σ-additive,i.e., iff µ(

⋃n∈NAn) =

∑n∈N µ(An), whenever (An)n∈N is a sequence of mutually disjoint sets

in A.

Page 497: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 485

Clearly, µ(∅) = 0, since a signed measure µ is finite, so the distinguishing feature is theabsence of monotonicity. It turns out, however, that we can partition the whole space X intoa positive and a negative part, that restricting µ to these parts will yield a measure each, andthat µ can be written in this way as the difference of two measures.

Fix a signed measure µ. Call N ∈ A a negative set iff µ(A ∩N) ≤ 0 for all A ∈ A; a positiveset is defined accordingly. It is immediate that the difference of two negative sets is a negativeset again, and that the union of a disjoint sequence of negative sets is a negative set as well.Thus the union of a sequence of negative sets is negative again.

Proposition 4.11.32 Let µ be a signed measure on A. Then there exists a pair X+ andX− of disjoint measurable sets such that X+ is a positive set, X− is a negative set. Thenµ+(B) := µ(B ∩ X+) and µ−(B) := −µ(B ∩ X−) are finite measures on A such that µ =µ+ − µ−. The pair µ+ and µ− is called the Jordan Decomposition of the signed measure µ.

JordanDecompo-

sitionµ+, µ−

Proof 1. Defineα := infµ(A) | A ∈ A is negative > −∞.

Assume that (An)n∈N is a sequence of measurable sets with µ(An) → α, then we know thatA :=

⋃n∈NAn is negative again with α = µ(A). In fact, put B1 := A1, Bn+1 := An+1 \ Bn,

then each Bn is negative, we have

µ(A) = µ(⋃n∈N

Bn) =∑n∈N

µ(Bn) = limn→∞

µ(An)

by telescoping.

2. We claim thatX+ := X \A

is a positive set. In fact, assume that this is not true — now this is truly the tricky part —then there exists E0 ⊆ X+ with µ(E0) < 0. E0 cannot be a negative set, because otherwiseA∪E0 would be a negative set with µ(A∪E0) = µ(A) +µ(E0) < α, which is contradicts theconstruction of A. Let k1 be the smallest positive integer such that E0 contains a measurableset E1 with µ(E1) ≥ 1/k1. Now look at E0 \ E1. We have

µ(E0 \ E1) = µ(E0)− µ(E1) ≤ µ(E0)− µ(E1) ≤ µ(E0)− 1/k1 < 0.

We may repeat the same consideration now for E0 \E1; let k2 be the smallest positive integersuch that E0 \E1 contains a measurable set E2 with µ(E2) ≥ 1/k2. This produces a sequenceof disjoint measurable sets (En)n∈N with

En+1 ⊆ E0 \ (E1 ∪ . . . ∪ En),

and since∑

n∈N µ(En) is finite (because⋃n∈NEn ∈ A, and µ takes only finite values), we

infer that limn→∞ 1/kn = 0.

3. Let F ⊆ F0 := E0 \⋃n∈NEn, and assume that µ(F ) ≥ 0. Let ` be the largest positive

integer with µ(F ) ≥ 1/`. Since kn → 0, as n → ∞, we find m ∈ N with 1/` ≥ 1/km. SinceF ⊆ E0\(E1∪. . .∪Em), this yields a contradiction. But F0 is disjoint from A, and since

µ(F0) = µ(E0)−∑n∈N

µ(En) ≤ µ(E0) < 0,

Page 498: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

486 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

we have arrived at a contradiction. Thus µ(E0) ≥ 0.

4. Now define µ+ and µ− as the traces of µ on X+ and X− := A, resp., then the assertionfollows. a

It should be noted that the decomposition of X into X+ and X− is not unique, but thedecomposition of µ into µ+ and µ− is. Assume that X+

1 with X−1 and X+2 with X−2 are

two such decompositions. Let A ∈ A, then we have A ∩ (X+1 \ X

+2 ) ⊆ A ∩ X+

1 , henceµ(A∩(X+

1 \X+2 ) ≥ 0; on the other hand, A∩(X+

1 \X+2 ) ⊆ A∩X−2 , thus µ(A∩(X+

1 \X+2 ) ≤ 0,

so that we have µ(A∩(X+1 \X

+2 ) = 0, which implies µ(A∩X+

1 ) = µ(A∩X+2 ). Thus uniqueness

of µ+ and µ− follows.

Given a signed measure µ with a Jordan decomposition µ+ and µ−, we define a (positive)measure |µ| := µ+ + µ−; |µ| is called the total variation of µ. It is clear that |µ| is a finite

Total vari-ation |µ|

measure on A. A set A ∈ A is called a µ-nullset iff µ(B) = 0 for every B ∈ A withB ⊆ A; hence A is a µ-nullset iff A is a |µ|-nullset iff |µ|(A) = 0. In this way, we can definethat a property holds µ-everywhere also for signed measures, viz., by saying that it holds|µ|-everywhere (in the traditional sense). Also the relation µ << ν of absolute continuitybetween the signed measure µ and the positive measure ν can be redefined as saying thateach ν-nullset is a µ-nullset. Thus µ << ν is equivalent to |µ| << ν and to both µ+ << ν andµ− << ν. For the derivatives, it is easy to see that

dν=dµ+

dν− dµ−

dνand

d|µ|dν

=dµ+

dν+dµ−

hold.

We define integrability of a measurable function through |µ| by putting

Lp(|µ|) := Lp(µ+) ∩ Lp(µ−),

and define Lp(µ) again as the set of equivalence classes.

These observations provide a convenient entry point into discussing complex measures. Callµ : A → C a (complex) measure iff µ is σ-additive, i.e., iff µ(

⋃n∈NAn) =

∑n∈N µ(An) for

Complexmeasure

each sequence (An)n∈N of mutually disjoint sets in A. Then it can be easily shown that µcan be written as µ = µr + i · µc with (real) signed measures µr and µc, which in turn havea Jordan decomposition and consequently a total variation each. In this way the Lp-spacescan be defined also for complex measures and complex measurable functions; the reader isreferred to [Rud74] or [HS65] for further information. —

Returning to the main current of the discussion, we are able to state the general representationof continuous linear functionals on an Lp(µ)-space. We need only to sketch the proof, mutatismutandis, since the main work has already been done in the proof of Theorem 4.11.30 for thereal valued and non negative case.

Theorem 4.11.33 Assume that µ is a finite measure on (X,A), 1 ≤ p < ∞, and that Λ isa bounded linear functional on Lp(µ). Then there exists a unique g ∈ Lq(µ) such that

Λ(f) =

∫Xf · g dµ

Page 499: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 487

holds for each f ∈ Lp(µ). In addition, ‖Λ‖ = ‖g‖q.

Proof ν(A) := Λ(χA) defines a signed measure on A with ν << µ. Let h be the Radon-Nikodym derivative of ν with respect to µ, then h ∈ Lq(µ) and

Λ(f) =

∫Xf · h dµ

are shown as above. a

It should be noted that Theorem 4.11.33 holds also for σ-finite measures, and that it is truefor 1 < p < ∞ in the case of general (positive) measures, see, e.g., [Els99, § VII.3] for adiscussion.

The case of continuous linear functionals for the space L∞(µ) is considerably more involved.Example 4.11.21 indicates already that these spaces play a special role. Looking back atthe discussion above, we found that for p < ∞ the map A 7→

∫A |f |

p dµ yields a measure,and this measure was instrumental through the Radon-Nikodym Theorem for providing thefactor which could be chosen to represent the linear functional. This argument, however, isnot available for the case p =∞, since we are not working there with a norm which is derivedfrom an integral. It can be shown, however, that continuous linear functional have an integralrepresentation with respect to finitely additive set functions; in fact, [HS65, Theorem 20.35]or [DS57, Theorem IV.8.16] show that the continuous linear functionals on L∞(µ) are in aone-to-one correspondence with all finitely additive set functions ξ such that ξ << µ. Notethat this requires an extension of integration to not necessarily σ-additive set functions.

4.11.5 Disintegration

We provide another application of the Radon-Nikodym Theorem.

One encounters occasionally the situation the need to decompose a measure on a product oftwo spaces. Consider this scenario. Given a measurable space (X,A) as an input, (Y,B) asan output space, let

(µ⊗K)(B) =

∫XK(x)(Dx) dµ(x)

be the probability for 〈x1, x2〉 ∈ B ∈ A⊗B with µ as the initial distribution and K : (X,A) (Y,B) as the transition law (see Example 4.9.4; think of an epidemic which is set off accordingto µ and propagates according to K). Assume that you want to reverse the process: GivenF ∈ B, you put

ν(F ) := S(πY (µ⊗K))(F ) = (µ⊗K)(X × F ),

so this is the probability that your process hits an element of F . Can you find a stochasticrelation L : (Y,B) (X,A) such that

(µ⊗K)(B) =

∫XL(x)(By) dν(y)

holds? The relation L is the converse of K given µ. It is probably not particularly importantthat the measure on the product has the shape µ ⊗ K, so we state the problem in such away that we are given a measure on a product of two measurable spaces, and the question is

Page 500: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

488 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

whether we can decompose it into the product of a projection onto one space, and a stochasticrelation between the spaces.

This problem is of course easiest dealt with when one can deduce that the measure is theproduct of measures on the coordinate spaces; probabilistically, this would correspond to thedistribution of two independent random variables. But sometimes one is not so lucky, andthere is some hidden dependence, or one simply cannot assess the degree of independence.Then one has to live with a somewhat weaker result: in this case one can decompose themeasure into a measure on one component and a transition probability. This will be madespecific in the discussion to follow.

Because it will not cost substantially more attention, we will treat the question a bit moregenerally. Let (X,A), (Y,B), and (Z, C) be measurable spaces, assume that µ ∈ S(X,A), andlet f : X → Y and g : X → Z be measurable maps. Then µf := S(f)(µ) and µg := S(g)(µ)define subprobabilities on (Y,B) resp. (Z, C). µf and µg can be interpreted as the probabilitydistribution of f resp. g under µ.

We will show that we can represent the joint distribution as

µ(x ∈ X | f(x) ∈ B, g(x) ∈ C) =

∫BK(y)(C) dµf (y),

where K : (Y,B) (Z, C) is a suitable stochastic relation. This will require Z to be a Polishspace with C = B(Z).

Let us see how this corresponds to the initially stated problem. Suppose X := Y × Z withA = B ⊗ C, and let f := πY , g := πZ , then

µf (B) = µ(B × Z),

µg(C) = µ(Y × Z),

µ(B × C) = µ(x ∈ X | f(x) ∈ B, g(x) ∈ C).

Granted that we have established the decomposition, we can then write

µ(B × C) =

∫BK(y)(C) dµf (y);

thus we have decomposed the probability on the product into a probability on the first com-ponent, and, conditioned on the value the first component may take, a probability on thesecond factor.

Definition 4.11.34 Using the notation from above, K is called a regular conditional distri-bution of g given f iff

µ(x ∈ X | f(x) ∈ B, g(x) ∈ C) =

∫BK(y)(C) µf (dy)

holds for each B ∈ B, C ∈ C, where K : (Y,B) (C, C) is a stochastic relation on (X,A) and(Z, C). If only y 7→ K(y)(C) is B-measurable for all C ∈ C, then it will be called a conditionaldistribution of g given f .

Page 501: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 489

The existence of a regular conditional distribution will be established, provided Z is Polishwith C = B(Z). This will be accomplished in several steps: first the existence of a conditionaldistribution will be shown using the Radon-Nikodym Theorem. The latter construction willthen be examined further. It will turn out that there exists a set of measure zero outsideof which the conditional distribution behaves like a regular one, but at first sight only onan algebra of sets, not on the entire σ-algebra. But don’t worry, the second step will ap-ply a classical extension argument and yield a regular conditional distribution on the Borelsets, just as we want it. The proofs are actually a kind of a round trip through the firstprinciples of measure theory, where the Radon-Nikodym Theorem together with the classicalHahn Extension Theorem are the main vehicles. It displays also some nice and helpful prooftechniques.

We fix (X,A), (Y,B), and (Z, C) as measurable spaces, assume that µ ∈ S(X,A), and takef : X → Y and g : X → Z to be measurable maps. The measures µf := S(f)(µ) andµg := S(g)(µ) are defined as above as the distribution of f resp. g under µ.

The existence of a conditional distribution of g given f is established first, and it is shownthat it is essentially unique.

Lemma 4.11.35 Using the notation from above, then

1. there exists a conditional distribution K0 of g given f ,

2. if there is another conditional distribution K ′0 of g given f , then there exists for anyC ∈ C a set NC ∈ B with µf (NC) = 0 such that K0(y)(C) = K ′0(C) for all y /∈ C.

Proof 1. Fix C ∈ C, then

$C(B) := µ(f−1[B]∩ g−1

[C])

defines a subprobability measure $C on B which is absolutely continuous with respect to µg,because µg(B) = 0 implies $C(B) = 0. The Radon-Nikodym Theorem 4.11.26 now gives adensity hC ∈ F(Y,B) with

$C(B) =

∫BhC dµf

for all B ∈ B. Setting K0(y)(C) := hC(y) yields the desired conditional distribution.

2. Suppose K ′0 is another conditional distribution of g given f , then we have

∀B ∈ B :

∫BK0(y)(C) dµf (y) =

∫BK0(y)(C) dµf (y),

for all C ∈ C, which implies that the set on which K0(·)(C) disagrees with K ′0(·)(C) is µf -null.a

Essential uniqueness may strengthened if the σ-algebra C is countably generated, and if theconditional distribution is regular.

Lemma 4.11.36 Assume that K and K ′ are regular conditional distributions of g given f ,and that C has a countable generator. Then there exists a set N ∈ B with µf (N) = 0 suchthat K(y)(C) = K ′(y)(C) for all C ∈ C and all y /∈ N .

Page 502: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

490 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Proof If C0 is a countable generator of C, then

Cf := ⋂E | E ⊆ C0 is finite

is a countable generator of C well, and Cf is closed under finite intersections; note that Z ∈ Cf .Construct for D ∈ Cf the set ND ∈ B outside of which K(·)(D) and K ′(·)(D) coincide, anddefine

N :=⋃D∈Cf

ND ∈ B.

Evidently, µf (N) = 0. We claim that K(y)(C) = K ′(y)(C) holds for all C ∈ C, whenevery /∈ N . In fact, fix y /∈ N , and let

C1 := C ∈ C | K(y)(C) = K ′(y)(C),

then C1 contains Cf by construction, and is closed under complements and countable disjointunions. Thus C = σ(Cf ) ⊆ C1, by the π-λ-Theorem 1.6.30, and we are done. a

We will show now that a regular conditional distribution of g given f exists. This will beSteps forthe proof

done through several steps, given the construction of a conditional distribution K0:

À A set Na ∈ B is constructed with µf (Na) = 0 such that K0(y) is additive on a countablegenerator Cz for C.

Á We construct a set Nz ∈ B with µf (Nz) = 0 such that K0(y)(Z) ≤ 1 for y /∈ Nz.

 For each element G of Cz we will find a set NG ∈ B with µf (NG) = 0 such that K0(y)(G)can be approximated from inside through compact sets, whenever y /∈ NG.

à Then we will combine all these sets of µf -measure zero to produce a set N ∈ B withµf (N) = 0 outside of which K0(y) is σ-additive on the generator Cz, hence can beextended to a measure on all of C.

Well, this looks like a full program, so let us get on with it.

Theorem 4.11.37 Given measurable spaces (X,A) and (Y,B), a Polish space Z, a subprob-ability µ ∈ S(X,A), and measurable maps f : X → Y , g : X → Z, there exists a regularconditional distribution K of g given f . K is uniquely determined up to a set of µf -measurezero.

Proof 0. Since Z is a Polish space, its topology has a countable base. We infer fromLemma 4.3.2 that B(Z) has a countable generator C. Then the Boolean algebra C1 generatedby C is also a countable generator of B(Z).

1. Given Cn ∈ C1, we find by Proposition 4.10.12 a sequence (En,k)k∈N of compact sets in Zwith

En,1 ⊆ En,2 ⊆ En,3 . . . ⊆ Cnsuch that

µg(Cn) = supk∈N

µg(En,k).

Then the Boolean algebra Cz generated by C ∪ En,k | n, k ∈ N is also a countable generatorof B(Z).

Page 503: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.11. Lp-SPACES 491

2. From the construction of the conditional distribution of g given f we infer that for disjointC1, C2 ∈ Cz∫

YK0(y)(C1 ∪ C2) dµf (y) = µ(x ∈ X | f(x) ∈ B, g(x) ∈ C1 ∪ C2)

= µ(x ∈ X | f(x) ∈ B, g(x) ∈ C1) +

µ(x ∈ X | f(x) ∈ B, g(x) ∈ C2)

=

∫YK0(y)(C1) dµf (y) +

∫YK0(y)(C2) dµf (y).

Thus there exists NC1,C2 ∈ B with µf (NC1,C2) = 0 such that

K0(y)(C1 ∪ C2) = K0(y)(C1) +K0(y)(C2)

for y /∈ NC1,C2 . Because Cz is countable, we may deduce (by taking the union of NC1,C2 overall pairs C1, C2) that there exists a set Na ∈ B such that K0 is additive outside Na, andµf (Na) = 0. This accounts for part À in the plan above. À 3

3. By the previous arguments it is easy to construct a set Nz ∈ B with µf (Nz) = 0 such thatK0(y)(Z) ≤ 1 for y /∈ Nz (part Á). Á 3

4. Because∫YK0(y)(Cn) dµf (y) = µ(f−1

[Y]∩ g−1

[Cn])

= µg(Cn)

= supk∈N

µg(En,k)

= supk∈N

∫YK0(y)(En,k) µf (dy) Levi’s Theorem 4.8.2

=

∫Y

supk∈N

K0(y)(En,k) dµf (y)

we find for each n ∈ N a set Nn ∈ B with

∀y /∈ Nn : K0(y)(Cn) = supk∈N

K0(y)(En,k)

and µf (Nn) = 0. This accounts for part Â. Â 3

5. Now we may begin to work on part Ã. Put

N := Na ∪Nz ∪⋃n∈N

Nn,

then N ∈ B with µf (N) = 0. We claim that K0(y) is a premeasure on Cz for each y /∈ N .It is clear that K0(y) is additive on Cz, hence monotone, so only σ-additivity has to bedemonstrated: let (D`)`∈N be a sequence in Cz that is monotonically decreasing with

η := inf`∈N

K0(y)(D`) > 0,

Page 504: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

492 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

then we have to show that ⋂`∈N

D` 6= ∅.

We approximate the sets D` now by compact sets, so we assume that D` = Cn` for somen` (otherwise the sets are compact themselves). By construction we find for each ` ∈ N acompact set En`,k` ⊆ C` with

K0(y)(Cn` \ En`,k`) < η · 2`+1,

then

Er :=r⋂i=`

En`,k` ⊆ Cnr = Dr

defines a decreasing sequence of compact sets with

K0(y)(Er) ≥ K0(y)(Cnr)−r∑i=`

K0(y)(En`,k`) > η/2,

thus Er 6= ∅. Since Er is compact and decreasing, we know that the sequence has a nonemptyintersection (otherwise one of the Er would already be empty). We may infer⋂

`∈ND` ⊇

⋂r∈N

Er 6= ∅.

6. The Extension Theorem 1.6.29 now tells us that there exists a unique extension of K0(y)from Cz to a measure K(y) on σ(Cz) = B(Z), whenever y /∈ N . If, however, y ∈ N , then wedefine K(y) := ν, where ν ∈ S(Z) is arbitrary. Because∫

BK(y)(C) dµf (y) =

∫BK0(y)(C) dµf (y) = µ(x ∈ X | f(x) ∈ B, g(x) ∈ C)

holds for C ∈ Cz, the π-λ-Theorem 1.6.30 asserts that this equality is valid for all C ∈ B(Z)as well.

Measurability of y 7→ K(y)(C) needs to be shown, and then we are done. We do this throughthe principle of good sets: put

E := C ∈ B(Z) | y 7→ K(y)(C) is B −measurable.

Then E is a σ-algebra, and E contains the generator Cz by construction, thus E = B(Z).Ã 3

a

The scenario in which the space X = Y ×Z with a measurable space (Y,B) and a Polish spaceZ with A = B ⊗ B(Z) with f and g as projections deserves particular attention. In this casewe decompose a measure on A into its projection onto Z and a conditional distribution forthe projection onto Z given the projection onto Y . This is sometimes called the disintegrationof a measure µ ∈ S(Y × Z).

We state the corresponding proposition explicitly, since one needs it usually in this specializedform.

Page 505: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.12. BIBLIOGRAPHIC NOTES 493

Proposition 4.11.38 Given a measurable space (Y,B) and a Polish space Z, there exists forevery subprobability µ ∈ S(Y ×Z,B⊗B(Z)) a regular conditional distribution of πZ given πY ,that is, a stochastic relation K : (Y,B) (Z,B(Z)) such that

µ(E) =

∫YK(y)(Ey) dS(πY )(µ)(y)

for all E ∈ B ⊗ B(Z). a

The construction requires a Polish as one of the factors. The proof shows that it is indeedtightness which saves the day. Otherwise it would be difficult to make sure that the conditiondistribution constructed above is σ-additive. We know from Proposition 4.10.12 that finitemeasures on a Polish space are tight. In fact, examples show that this assumption is in factnecessary: [Kel72] constructs a product measure on spaces which fail to be Polish, for whichno disintegration exists.

4.12 Bibliographic Notes

Most topics of this chapter are fairly standard, hence there are plenty of sources to mention.One of my favourite texts is the rich compendium written by [Bog07], not to forget Fremlin’smassive [Fre08] or Halmos’ classic [Hal50]. The discussion on Souslin’s operation A (A) ona σ-algebra A is heavily influenced by Srivastava’s representation [Sri98] of this topic, butsee also [Par67, Arv76, Kel72]. The measure extension is taken from [Lub74], following asuggestion by S. M. Srivastava; the extension of a stochastic relation is from [Dob12b]. Theapproach to integration centering around B. Levi’s Theorem is taken mostly from the elegantrepresentation by Doob [Doo94, Chapter VI], see also [Els99, Kapitel IV]. The introductionof the Daniell integral follows essentially [Bog07, Sec. 7.8], see also [Kel72]. The logic CSL isdefined and investigated in terms of model checking in [BHHK03], the stochastic interpretationis taken from [Dob07], see also [DP03]. The Hutchinson metric is discussed in detail in Edgar’smonograph [Edg98], from which the present proof of Proposition 4.10.13 is taken. Thereare many fine books on Banach spaces, Hilbert spaces and the application to Lp spaces;my sources are [Doo94, Hal50, Rud74, DS57, Loo53, Sch70]. The exposition of projectivelimits, and of disintegration follows basically [Par67, Chapter V] with an occasional glimpseat [Bog07].

4.13 Exercises

Exercise 4.1 Assume that A = σ(A0). Show that the weak-*-σ-algebra ℘℘℘(A) on M(X,A)is the initial σ-algebra with respect to evA | A ∈ A0.

Show also that both S(X,A) and P (X,A) are measurable subsets of M(X,A).

Exercise 4.2 Let (X, τ) be a topological, and (Y, d) a metric space. Each continuous functionX → Y is also Baire measurable.

Exercise 4.3 Let (X, d) be a separable metric space, µ ∈ M(X,B(X)). Show that x ∈supp(µ) iff µ(U) > 0 for each open neighborhood U of x.

Page 506: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

494 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Exercise 4.4 Let (X,A, µ) be a finite measure space. Show that norm convergence inL∞(X,A, µ) implies convergence almost everywhere (fn

a.e.−→ f , provided ||fn − f ||µ∞ → 0).Give an example showing that the converse is false.

Exercise 4.5 If A is a σ-algebra on X and B ⊆ X with A 6∈ A, then

(A1 ∩B) ∪ (A2 ∩ (X \B)) | A1, A2 ∈ A

is the smallest σ-algebra σ(A∪B) on X containing A and B. If τ is a topology on X withH 6∈ τ , then

G1 ∪ (G2 ∩H) | G1, G2 ∈ τ

is the smallest topology τH on X containing τ and H. Show that B(τH) = σ(A ∪ H)

Exercise 4.6 Let (X,A, µ) be a finite measure space, B 6∈ A, and β := α · µ∗(B) + (1− α) ·µ∗(B) with 0 ≤ α ≤ 1. Then there exists a measure ν on σ(A ∪ B) which extends µ suchthat ν(B) = β. (Hint: Exercise 4.5).

Exercise 4.7 Given the measurable space (X,A) and f ∈ F(X,A) with f ≥ 0. Show thatthere exists a decreasing sequence (fn)n∈N of step functions fn ∈ F(X,A) with

f(x) = infn∈N

fn(x)

for all x ∈ X.

Exercise 4.8 Let fi : Xi → Yi be Ai-Bi-measurable maps for i ∈ I. Show that

f :

∏i∈I Xi →

∏i∈I Yi

(xi)i∈I 7→ (fi(xi))i∈I

is⊗

i∈I Ai-⊗

i∈I Bi-measurable. Conclude that the kernel of f

ker (f) := 〈x, x′〉 | f(x) = f(x′)

is a measurable subset of Y × Y , whenever f : (X,A) → (Y,B) is measurable, and B isseparable.

Exercise 4.9 Let f : X → Y be A-B measurable, and assume that B is separable. Showthat the graph graph(f) := 〈x, f(x)〉 | x ∈ X of f is a measurable subset of A⊗ B.

Exercise 4.10 Let χA be the indicator function of set A. Show that

1. A ⊆ B iff χA ≤ χB,

2. χ⋃n∈N An

= supn∈N χAn and χ⋂n∈N An

= infn∈N χAn

3. χA∆B = |χA − χB| = χA + χB (mod 2). Conclude that the power set (P (X) , ∆) is acommutative group with A∆A = ∅.

4.(⋃

n∈NAn)∆(⋃

n∈NBn)⊆⋃n∈N(An∆Bn)

Exercise 4.11 Let (X,A, µ) be a finite measure space, put d(A,B) := µ(A∆B) for A,B ∈ A.Show that (A, d) is a complete pseudo metric space.

Page 507: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.13. EXERCISES 495

Exercise 4.12 (This Exercise draws heavily on Exercises 4.5 and 4.6). Let X := [0, 1] withλ as the Lebesgue measure on the Borel set of X. There exists a set B ⊆ X with λ∗(B) = 0and λ∗(B) = 1 by Lemma 1.7.7, so that B 6∈ B(X).

1. Show that (X, τB) is a Hausdorff space with a countable base, where τB is the smallesttopology containing the interval topology on [0, 1] and B (see Exercise 4.5).

2. Extend λ to a measure µ with α = 1/2 in Exercise 4.6.

3. Show that infµ(G) | G ⊇ X \B and G is τB-open = 1, but µ(X \B) = 1/2. Thus µis not regular (since (X, τB) is not a metric space).

Exercise 4.13 Prove Proposition 4.3.18.

Exercise 4.14 Let K : (X,A) (Y,B) be a transition kernel.

1. Assume that f ∈ F+(Y,B) is integrable with respect to K(x) for all x ∈ X. Show that

K(f)(x) :=

∫Xf dK(x)

defines a measurable function K(f) : X → R+.

2. Assume that x 7→ K(x)(Y ) is bounded. Define for B ∈ B

K(µ)(B) :=

∫XK(x)(B) dµ(x).

Show that K : S(X,A)→ S(Y,B) is ℘℘℘(X,A)-℘℘℘(Y,B)-measurable (see Example 2.4.8).

Exercise 4.15 Let µ ∈ S(X,A) be s subprobability measure on (X,A), and let K : (X,A) (Y,B) be a stochastic relation. Assume that f : X × Y → R is bounded and measurable.Show that ∫

X×Yf dµ⊗K =

∫X

(∫Yfx dK(x)

)dµ(x)

(µ⊗K is defined in Example 4.9.4 on page 407).

Exercise 4.16 Let (X,A) and (Y,B) be measurable spaces and D ∈ A ⊗ B. Show that themap

M(Y,B)×X → R〈ν, x〉 7→ ν(Dx)

is℘℘℘(Y,B)⊗A-B(R)-measurable (the weak-*-σ-algebra℘℘℘(Y,B) has been defined in Section 4.1.2).

Exercise 4.17 Let (S,A) be a measurable space, and assume that A is countably generated.Show that a stochastic effectivity function P : S → E(S) is A-B(τ)-measurable, where τ isthe Priestley topology on ℘℘℘(S,A). This topology is defined in Example 1.5.58 on page 45.

Exercise 4.18 Show that the category of analytic spaces with measurable maps is not closedunder taking pushouts. Hint: Show that the pushout of X/α1 and X/α2 is X/(α1 ∪ α2) forequivalence relations α1 and α2 on a Polish space X. Then use Proposition 4.4.22 andExample 4.4.29.

Page 508: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

496 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Exercise 4.19 LetX and Y be Polish spaces with a transition kernelK : X Y . The equiv-alence relations α on X and β on Y are assumed to be smooth with determining sequences(An)n∈N resp. (Bn)n∈N of Borel sets. Put Iα := σ(An | n ∈ N and Jβ := σ(Bn | n ∈ N.Show that the following statements are equivalent

1. K : (X, Iα) (Y,Jβ) is a transition kernel.

2. (α, β) is a congruence for K.

3. α ⊆ ker (S(ηβ K)).

4. There exists a transition kernel K ′ : (X, Iα) (Y,Jβ) such that (iα, jβ) : K → K ′ is amorphism, where the measurable maps iα : (X,B(X))→ (X, Iα) and jβ : (Y,B(Y ))→(Y, Iβ) are given by the respective identities.

Exercise 4.20 Let SX be the set of all smooth equivalence relations on the Polish space X,which is ordered by inclusion. Then SX is closed under countable infima, and ∆X ⊆ ρ ⊆ ∇X ,where ∇X := X ×X is the universal relation.

1. ρ 7→ A ∈ B(X) | A is ρ−invariant is an order reversing bijection between SX and thecountably generated sub-σ-algebras of B(X) such that ∆X 7→ B(X) and ∇X 7→ ∅, X.

2. Define for x, x′ ∈ X with x 6= x′ the equivalence relation ϑx,x′ := ∆X ∪ 〈x, x′〉, 〈x′, x〉.Then ϑx,x′ is an atom of SX . Describe the σ-algebra of ϑx,x′-invariant Borel sets.

3. Define for the Borel set B with ∅ 6= B 6= X the equivalence relation τB through x τB x′

iff x, x′ ⊆ B or x, x′ ∩B = ∅ for all x, x′ ∈ X. Then τB is an anti-atom in SX (i.e.,an atom in the reverse order). Describe the σ-algebra of τB-invariant Borel sets.

4. Show that for each ρ ∈ SX there exists a countable family (βn)n∈N of anti-atoms withρ =

∧n∈N βn.

5. Show that τB ∧ ϑx,x′ = ∆X and τB ∨ ϑx,x′ = ∇X , whenever B is a Borel set with∅ 6= B 6= X and x ∈ B, x′ 6∈ B.

Exercise 4.21 Let Y be a Polish space, F : X → F(Y ) and L and algebra of sets on X. Weassume that Fw(G) ∈ Lσ for each open G ⊆ Y (Fw is defined on page 391). Show that thereexists a map s : X → Y such that s(x) ∈ F (x) for all x ∈ X such that s−1

[B]∈ Lσ for each

B ∈ B(Y ). Hint: Modify the proof for Theorem 4.7.2 suitably.

Exercise 4.22 Given a finite measure space (X,A, µ), let f =∑n

i=1 αi·χAi be a step functionwith A1, . . . , An ∈ A and coefficients α1, . . . , αn. Show that

n∑i=1

αi · µ(Ai) =∑γ>0

γ · µ(x ∈ X | f(x) = γ).

Exercise 4.23 Let (Y,B) be a measurable space and assume that X is compact metric; C(X)is the set of all non-empty compact subsets of X endowed with the Hausdorff metric δH , seeExample 3.5.10. Show that F : Y → C(X) is B-B(C(X)) measurable iff F is measurable as arelation (in the sense of Definition 4.7.1 on page 391).

Exercise 4.24 Show that (C(X), δH) is second countable iff (X, d) is.

Page 509: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.13. EXERCISES 497

Exercise 4.25 Given the plane E := 〈x1, x2, x3〉 ∈ R3 | 2·x1 +4·x2−7·x3 = 12, determinethe point in E which is closest to 〈4, 2, 0〉 in the Euclidean distance.

Exercise 4.26 Let (V, ‖ · ‖) be a real normed space, L : V → R be linear. Show that L iscontinuous iff L is bounded, i.e., iff sup‖v‖≤1 |L(v)| <∞.

Exercise 4.27 Let (V, ‖ · ‖) be a real normed space, and define

V ∗ := L : V → R | L is linear and continuous,

the dual space of V . Then V ∗ is a vector space. Show that

‖L‖ := sup‖v‖≤1

|L(v)|

defines a norm on V ∗ with which (V ∗, ‖ · ‖) is a Banach space.

Exercise 4.28 Let H be a Hilbert space, then H∗ is isometrically isomorphic to H.

Exercise 4.29 Let (V, ‖ · ‖) be a real normed space, and define

π(x)(L) := L(x)

for x ∈ V and L ∈ V ∗.

1. Show that π(x) ∈ V ∗∗, and that x 7→ π(x) defines a continuous map V → V ∗∗.

2. Given x ∈ V with x 6= 0, there exists L ∈ V ∗ with ‖L‖ = 1 and L(x) = ‖x‖ (use theHahn-Banach Theorem 1.5.14).

3. Show that π is an isometry (thus a normed space can be embedded isometrically intoits bidual).

Exercise 4.30 Given a real vector space V .

1. Let (·, ·) be an inner product on V . Show that the parallelogram law

‖x+ y‖2 + ‖x− y‖2 = 2 · ‖x‖2 + 2 · ‖y‖2

always holds (see page 467).

2. Assume, conversely, that ‖ · ‖ is a norm for which the parallelogram law holds. Showthat

(x, y) :=‖x+ y‖2 − ‖x− y‖2

4

defines an inner product on V .

Exercise 4.31 Let H be a Hilbert space, L : H → R be a continuous linear map with L 6= 0.Relating Kern(L) and ker (L), show that H/Kern(L) and R are isomorphic as vector spaces.

Page 510: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

498 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Page 511: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

List of Examples

This is a list of most examples together with a short description, ordered by chapters. Itshould help to find an example quickly.

The Axiom of Choice and Some Of Its Equivalents

Example 1.2.2 There is no injection P (N)→ N (p. 8).

Example 1.3.1 Minimum vs. minimal element (p. 11).

Example 1.3.2 The greatest common divisor as sup, the least common multipla as inf(p. 11).

Example 1.3.3 Maximal element need not be comparable (p. 11).

Example 1.3.7 Reduction systems use well orders for termination (p. 12).

Example 1.3.10 Well-ordering a union of well-ordered sets with a well-ordered index set(p. 13).

Example 1.4.2 Construct the somewhat natural numbers through a well order (p. 15).

Example 1.4.3 α ∪ α is an ordinal for ordinal α (p. 15).

Example 1.5.10√

2 and√

3 are linearly independent over Q (p. 27).

Example 1.5.16 Filter based on a point, cofinite filter (p. 30).

Example 1.5.18 Filter base of neighborhood filters in R, infinite tails of a sequence (p. 30).

Example 1.5.22 The cofinite filter is not an ultrafilter, if the carrier set is infinite (p. 31).

Example 1.5.25 The power set P (S) is a lattice under inclusion (p. 32).

Example 1.5.27 Infima and suprema in the lattice of open intervals (p. 33).

Example 1.5.28 Compute the supremum in the lattice of rectangles (p. 33).

Example 1.5.29 Distributivity is not inherited to sublattices (p. 33).

Example 1.5.30 Embed a partial order into a distributive lattice through down sets (p. 33).

Example 1.5.34 Identify some prime ideals in (P (S) ,⊆) (p. 35).

Example 1.5.48 Discrete and indiscrete topologies (p. 41).

Example 1.5.49 Open subsets of R (p. 42).

Example 1.5.56 Stone duality — the prime ideal topology as a compact topology on aBoolean algebra (p. 43).

499

Page 512: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

500 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 1.5.58 The Priestley topology is a compact topology on P (S): An application ofAlexander’s Subbase Theorem (p. 45).

Example 1.6.4 The Borel sets of [0, 1] are generated by the open intervals; Cantor’s ternaryset is a Borel set (p. 48).

Example 1.6.5 The set of infinite binary sequences with fixed sums is measurable in theinfinite product (p. 48).

Example 1.6.6 The set of infinite binary sequences with converging averages is productmeasurable (p. 49).

Example 1.6.13 Dirac measure (p. 53).

Example 1.6.14 A binary valued measure generates an ultrafilter; the converse is difficult(p. 53).

Example 1.6.15 The length of an interval has interesting properties as a measure: it isσ-additive and monotone (p. 53).

Example 1.6.33 A non σ-finite measure may have infinitely many extensions (p. 63).

Example 1.6.37 The famous Vitali equivalence relation yields a non-measurable set (p. 65).

Categories

Example 2.1.2 Introduces the category Set of sets with maps as morphisms (p. 81).

Example 2.1.3 Introduces the category of sets with binary relations as morphisms (p. 82).

Example 2.1.4 Makes a partially ordered sets into a category (p. 82).

Example 2.1.5 Defines the free category generated by a graph (p. 82).

Example 2.1.6 The discrete category (p. 82).

Example 2.1.9 The category of transition systems with simple morphisms (p. 83).

Example 2.1.10 The category of transition systems with an added backward property yieldsbounded morphisms (p. 83).

Example 2.1.11 Top, all topological spaces with continuous maps (p. 84).

Example 2.1.12 Meas, all measurable spaces with measurable maps (p. 84).

Example 2.1.13 The category of probability spaces with measure preserving measurablemaps (p. 85).

Example 2.1.14 The probabilities P (S,A) define a measurable space through the weak-*-σ-algebra (p. 85).

Example 2.1.15 State automata with their morphisms (p. 86).

Example 2.1.16 Slice category K/x (p. 87).

Example 2.2.3 The product in the category of sets with maps as morphisms is what youthink it is (p. 94).

Example 2.2.4 Constructs the product σ-algebra A ⊗ B for the product X × Y of twomeasurable spaces (X,A) and (Y,B) (p. 94).

Example 2.2.5 The category of probability spaces does not have products (p. 95).

Page 513: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.13. EXERCISES 501

Example 2.2.6 Construct the product topology τ × ϑ on the product X × Y of the topo-logical spaces (X, τ) and (Y, ϑ) (p. 96).

Example 2.2.7 The greatest lower bound inf as a product (p. 96).

Example 2.2.11 The smallest upper bound sup as a coproduct (p. 97).

Example 2.2.17 The coproduct in the category of relations (p. 99).

Example 2.2.19 Construct the pullback in the category of sets (p. 101).

Example 2.2.21 The equivalence relation as a pullback of its factor map (p. 101).

Example 2.2.25 Construct the pushout in the category of sets (p. 104).

Example 2.2.26 The pushout of the factors of equivalence relations is the factor of theirsupremum (p. 105).

Example 2.3.2 Assigning each set its powerset yields a functor (p. 106).

Example 2.3.3 The hom sets assign to each object two functors (p. 106).

Example 2.3.4 Forgetting part of the structure is leads to a forgetful functor (p. 106).

Example 2.3.5 Reversing arrows is not a reason to go into panic mode; this is why we havecontravariant functors (p. 107).

Example 2.3.10 Model a labeled transition system with labels from A as the functorP (A×−) (p. 109).

Example 2.3.11 Discrete probabilities define a functor on the category of sets (p. 109).

Example 2.3.12 Introduces the weak-*-σ-algebra on a measurable space and defines thesubprobability functor S as a functor on the category of measurable spaces(p. 110).

Example 2.3.13 The upper closed sets define a functor on the category of sets (p. 111).

Example 2.3.14 The ultrafilters on the power set of a set define a functor on the categoryof sets (p. 111).

Example 2.3.18 Composition as a natural transformation for the hom functor (p. 113).

Example 2.3.21 Natural transformations for a contravariant functor (p. 114).

Example 2.3.26 The weak-*-σ-algebra is generated by natural transformations (p. 118).

Example 2.3.29 The binary product as a cone (p. 119).

Example 2.3.30 Pullback as a cone (p. 119).

Example 2.3.33 The binary coproduct as a cocone (p. 120).

Example 2.4.6 The powerset functor is the functorial part of a monad (p. 125).

Example 2.4.7 Discrete probabilities generate a monad (p. 125).

Example 2.4.8 The Giry monad is based on the subprobability functor (p. 126).

Example 2.4.9 The ultrafilter monad (p. 128).

Example 2.4.10 Upper closed sets form a monad (p. 128).

Example 2.5.2 −× E is left adjoint to −E through currying (p. 131).

Example 2.5.3 The diagonal functor has the product functor as an adjoint (p. 132).

Example 2.5.4 Direct and inverse images yield an adjunction (p. 133).

Page 514: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

502 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 2.5.10 Unit and counit for currying (p. 136).

Example 2.5.14 The Eilenberg-Moore algebras for the powerset monad are the completesup semi-lattices (p. 139).

Example 2.6.2 Coalgebras for the powerset functor are identified with binary relations(p. 145).

Example 2.6.3 Transforming a labeled transition system into a coalgebra (p. 145).

Example 2.6.4 Automata as coalgebras (through currying) (p. 145).

Example 2.6.6 Binary trees as coalgebras (p. 146).

Example 2.6.7 Subprobabilistic transition kernels (a.k.a stochastic relations) as coalgebrasfor the subprobability functor (p. 146).

Example 2.6.8 Upper closed subsets as coalgebras (p. 146).

Example 2.6.10 Coalgebra morphism preserves the tree structure (p. 147).

Example 2.6.12 Coalgebra morphisms for transition structures are just the bounded mor-phisms (p. 147).

Example 2.7.3 A modal logic for the Roman god Janus (p. 166).

Example 2.7.4 Modalities for propositional dynamic logic (PDL) (p. 166).

Example 2.7.5 Modalities for game logic — Angel and Demon play (p. 166).

Example 2.7.6 Arrow logic (p. 167).

Example 2.7.12 Topological spaces define neighborhood frames (p. 170).

Example 2.7.13 The frame defined by ultrafilters (p. 170).

Example 2.7.14 Translate a Kripke frame into a neighborhood frame (p. 170).

Example 2.7.19 Interpret Janus’ logic (p. 172).

Example 2.7.20 Prepare an interpretation of PDL through a Kripke model (p. 172).

Example 2.7.21 Interpret PDL in a neighborhood model (p. 173).

Example 2.7.22 Interpret game logic in a neighborhood model (p. 174).

Example 2.7.54 A functor for interpreting neighborhood models through coalgebraic logic(p. 188).

Example 2.7.56 Predicate liftings for Kripke models (p. 189).

Example 2.7.57 Predicate liftings for neighborhood models (p. 189).

Example 2.7.64 Interpretation of a basic modal language through a transition system, con-structing a factor systems and relating logically equivalent and bisimilarsystems factors (p. 193).

Example 2.7.65 A coalgebraic interpretation of the computational tree logic CTL* (p. 195).

Topological Spaces

Example 3.1.2 The topology on R given by open intervals (p. 209).

Page 515: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.13. EXERCISES 503

Example 3.1.3 The Euclidean topology on R3 (p. 209).

Example 3.1.4 A base for the weak topology which is induced by a family of functions(p. 210).

Example 3.1.5 A topology induced on the set A B of all partial maps between sets Aand B (p. 210).

Example 3.1.6 Scott open sets on an inductive ordered set (p. 211).

Example 3.1.8 The definition of continuity is equivalent ε-δ to the definition with opensets in R (p. 212).

Example 3.1.10 Continuity of maps (A B)→ (C D) (p. 212).

Example 3.1.11 Scott continuous maps between inductive ordered sets (p. 212).

Example 3.1.12 Interpreting a modal logic in a topological space may violate the law of theexcluded middle (p. 213).

Example 3.1.19 The quotient topology on [0, 2 ·π] identifying the endpoints yields the unitcircle (p. 216).

Example 3.1.20 A closure operator on a finite partially ordered set (p. 217).

Example 3.1.24 Computing the open sets for the topology from Example 3.1.20 for the setof all divisors of 6 (p. 219).

Example 3.1.25 The topology on a topological group is determined by the neighborhood ofthe neutral element (p. 220).

Example 3.3.6 T1 but not T0 (p. 227).

Example 3.3.8 T0 but not T1 (p. 227).

Example 3.3.10 The topology of cofinite sets is T1 but not T2 (p. 228).

Example 3.3.11 T3 but not T2 or T1 (p. 228).

Example 3.3.12 T2 but not T3 (p. 228).

Example 3.3.13 T3 12

(p. 229).

Example 3.4.7 An embedding [0, 1]N → [0, 1]M for sets M and N (p. 236).

Example 3.4.15 The prime ideal topology on a Boolean algebra has a prime ideal whichdoes not preserve a countable set of given suprema and contains a givenelement (p. 239).

Example 3.5.2 Pseudometric spaces: R resp. R3 with the Euclidean topology, discretespaces, the uniform distance of sets of bounded resp. bounded continuousmaps, point evaluations for functions, the measure of the symmetric differ-ence for Borel sets, a metric induced by a ranking resp. by a sequence ofequivalence relations (p. 241).

Example 3.5.10 The Hausdorff metric on the space of compact and non-empty sets of ametric space (p. 246).

Example 3.5.17 The rationals are not complete: another example due to Bourbaki (p. 249).

Example 3.5.18 Completeness is not a topological property of a metric space; equivalentmetrics do not need both to be complete (p. 249).

Example 3.5.19 The continuous functions on [0, 1] are complete with the sup-norm (p. 250).

Page 516: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

504 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 3.5.20 A ranking function induces a complete metric on the power set (p. 250).

Example 3.5.28 The Banach Fixed Point Theorem explains how micro-Google works(p. 254).

Example 3.5.35 x 7→ x2 is not uniformly continuous (p. 259).

Example 3.5.40 The Cantor ternary set is nowhere dense (p. 262).

Example 3.6.7 A complete Heyting algebra is pseudo-complemented (p. 271).

Example 3.6.8 A complete pseudo-complemented lattice satisfies the general distributivelaw (p. 271).

Example 3.6.11 Topological spaces and complete Heyting algebras with homomorphismsare topological systems (p. 272).

Example 3.6.13 Continuity of c-morphisms (p. 273).

Example 3.6.28 A sober topological space is a dcpo under the specialization order (p. 278).

Example 3.6.30 Sets of weakly finite character are Scott open (p. 279).

Example 3.6.45 The polynomials over the closed unit interval are dense in the topology ofuniform convergence (p. 286).

Example 3.6.49 Uniform spaces: Discrete and indiscrete uniformities, the additive and themultiplicative uniformity on R, the uniformity defined through finite par-titions on a set, the p-adic uniformity on Z, the uniformity of uniformconvergence, left and right uniformity for a topological group (p. 291).

Example 3.6.52 The additive and the multiplicative uniformity on R give the same topology,the discrete topology is also induced by the uniformity of finite partitions,the topologies induced by the p-adic and q-adic uniformities are distinctfor different primes p, q (p. 294).

Example 3.6.60 The topology for the uniformity of finite partitions is not metrizable for aninfinite carrier set (p. 298).

Example 3.6.64 Each ultrafilter on the space of finite partitions is a Cauchy filter (p. 299).

Example 3.6.69 The uniformity on P (X) induced by an ideal (part 5 of Example 3.6.49)is complete — net version (p. 300).

Example 3.6.70 The uniformity on P (X) induced by an ideal (part 5 of Example 3.6.49)is complete — filter version (p. 301).

Example 3.6.74 The left and the right uniformities on a topological group are not necessarilyidentical (p. 302).

Measures for Probabilistic Systems

Example 4.1.1 Functionally closed sets lead to the σ-algebra of Baire sets (p. 314).

Example 4.1.2 The hit σ-algebra defies measurability in terms of hit sets for a distin-guished family of sets (p. 314).

Example 4.1.5 A σ-algebra generates the same equivalence relation as each of its genera-tors (p. 316).

Page 517: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

4.13. EXERCISES 505

Example 4.1.11 Transition kernels may be used for interpreting modal logics (p. 320).

Example 4.1.14 A finite transition system can be converted into a stochastic effectivityfunction. (p. 323).

Example 4.1.15 Stochastic effectivity function interpret a simple modal logic (p. 324).

Example 4.1.18 The Dirac kernel defines a stochastic effectivity function (p. 325).

Example 4.2.6 Evaluation of measures is measurable both for measures and for reals(p. 343).

Example 4.2.7 〈x, r〉 | M(x)([[ϕ]]M ) ≥ r is product measurable for modal formula ϕ(p. 344).

Example 4.2.18 Convergence in measure does in general not imply convergence almost ev-erywhere (p. 349).

Example 4.3.1 If the base space is separable, its Borel sets are countably generated(p. 352).

Example 4.3.4 The countable-cocountable σ-algebra is not countably generated (p. 352).

Example 4.3.7 Even if all horizontal and all vertical cuts are measurable, a set is notnecessarily product measurable (p. 353).

Example 4.3.20 Construct a metric so that the open unit interval is complete (p. 358).

Example 4.3.23 The set of all infinite sequences over N is a Polish space in the producttopology (p. 358).

Example 4.4.19 A modal logics induces a smooth equivalence relation (p. 370).

Example 4.4.27 The equivalence generated by the countable-cocountable σ-algebra is theidentity, which in turn has the Borel sets as their σ-algebra of invariantsets (p. 375).

Example 4.4.28 We find two σ-algebras which are countably generated, but their intersec-tion is not (p. 375).

Example 4.4.29 The supremum of two countably generated equivalence relations is notalways countably generated (p. 375).

Example 4.7.4 Image finite hit-measurable maps into a Polish space have a countablefamily of selectors (p. 393).

Example 4.7.5 The support function associated with a stochastic relation is measurableand has a dense set of selectors; this is a form of stochastic nondeterminism(p. 393).

Example 4.9.4 µ⊗K defines a finite measure on X×Y for the finite measure µ on X andthe transition kernel K : X Y (p. 407).

Example 4.9.5 L(z)⊗K defines a transition kernel Z X × Y for the transition kernelsL : Z X and K : X Y (p. 408).

Example 4.9.6 The convolution of kernels is the Kleisli product in the Giry category(p. 408).

Example 4.9.7 Integrating a nonnegative function means computing the area under thegraph through Choquet’s representation (p. 408).

Example 4.9.36 A morphism for game frames induces a natural transformation (p. 434).

Page 518: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

506 CHAPTER 4. MEASURES FOR PROBABILISTIC SYSTEMS

Example 4.9.49 Computing the validity sets [[〈p?; g〉qϕ]]G and [[〈p¿; g〉qϕ]]G for a primitiveformula p and an arbitrary game g (p. 441).

Example 4.10.3 For separable metric X, the base space can be embedded isometrically intothe space of all finite measures as a closed subset (p. 444).

Example 4.10.21 Comparing logically equivalent and bisimilar stochastic relations requiresfactoring (p. 457).

Example 4.10.23 The existence of a non-measurable sets implies the non-existence of a semi-pullback of measures (p. 460).

Example 4.10.25 The factorization from Set is not suitable for the Giry monad (p. 462).

Example 4.11.17 The sequence spaces `p and `∞ are the well-known Lp-spaces for the count-ing measure on N (p. 475).

Example 4.11.21 The space L∞(λ) is not separable (p. 476).

Page 519: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Bibliography

[Acz00] A. D. Aczel. The Mystery of the Aleph - Mathematics, the Kabbalah and theSearch for Infinity. Pocket Books, New York, 2000.

[AJ94] S. Abramsky and A. Jung. Domain theory. In A. Abramsky, D. M. Gabbay, andT. S. E. Maibaum, editors, Handbook of Logic in Computer Science, volume 3 —Semantic Structures, pages 1 – 168. Oxford University Press, Oxford, 1994.

[Arv76] W. Arveson. An Invitation to C*-Algebra. Number 39 in Graduate Texts inMathematics. Springer-Verlag, New York, Berlin, Heidelberg, 1976.

[Aum54] G. Aumann. Reelle Funktionen. Springer-Verlag, Berlin, Gottingen, Heidelberg,1954.

[Awo10] S. Awodey. Category Theory. Number 52 in Oxford Logic Guides. Oxford Uni-versity Press, Oxford, 2nd edition, 2010.

[Bar77] J. Barwise, editor. Handbook of Mathematical Logic. North-Holland PublishingCompany, Amsterdam, 1977.

[Bar01] L. M. Barbosa. Components as Coalgebras. PhD thesis, Universidade do Minho,2001.

[BdRV01] P. Blackburn, M. de Rijke, and Y. Venema. Modal Logic. Number 53 in CambridgeTracts in Theoretical Computer Science. Cambridge University Press, Cambridge,UK, 2001.

[BHHK03] C. Baier, B. Haverkort, H. Hermanns, and J.-P. Katoen. Model-checking algo-rithms for continuous time Markov chains. IEEE Trans. Softw. Eng., 29(6):524 –541, June 2003.

[Bil95] P. Billingsley. Probability and Measure. John Wiley & Sons, New York, thirdedition edition, 1995.

[Bir67] G. Birkhoff. Lattice Theory, volume 25 of Colloquium Publications. AmericanMathematical Society, Providence, RI, 1967.

[Blo08] W. B. Bloch. The Unimaginable Mathematics of Borges’ Library of Babel. OxfordUniversity Press, Oxford, 2008.

[BN98] F. Baader and T. Nipkow. Term rewriting and all that. Cambridge UniversityPress, Cambridge, UK, 1998.

507

Page 520: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

508 BIBLIOGRAPHY

[Bog07] V. I. Bogachev. Measure Theory. Springer-Verlag, 2007.

[Bor94a] F. Borceux. Handbook of Categorical Algebra 1: Basic Category Theory, volume 50of Encyclopedia of Mathematics and Its Applications. Cambridge University Press,Cambridge, UK, 1994.

[Bor94b] F. Borceux. Handbook of Categorical Algebra 2: Categories and Structures, vol-ume 51 of Encyclopedia of Mathematics and Its Applications. Cambridge Univer-sity Press, Cambridge, UK, 1994.

[Bou89] N. Bourbaki. General Topology. Elements of Mathematics. Springer-Verlag, Berlin,Heidelberg, New York, 1989.

[Bro08] T. J. Bromwich. In Introduction to the Theory of Infinite Series. MacMillan andCo., London, 1908.

[BW99] M. Barr and C. Wells. Category Theory for Computing Science. Les PublicationsCRM, Montreal, 1999.

[CFO89] D. Cantone, A. Ferro, and E. G. Omodeo. Computable Set Theory, volume 6of International Serues of Monographs on Computer Science. Oxford UniversityPress, Oxford, 1989.

[CGP99] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. The MIT Press,Cambridge, MA, 1999.

[CH67] R. Courant and D. Hilbert. Methoden der Mathematischen Physik I, volume 30of Heidelberger Taschenbucher. Springer-Verlag, third edition, 1967.

[Che89] B. F. Chellas. Modal Logic. Cambridge University Press, Cambridge, UK, 1989.

[Chr64] G. Chrystal. Textbook of Algebra I/II. Chelsea Publishing Company (Reprint),New York, 1964.

[CK90] C. C. Chang and H. J. Keisler. Model Theory, volume 73 of Studies in Logic andthe Foundations of Mathematics. Elsevier, Amsterdam, 1990.

[CLR92] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. An Introduction to Algorithms.The MIT Press, Cambridge MA, 1992.

[COP01] D. Cantone, E. G. Omodeo, and A. Policriti. Set Theory for Computing. Springer-Verlag, Berlin, Heidelberg, New York, 2001.

[CV77] C. Castaing and M. Valadier. Convex Analysis and Measurable Multifunctions.Number 580 in Lect. Notes Math. Springer-Verlag, Berlin, Heidelberg, New York,1977.

[Dav00] M. Davis. Engines of Logic - Mathematics and the Origin of the Computer. W.W. Norton, New York, London, 2000.

[DEP02] J. Desharnais, A. Edalat, and P. Panangaden. Bisimulation of labelled Markovprocesses. Information and Computation, 179(2):163 – 193, 2002.

[DF89] E.-E. Doberkat and D. Fox. Software Prototyping mit SETL. Leitfaden undMonographien der Informatik. Teubner-Verlag, Stuttgart, 1989.

Page 521: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

BIBLIOGRAPHY 509

[Dob89] E.-E. Doberkat. Topological completeness in an ideal model for recursive poly-morphic types. SIAM J. Computing, 18(5):977 – 991, 1989.

[Dob03] E.-E. Doberkat. Pipelines: Modelling a software architecture through relations.Acta Informatica, 40:37 – 79, 2003.

[Dob07] E.-E. Doberkat. The Hennessy-Milner equivalence for continuous-times stochasticlogic with mu-operator. J. Appl. Logic, 35:519 – 544, 2007.

[Dob09] E.-E. Doberkat. Stochastic Coalgebraic Logic. EATCS Monographs in TheoreticalComputer Science. Springer-Verlag, Berlin, 2009.

[Dob10] E.-E. Doberkat. A note on the coalgebraic interpretation of game logic. RendicontiIst. di Mat. Univ. di Trieste, 42:191 – 204, 2010.

[Dob12a] E.-E. Doberkat. Haskell fur Objektorientierte. Oldenbourg-Verlag, Munchen,2012.

[Dob12b] E.-E. Doberkat. A stochastic interpretation of propositional dynamic logic: Ex-pressivity. J. Symb. Logic, 77(2):687 – 716, 2012.

[Dob14] E.-E. Doberkat. Algebraic properties of stochastic effectivity functions. J. Logicand Algebraic Progr., 83:339 – 358, 2014.

[Doo94] J. L. Doob. Measure Theory. Number 143 in Graduate Texts in Mathematics.Springer Verlag, 1994.

[DP02] B.A. Davey and H. A. Priestley. Introduction to Lattices and Order. CambridgeUniversity Press, Cambridge, UK, 2002.

[DP03] J. Desharnais and P. Panangaden. Continuous stochastic logic characterizes bisim-ulation of continuous-time Markov processes. J. Log. Alg. Programming, 56(1-2):99– 115, 2003.

[DPP09] A. Doxiadis, C. Papadimitriou, and A. Papadatos. Logicomix: An Epic Searchfor Truth. Bloomsbury, London, 2009.

[DS57] N. Dunford and J. T. Schwartz. Linear Operators, volume I. Interscience Pub-lishers, 1957.

[DS65] L. E. Dubins and L. J. Savage. How to Gamble if you Must: Inequalities forStochastic Processes. McGraw-Hill, New York, 1965.

[DS11] E.-E. Doberkat and Ch. Schubert. Coalgebraic logic over general measurablespaces - a survey. Math. Struct. Comp. Science, 21:175 – 234, 2011. Special issueon coalgebraic logic.

[DT41] E.-E. Doberkat and P. Sanchez Terraf. Stochastic nondeterminism and effectivityfunctions, December 2014 (arxiv: 1405.7141).

[Eda99] A. Edalat. Semi-pullbacks and bisimulations in categories of Markov processes.Math. Struct. Comp. Science, 9(5):523 – 543, 1999.

[Edg98] G. A. Edgar. Integral, Probability, and Fractal Measures. Springer-Verlag, NewYork, 1998.

Page 522: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

510 BIBLIOGRAPHY

[Els99] J. Elstrodt. Maß- und Integrationstheorie. Springer-Verlag, Berlin-Heidelberg-New York, 2 edition, 1999.

[Eng89] R. Engelking. General Topology, volume 6 of Sigma Series in Pure Mathematics.Heldermann-Verlag, Berlin, revised and completed edition edition, 1989.

[Fic64] G. M. Fichtenholz. Differential- und Integralrechnung I-III. VEB Deutscher Verlagder Wissenschaften, Berlin, 1964.

[Fre08] D. H. Fremlin. Measure Theory, volume 1 – 5. Torres Fremlin, Colchester, 2000– 2008.

[GHK+03] G. Gierz, K. H. Hofmann, K. Keimel, J. D. Lawson, M. W. Mislove, and D. S.Scott. Continuous Lattices and Domains. Number 93 in Encyclopaedia of Math-ematics and its Applications. Cambridge University Press, Cambridge, UK, 2003.

[Gir81] M. Giry. A categorical approach to probability theory. In Categorical Aspects ofTopology and Analysis, number 915 in Lect. Notes Math., pages 68 – 85, Berlin,1981. Springer-Verlag.

[GKP89] R. E. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics: A Foun-dation for Computer Science. Addison-Wesley, Reading, MA, 1989.

[Gol96] D. Goldrei. Classic Set Theory. Chapman and Hall/CRC, Boca Raton, 1996.

[Gol06] R. Goldblatt. Topoi — The Categorical Analysis of Logic. Dover Publications,New York, 2006.

[Gol10] R. Goldblatt. Deduction systems for coalgebras over measurable spaces. Journalof Logic and Computation, 20(5):1069 – 1100, 2010.

[Gol12] R. Goldblatt. Topological proofs of some Rasiowa-Sikorski lemmas. Studia Logica,100:1 – 18, 2012.

[Hal50] P. R. Halmos. Measure Theory. Van Nostrand Reinhold, New York, 1950.

[Her06] H. Herrlich. Axiom of Choice. Number 1876 in Lect. Notes Math. Springer-Verlag,Berlin, Heidelberg, New York, 2006.

[HS65] E. Hewitt and K. R. Stromberg. Real and Abstract Analysis. Springer-Verlag,Berlin, Heidelberg, New York, 1965.

[Isb64] J. R. Isbell. Uniform Spaces. Number 12 in Mathematical Surveys and Mono-graphs. American Mathematical Society, Providence, RI, 1964.

[Jam87] I. M. James. Topological and Uniform Spaces. Undergraduate Texts in Mathe-matics. Springer-Verlag, New York, Berlin, Heidelberg, 1987.

[Jec73] T. Jech. The Axiom of Choice, volume 75 of Studies in Logic and the Foundationsof Mathematics. North-Holland Publishing Company, New York, 1973.

[Jec06] T. Jech. Set Theory. Springer-Verlag (The Third Millennium Edition), Berlin,Heidelberg, New York, 2006.

[Joh82] P. T. Johnstone. Stone Spaces. Cambridge University Press, Cambridge, UK,1982.

Page 523: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

BIBLIOGRAPHY 511

[JR02] J. E. Jayne and C. A. Rogers. Selectors. Princeton University Press, Princeton,N. J., 2002.

[Kec94] A. S. Kechris. Classical Descriptive Set Theory. Graduate Texts in Mathematics.Springer-Verlag, Berlin, Heidelberg, New York, 1994.

[Kee93] J. P. Keener. The Perron-Frobenius theorem and the ranking of football teams.SIAM Review, 35(1):80 – 93, 1993.

[Kel55] J. L. Kelley. General Topology. Number 27 in Graduate Texts in Mathematics.Springer Verlag, New York, Berlin, Heidelberg, 1955.

[Kel72] H. G. Kellerer. Topologische Maßtheorie. Lecture notes, Mathematisches Institut,Ruhr-Universitat Bochum, Sommersemester 1972.

[KM76] K. Kuratowski and A. Mostowski. Set Theory, volume 86 of Studies in Logicand the Foundations of Mathematics. North-Holland and PWN, Polish ScientificPublishers, Amsterdam and Warzawa, 1976.

[Knu73] D. E. Knuth. The Art of Computer Programming. Vol. I, Fundamental Algorithms.Addison-Wesley, Reading, Mass., 2 edition, 1973.

[Kop89] S. Koppelberg. Handbook of Boolean Algebras, volume 1. North Holland, Amster-dam, 1989.

[Koz85] D. Kozen. A probabilistic PDL. J. Comp. Syst. Sci., 30(2):162–178, 1985.

[Kur66] K. Kuratowski. Topology, volume I. PWN – Polish Scientific Publishers andAcademic Press, Warsaw and New York, 1966.

[Lip11] M. Lipovaca. Learn You a Haskell for Great Good! no starch press, San Francisco,2011.

[LM05] A. M. Langville and C. D. Mayer. A survey of eigenvector methods for Webinformation retrieval. SIAM Review, 47(1):135 – 161, 2005.

[Loo53] L. H. Loomis. An Introduction to Abstract Harmonic Analysis. D. van NostrandCompany, Princeton, N. J., 1953.

[Lub74] A. Lubin. Extensions of measures and the von Neumann selection theorem. Proc.Amer. Math. Soc., 43(1):118 – 122, 1974.

[Mic51] E. Michael. Topologies on spaces of subsets. Trans. Am. Math. Soc., 71(2):152 –182, 1951.

[ML97] S. Mac Lane. Categories for the Working Mathematician. Graduate Texts inMathematics. Springer-Verlag, Berlin, 1997.

[Mog89] E. Moggi. An abstract view of programming languages. Lecture Notes, StanfordUniversity, June 1989.

[Mog91] E. Moggi. Notions of computation and monads. Information and Computation,93:55 – 92, 1991.

[Mos06] Y. Moschovakis. Notes on Set Theory. Undergraduate Texts in Mathematics.Springer Verlag, 2nd edition, 2006.

Page 524: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

512 BIBLIOGRAPHY

[MPS86] D. MacQueen, G. Plotkin, and R. Sethi. An ideal model for recursive polymorphictypes. Information and Control, 71:95 – 130, 1986.

[MS64] J. Mycielski and S. Swierczkowski. On the Lebesgue measurability and the axiomof determinateness. Fund. Math., 54:67 – 71, 1964.

[OGS09] B. O’Sullivan, J. Goerzen, and D. Stewart. Real World Haskell. O’Reilly, Se-bastopol, CA, 2009.

[Oxt80] J. C. Oxtoby. Measure and Category. Number 2 in Graduate Texts in Mathemat-ics. Springer Verlag, New York, Berlin, Heidelberg, 2nd edition, 1980.

[Pan09] P. Panangaden. Labelled Markov Processes. World Scientific Pub Co, 2009.

[Par67] K. R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press,New York, 1967.

[Par85] R. Parikh. The logic of games and its applications. In M. Karpinski and J. vanLeeuwen, editors, Topics in the Theory of Computation, volume 24, pages 111–140.Elsevier, 1985.

[Pat04] D. Pattinson. Expressive logics for coalgebras via terminal sequence induction.Notre Dame J. Formal Logic, 45(1):19 – 33, 2004.

[Pau00] M. Pauly. Game logic for game theorists. Technical Report INS-R0017, CWI,Amsterdam, 2000.

[PP03] M. Pauly and R. Parikh. Game logic — an overview. Studia Logica, 75:165 – 182,2003.

[Pum99] D. Pumplun. Elemente der Kategorientheorie. Spektrum Akademischer Verlag,Heidelberg, 1999.

[Pum03] D. Pumplun. Positively convex modules and ordered normed linear spaces. J.Convex Analysis, 10(1):109 – 127, 2003.

[Que01] B. v. Querenburg. Mengentheoretische Topologie. Springer -Lehrbuch. Springer-Verlag, Berlin, 3rd edition, 2001.

[Rou10] Ch. Rousseau. How Google works. www.kleinproject.org, August 2010.

[RR81] K. P. S. Bhaskara Rao and B. V. Rao. Borel spaces. Dissertationes Mathematicae,190:1–63, 1981.

[RS50] H. Rasiowa and R. Sikorski. A proof of the completeness theorem of Godel. Fund.Math., 37:193 – 200, 1950.

[RS63] H. Rasiowa and R. Sikorski. The Mathematics of Metamathematics, volume 41 ofMonografie Matematyczne. PWN, Warsaw, 1963.

[Rud74] W. Rudin. Real and Complex Analysis. Tata McGraw-Hill, 2nd edition, 1974.

[Rut00] J. J. M. M. Rutten. Universal coalgebra: a theory of systems. Theor. Comp. Sci.,249(1):3 – 80, 2000. Special issue on modern algebra and its applications.

Page 525: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

BIBLIOGRAPHY 513

[Sch70] H. H. Schaefer. Topological Vector Spaces. Number 3 in Graduate Texts in Math-ematics. Springer Verlag, New York, Heidelberg, Berlin, 1970.

[SDDS86] J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programmingwith Sets — An Introduction to SETL. Springer-Verlag, New York, Berlin, Hei-delberg, Tokyo, 1986.

[Sho67] J. R. Shoenfield. Mathematical Logic. Addison-Wesley, Reading, MA, 1967.

[Smy92] M. B. Smyth. Topology. In S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum,editors, Handbook of Logic in Computer Science, volume 1 — Background: Math-ematical Structures, pages 641 – 761. Oxford University Press, Oxford, 1992.

[Sok05] A. Sokolova. Coalgebraic Analysis of Probabilistic Systems. PhD thesis, Depart-ment of Computer Science, University of Eindhoven, 2005.

[Sri98] S. M. Srivastava. A Course on Borel Sets. Graduate Texts in Mathematics.Springer-Verlag, Berlin, 1998.

[Sri08] S. M. Srivastava. A Course on Mathematical Logic. Universitext. Springer Verlag,2008.

[ST11] P. Sanchez Terraf. Unprovability of the logical characterization of bisimulation.Information and Computation, 209(7):1048 – 1056, 2011.

[Sta97] R. Stanley. Enumerative Combinatorics, volume 1 of Cambridge Studies in Ad-vanced Mathematics. Cambridge University Press, Cambridge, UK, 1997.

[Str81] K. R. Stromberg. An Introduction to Classical Real Analysis. Wadsworth Inter-national Group, Belmont, 1981.

[Tay99] P. Taylor. Practical Foundations of Mathematics, volume 59 of Cambridge Studiesin Advanced Mathematics. Cambridge University Press, Cambridge, 1999.

[Thi65] C. Thiel. Sinn und Bedeutung in der Logik Gottlob Freges. Verlag Anton Hain,Meisenheim am Glan, 1965.

[vdHP07] W. van der Hoek and M. Pauly. Modal logic for games and information. InP. Blackburn, J. van Benthem, and F. Wolter, editors, Handbook of Modal Logic,volume 3 of Studies in Logic and Practical Reasoning, pages 1077 – 1148. Elsevier,Amsterdam, 2007.

[Ven07] Y. Venema. Algebras and co-algebras. In P. Blackburn, J. van Benthem, andF. Wolter, editors, Handbook of Modal Logic, volume 3 of Studies in Logic andPractical Reasoning, pages 331–426. Elsevier, Amsterdam, 2007.

[Vic89] S. Vickers. Topology via Logic. Number 5 in Cambridge Tracts in TheoreticalComputer Science. Cambridge University Press, Cambridge, UK, 1989.

Page 526: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Index

Symbols

B(x, r), 42, 243

Hγ , 448

S(x, r), 243

U [x], 290

Ba(τ), 314

B(τ), 314

Meas, 84

Prob, 85

P , 82

Rel, 88

Top, 84

C(X), 282

M(X,A), 319

[[ϕ]]M, 168

ΣΣΣρ(A), 330

N , 68a.e.−→, 346

α-sequence, 19

Kop, 87

Set, 81

AbGroup, 201

ε-net, 256

η∼I , 38

M, w |= ϕ, 168a.e.−→, 346i.m.−→, 348

diam(A), 252

µ-measurable, 57

µ-a.e., 344i.m.−→, 348

ω1, 19

π-λ-Theorem, 62

℘℘℘(X,A), 319

supp(µ), 338

Thγ(c), 192

U(x), 219

`Λ ϕ, 181

dµ/dν, 478

w |=γ ϕ, 191

F → x, 221

L(Φ), 165

L(L), 191

L(τ, Φ), 165

L∞(µ), 344

22, 272

A

A-cover, 380

absolute continuity, 476

signed measure, 486

(AC), 6

accumulation point, 225

(AD), 68

additive, 53

adjoint

left, 131

right, 131

adjunction, 131

counit, 135

unit, 135

algebra

Boolean, 35

Eilenberg-Moore, 138

finite-cofinite, 47

free, 139

Heyting, 78, 271

morphism, 271

Lindenbaum, 267

morphism, 138

structure morphism, 138

analytic set, 365

Angel, 65, 166

appending an element, 66

atom, 355

µ, 336

514

Page 527: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

INDEX 515

atomic harmony, 177

automaton, 86

with output, 107

Axiom of Choice

(AC), 6

(AD), 68

(MI), 36

(MP), 22

(WO), 13

(ZL), 21

Axiom of Determinacy, 68

B

Baire sets

Ba(τ), 314

Banach decomposition, 76

Banach-Mazur game, 262

base, 27

βββA, 319

bisimilar, 455

states, 177

bisimulation, 148, 150, 152, 177

bisimulation equivalence, 156

Boolean σ-algebra, 46

Borel measurability, 334

Borel sets, 48

B(τ), 314

bound

lower, 11

upper, 11

boundary

∂M , 42

bounded, 482

C

Cantor ternary set, 48

cardinality, 7

category

comma, 111

composition, 81

coproduct, 97

injection, 97

discrete, 82

dual, 87

epimorphism, 89

split, 198

free, 82

monomorphism, 88split, 198

morphism, 81natural transformation, 112natural transformation

component, 112object, 80product, 93

projection, 93pullback, 100pushout, 104slice, 87small, 82sum, 97weak pullback, 100

Cauchy filter, 299Cauchy sequence, 248chain, 21change of variables

calculus, 399image measure, 398

Charlie Brown’s device, 251choice

angelic, 167demonic, 167

choice function, 6clopen, 43closure

Ma, 42closure operator, 217coalgebra, 145

bisimilar, 148carrier, 145dynamics, 145mediating, 148morphism, 147

cocone, 119colimit, 120

colimit, 120compact

countably compact, 240Lindelof space, 240locally compact, 235paracompact, 240

locally finite, 240refinement, 240

sequentially compact, 240, 256

Page 528: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

516 INDEX

compactification, 237Alexandrov one point, 236Stone-Cech, 238

complement, 35complete

measure space, 381strongly C, 183weakly C, 183

completionM , 383µ, 382universal, 383

conditional distribution, 488regular, 488

cone, 118limit, 119

congruence, 37, 161, 458effectivity function, 331

conjugate numbers, 472consistent

Λ, 181maximal Λ, 183

continuous, 211uniformly, 259, 301

contraction, 254convergence

almost everywhere, 346filter, 221in measure, 348in measure, 348net, 221pointwise, 345uniform, 345

converse, 487convex, 469coproduct, 158countable, 8, 11countably additive, 53countably infinite, 8countably subadditive, 53covers, 77CSL, 419

next operator, 419path quantifier, 419steady-state, 419until operator, 419

CTL*, 195

currying, 131cut

Qx, Qy, 318

cylinder sets, 413

Ddcpo, 278Demon, 65, 166dense set, 231density, 477diagram

chasing, 92commutative, 86

diameter, 252, 339Dirac measure, 126, 127disintegration, 492distance

Levy-Prohorov, 337down set, 33

Eeffectivity function, 169Eilenberg-Moore algebra, 138

morphism, 138element

largest, 11maximal, 11smallest, 11

embedding, 236entourage, 290epi, 89epic

jointly, 200equalizer, 200equivalence

induced by C, 316modal, 177

equivalence relationgenerated by sets, 316invariant set, 330smooth, 370

determining sequence, 370tame, 330

exception, 121

Ffactor algebra, 38filter, 30, 34

Page 529: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

INDEX 517

accumulation point, 225Cauchy, 299maximal, 31neighborhood, 218

filter base, 30finite character, 22first entrance trick, 60flyswatter, 256formula

globally true, 169refutable, 169satisfiable, 169

frame, 271t, 171accessibility relation, 167game, 427Kripke, 167morphism, 175neighborhood, 169set of worlds, 167

framesclass of, 183

functionµ-essentially bounded, 344indicator, 342step, 342

functionallinear, 28

functor, 105constant, 106contravariant, 107covariant, 107endofunctor, 106forgetful, 106identity, 106power set, 106

GGalois connection, 133game, 65

Banach-Mazur, 66determined, 67, 174, 427disjunctive, 175frame, 427interpretable, 432logic, 167strategy, 66

Giry

functor, 128

monad, 128

Godement product, 115

Google, 255

graph

F (G), 82

k-colorable, 76

free category, 82

path, 82

undirected, 76, 198

group

topological, 220

H

Holder’s inequality, 473

Hilbert cube, 360

Hilbert space, 469

hit

σ-algebra, 314

measurable, 393

homeomorphism, 216

Hutchinson metric, 448

I

ideal, 34

independence

linear, 27

indicator function, 342

induction

Noetherian, 14

transfinite, 14

well-founded, 14

inequality

Holder, 472, 473

Minkowski, 473

Schwarz, 467

infimum, 11

initial segment, 14

inner measure, 64

inner product, 467

interior

Mo, 42

invariant

set, 330

invariant subset, 162

irreducible, 277

Page 530: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

518 INDEX

J

join irreducible, 77

Jordan decomposition, 485

K

kernel

linear functional, 468

Markov, 320

transition, 319

Kleisli

category, 122

tripel, 121

Kripke model, 167

stochastic, 438

Kronecker’s δ, 142

Kuratowski’s trap, 359

L

Lagrange’s Formula, 482

lattice, 32

bounded, 32

Brouwerian, 78

complete, 78

distributive, 33

join, 32

meet, 32

pseudo-complement, 77

Lebesgue decomposition, 478

left adjoint, 131

leitmotif, 6

limit, 119

Lindenbaum Lemma, 186

linear functional, 482

positive, 483

localization, 440

logic

closed

modus ponens, 180

uniform substitution, 180

coalgebraic

behavioral equivalence, 192

bridge operators, 197

logical equivalence, 192

predicate lifting, 188

continuous time stochastic, 419

CSL, 419

modal, 180, 320

normal, 182

MMackey, 354map

affine, 292continuous, 84factor, 38kernel, 90measurable, 84, 315

final, 162strong, 162

semicontinuous, 315Markov property, 421maximal lower bound, 11measurable

rectangle, 318relation, 391selector, 391set-valued map, 391

measure, 53complex, 486Dirac, 53Lebesgue, 55projective limit, 416, 418projective system, 416σ-finite, 53τ -regular, 338tight, 447

metricdiscrete, 242Hausdorff, 246ultrametric, 243

(MI), 36minimal upper bound, 11Minkowski’s inequality, 473modal language

basic, 165extended, 165game logic, 167nabla, 165PDL, 166

modal logicnormal, 182

model, 268, 269t, 171canonical, 186

Page 531: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

INDEX 519

game, 438image finite, 179Kripke, 167morphism, 176neighborhood, 170

monad, 122Haskell, 129Manes, 139multiplication, 122unit, 122

monicjointly, 200

mono, 88monoid, 91morphism

automaton, 86bounded, 83, 148codomain, 81domain, 81epi/mono factorization, 91epimorphism, 89frame, 175game frame, 433isomorphism, 92measure spaces, 460model, 176, 439monomorphism, 88neighborhood, 179stochastic relations, 455

(MP), 22mutual singular, 476

NNachbarschaft, 290naive set theory, 4natural transformation, 112

component, 112Godement product, 115horizontal composition, 115vertical composition, 115

neighborhood, 41frame, 169model, 170

neighborhood morphism, 179net, 221

Cauchy, 300convergence, 221

next operator

CSL, 419

norm, 282

nowhere dense, 261

null set, 382

O

open

Scott, 211, 279

order

lexicographic, 13

linear, 11

strict, 11

ordered

inductively, 21

ordinal, 15

limit, 16

odd, even, 16

successor, 15

orthogonal

complement, 469

vector, 468

oscillation, 252

P

partition, 291

path quantifier

CSL, 419

PDL, 166

PDL fragment, 435

playing, 65

portfolio, 322

positive convexity, 141

affine map, 142

morphism, 142

regular, 452

predicate lifting, 188

prime, 34

completely, 275

element, 275

principal down sets, 33

probability

discrete, 109

probability functor

continuous space, 110

probability functor

continuous, 126

Page 532: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

520 INDEX

discrete, 110, 125propositional formula, 23propositional letters, 165pseudo-norm, 344pseudometric, 241pseudometrics

equivalent, 244pullback, 100

preserves weak pullbacks, 154weak, 100

Qquotient object, 461

RRadon-Nikodym derivative, 478, 480reduction system, 12relation

antisymmetric, 11characteristic, 326linear, 11reflexive, 11strict, 11transitive, 11

right adjoint, 131

Ssatisfiable, 24Schwarz inequality, 467semi lattice, 139semicontinuous

lower, 307upper, 307

semiring, 55sentence, 267separable

σ-algebra, 353separation

T0, T1, 227T2, 226T3, T3 1

2, T4, 228

sequenceCauchy, 248

setGδ, 253µ-measurable, 65µ-null, 63analytic, 365

clopen, 361

co-analytic, 365

cylinder, 318

directed, 221, 278, 338

saturated, 281

small, 299

Mσ(X,A), 319

σ-algebra, 46

complete, 64

countable-cocountable, 46, 352

countably generated, 351

final, 316

initial, 316

product, 95, 318

separable, 353

sum, 318

trace, 317

weak, 85

σ-ideal, 52

σ(A), 47

signed measure

absolute continuity, 486

signed measure, 484

nullset, 486

positive set, 485

total variation, 486

similar, 18

somewhat natural numbers, 15

Sorgenfrey line, 226

Souslin scheme, 377

regular, 377

space

T0, T1, 227

T3, T3 12, T4, 228

analytic, 368

Banach, 283

completely regular, 230

dual, 497

first category, 261

Hausdorff, T2, 226

locally compact, 235

measurable, 84, 314

metric, 241

complete, 357

normal, 230

normed, 283

Page 533: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

INDEX 521

Polish, 358probability, 85pseudometric, 241

complete, 249regular, 230standard Borel, 376topological, 41, 84

Baire sets, 314Borel sets, 314

uniform, 289complete, 299separated, 296topology, 294

statebehavioral equivalence, 192logical equivalence, 192theory, 192

steady-stateCSL, 419

stochastic relations, 320category, 86

strategy, 66winning, 66

substitution, 180support, 109, 338supremum, 11symmetric difference, 37

Tt-measurable, 322terminal object, 201terminating, 12theorem

π-λ, 62Alexander’s Subbase, 45Alexandrov, 361, 418Baire

complete pseudometric, 261locally compact, 239

Blackwell-Mackey, 374Compactness, 24Dini, 283Egorov, 347Fatou’s Lemma, 397Hahn-Banach, 29Heine-Borel, 41Hennessy-Milner, 179

Hofmann-Mislove, 281Kuratowski and Ryll-Nardzewski, 391Kuratowski Isomorphism, 368Lebesgue Dominated Convergence, 397Lubin, 386Lusin, 367Manes, 123Prime Ideal, 37Radon-Nikodym, 479

derivative, 480Riesz Representation, 403Schroder-Bernstein, 7Souslin, 367Stone Duality, 43Stone Representation, 39Stone-Weierstraß, 285Tihonov, 224unique structure, 369Urysohn’s Metrization, 248von Neumann Selection, 385

theory, 192topological sort, 25topological system, 272

c-morphism, 273homeomorphism, 273localic, 275localization, 275opens, 272

extension, 272points, 272spatialization, 273

topology, 41Alexandrov, 334Baire sets, 314base, 43, 208Borel sets, 314boundary, 42closure, 42compactification, 237discrete, indiscrete, 41final, 214first countable, 247initial, 214interior, 42interval, 42Priestley, 46product, 214

Page 534: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

522 INDEX

quotient, 215

Scott, 211, 279

second countable, 247

separable, 247

sober, 278

subbase, 43

subspace, 42

sum, 215

topological group, 220

trace, 214

uniform, 294

uniform convergence, 283

Vietoris, 259

weak, 210, 443

totally bounded, 256

transition system, 83

behavioral equivalence, 193

bisimilar, 193

bisimulation, 148

labeled, 109

logical equivalence, 193

Tuckey’s Maximality Principle, 23

U

ultrafilter, 31, 34

principal, 30

ultrametric, 243

Umgebung, 290

uniformity, 289

p-adic, 291

additive, 291

discrete, 291

finite partitions, 291

indiscrete, 291

initial, 302

multiplicative, 291

product, 303

subspace, 303

universal set, 359

until operator

CSL, 419

upper closed, 111

Urysohn’s Lemma, 231

V

valuation, 23

vector lattice, 399

Vitali’s equivalence relation, 65

Wweak σ-algebra

℘℘℘(A), 85βββA(A, r), 85

weak topology, 443well-ordered, 12well-ordering, 12(WO), 13world

behavioral equivalence, 192logical equivalence, 192theory, 192

YYoneda isomorphism, 116

ZZermelo-Fraenkel System, 4ZFC, 4(ZL), 21Zorn’s Lemma, 23

Page 535: TECHNICAL REPORTS IN COMPUTER SCIENCE Technische ...€¦ · graduate course on Markov transition systems for computer scientists. It turned out that I had to devote most of the time

Forschungsberichte der Fakultät für Informatik

der Technischen Universität Dortmund

ISSN 0933-6192

Anforderungen an:Dekanat Informatik | TU Dortmund

D-44221 Dortmund