Cornerstones Series Editors Charles L. Epstein, University of Pennsylvania, Philadelphia Steven G. Krantz, University o...

Author:
Anthony W. Knapp

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Cornerstones Series Editors Charles L. Epstein, University of Pennsylvania, Philadelphia Steven G. Krantz, University of Washington, St. Louis

Advisory Board Anthony W. Knapp, State University of New York at Stony Brook, Emeritus

Anthony W. Knapp

Basic Algebra Along with a companion volume Advanced Algebra

Birkh¨auser Boston • Basel • Berlin

Anthony W. Knapp 81 Upper Sheep Pasture Road East Setauket, NY 11733-1729 U.S.A. e-mail to: [email protected] http://www.math.sunysb.edu/˜ aknapp/books/b-alg.html

Cover design by Mary Burgess. Mathematics Subject Classiciﬁcation (2000): 15-01, 20-02, 13-01, 12-01, 16-01, 08-01, 18A05, 68P30 Library of Congress Control Number: 2006932456 ISBN-10 0-8176-3248-4 ISBN-13 978-0-8176-3248-9

eISBN-10 0-8176-4529-2 eISBN-13 978-0-8176-4529-8

Advanced Algebra Basic Algebra and Advanced Algebra (Set)

ISBN 0-8176-4522-5 ISBN 0-8176-4533-0

Printed on acid-free paper. c 2006 Anthony W. Knapp All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkh¨auser Boston, c/o Springer Science+Business Media LLC, 233 Spring Street, New York, NY 10013, USA) and the author, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identiﬁed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

9 8 7 6 5 4 3 2 1 www.birkhauser.com

(EB)

To Susan and To My Algebra Teachers: Ralph Fox, John Fraleigh, Robert Gunning, John Kemeny, Bertram Kostant, Robert Langlands, Goro Shimura, Hale Trotter, Richard Williamson

CONTENTS

Contents of Advanced Algebra List of Figures Preface Dependence Among Chapters Standard Notation Guide for the Reader I.

II.

x xi xiii xvii xviii xix

PRELIMINARIES ABOUT THE INTEGERS, POLYNOMIALS, AND MATRICES 1. Division and Euclidean Algorithms 2. Unique Factorization of Integers 3. Unique Factorization of Polynomials 4. Permutations and Their Signs 5. Row Reduction 6. Matrix Operations 7. Problems

1 1 4 9 15 19 24 30

VECTOR SPACES OVER Q, R, AND C 1. Spanning, Linear Independence, and Bases 2. Vector Spaces Deﬁned by Matrices 3. Linear Maps 4. Dual Spaces 5. Quotients of Vector Spaces 6. Direct Sums and Direct Products of Vector Spaces 7. Determinants 8. Eigenvectors and Characteristic Polynomials 9. Bases in the Inﬁnite-Dimensional Case 10. Problems

33 33 38 42 50 54 58 65 73 77 82

III. INNER-PRODUCT SPACES 1. Inner Products and Orthonormal Sets 2. Adjoints 3. Spectral Theorem 4. Problems vii

88 88 98 104 111

viii

Contents

IV.

GROUPS AND GROUP ACTIONS 1. Groups and Subgroups 2. Quotient Spaces and Homomorphisms 3. Direct Products and Direct Sums 4. Rings and Fields 5. Polynomials and Vector Spaces 6. Group Actions and Examples 7. Semidirect Products 8. Simple Groups and Composition Series 9. Structure of Finitely Generated Abelian Groups 10. Sylow Theorems 11. Categories and Functors 12. Problems

116 117 128 134 140 147 158 166 170 174 183 188 198

V.

THEORY OF A SINGLE LINEAR TRANSFORMATION 1. Introduction 2. Determinants over Commutative Rings with Identity 3. Characteristic and Minimal Polynomials 4. Projection Operators 5. Primary Decomposition 6. Jordan Canonical Form 7. Computations with Jordan Form 8. Problems

209 209 212 216 224 226 229 235 239

VI.

MULTILINEAR ALGEBRA 1. Bilinear Forms and Matrices 2. Symmetric Bilinear Forms 3. Alternating Bilinear Forms 4. Hermitian Forms 5. Groups Leaving a Bilinear Form Invariant 6. Tensor Product of Two Vector Spaces 7. Tensor Algebra 8. Symmetric Algebra 9. Exterior Algebra 10. Problems

245 246 250 253 255 257 260 274 280 288 292

VII. ADVANCED GROUP THEORY 1. Free Groups 2. Subgroups of Free Groups 3. Free Products 4. Group Representations

303 303 314 319 326

Contents

VII. ADVANCED GROUP THEORY (Continued) 5. Burnside’s Theorem 6. Extensions of Groups 7. Problems

ix

342 344 357

VIII. COMMUTATIVE RINGS AND THEIR MODULES 1. Examples of Rings and Modules 2. Integral Domains and Fields of Fractions 3. Prime and Maximal Ideals 4. Unique Factorization 5. Gauss’s Lemma 6. Finitely Generated Modules 7. Orientation for Algebraic Number Theory and Algebraic Geometry 8. Noetherian Rings and the Hilbert Basis Theorem 9. Integral Closure 10. Localization and Local Rings 11. Dedekind Domains 12. Problems

367 367 378 381 384 390 396

IX.

448 449 453 457 460 464 469 476 479 483 486

FIELDS AND GALOIS THEORY 1. Algebraic Elements 2. Construction of Field Extensions 3. Finite Fields 4. Algebraic Closure 5. Geometric Constructions by Straightedge and Compass 6. Separable Extensions 7. Normal Extensions 8. Fundamental Theorem of Galois Theory 9. Application to Constructibility of Regular Polygons 10. Application to Proving the Fundamental Theorem of Algebra 11. Application to Unsolvability of Polynomial Equations with Nonsolvable Galois Group 12. Construction of Regular Polygons 13. Solution of Certain Polynomial Equations with Solvable Galois Group 14. Proof That π Is Transcendental 15. Norm and Trace 16. Splitting of Prime Ideals in Extensions 17. Two Tools for Computing Galois Groups 18. Problems

408 414 417 425 434 439

488 493 501 510 514 521 527 534

x

X.

Contents

MODULES OVER NONCOMMUTATIVE RINGS 1. Simple and Semisimple Modules 2. Composition Series 3. Chain Conditions 4. Hom and End for Modules 5. Tensor Product for Modules 6. Exact Sequences 7. Problems

APPENDIX A1. Sets and Functions A2. Equivalence Relations A3. Real Numbers A4. Complex Numbers A5. Partial Orderings and Zorn’s Lemma A6. Cardinality Hints for Solutions of Problems Selected References Index of Notation Index

CONTENTS OF ADVANCEDALGEBRA I. II. III. IV. V. VI. VII. VIII. IX. X.

Transition to Modern Number Theory Wedderburn–Artin Ring Theory Brauer Group Homological Algebra Three Theorems in Algebraic Number Theory Reinterpretation with Adeles and Ideles Inﬁnite Field Extensions Background for Algebraic Geometry The Number Theory of Algebraic Curves Methods of Algebraic Geometry

544 544 551 556 558 565 574 579 583 583 589 591 594 595 599 603 697 699 703

LIST OF FIGURES

2.1. The vector space of lines v + U in R2 parallel to a given line U through the origin 2.2. Factorization of linear maps via a quotient of vector spaces 2.3. Three 1-dimensional vector subspaces of R2 such that each pair has intersection 0 2.4. Universal mapping property of a direct product of vector spaces 2.5. Universal mapping property of a direct sum of vector spaces 3.1. Geometric interpretation of the parallelogram law 3.2. Resolution of a vector into a parallel component and an orthogonal component 4.1. Factorization of homomorphisms of groups via the quotient of a group by a normal subgroup 4.2. Universal mapping property of an external direct product of groups 4.3. Universal mapping property of a direct product of groups 4.4. Universal mapping property of an external direct sum of abelian groups 4.5. Universal mapping property of a direct sum of abelian groups 4.6. Factorization of homomorphisms of rings via the quotient of a ring by an ideal 4.7. Substitution homomorphism for polynomials in one indeterminate 4.8. Substitution homomorphism for polynomials in n indeterminates 4.9. A square diagram 4.10. Diagrams obtained by applying a covariant functor and a contravariant functor 4.11. Universal mapping property of a product in a category 4.12. Universal mapping property of a coproduct in a category 5.1. Example of a nilpotent matrix in Jordan form 5.2. Powers of the nilpotent matrix in Figure 5.1 6.1. Universal mapping property of a tensor product 6.2. Diagrams for uniqueness of a tensor product xi

55 56 62 64 65 91 93 132 136 136 138 139 146 150 156 193 193 195 197 231 232 261 261

xii

6.3. 6.4. 6.5. 7.1. 7.2. 7.3. 7.4. 8.1. 8.2. 8.3. 8.4. 8.5. 8.6. 8.7. 9.1. 9.2. 9.3. 9.4. 10.1.

List of Figures

Commutative diagram of a natural transformation {TX } Commutative diagram of a triple tensor product University mapping property of a tensor algebra Universal mapping property of a free group Universal mapping property of a free product An intertwining operator for two representations Equivalent group extensions Universal mapping property of the integral group ring of G Universal mapping property of a free left R module Factorization of R homomorphisms via a quotient of R modules Universal mapping property of the group algebra RG Universal mapping property of the ﬁeld of fractions of R Real points of the curve y 2 = (x − 1)x(x + 1) Universal mapping property of the localization of R at S Closure of positive constructible x coordinates under multiplication and division Closure of positive constructible x coordinates under square roots Construction of a regular pentagon Construction of a regular 17-gon Universal mapping property of a tensor product of a right R module and a left R module

265 274 279 305 320 330 349 371 374 376 378 380 409 428 465 466 496 500 566

PREFACE

Basic Algebra and its companion volume Advanced Algebra systematically develop concepts and tools in algebra that are vital to every mathematician, whether pure or applied, aspiring or established. These two books together aim to give the reader a global view of algebra, its use, and its role in mathematics as a whole. The idea is to explain what the young mathematician needs to know about algebra in order to communicate well with colleagues in all branches of mathematics. The books are written as textbooks, and their primary audience is students who are learning the material for the ﬁrst time and who are planning a career in which they will use advanced mathematics professionally. Much of the material in the books, particularly in Basic Algebra but also in some of the chapters of Advanced Algebra, corresponds to normal course work. The books include further topics that may be skipped in required courses but that the professional mathematician will ultimately want to learn by self-study. The test of each topic for inclusion is whether it is something that a plenary lecturer at a broad international or national meeting is likely to take as known by the audience. The key topics and features of Basic Algebra are as follows: • Linear algebra and group theory build on each other throughout the book. A small amount of linear algebra is introduced ﬁrst, as the topic likely to be better known by the reader ahead of time, and then a little group theory is introduced, with linear algebra providing important examples. • Chapters on linear algebra develop notions related to vector spaces, the theory of linear transformations, bilinear forms, classical linear groups, and multilinear algebra. • Chapters on modern algebra treat groups, rings, ﬁelds, modules, and Galois groups, including many uses of Galois groups and methods of computation. • Three prominent themes recur throughout and blend together at times: the analogy between integers and polynomials in one variable over a ﬁeld, the interplay between linear algebra and group theory, and the relationship between number theory and geometry. • The development proceeds from the particular to the general, often introducing examples well before a theory that incorporates them. • More than 400 problems at the ends of chapters illuminate aspects of the text, develop related topics, and point to additional applications. A separate xiii

xiv

Preface

90-page section “Hints for Solutions of Problems” at the end of the book gives detailed hints for most of the problems, complete solutions for many. • Applications such as the fast Fourier transform, the theory of linear errorcorrecting codes, the use of Jordan canonical form in solving linear systems of ordinary differential equations, and constructions of interest in mathematical physics arise naturally in sequences of problems at the ends of chapters and illustrate the power of the theory for use in science and engineering. Basic Algebra endeavors to show some of the interconnections between different areas of mathematics, beyond those listed above. Here are examples: Systems of orthogonal functions make an appearance with inner-product spaces. Covering spaces naturally play a role in the examination of subgroups of free groups. Cohomology of groups arises from considering group extensions. Use of the power-series expansion of the exponential function combines with algebraic numbers to prove that π is transcendental. Harmonic analysis on a cyclic group explains the mysterious method of Lagrange resolvents in the theory of Galois groups. Algebra plays a singular role in mathematics by having been developed so extensively at such an early date. Indeed, the major discoveries of algebra even from the days of Hilbert are well beyond the knowledge of most nonalgebraists today. Correspondingly most of the subject matter of the present book is at least 100 years old. What has changed over the intervening years concerning algebra books at this level is not so much the mathematics as the point of view toward the subject matter and the relative emphasis on and generality of various topics. For example, in the 1920s Emmy Noether introduced vector spaces and linear mappings to reinterpret coordinate spaces and matrices, and she deﬁned the ingredients of what was then called “modern algebra”—the axiomatically deﬁned rings, ﬁelds, and modules, and their homomorphisms. The introduction of categories and functors in the 1940s shifted the emphasis even more toward the homomorphisms and away from the objects themselves. The creation of homological algebra in the 1950s gave a unity to algebraic topics cutting across many ﬁelds of mathematics. Category theory underwent a period of great expansion in the 1950s and 1960s, followed by a contraction and a return more to a supporting role. The emphasis in topics shifted. Linear algebra had earlier been viewed as a separate subject, with many applications, while group theory and the other topics had been viewed as having few applications. Coding theory, cryptography, and advances in physics and chemistry have changed all that, and now linear algebra and group theory together permeate mathematics and its applications. The other subjects build on them, and they too have extensive applications in science and engineering, as well as in the rest of mathematics. Basic Algebra presents its subject matter in a forward-looking way that takes this evolution into account. It is suitable as a text in a two-semester advanced

Preface

xv

undergraduate or ﬁrst-year graduate sequence in algebra. Depending on the graduate school, it may be appropriate to include also some material from Advanced Algebra. Brieﬂy the topics in Basic Algebra are linear algebra and group theory, rings, ﬁelds, and modules. A full list of the topics in Advanced Algebra appears on page x; of these, the Wedderburn theory of semisimple algebras, homological algebra, and foundational material for algebraic geometry are the ones that most commonly appear in syllabi of ﬁrst-year graduate courses. A chart on page xvii tells the dependence among chapters and can help with preparing a syllabus. Chapters I–VII treat linear algebra and group theory at various levels, except that three sections of Chapter IV and one of Chapter V introduce rings and ﬁelds, polynomials, categories and functors, and determinants over commutative rings with identity. Chapter VIII concerns rings, with emphasis on unique factorization; Chapter IX concerns ﬁeld extensions and Galois theory, with emphasis on applications of Galois theory; and Chapter X concerns modules and constructions with modules. For a graduate-level sequence the syllabus is likely to include all of Chapters I–V and parts of Chapters VIII and IX, at a minimum. Depending on the knowledge of the students ahead of time, it may be possible to skim much of the ﬁrst three chapters and some of the beginning of the fourth; then time may allow for some of Chapters VI and VII, or additional material from Chapters VIII and IX, or some of the topics in Advanced Algebra. For many of the topics in Advanced Algebra, parts of Chapter X of Basic Algebra are prerequisite. For an advanced undergraduate sequence the ﬁrst semester can include Chapters I through III except Section II.9, plus the ﬁrst six sections of Chapter IV and as much as reasonable from Chapter V; the notion of category does not appear in this material. The second semester will involve categories very gently; the course will perhaps treat the remainder of Chapter IV, the ﬁrst ﬁve or six sections of Chapter VIII, and at least Sections 1–3 and 5 of Chapter IX. More detailed information about how the book can be used with courses can be deduced by using the chart on page xvii in conjunction with the section “Guide for the Reader” on pages xix–xxii. In my own graduate teaching, I have built one course around Chapters I–III, Sections 1–6 of Chapter IV, all of Chapter V, and about half of Chapter VI. A second course dealt with the remainder of Chapter IV, a little of Chapter VII, Sections 1–6 of Chapter VIII, and Sections 1–11 of Chapter IX. The problems at the ends of chapters are intended to play a more important role than is normal for problems in a mathematics book. Almost all problems are solved in the section of hints at the end of the book. This being so, some blocks of problems form additional topics that could have been included in the text but were not; these blocks may either be regarded as optional topics, or they may be treated as challenges for the reader. The optional topics of this kind

xvi

Preface

usually either carry out further development of the theory or introduce signiﬁcant applications. For example one block of problems at the end of Chapter VII carries the theory of representations of ﬁnite groups a little further by developing the Poisson summation formula and the fast Fourier transform. For a second example blocks of problems at the ends of Chapters IV, VII, and IX introduce linear error-correcting codes as an application of the theory in those chapters. Not all problems are of this kind, of course. Some of the problems are really pure or applied theorems, some are examples showing the degree to which hypotheses can be stretched, and a few are just exercises. The reader gets no indication which problems are of which type, nor of which ones are relatively easy. Each problem can be solved with tools developed up to that point in the book, plus any additional prerequisites that are noted. Beyond a standard one-variable calculus course, the most important prerequisite for using Basic Algebra is that the reader already know what a proof is, how to read a proof, and how to write a proof. This knowledge typically is obtained from honors calculus courses, or from a course in linear algebra, or from a ﬁrst junior–senior course in real variables. In addition, it is assumed that the reader is comfortable with a small amount of linear algebra, including matrix computations, row reduction of matrices, solutions of systems of linear equations, and the associated geometry. Some prior exposure to groups is helpful but not really necessary. The theorems, propositions, lemmas, and corollaries within each chapter are indexed by a single number stream. Figures have their own number stream, and one can ﬁnd the page reference for each ﬁgure from the table on pages xi–xii. Labels on displayed lines occur only within proofs and examples, and they are local to the particular proof or example in progress. Some readers like to skim or skip proofs on ﬁrst reading; to facilitate this procedure, each occurrence of the word “PROOF” or “PROOF” is matched by an occurrence at the right margin of the symbol to mark the end of that proof. I am grateful to Ann Kostant and Steven Krantz for encouraging this project and for making many suggestions about pursuing it. I am especially indebted to an anonymous referee, who made detailed comments about many aspects of a preliminary version of the book, and to David Kramer, who did the copyediting. The typesetting was by AMS-TEX, and the ﬁgures were drawn with Mathematica. I invite corrections and other comments from readers. I plan to maintain a list of known corrections on my own Web page. A. W. KNAPP August 2006

DEPENDENCE AMONG CHAPTERS

Below is a chart of the main lines of dependence of chapters on prior chapters. The dashed lines indicate helpful motivation but no logical dependence. Apart from that, particular examples may make use of information from earlier chapters that is not indicated by the chart.

I, II

III

IV.1–IV.6

IV.7–IV.11

VII

V

VI

VIII.1–VIII.6

X

IX.1–IX.13

VIII.7–VIII.11

IX.14–IX.17

xvii

STANDARD NOTATION

See the Index of Notation, pp. 699–701, for symbols deﬁned starting on page 1. Item

Meaning

#S or |S| ∅ {x ∈ E | P} Ec E ∪ F, E∩ F, E − F α Eα , α Eα E ⊆ F, E ⊇ F E F, E F E × F, ×s∈S X s (a1 , . . . , an ), {a1 , . . . , an } f : E → F, x → f (x) f ◦ g or f g, f E f ( · , y) f (E), f −1 (E) δi j

number of elements in S empty set the set of x in E such that P holds complement of the set E union, intersection, difference of sets union, intersection of the sets E α E is contained in F, E contains F E properly contained in F, properly contains F products of sets ordered n-tuple, unordered n-tuple function, effect of function composition of g followed by f , restriction to E the function x → f (x, y) direct and inverse image of a set Kronecker delta: 1 if i = j, 0 if i = j binomial coefﬁcient n > 0, n < 0 integers, rationals, reals, complex numbers maximum of a ﬁnite subset of a totally ordered set sum or product, possibly with a limit operation ﬁnite or in one-one correspondence with Z greatest integer ≤ x if x is real real and imaginary parts of complex z complex conjugate of z absolute value of z multiplicative identity identity matrix or operator identity function on X spaces of column vectors diagonal matrix is isomorphic to, is equivalent to

n k

n positive, n negative Z, Q, R, C max (and similarly min) or countable [x] Re z, Im z z¯ |z| 1 1 or I 1X Qn , Rn , Cn diag(a1 , . . . , an ) ∼ =

xviii

GUIDE FOR THE READER

This section is intended to help the reader ﬁnd out what parts of each chapter are most important and how the chapters are interrelated. Further information of this kind is contained in the abstracts that begin each of the chapters. The book pays attention to at least three recurring themes in algebra, allowing a person to see how these themes arise in increasingly sophisticated ways. These are the analogy between integers and polynomials in one indeterminate over a ﬁeld, the interplay between linear algebra and group theory, and the relationship between number theory and geometry. Keeping track of how these themes evolve will help the reader understand the mathematics better and anticipate where it is headed. In Chapter I the analogy between integers and polynomials in one indeterminate over the rationals, reals, or complex numbers appears already in the ﬁrst three sections. The main results of these sections are theorems about unique factorization in each of the two settings. The relevant parts of the underlying structures for the two settings are the same, and unique factorization can therefore be proved in both settings by the same argument. Many readers will already know this unique factorization, but it is worth examining the parallel structure and proof at least quickly before turning to the chapters that follow. Before proceeding very far into the book, it is worth looking also at the appendix to see whether all its topics are familiar. Readers will ﬁnd Section A1 useful at least for its summary of set-theoretic notation and for its emphasis on the distinction between range and image for a function. This distinction is usually unimportant in analysis but becomes increasingly important as one studies more advanced topics in algebra. Readers who have not speciﬁcally learned about equivalence relations and partial orderings can learn about them from Sections A2 and A5. Sections A3 and A4 concern the real and complex numbers; the emphasis is on notation and the Intermediate Value Theorem, which plays a role in proving the Fundamental Theorem of Algebra. Zorn’s Lemma and cardinality in Sections A5 and A6 are usually unnecessary in an undergraduate course. They arise most importantly in Sections II.9 and IX.4, which are normally omitted in an undergraduate course, and in Proposition 8.8, which is invoked only in the last few sections of Chapter VIII. The remainder of this section is an overview of individual chapters and pairs of chapters. xix

xx

Guide for the Reader

Chapter I is in three parts. The ﬁrst part, as mentioned above, establishes unique factorization for the integers and for polynomials in one indeterminate over the rationals, reals, or complex numbers. The second part deﬁnes permutations and shows that they have signs such that the sign of any composition is the product of the signs; this result is essential for deﬁning general determinants in Section II.7. The third part will likely be a review for all readers. It establishes notation for row reduction of matrices and for operations on matrices, and it uses row reduction to show that a one-sided inverse for a square matrix is a two-sided inverse. Chapters II–III treat the fundamentals of linear algebra. Whereas the matrix computations in Chapter I were concrete, Chapters II–III are relatively abstract. Much of this material is likely to be a review for graduate students. The geometric interpretation of vectors spaces, subspaces, and linear mappings is not included in the chapter, being taken as known previously. The fundamental idea that a newly constructed object might be characterized by a “universal mapping property” appears for the ﬁrst time in Chapter II, and it appears more and more frequently throughout the book. One aspect of this idea is that it is sometimes not so important what certain constructed objects are, but what they do. A related idea being emphasized is that the mappings associated with a newly constructed object are likely to be as important as the object, if not more so; at the least, one needs to stop and ﬁnd what those mappings are. Section II.9 uses Zorn’s Lemma and can be deferred until Chapter IX if one wants. Chapter III discusses special features of real and complex vector spaces endowed with inner products. The main result is the Spectral Theorem in Section 3. Many of the problems at the end of the chapter make contact with real analysis. The subject of linear algebra continues in Chapter V. Chapter IV is the primary chapter on group theory and may be viewed as in three parts. Sections 1–6 form the ﬁrst part, which is essential for all later chapters in the book. Sections 1–3 introduce groups and some associated constructions, along with a number of examples. Many of the examples will be seen to be related to speciﬁc or general vector spaces, and thus the theme of the interplay between group theory and linear algebra is appearing concretely for the ﬁrst time. In practice, many examples of groups arise in the context of group actions, and abstract group actions are deﬁned in Section 6. Of particular interest are group representations, which are group actions on a vector space by linear mappings. Sections 4–5 are a digression to deﬁne rings, ﬁelds, and ring homomorphisms, and to extend the theories concerning polynomials and vector spaces as presented in Chapters I–II. The immediate purpose of the digression is to make prime ﬁelds, their associated multiplicative groups, and the notion of characteristic available for the remainder of the chapter. The deﬁnition of vector space is extended to allow scalars from any ﬁeld. The deﬁnition of polynomial is extended to allow coefﬁcients from any commutative ring with identity, rather than just the

Guide for the Reader

xxi

rationals or reals or complex numbers, and to allow more than one indeterminate. Universal mapping properties for polynomial rings are proved. Sections 7–10 form the second part of the chapter and are a continuation of group theory. The main result is the Fundamental Theorem of Finitely Generated Abelian Groups, which is in Section 9. Section 11 forms the third part of the chapter. This section is a gentle introduction to categories and functors, which are useful for working with parallel structures in different settings within algebra. As S. Mac Lane says in his book, “Category theory asks of every type of Mathematical object: ‘What are the morphisms?’; it suggests that these morphisms should be described at the same time as the objects. . . . This emphasis on (homo)morphisms is largely due to Emmy Noether, who emphasized the use of homomorphisms of groups and rings.” The simplest parallel structure reﬂected in categories is that of an isomorphism. The section also discusses general notions of product and coproduct functors. Examples of products are direct products in linear algebra and in group theory. Examples of coproducts are direct sums in linear algebra and in abelian group theory, as well as disjoint unions in set theory. The theory in this section helps in unifying the mathematics that is to come in Chapters VI–VIII and X. The subject of group theory in continued in Chapter VII, which assumes knowledge of the material on category theory. Chapters V and VI continue the development of linear algebra. Chapter VI uses categories, but Chapter V does not. Most of Chapter V concerns the analysis of a linear transformation carrying a ﬁnite-dimensional vector space over a ﬁeld into itself. The questions are to ﬁnd invariants of such transformations and to classify the transformations up to similarity. Section 2 at the start extends the theory of determinants so that the matrices are allowed to have entries in a commutative ring with identity; this extension is necessary in order to be able to work easily with characteristic polynomials. The extension of this theory is carried out by an important principle known as the “permanence of identities.” Chapter VI largely concerns bilinear forms and tensor products, again in the context that the coefﬁcients are from a ﬁeld. This material is necessary in many applications to geometry and physics, but it is not needed in Chapters VII–IX. Many objects in the chapter are constructed in such a way that they are uniquely determined by a universal mapping property. Problems 18–22 at the end of the chapter discuss universal mapping properties in the general context of category theory, and they show that a uniqueness theorem is automatic in all cases. Chapter VII continues the development of group theory, making use of category theory. It is in two parts. Sections 1–3 concern free groups and the topic of generators and relations; they are essential for abstract descriptions of groups and for work in topology involving fundamental groups. Section 3 constructs a notion of free product and shows that it is the coproduct functor for the category of groups. Sections 4–6 continue the theme of the interplay of group theory and

xxii

Guide for the Reader

linear algebra. Section 4 analyzes group representations of a ﬁnite group when the underlying ﬁeld is the complex numbers, and Section 5 applies this theory to obtain a conclusion about the structure of ﬁnite groups. Section 6 studies extensions of groups and uses them to motivate the subject of cohomology of groups. Chapter VIII introduces modules, giving many examples in Section 1, and then goes on to discuss questions of unique factorization in integral domains. Section 6 obtains a generalization for principal ideal domains of the Fundamental Theorem of Finitely Generated Abelian Groups, once again illustrating the ﬁrst theme—similarities between the integers and certain polynomial rings. Section 7 introduces the third theme, the relationship between number theory and geometry, as a more sophisticated version of the ﬁrst theme. The section compares a certain polynomial ring in two variables with a certain ring of algebraic integers that extends the ordinary integers. Unique factorization of elements fails for both, but the geometric setting has a more geometrically meaningful factorization in terms of ideals that is evidently unique. This kind of unique factorization turns out to work for the ring of algebraic integers as well. Sections 8–11 expand the examples in Section 7 into a theory of unique factorization of ideals in any integrally closed Noetherian domain whose nonzero prime ideals are all maximal. Chapter IX analyzes algebraic extensions of ﬁelds. The ﬁrst 13 sections make use only of Sections 1–6 in Chapter VIII. Sections 1–5 of Chapter IX give the foundational theory, which is sufﬁcient to exhibit all the ﬁnite ﬁelds and to prove that certain classically proposed constructions in Euclidean geometry are impossible. Sections 6–8 introduce Galois theory, but Theorem 9.28 and its three corollaries may be skipped if Sections 14–17 are to be omitted. Sections 9–11 give a ﬁrst round of applications of Galois theory: Gauss’s theorem about which regular n-gons are in principle constructible with straightedge and compass, the Fundamental Theorem of Algebra, and the Abel–Galois theorem that solvability of a polynomial equation with rational coefﬁcients in terms of radicals implies solvability of the Galois group. Sections 12–13 give a second round of applications: Gauss’s method in principle for actually constructing the constructible regular n-gons and a converse to the Abel–Galois theorem. Sections 14–17 make use of Sections 7–11 of Chapter VIII, proving that π is transcendental and obtaining two methods for computing Galois groups. Chapter X is a relatively short chapter developing further tools for dealing with modules over a ring with identity. The main construction is that of the tensor product over a ring of a unital right module and a unital left module, the result being an abelian group. The chapter makes use of material from Chapters VI and VIII, but not from Chapter IX.

Basic Algebra

CHAPTER I Preliminaries about the Integers, Polynomials, and Matrices

Abstract. This chapter is mostly a review, discussing unique factorization of positive integers, unique factorization of polynomials whose coefﬁcients are rational or real or complex, signs of permutations, and matrix algebra. Sections 1–2 concern unique factorization of positive integers. Section 1 proves the division and Euclidean algorithms, used to compute greatest common divisors. Section 2 establishes unique factorization as a consequence and gives several number-theoretic consequences, including the Chinese Remainder Theorem and the evaluation of the Euler ϕ function. Section 3 develops unique factorization of rational and real and complex polynomials in one indeterminate completely analogously, and it derives the complete factorization of complex polynomials from the Fundamental Theorem of Algebra. The proof of the fundamental theorem is postponed to Chapter IX. Section 4 discusses permutations of a ﬁnite set, establishing the decomposition of each permutation as a disjoint product of cycles. The sign of a permutation is introduced, and it is proved that the sign of a product is the product of the signs. Sections 5–6 concern matrix algebra. Section 5 reviews row reduction and its role in the solution of simultaneous linear equations. Section 6 deﬁnes the arithmetic operations of addition, scalar multiplication, and multiplication of matrices. The process of matrix inversion is related to the method of row reduction, and it is shown that a square matrix with a one-sided inverse automatically has a two-sided inverse that is computable via row reduction.

1. Division and Euclidean Algorithms The ﬁrst three sections give a careful proof of unique factorization for integers and for polynomials with rational or real or complex coefﬁcients, and they give an indication of some ﬁrst consequences of this factorization. For the moment let us restrict attention to the set Z of integers. We take addition, subtraction, and multiplication within Z as established, as well as the properties of the usual ordering in Z. A factor of an integer n is a nonzero integer k such that n = kl for some integer l. In this case we say also that k divides n, that k is a divisor of n, and that n is a multiple of k. We write k | n for this relationship. If n is nonzero, any product formula n = kl1 · · · lr is a factorization of n. A unit in Z is a divisor 1

I. Preliminaries about the Integers, Polynomials, and Matrices

2

of 1, hence is either +1 or −1. The factorization n = kl of n = 0 is called nontrivial if neither k nor l is a unit. An integer p > 1 is said to be prime if it has no nontrivial factorization p = kl. The statement of unique factorization for positive integers, which will be given precisely in Section 2, says roughly that each positive integer is the product of primes and that this decomposition is unique apart from the order of the factors.1 Existence will follow by an easy induction. The difﬁculty is in the uniqueness. We shall prove uniqueness by a sequence of steps based on the “Euclidean algorithm,” which we discuss in a moment. In turn, the Euclidean algorithm relies on the following. Proposition 1.1 (division algorithm). If a and b are integers with b = 0, then there exist unique integers q and r such that a = bq + r and 0 ≤ r < |b|. PROOF. Possibly replacing q by −q, we may assume that b > 0. The integers n with bn ≤ a are bounded above by |a|, and there exists such an n, namely n = −|a|. Therefore there is a largest such integer, say n = q. Set r = a − bq. Then 0 ≤ r and a = bq + r . If r ≥ b, then r − b ≥ 0 says that a = b(q + 1) + (r − b) ≥ b(q + 1). The inequality q + 1 > q contradicts the maximality of q, and we conclude that r < b. This proves existence. For uniqueness when b > 0, suppose a = bq1 + r1 = bq2 + r2 . Subtracting, we obtain b(q1 − q2 ) = r2 − r1 with |r2 − r1 | < b, and this is a contradiction unless r2 − r1 = 0. Let a and b be integers not both 0. The greatest common divisor of a and b is the largest integer d > 0 such that d | a and d | b. Let us see existence. The integer 1 divides a and b. If b, for example, is nonzero, then any such d has |d| ≤ |b|, and hence the greatest common divisor indeed exists. We write d = GCD(a, b). Let us suppose that b = 0. The Euclidean algorithm consists of iterated application of the division algorithm (Proposition 1.1) to a and b until the remainder term r disappears: a = bq1 + r1 ,

0 ≤ r1 < b,

b = r 1 q2 + r 2 ,

0 ≤ r2 < r1 ,

r 1 = r 2 q3 + r 3 , .. .

0 ≤ r3 < r2 ,

rn−2 = rn−1 qn + rn ,

0 ≤ rn < rn−1

(with rn = 0, say),

rn−1 = rn qn+1 . 1 It

is to be understood that the prime factorization of 1 is as the empty product.

1. Division and Euclidean Algorithms

3

The process must stop with some remainder term rn+1 equal to 0 in this way since b > r1 > r2 > · · · ≥ 0. The last nonzero remainder term, namely rn above, will be of interest to us. EXAMPLE. For a = 13 and b = 5, the steps read 13 = 5 · 2 + 3, 5 = 3 · 1 + 2, 3=2·1+ 1 , 2 = 1 · 2. The last nonzero remainder term is written with a box around it. Proposition 1.2. Let a and b be integers with b = 0, and let d = GCD(a, b). Then (a) the number rn in the Euclidean algorithm is exactly d, (b) any divisor d of both a and b necessarily divides d, (c) there exist integers x and y such that ax + by = d. EXAMPLE, CONTINUED. We rewrite the steps of the Euclidean algorithm, as applied in the above example with a = 13 and b = 5, so as to yield successive substitutions: 13 = 5 · 2 + 3,

3 = 13 − 5 · 2,

5 = 3 · 1 + 2,

2 = 5 − 3 · 1 = 5 − (13 − 5 · 2) · 1 = 5 · 3 − 13 · 1,

3=2·1+ 1 ,

1 = 3 − 2 · 1 = (13 − 5 · 2) − (5 · 3 − 13 · 1) · 1 = 13 · 2 − 5 · 5.

Thus we see that 1 = 13x + 5y with x = 2 and y = −5. This shows for the example that the number rn works in place of d in Proposition 1.2c, and the rest of the proof of the proposition for this example is quite easy. Let us now adjust this computation to obtain a complete proof of the proposition in general. PROOF OF PROPOSITION 1.2. Put r0 = b and r−1 = a, so that rk−2 = rk−1 qk + rk

for 1 ≤ k ≤ n.

(∗)

The argument proceeds in three steps. Step 1. We show that rn is a divisor of both a and b. In fact, from rn−1 = rn qn+1 , we have rn | rn−1 . Let k ≤ n, and assume inductively that rn divides

4

I. Preliminaries about the Integers, Polynomials, and Matrices

rk−1 , . . . , rn−1 , rn . Then (∗) shows that rn divides rk−2 . Induction allows us to conclude that rn divides r−1 , r0 , . . . , rn−1 . In particular, rn divides a and b. Step 2. We prove that ax + by = rn for suitable integers x and y. In fact, we show by induction on k for k ≤ n that there exist integers x and y with ax + by = rk . For k = −1 and k = 0, this conclusion is trivial. If k ≥ 1 is given and if the result is known for k − 2 and k − 1, then we have ax2 + by2 = rk−2 , ax1 + by1 = rk−1

(∗∗)

for suitable integers x2 , y2 , x1 , y1 . We multiply the second of the equalities of (∗∗) by qk , subtract, and substitute into (∗). The result is rk = rk−2 − rk−1 qk = a(x2 − qk x1 ) + b(y2 − qk y1 ), and the induction is complete. Thus ax + by = rn for suitable x and y. Step 3. Finally we deduce (a), (b), and (c). Step 1 shows that rn divides a and b. If d > 0 divides both a and b, the result of Step 2 shows that d | rn . Thus d ≤ rn , and rn is the greatest common divisor. This is the conclusion of (a); (b) follows from (a) since d | rn , and (c) follows from (a) and Step 2. Corollary 1.3. Within Z, if c is a nonzero integer that divides a product mn and if GCD(c, m) = 1, then c divides n. PROOF. Proposition 1.2c produces integers x and y with cx + my = 1. Multiplying by n, we obtain cnx + mny = n. Since c divides mn and divides itself, c divides both terms on the left side. Therefore it divides the right side, which is n. Corollary 1.4. Within Z, if a and b are nonzero integers with GCD(a, b) = 1 and if both of them divide the integer m, then ab divides m. PROOF. Proposition 1.2c produces integers x and y with ax + by = 1. Multiplying by m, we obtain amx + bmy = m, which we rewrite in integers as ab(m/b)x + ab(m/a)y = m. Since ab divides each term on the left side, it divides the right side, which is m.

2. Unique Factorization of Integers We come now to the theorem asserting unique factorization for the integers. The precise statement is as follows.

2. Unique Factorization of Integers

5

Theorem 1.5 (Fundamental Theorem of Arithmetic). Each positive integer n can be written as a product of primes, n = p1 p2 · · · pr , with the integer 1 being written as an empty product. This factorization is unique in the following sense: if n = q1 q2 · · · qs is another such factorization, then r = s and, after some reordering of the factors, q j = p j for 1 ≤ j ≤ r . The main step is the following lemma, which relies on Corollary 1.3. Lemma 1.6. Within Z, if p is a prime and p divides a product ab, then p divides a or p divides b. PROOF. Suppose that p does not divide a. Since p is prime, GCD(a, p) = 1. Taking m = a, n = b, and c = p in Corollary 1.3, we see that p divides b. PROOF OF EXISTENCE IN THEOREM 1.5. We induct on n, the case n = 1 being handled by an empty product expansion. If the result holds for k = 1 through k = n − 1, there are two cases: n is prime and n is not prime. If n is prime, then n = n is the desired factorization. Otherwise we can write n = ab nontrivially with a > 1 and b > 1. Then a ≤ n − 1 and b ≤ n − 1, so that a and b have factorizations into primes by the inductive hypothesis. Putting them together yields a factorization into primes for n = ab. PROOF OF UNIQUENESS IN THEOREM 1.5. Suppose that n = p1 p2 · · · pr = q1 q2 · · · qs with all factors prime and with r ≤ s. We prove the uniqueness by induction on r , the case r = 0 being trivial and the case r = 1 following from the deﬁnition of “prime.” Inductively from Lemma 1.6 we have pr | qk for some k. Since qk is prime, pr = qk . Thus we can cancel and obtain p1 p2 · · · pr −1 = q1 q2 · · · qk · · · qs , the hat indicating an omitted factor. By induction the factors on the two sides here are the same except for order. Thus the same conclusion is valid when comparing the two sides of the equality p1 p2 · · · pr = q1 q2 · · · qs . The induction is complete, and the desired uniqueness follows. In the product expansion of Theorem 1.5, it is customary to group factors that are equal, thus writing the positive integer n as n = p1k1 · · · prkr with the primes p j distinct and with the integers k j all ≥ 0. This kind of decomposition is unique k up to order if all factors p j j with k j = 0 are dropped, and we call it a prime factorization of n. Corollary 1.7. If n = p1k1 · · · prkr is a prime factorization of a positive integer n, then the positive divisors d of n are exactly all products d = p1l1 · · · prlr with 0 ≤ l j ≤ k j for all j. REMARK. A general divisor of n within Z is the product of a unit ±1 and a positive divisor.

6

I. Preliminaries about the Integers, Polynomials, and Matrices

PROOF. Certainly any such product divides n. Conversely if d divides n, write n = d x for some positive integer x. Apply Theorem 1.5 to d and to x, form the resulting prime factorizations, and multiply them together. Then we see from the uniqueness for the prime factorization of n that the only primes that can occur in the expansions of d and x are p1 , . . . , pr and that the sum of the exponents of p j in the expansions of d and x is k j . The result follows. If we want to compare prime factorizations for two positive integers, we can insert 0th powers of primes as necessary and thereby assume that the same primes appear in both expansions. Using this device, we obtain a formula for greatest common divisors. Corollary 1.8. If two positive integers a and b have expansions as products of powers of r distinct primes given by a = p1k1 · · · prkr and b = p1l1 · · · prlr , then GCD(a, b) = p1min(k1 ,l1 ) · · · prmin(kr ,lr ) . PROOF. Let d be the right side of the displayed equation. It is plain that d is positive and that d divides a and b. On the other hand, two applications of Corollary 1.7 show that the greatest common divisor of a and b is a number d of the form p1m 1 · · · prm r with the property that m j ≤ k j and m j ≤ l j for all j. Therefore m j ≤ min(k j , l j ) for all j, and d ≤ d . Since any positive divisor of both a and b is ≤ d, we have d ≤ d. Thus d = d. In special cases Corollary 1.8 provides a useful way to compute GCD(a, b), but the Euclidean algorithm is usually a more efﬁcient procedure. Nevertheless, Corollary 1.8 remains a handy tool for theoretical purposes. Here is an example: Two nonzero integers a and b are said to be relatively prime if GCD(a, b) = 1. It is immediate from Corollary 1.8 that two nonzero integers a and b are relatively prime if and only if there is no prime p that divides both a and b. Corollary 1.9 (Chinese Remainder Theorem). Let a and b be positive relatively prime integers. To each pair (r, s) of integers with 0 ≤ r < a and 0 ≤ s < b corresponds a unique integer n such that 0 ≤ n < ab, a divides n − r , and b divides n − s. Moreover, every integer n with 0 ≤ n < ab arises from some such pair (r, s). REMARK. In notation for congruences that we introduce formally in Chapter IV, the result says that if GCD(a, b) = 1, then the congruences n ≡ r mod a and n ≡ s mod b have one and only one simultaneous solution n with 0 ≤ n < ab.

2. Unique Factorization of Integers

7

PROOF. Let us see that n exists as asserted. Since a and b are relatively prime, Proposition 1.2c produces integers x and y such that ax − by = 1. Multiplying by s − r , we obtain ax − by = s − r for suitable integers x and y. Put n = ax + r = by + s, and write by the division algorithm (Proposition 1.1) n = abq + n for some integer q and for some integer n with 0 ≤ n < ab. Then n − r = n − abq − r = ax − abq is divisible by a, and similarly n − s is divisible by b. Suppose that n and n both have the asserted properties. Then a divides n − n = (n − r ) − (n − r ), and b divides n − n = (n − s) − (n − s). Since a and b are relatively prime, Corollary 1.4 shows that ab divides n − n . But |n − n | < ab, and the only integer N with |N | < ab that is divisible by ab is N = 0. Thus n − n = 0 and n = n . This proves uniqueness. Finally the argument just given deﬁnes a one-one function from a set of ab pairs (r, s) to a set of ab elements n. Its image must therefore be all such integers n. This proves the corollary. If n is a positive integer, we deﬁne ϕ(n) to be the number of integers k with 0 ≤ k < n such that k and n are relatively prime. The function ϕ is called the Euler ϕ function. Corollary 1.10. Let N > 1 be an integer, and let N = p1k1 · · · prkr be a prime factorization of N . Then ϕ(N ) =

r

k −1

pj j

( p j − 1).

j=1

REMARK. The conclusion is valid also for N = 1 if we interpret the right side of the formula to be the empty product. PROOF. For positive integers a and b, let us check that ϕ(ab) = ϕ(a)ϕ(b)

if

GCD(a, b) = 1.

(∗)

In view of Corollary 1.9, it is enough to prove that the mapping (r, s) → n given in that corollary has the property that GCD(r, a) = GCD(s, b) = 1 if and only if GCD(n, ab) = 1. To see this property, suppose that n satisﬁes 0 ≤ n < ab and GCD(n, ab) > 1. Choose a prime p dividing both n and ab. By Lemma 1.6, p divides a or p divides b. By symmetry we may assume that p divides a. If (r, s) is the pair corresponding to n under Corollary 1.9, then the corollary says that a divides n − r . Since p divides a, p divides n − r . Since p divides n, p divides r . Thus GCD(r, a) > 1. Conversely suppose that (r, s) is a pair with 0 ≤ r < a and 0 ≤ s < b such that GCD(r, a) = GCD(s, b) = 1 is false. Without loss of generality, we may

8

I. Preliminaries about the Integers, Polynomials, and Matrices

assume that GCD(r, a) > 1. Choose a prime p dividing both r and a. If n is the integer with 0 ≤ n < ab that corresponds to (r, s) under Corollary 1.9, then the corollary says that a divides n − r . Since p divides a, p divides n − r . Since p divides r , p divides n. Thus GCD(n, ab) > 1. This completes the proof of (∗). For a power p k of a prime p with k > 0, the integers n with 0 ≤ n < p k such that GCD(n, p k ) > 1 are the multiples of p, namely 0, p, 2 p, . . . , p k − p. There are p k−1 of them. Thus the number of integers n with 0 ≤ n < p k such that GCD(n, p k ) = 1 is p k − p k−1 = p k−1 ( p − 1). In other words, ϕ( p k ) = p k−1 ( p − 1)

if p is prime and k ≥ 1.

(∗∗)

To prove the corollary, we induct on r , the case r = 1 being handled by (∗∗). If the formula of the corollary is valid for r − 1, then (∗) allows us to combine that result with the formula for ϕ( p kr ) given in (∗∗) to obtain the formula for ϕ(N ). We conclude this section by extending the notion of greatest common divisor to apply to more than two integers. If a1 , . . . , at are integers not all 0, their greatest common divisor is the largest integer d > 0 that divides all of a1 , . . . , at . This exists, and we write d = GCD(a1 , . . . , at ) for it. It is immediate that d equals the greatest common divisor of the nonzero members of the set {a1 , . . . , at }. Thus, in deriving properties of greatest common divisors, we may assume that all the integers are nonzero. Corollary 1.11. Let a1 , . . . , at be positive integers, and let d be their greatest common divisor. Then k k (a) if for each j with 1 ≤ j ≤ t, a j = p11, j · · · pr r, j is an expansion of a j as a product of powers of r distinct primes p1 , . . . , pr , it follows that min1≤ j≤r {k1, j }

d = p1

min1≤ j≤r {kr, j }

· · · pr

,

divides d, (b) any divisor d of all of a1 , . . . , at necessarily (c) d = GCD GCD(a1 , . . . , at−1 ), at if t > 1, (d) there exist integers x1 , . . . , xt such that a1 x1 + · · · + at xt = d. PROOF. Part (a) is proved in the same way as Corollary 1.8 except that Corollary 1.7 is to be applied r times rather than just twice. Further application of Corollary 1.7 shows that any positive divisor d of a1 , . . . , at is of the form d = p1m 1 · · · prm r with m 1 ≤ k1, j for all j, . . . , and with m r ≤ kr, j for all j. Therefore m 1 ≤ min1≤ j≤r {k1, j }, . . . , and m r ≤ min1≤ j≤r {kr, j }, and it follows that d divides d. This proves (b). Conclusion (c) follows by using the formula in (a), and (d) follows by combining (c), Proposition 1.2c, and induction.

3. Unique Factorization of Polynomials

9

3. Unique Factorization of Polynomials This section establishes unique factorization for ordinary rational, real, and complex polynomials. We write Q for the set of rational numbers, R for the set of real numbers, and C for the set of complex numbers, each with its arithmetic operations. The rational numbers are constructed from the integers by a process reviewed in Section A3 of the appendix, the real numbers are deﬁned from the rational numbers by a process reviewed in that same section, and the complex numbers are deﬁned from the real numbers by a process reviewed in Section A4 of the appendix. Sections A3 and A4 of the appendix mention special properties of R and C beyond those of the arithmetic operations, but we shall not make serious use of these special properties here until nearly the end of the section— after unique factorization of polynomials has been established. Let F denote any of Q, R, or C. The members of F are called scalars. We work with ordinary polynomials with coefﬁcients in F. Informally these are expressions P(X ) = an X n +· · ·+a1 X +a0 with an , . . . , a1 , a0 in F. Although it is tempting to think of P(X ) as a function with independent variable X , it is better to identify P with the sequence (a0 , a1 , . . . , an , 0, 0, . . . ) of coefﬁcients, using expressions P(X ) = an X n + · · · + a1 X + a0 only for conciseness and for motivation of the deﬁnitions of various operations. The precise deﬁnition therefore is that a polynomial in one indeterminate with coefﬁcients in F is an inﬁnite sequence of members of F such that all terms of the sequence are 0 from some point on. The indexing of the sequence is to begin with 0. We may refer to a polynomial P as P(X ) if we want to emphasize that the indeterminate is called X . Addition, subtraction, and scalar multiplication are deﬁned in coordinate-by-coordinate fashion: (a0 , a1 , . . . , an , 0, 0, . . . ) + (b0 ,b1 , . . . , bn , 0, 0, . . . ) = (a0 + b0 , a1 + b1 , . . . , an + bn , 0, 0, . . . ), (a0 , a1 , . . . , an , 0, 0, . . . ) − (b0 ,b1 , . . . , bn , 0, 0, . . . ) = (a0 − b0 , a1 − b1 , . . . , an − bn , 0, 0, . . . ), c(a0 , a1 , . . . , an , 0, 0, . . . ) = (ca0 , ca1 , . . . , can , 0, 0, . . . ). Polynomial multiplication is deﬁned so as to match multiplication of expressions an X n + · · · + a1 X + a0 if the product is expanded out, powers of X are added, and then terms containing like powers of X are collected: (a0 , a1 , . . . , 0, 0, . . . )(b0 , b1 , . . . , 0, 0, . . . ) = (c0 , c1 , . . . , 0, 0, . . . ), N where c N = k=0 ak b N −k . We take it as known that the usual associative, commutative, and distributive laws are then valid. The set of all polynomials in the indeterminate X is denoted by F[X ].

10

I. Preliminaries about the Integers, Polynomials, and Matrices

The polynomial with all entries 0 is denoted by 0 and is called the zero polynomial. For all polynomials P = (a0 , . . . , an , 0, . . . ) other than 0, the degree of P, denoted by deg P, is deﬁned to be the largest index n such that an = 0. The constant polynomials are by deﬁnition the zero polynomial and the polynomials of degree 0. If P and Q are nonzero polynomials, then P+Q=0

or

deg(P + Q) ≤ max(deg P, deg Q), deg(c P) = deg P,

deg(P Q) = deg P + deg Q. In the formula for deg(P + Q), equality holds if deg P = deg Q. Implicit in the formula for deg(P Q) is the fact that P Q cannot be 0 unless P = 0 or Q = 0. A cancellation law for multiplication is an immediate consequence: P R = Q R with R = 0

implies

P = Q.

In fact, P R = Q R implies (P − Q)R = 0; since R = 0, P − Q must be 0. If P = (a0 , . . . , an , 0, . . . ) is a polynomial and r is in F, we can evaluate P at r , obtaining as a result the number P(r ) = an r n + · · · + a1r + a0 . Taking into account all values of r , we obtain a mapping P → P( · ) of F[X ] into the set of functions from F into F. Because of the way that the arithmetic operations on polynomials have been deﬁned, we have (P + Q)(r ) = P(r ) + Q(r ), (P − Q)(r ) = P(r ) − Q(r ), (c P)(r ) = c P(r ), (P Q)(r ) = P(r )Q(r ). In other words, the mapping P → P( · ) respects the arithmetic operations. We say that r is a root of P if P(r ) = 0. Now we turn to the question of unique factorization. The deﬁnitions and the proof are completely analogous to those for the integers. A factor of a polynomial A is a nonzero polynomial B such that A = B Q for some polynomial Q. In this case we say also that B divides A, that B is a divisor of A, and that A is a multiple of B. We write B | A for this relationship. If A is nonzero, any product formula A = B Q 1 · · · Q r is a factorization of A. A unit in F[X ] is a divisor of 1, hence is any polynomial of degree 0; such a polynomial is a constant polynomial A(X ) = c with c equal to a nonzero scalar. The factorization A = B Q of A = 0 is called nontrivial if neither B nor Q is a unit. A prime P in F[X ] is a nonzero polynomial that is not a unit and has no nontrivial factorization P = B Q. Observe that the product of a prime and a unit is always a prime.

3. Unique Factorization of Polynomials

11

Proposition 1.12 (division algorithm). If A and B are polynomials in F[X ] and if B not the 0 polynomial, then there exist unique polynomials Q and R in F[X ] such that (a) A = B Q + R and (b) either R is the 0 polynomial or deg R < deg B. REMARK. This result codiﬁes the usual method of dividing polynomials in high-school algebra. That method writes A/B = Q + R/B, and then one obtains the above result by multiplying by B. The polynomial Q is the quotient in the division, and R is the remainder. PROOF OF UNIQUENESS. If A = B Q + R = B Q 1 + R1 , then B(Q − Q 1 ) = R1 − R. Without loss of generality, R1 − R is not the 0 polynomial since otherwise Q − Q 1 = 0 also. Then deg B + deg(Q − Q 1 ) = deg(R1 − R) ≤ max(deg R, deg R1 ) < deg B, and we have a contradiction.

PROOF OF EXISTENCE. If A = 0 or deg A < deg B, we take Q = 0 and R = A, and we are done. Otherwise we induct on deg A. Assume the result for degree ≤ n − 1, and let deg A = n. Write A = an X n + A1 with A1 = 0 or deg A1 < deg A. Let B = bk X k + B1 with B1 = 0 or deg B1 < deg B. Put Q 1 = an bk−1 X n−k . Then A − B Q 1 = an X n + A1 − an X n − an bk−1 X n−k B1 = A1 − an bk−1 X n−k B1 with the right side equal to 0 or of degree < deg A. Then the right side, by induction, is of the form B Q 2 + R, and A = B(Q 1 + Q 2 ) + R is the required decomposition. Corollary 1.13 (Factor Theorem). If r is in F and if P is a polynomial in F[X ], then X − r divides P if and only if P(r ) = 0. PROOF. If P = (X − r )Q, then P(r ) = (r − r )Q(r ) = 0. Conversely let P(r ) = 0. Taking B(X ) = X − r in the division algorithm (Proposition 1.12), we obtain P = (X − r )Q + R with R = 0 or deg R < deg(X − r ) = 1. Thus R is a constant polynomial, possibly 0. In any case we have 0 = P(r ) = (r − r )Q(r ) + R(r ), and thus R(r ) = 0. Since R is constant, we must have R = 0, and then P = (X − r )Q. Corollary 1.14. If P is a nonzero polynomial with coefﬁcients in F and if deg P = n, then P has at most n distinct roots.

12

I. Preliminaries about the Integers, Polynomials, and Matrices

REMARKS. Since there are inﬁnitely many scalars in any of Q and R and C, the corollary implies that the function from F to F associated to P, namely r → P(r ), cannot be identically 0 if P = 0. Starting in Chapter IV, we shall allow other F’s besides Q and R and C, and then this implication can fail. For example, when F is the two-element “ﬁeld” F = {0, 1} with 1 + 1 = 0 and with otherwise the expected addition and multiplication, then P(X ) = X 2 + X is not the zero polynomial but P(r ) = 0 for r = 0 and r = 1. It is thus important to distinguish polynomials in one indeterminate from their associated functions of one variable. PROOF. Let r1 , . . . , rn+1 be distinct roots of P(X ). By the Factor Theorem (Corollary 1.13), X − r1 is a factor of P(X ). We prove inductively on k that the product (X − r1 )(X − r2 ) · · · (X − rk ) is a factor of P(X ). Assume that this assertion holds for k, so that P(X ) = (X − r1 ) · · · (X − rk )Q(X ) and 0 = P(rk+1 ) = (rk+1 − r1 ) · · · (rk+1 − rk )Q(rk+1 ). Since the r j ’s are distinct, we must have Q(rk+1 ) = 0. By the Factor Theorem, we can write Q(X ) = (X − rk+1 )R(X ) for some polynomial R(X ). Substitution gives P(X ) = (X − r1 ) · · · (X − rk )(X − rk+1 )R(X ), and (X − r1 ) · · · (X − rk+1 ) is exhibited as a factor of P(X ). This completes the induction. Consequently P(X ) = (X − r1 ) · · · (X − rn+1 )S(X ) for some polynomial S(X ). Comparing the degrees of the two sides, we ﬁnd that deg S = −1, and we have a contradiction. We can use the division algorithm in the same way as with the integers in Sections 1–2 to obtain unique factorization. Within the set of integers, we deﬁned greatest common divisors so as to be positive, but their negatives would have worked equally well. That ﬂexibility persists with polynomials; the essential feature of any greatest common divisor of polynomials is shared by any product of that polynomial by a unit. A greatest common divisor of polynomials A and B with B = 0 is any polynomial D of maximum degree such that D divides A and D divides B. We shall see that D is indeed unique up to multiplication by a nonzero scalar.2 2 For some purposes it is helpful to isolate one particular greatest common divisor by taking the coefﬁcient of the highest power of X to be 1.

3. Unique Factorization of Polynomials

13

The Euclidean algorithm is the iterative process that makes use of the division algorithm in the form A = B Q 1 + R1 ,

R1 = 0 or deg R1 < deg B,

B = R1 Q 2 + R2 ,

R2 = 0 or deg R2 < deg R1 ,

R1 = R2 Q 3 + R3 , .. .

R3 = 0 or deg R3 < deg R2 ,

Rn−2 = Rn−1 Q n + Rn ,

Rn = 0 or deg Rn < deg Rn−1 ,

Rn−1 = Rn Q n+1 . In the above computation the integer n is deﬁned by the conditions that Rn = 0 and that Rn+1 = 0. Such an n must exist since deg B > deg R1 > · · · ≥ 0. We can now obtain an analog for F[X ] of the result for Z given as Proposition 1.2. Proposition 1.15. Let A and B be polynomials in F[X ] with B = 0, and let R1 , . . . , Rn be the remainders generated by the Euclidean algorithm when applied to A and B. Then (a) Rn is a greatest common divisor of A and B, (b) any D1 that divides both A and B necessarily divides Rn , (c) the greatest common divisor of A and B is unique up to multiplication by a nonzero scalar, (d) any greatest common divisor D has the property that there exist polynomials P and Q with A P + B Q = D. PROOF. Conclusions (a) and (b) are proved in the same way that parts (a) and (b) of Proposition 1.2 are proved, and conclusion (d) is proved with D = Rn in the same way that Proposition 1.2c is proved. If D is a greatest common divisor of A and B, it follows from (a) and (b) that D divides Rn and that deg D = deg Rn . This proves (c). Using Proposition 1.15, we can prove analogs for F[X ] of the two corollaries of Proposition 1.2. But let us instead skip directly to what is needed to obtain an analog for F[X ] of unique factorization as in Theorem 1.5. Lemma 1.16. If A and B are nonzero polynomials with coefﬁcients in F and if P is a prime polynomial such that P divides AB, then P divides A or P divides B. PROOF. If P does not divide A, then 1 is a greatest common divisor of A and P, and Proposition 1.15d produces polynomials S and T such that AS + P T = 1. Multiplication by B gives AB S + P T B = B. Then P divides AB S because it divides AB, and P divides P T B because it divides P. Hence P divides B.

14

I. Preliminaries about the Integers, Polynomials, and Matrices

Theorem 1.17 (unique factorization). Every member of F[X ] of degree ≥ 1 is a product of primes. This factorization is unique up to order and up to multiplication of each prime factor by a unit, i.e., by a nonzero scalar. PROOF. The existence follows in the same way as the existence in Theorem 1.5; induction on the integers is to be replaced by induction on the degree. The uniqueness follows from Lemma 1.16 in the same way that the uniqueness in Theorem 1.5 follows from Lemma 1.6. We turn to a consideration of properties of polynomials that take into account special features of R and C. If F is R, then X 2 + 1 is prime. The reason is that a nontrivial factorization of X 2 + 1 would have to involve two ﬁrst-degree real polynomials and then r 2 +1 would have to be 0 for some real r , namely for r equal to the root of either of the ﬁrst-degree polynomials. On the other hand, X 2 + 1 is not prime when F = C since X 2 + 1 = (X + i)(X − i). The Fundamental Theorem of Algebra, stated below, implies that every prime polynomial over C is of degree 1. It is possible to prove the Fundamental Theorem of Algebra within complex analysis as a consequence of Liouville’s Theorem or within real analysis as a consequence of the Heine–Borel Theorem and other facts about compactness. This text gives a proof of the Fundamental Theorem of Algebra in Chapter IX using modern algebra, speciﬁcally Sylow theory as in Chapter IV and Galois theory as in Chapter IX. One further fact is needed; this fact uses elementary calculus and is proved below as Proposition 1.20. Theorem 1.18 (Fundamental Theorem of Algebra). Any polynomial in C[X ] with degree ≥ 1 has at least one root. Corollary 1.19. Let P be a nonzero polynomial of degree n in C[X ], and let r1 , . . . , rk be the distinct roots. Then there exist unique integers m j > 0 for 1 ≤ j ≤ k such that P(X ) is a scalar multiple of kj=1 (X − r j )m j . The numbers m j have kj=1 m j = n. PROOF. We may assume that deg P > 0. We apply unique factorization (Theorem 1.17) to P(X ). It follows from the Fundamental Theorem of Algebra (Theorem 1.18) and the Factor Theorem (Corollary 1.13) that each prime polynomial with coefﬁcients in C has degree 1. Thus the unique factorization of P(X ) n (X − zl ) for some c = 0 and for some complex has to be of the form c l=1 numbers zl that are unique up to order. The zl ’s are roots, and every root is a zl by the Factor Theorem. Grouping like factors proves the desired factorization and its uniqueness. The numbers m j have kj=1 m j = n by a count of degrees. The integers m j in the corollary are called the multiplicities of the roots of the polynomial P(X ).

4. Permutations and Their Signs

15

We conclude this section by proving the result from calculus that will enter the proof of the Fundamental Theorem of Algebra in Chapter IX. Proposition 1.20. Any polynomial in R[X ] with odd degree has at least one root. PROOF. Without loss of generality, we may take the leading coefﬁcient to be 1. Thus let the polynomial be P(X ) = X 2n+1 + a2n X 2n + · · · + a1 X + a0 = X 2n+1 + R(X ). For |r | ≥ 1, the polynomial R satisﬁes |R(r )| ≤ C|r |2n , where C = |a2n | + · · · + |a1 | + |a0 |. Thus |r | > max(C, 1) implies |P(r ) − r 2n+1 | ≤ C|r |2n < |r |2n+1 , and it follows that P(r ) has the same sign as r 2n+1 for |r | > max(C, 1). For r0 = max(C, 1) + 1, we therefore have P(−r0 ) < 0 and P(r0 ) > 0. By the Intermediate Value Theorem, given in Section A3 of the appendix, P(r ) = 0 for some r with −r0 ≤ r ≤ r0 .

4. Permutations and Their Signs Let S be a ﬁnite nonempty set of n elements. A permutation of S is a one-one function from S onto S. The elements might be listed as a1 , a2 , . . . , an , but it will simplify the notation to view them simply as 1, 2, . . . , n. We use ordinary function notation for describing the effect of permutations. Thus the value of a permutation σ at j is σ ( j), and the composition of τ followed by σ is σ ◦ τ or simply σ τ , with (σ τ )( j) = σ (τ ( j)). Composition is automatically associative, i.e., (ρσ )τ = ρ(σ τ ), because the effect of both sides on j, when we expand things out, is ρ(σ (τ ( j))). The composition of two permutations is also called their product. The identity permutation will be denoted by 1. Any permutation σ , being a one-one onto function, has a well-deﬁned inverse permutation σ −1 with the property that σ σ −1 = σ −1 σ = 1. One way of describing concisely the effect of a permutation is to list its domain

valuesand to put the corresponding range 12345 values beneath them. Thus σ = is the permutation of {1, 2, 3, 4, 5} 43512 with σ (1) = 4, σ (2) = 3, σ (3) = 5, σ (4) = 1, and σ (5) = 2. The inverse 43512 permutation is obtained by interchanging the two rows to obtain and 12345 then adjusting theentries in the rows so that the ﬁrst row is in the usual order:

1 2 3 4 5 . σ −1 = 45213 If 2 ≤ k ≤ n, a k-cycle is a permutation σ that ﬁxes each element in some subset of n − k elements and moves the remaining elements c1 , . . . , ck according to σ (c1 ) = c2 , σ (c2 ) = c3 , . . . , σ (ck−1 ) = ck , σ (ck ) = c1 . Such a cycle may be

16

I. Preliminaries about the Integers, Polynomials, and Matrices

denoted by (c1 c2 · · · ck−1 ck ) to stress its structure. For example take n = 5; 12345 then σ = (2 3 5) is the 3-cycle given in our earlier notation by . 13542 The cycle (2 3 5) is the same as the cycle (3 5 2) and the cycle (5 2 3). It is sometimes helpful to speak of the identity permutation 1 as the unique 1-cycle. A system of cycles is said to be disjoint if the sets that each of them moves are disjoint in pairs. Thus (2 3 5) and (1 4) are disjoint, but (2 3 5) and (1 3) are not. Any two disjoint cycles σ and τ commute in the sense that σ τ = τ σ . Proposition 1.21. Any permutation σ of {1, 2, . . . , n} is a product of disjoint cycles. The individual cycles in the decomposition are unique in the sense of being determined by σ .

12345 = (2 3 5)(1 4). EXAMPLE. 43512 PROOF. Let us prove existence. Working with {1, 2, . . . , n}, we show that any σ is the disjoint product of cycles in such a way that no cycle moves an element j unless σ moves j. We do so for all σ simultaneously by induction downward on the number of elements ﬁxed by σ . The starting case of the induction is that σ ﬁxes all n elements. Then σ is the identity, and we are regarding the identity as a 1-cycle. For the inductive step suppose σ ﬁxes the elements in a subset T of r elements of {1, 2, . . . , n} with r < n. Let j be an element not in T , so that σ ( j) = j. Choose k as small as possible so that some element is repeated among j, σ ( j), σ 2 ( j), . . . , σ k ( j). This condition means that σ l ( j) = σ k ( j) for some l with 0 ≤ l < k. Then σ k−l ( j) = j, and we obtain a contradiction to the minimality of k unless k − l = k, i.e., l = 0. In other words, we have σ k ( j) = j. We may thus form the k-cycle γ = ( j σ ( j) σ 2 ( j) σ k−1 ( j)). The permutation γ −1 σ then ﬁxes the r + k elements of T ∪ U , where U is the set of elements j, σ ( j), σ 2 ( j), . . . , σ k−1 ( j). By the inductive hypothesis, γ −1 σ is the product τ1 · · · τ p of disjoint cycles that move only elements not in T ∪ U . Since γ moves only the elements in U , γ is disjoint from each of τ1 , . . . , τ p . Therefore σ = γ τ1 · · · τ p provides the required decomposition of σ . For uniqueness we observe from the proof of existence that each element j generates a k-cycle C j for some k ≥ 1 depending on j. If we have two decompositions as in the proposition, then the cycle within each decomposition that contains j must be C j . Hence the cycles in the two decompositions must match. A 2-cycle is often called a transposition. The proposition allows us to see quickly that any permutation is a product of transpositions.

4. Permutations and Their Signs

17

Corollary 1.22. Any k-cycle σ permuting {1, 2, . . . , n} is a product of k − 1 transpositions if k > 1. Therefore any permutation σ of {1, 2, . . . , n} is a product of transpositions. PROOF. For the ﬁrst statement, we observe that (c1 c2 · · · ck−1 ck ) = (c1 ck )(c1 ck−1 ) · · · (c1 c3 )(c1 c2 ). The second statement follows by combining this fact with Proposition 1.21. Our ﬁnal tasks for this section are to attach a sign to each permutation and to examine the properties of these signs. We begin with the special case that our underlying set S is {1, . . . , n}. If σ is a permutation of {1, . . . , n}, consider the numerical products |σ (k) − σ ( j)| and (σ (k) − σ ( j)). 1≤ ji

rnn−1

PROOF. We show that the determinant is ⎛ 1 ⎜ r2 = (r j − r1 ) det ⎜ ⎝ ... j>1

··· ··· .. .

⎞ 1 rn ⎟ , .. ⎟ . ⎠

r2n−2 · · · rnn−2 and then the result follows by induction. In the given matrix, replace the n th row by the sum of it and −r1 times the (n − 1)st row, then the (n − 1)st row by the sum of it and −r1 times the (n − 2)nd row, and so on. The resulting determinant is ⎞ ⎛ 1 1 ··· 1 ··· rn − r1 r2 − r1 ⎟ ⎜0 ⎟ ⎜. .. .. . ⎟ ⎜ . . det ⎜ . . . . ⎟ ⎝ 0 r n−2 − r r n−3 · · · r n−2 − r r n−3 ⎠ 1 2 1 n n 2 0 r2n−1 − r1r2n−2 · · · rnn−1 − r1rnn−2 ⎞ ⎛ ··· rn − r1 r2 − r1 .. .. .. ⎟ ⎜ by Proposition 2.36a . . . ⎟ = det ⎜ ⎝ r n−2 − r r n−3 · · · r n−2 − r r n−3 ⎠ applied with j = 1 1 2 1 n n 2 n−1 n−2 n−1 n−2 r2 − r1r2 · · · rn − r1rn ⎛ ⎞ 1 ··· 1 ⎜ r2 ··· rn ⎟ , = (r2 − r1 ) · · · (rn − r1 ) det ⎜ . .. ⎟ . ⎝ .. .. . ⎠

r2n−2 · · · rnn−2 the last step following by multilinearity of the determinant in the columns (as a consequence of Proposition 2.35 and multilinearity in the rows).

II. Vector Spaces over Q, R, and C

72

The classical adjoint of the square matrix A, denoted by Aadj , is the matrix with adj !ji with A !kl deﬁned as in the statement of Proposition entries Ai j = (−1)i+ j det A !kl is the matrix A with the k th row and l th column deleted. 2.36: A

adj d −b a b = . Thus we have In the 2-by-2 case, we have −c a c d A Aadj = Aadj A = (det A)I in the 2-by-2 case. Cramer’s rule for solving simultaneous linear equations results from the n-by-n generalization of this formula. Proposition 2.38 (Cramer’s rule). If A is an n-by-n matrix, then A Aadj = A A = (det A)I , and thus det A = 0 implies A−1 = (det A)−1 Aadj . Consequently if det A = 0, then the unique solution ⎛ of the⎞simultaneous ⎛ system ⎞ Ax = b b1 x1 . . of n equations in n unknowns, in which x = ⎝ .. ⎠ and b = ⎝ .. ⎠, has xn bn adj

xj =

det B j det A

with B j equal to the n-by-n matrix obtained from A by replacing the j th column of A by b. REMARKS. If we think of the calculation of the determinant of an n-by-n matrix as requiring about n 3 steps, then application of Cramer’s rule, at least if done in an unthinking fashion, suggests that solving an invertible system requires about n 3 (n + 1) steps, i.e., n + 1 determinants are involved in the explicit solution. Use of row reduction directly to solve the system is more efﬁcient than proceeding this way. Thus Cramer’s rule is more important for its theoretical applications than it is for making computations. One simple theoretical application is the observation that each entry of the inverse of a matrix is the quotient of a polynomial function of the entries divided by the determinant. PROOF. The (i, j)th entry of Aadj A is (Aadj A)i j =

n k=1

adj

Aik Ak j =

n

! (−1)i+k (det A ki )Ak j .

k=1

If i = j, then expansion in cofactors about the j th column (Proposition 2.36a) identiﬁes the right side as det A. If i = j, consider the matrix B obtained from A by replacing the i th column of A by the j th column. Then the i th and j th columns of B are equal, and hence det B = 0. Expanding det B in cofactors about the i th column (Proposition 2.36a), we obtain 0 = det B =

n k=1

! (−1)i+k (det B ki )Bki =

n k=1

! (−1)i+k (det A ki )Ak j .

8. Eigenvectors and Characteristic Polynomials

73

Thus A Aadj = (det A)I . A similar argument proves that Aadj A = (det A)I . For the application to Ax = b, we multiply both sides on the left by Aadj and obtain (det A)x = Aadj b. Hence (det A)x j =

n

(Aadj ) ji bi =

i=1

n

!i j , (−1)i+ j bi det A

i=1

and the right side equals det B j by expansion in cofactors of det B j about the j th column (Proposition 2.36a). 8. Eigenvectors and Characteristic Polynomials A vector v = 0 in Fn is an eigenvector of the n-by-n matrix A if Av = λv for some scalar λ. We call λ the eigenvalue associated with v. When λ is an eigenvalue, the vector space of all v with Av = λv, i.e., the set consisting of the eigenvectors and the 0 vector, is called the eigenspace for λ. If we think of A as giving a linear map L from Fn to itself, an eigenvector takes on geometric signiﬁcance as a vector mapped to a multiple of itself by L. Another geometric way of viewing matters is that the eigenvector yields a 1-dimensional subspace U = Fv that is invariant, or stable, under L in the sense of satisfying L(U ) ⊆ U . Proposition 2.39. An n-by-n matrix A has an eigenvector with eigenvalue λ if and only if det(λI − A) = 0. In this case the eigenspace for λ is the kernel of λI − A. PROOF. We have Av = λv if and only if (λI − A)v = 0, if and only if v is in ker(λI − A). This kernel is nonzero if and only if det(λI − A) = 0. With A ﬁxed, the expression det(λI − A) is a polynomial in λ of degree n and is called the characteristic polynomial8 of A. To see that it is at least a polynomial function of λ, let us expand det(λI − A) as ⎞ ⎛ λ − A11 −A12 ··· −A1n ⎜ −A21 λ − A22 · · · −A2n ⎟ ⎟ det ⎜ .. .. .. .. ⎠ ⎝ . . . . =

σ

−An1

−An2

···

λ − Ann

(sgn σ )term1,σ (1) · · · termn,σ (n) .

8 Some authors call det(A − λI ) the characteristic polynomial. This is the same polynomial as det(λI − A) if n is even and is the negative of it if n is odd. The choice made here has the slight advantage of always having leading coefﬁcient 1, which is a handy property in some situations.

II. Vector Spaces over Q, R, and C

74

The term for the permutation σ = 1 has σ (k) = k for every k and gives n j=1 (λ − A j j ). All other σ ’s have σ (k) = k for at most n − 2 values of k, and λ therefore occurs at most n − 2 times. Thus the above expression is =

n

(λ − A j j ) +

j=1

= λn −

n j=1

#

$ other terms with powers of λ at most n − 2

# $ terms with powers of A j j λn−1 + + (−1)n det A. λ from n − 2 to 1

The constant term is (−1)n det A as indicated because it is the value of the polynomial at λ = 0, which is det(−A). In any event, we now see that characteristic polynomials are polynomial functions and can even be treated as polynomials in an indeterminate λ in the sense of Section I.3.9 The negative of the coefﬁcient of λn−1 is the trace of A, denoted by Tr A. Thus Tr A = nj=1 A j j . Trace is a linear functional on the vector space Mnn (F) of n-by-n matrices.

4 1 EXAMPLE 1. For A = , the characteristic polynomial is −2 1

det(λI − A) = det

λ − 4 −1 2 λ−1

= (λ − 4)(λ − 1) + 2 = λ2 − 5λ + 6 = (λ − 2)(λ − 3). The roots, and hence the eigenvalues, are λ = 2 and λ = 3. The eigenvectors for λ = 2 are computed by solving (2I − A)v = 0. The method of row reduction gives

0 0 0 −2 −1 1 12 2 − 4 −1 = → . 0 0 0 2 1 0 0 2 2−1 = −12 x2 . So the eigenvectors for λ = 2 Thus we have x1 + 12 x2 = 0 and x1

− 12 x1 = x2 . Similarly we ﬁnd are the nonzero vectors of the form x2 1 the eigenvectors for λ = 3 by starting from (3I − A)v = 0 and solving. The result for λ = 3 are the nonzero vectors of the form

is that the eigenvectors −1 x1 = x2 . For this example, there is a basis of eigenvectors. x2 1 9 In Chapter V we will allow determinants of matrices whose entries are from any “commutative ring with identity,” C[λ] being an example. Then we can think of det(λI − A) directly as involving an indeterminate λ and not initially as a function of a scalar λ.

8. Eigenvectors and Characteristic Polynomials

75

Corollary 2.40. An n-by-n matrix A has at most n eigenvalues. PROOF. Since det(λI − A) is a polynomial of degree n, this follows from Proposition 2.39 and Corollary 1.14. It will later be of interest that certain matrices A have a basis of eigenvectors. Such a basis exists for A as in Example 1 but not in general. One thing that can prevent a matrix from having a basis of eigenvectors is the failure of the characteristic polynomial to factor into ﬁrst-degree factors. Thus, for example,

0 1 A = has characteristic polynomial λ2 + 1, which does not factor −1 0 into ﬁrst-degree factors when F = R. Even when we do have a factorization into ﬁrst-degree factors, we can still fail to have a basis of eigenvectors, as the following example shows.

1 −1 , the characteristic polynomial is given EXAMPLE 2. For A = 0 1

λ−1 1 by det(λI − A) = det = (λ − 1)2 . When we solve for 0 λ − 1

x1 1 0 0 1 eigenvectors, we get = x1 , , and x2 = 0. Thus x2 0 0 0 0 and we do not have a basis of eigenvectors. What happens is that the presence of a factor (λ − c)k in the characteristic polynomial ensures the existence of an r -parameter family of eigenvectors for eigenvalue c, with 1 ≤ r ≤ k, but not necessarily with r = k. Example 2 shows that r can be strictly less than k. For purposes of deciding whether there is a basis of eigenvectors, the positive result is that the different roots of the characteristic polynomial do not interfere with each other; this is a consequence of the following proposition. Proposition 2.41. If A is an n-by-n matrix, then eigenvectors for distinct eigenvalues are linearly independent. REMARK. It follows that if the characteristic polynomial of A has n distinct eigenvalues, then it has a basis of eigenvectors. PROOF. Let Av1 = λ1 v1 , . . . , Avk = λk vk with λ1 , . . . , λk distinct, and suppose that c1 v1 + · · · + ck vk = 0.

II. Vector Spaces over Q, R, and C

76

Applying A repeatedly gives c1 λ1 v1 + · · · + ck λk vk = 0, c1 λ21 v1 + · · · + ck λ2k vk = 0, .. . k−1 c1 λk−1 1 v1 + · · · + ck λk vk = 0. ( j)

If the j th entry of vi is denoted by vi , this system of vector equations says that ⎛ ⎞ 1 ··· 1 ⎛ ⎞ ⎛ ( j) ⎞ c1 v1 0 ⎜ λ1 ··· λk ⎟ . . ⎜ . ⎟ ⎝ ⎠ ⎝ .. = .. ⎠ for 1 ≤ j ≤ n. .. ⎠ .. ⎝ .. . . ( j) 0 ck vk λk−1 · · · λk−1 1

k

The square matrix on the left side is a Vandermonde matrix, which is invertible ( j) by Corollary 2.37 since λ1 , . . . , λk are distinct. Therefore ci vi = 0 for all i ( j) and j. Each vi is nonzero in some entry vi with j perhaps depending on i, and hence ci = 0. Since all the coefﬁcients ci have to be 0, v1 , . . . , vk are linearly independent. The theory of eigenvectors and eigenvalues for square matrices allows us to develop a corresponding theory for linear maps L : V → V , where V is an n-dimensional vector space over F. If L is such a function, a vector v = 0 in V is an eigenvector of L if L(v) = λv for some scalar λ. We call λ the eigenvalue. When λ is an eigenvalue, the vector space of all v with L(v) = λv is called the eigenspace for λ under L. We can compute the eigenvalues and eigenvectors

of L by working any ordered basis of V . The equation L(v) =

in

L v v λv becomes =λ and is satisﬁed if and only if the column

v L vector is an eigenvalue of the matrix A = with eigenvalue λ. Applying Proposition 2.39 and remembering that determinants are well deﬁned on linear maps L : V → V , we see that L has an eigenvector with eigenvalue λ if and only if det(λI − L) = 0 and that in this case the eigenspace is the kernel of λI − L. What happens if we make these computations in a different

ordered basis

? L L We know from Proposition 2.17 that the matrices A = and B =

I . Computing with are similar, related by B = C −1 AC, where C =

9. Bases in the Infinite-Dimensional Case

77

v A leads to u = as eigenvector for the eigenvalue λ. The corresponding −1 −1 −1 −1 −1 result for B is that

B(C u) = C ACC u = C Au = λC u. Thus I v v C −1 u = = is an eigenvector of B with eigenvalue λ, just as it should be. These considerations about eigenvalues suggest some facts about similar matrices that we can observe more directly without ﬁrst passing from matrices to linear maps: One is that similar matrices have the same characteristic polynomial. To see this, suppose that B = C −1 AC; then det(λI − B) = det(λI − C −1 AC) = det(C −1 (λI − A)C) = (det C −1 ) det(λI − A)(det C −1 ) = (det C −1 )(det C −1 ) det(λI − A) = det(λI − A). A second fact is that similar matrices have the same trace. In fact, the trace is the negative of the coefﬁcient of λn−1 in the characteristic polynomial, and the characteristic polynomials are the same. Because of these considerations we are free in the future to speak of the characteristic polynomial, the eigenvalues, and the trace of a linear map from a ﬁnitedimensional vector space to itself, as well as the determinant, and these notions do not depend on any choice of ordered basis. We can speak unambiguously also of the eigenvectors of such a linear map. For this notion the realization of the eigenvectors in an ordered basis as column vectors depends on the ordered basis, the dependence being given by the formulas two paragraphs before the present one. One ﬁnal remark is in order. When the scalars are taken to be the complex numbers C, the Fundamental Theorem of Algebra (Theorem 1.18) is applicable: every polynomial of degree ≥ 1 has at least one root. When applied to the characteristic polynomial of a square matrix or a linear map from a ﬁnite-dimensional vector space to itself, this theorem tells us that the matrix or linear map always has at least one eigenvalue, hence an eigenvector. We shall make serious use of this fact in Chapter III.

9. Bases in the Inﬁnite-Dimensional Case So far in this chapter, the use of bases has been limited largely to vector spaces having a ﬁnite spanning set. In this case we know from Corollary 2.3 that the ﬁnite spanning set has a subset that is a basis, any linearly independent set can be extended to a basis, and any two bases have the same ﬁnite number of elements.

78

II. Vector Spaces over Q, R, and C

We called such spaces ﬁnite-dimensional and deﬁned the dimension of the vector space to be the number of elements in a basis. The ﬁrst objective in this section is to prove analogs of these results in the inﬁnite-dimensional case. We shall make use of Zorn’s Lemma as in Section A5 of the appendix, as well as the notion of cardinality discussed in Section A6 of the appendix. Once these analogs are in place, we shall examine the various results that we proved about ﬁnite-dimensional spaces to see the extent to which they remain valid for inﬁnite-dimensional spaces. Theorem 2.42. If V is any vector space over F, then (a) any spanning set in V has a subset that is a basis, (b) any linearly independent set in V can be extended to a basis, (c) V has a basis, (d) any two bases have the same cardinality. REMARKS. The common cardinality mentioned in (d) is called the dimension of the vector space V . In many applications it is enough to use +∞ in place of each inﬁnite cardinal in dimension formulas. This was the attitude conveyed in the remark with Corollary 2.24. PROOF. For (b), let E be the given linearly independent set, and let S be the collection of all linearly independent subsets of V that contain E. Partially order S by inclusion upward. The set S is nonempty because E is in S. Let T be a chain in S, and let A be the union of the members of T . We show that A is in S, and then A is certainly an upper bound of T . Because of its deﬁnition, A contains E, and we are to prove that A is linearly independent. For A to fail to be linearly independent would mean that there are vectors v1 , . . . , vn in A with c1 v1 + · · · + cn vn = 0 for some system of scalars not all 0. Let v j be in the member A j of the chain T . Since A1 ⊆ A2 or A2 ⊆ A1 , v1 and v2 are both in A1 or both in A2 . To keep the notation neutral, say they are both in A2 . Since A2 ⊆ A3 or A3 ⊆ A2 , all of v1 , v2 , v3 are in A2 or they are all in A3 . Say they are all in A3 . Continuing in this way, we arrive at one of the sets A1 , . . . , An , say An , such that all of v1 , . . . , vn are all in An . The members of An are linearly independent by assumption, and we obtain the contradiction c1 = · · · = cn = 0. We conclude that A is linearly independent. Thus the chain T has an upper bound in S. By Zorn’s Lemma, S has a maximal element, say M. By Proposition 2.1a, M is a basis of V containing E. For (a), let E be the given spanning set, and let S be the collection of all linearly independent subsets of V that are contained in E. Partially order S by inclusion upward. The set S is nonempty because ∅ is in S. Let T be a chain in S, and let A be the union of the members of T . We show that A is in S, and then A is certainly an upper bound of T . Because of its deﬁnition, A is contained in

9. Bases in the Infinite-Dimensional Case

79

E, and the same argument as in the previous paragraph shows that A is linearly independent. Thus the chain T has an upper bound in S. By Zorn’s Lemma, S has a maximal element, say M. Proposition 2.1a is not applicable, but its proof is easily adjusted to apply here to show that M spans V and hence is a basis: Given v in V , we are to prove that v lies is the linear span of M. First suppose that v is in E. If v is in M, there is nothing to prove. Since M ∪ {v} is contained in E, the assumed maximality implies that M ∪ {v} is not linearly independent, and hence cv + c1 v1 + · · · + cn vn = 0 for some scalars c, c1 , . . . , cn not all 0 and for some vectors v1 , . . . , vn in M. The scalar c cannot be 0 since M is linearly independent. Thus v = −c−1 c1 v1 − · · · − c−1 cn vn , and v is exhibited as in the linear span of M. Consequently every member of E lies in the linear span of M. Now suppose that v is not in E. Since every member of V lies in the linear span of E, every member of V lies in the linear span of M. Conclusion (c) follows from (a) by taking the spanning set to be V ; alternatively it follows from (b) by taking the linearly independent set to be ∅. For (d), let A = {vα } and B = {wβ } be two bases of V . Each member a of A can be written as a = c1 wβ1 + · · · + cn wβn uniquely with the scalars c1 , . . . , cn nonzero and with each wβj in B. Let Ba be the ﬁnite subset {wβ1 , . . . , wβn }. Then we have associated to each member of A a ﬁnite subset Ba of B. Let us see that a∈A Ba = B. If b is in B, then the linear span of B − {b} is not all of V . Thus some v in V is not in this span. Expand v in terms of A as v = d1 vα1 +· · ·+dm vαm with all d j = 0. Since v is not in the linear span of B − {b}, some a0 = vαj0 with 1 ≤ j0 ≤ m is not in this linear span. Then b is in Ba0 , and we conclude that B = a∈A Ba . By the corollary near the end of Section A6 of the appendix, card B ≤ card A. Reversing the roles of A and B, we obtain card A ≤ card B. By the Schroeder–Bernstein Theorem, A and B have the same cardinality. This proves (d). Now let us go through the results of the chapter and see how many of them extend to the inﬁnite-dimensional case and why. It is possible but not very useful in the inﬁnite-dimensional case to associate an inﬁnite “matrix” to a linear map when bases or ordered bases are speciﬁed for the domain and range. Because this association is not very useful, we shall not attempt to extend any of the results concerning matrices. The facts concerning extensions of results just dealing with dimensions and linear maps are as follows: COROLLARY 2.5. If V is any vector space and U is a vector subspace, then dim U ≤ dim V . In fact, take a basis of U and extend it to a basis of V ; a basis of U is then exhibited as a subset of a basis of V , and the conclusion about cardinal-number dimensions follows.

80

II. Vector Spaces over Q, R, and C

PROPOSITION 2.13. Let U and V be vector spaces over F, and let be a basis of U . Then to each function : → V corresponds one and only one linear map L : U → V such that L = . In fact, the proof given in Section 3 is valid with no assumption about ﬁnite dimensionality. COROLLARY 2.15. If L : U → V is a linear map between vector spaces over F, then dim(domain(L)) = dim(kernel(L)) + dim(image(L)). In fact, this formula remains valid, but the earlier proof via matrices has to be replaced. Instead, take a basis {vα | α ∈ A} of the kernel and extend it to a basis {vα | α ∈ S} of the domain. It is routine to check that {L(vα ) | α ∈ S − A} is a basis of the image of L. THEOREM 2.16 (part). The composition of two linear maps is linear. In fact, the proof in Section 3 remains valid with no assumption about ﬁnite dimensionality. PROPOSITION 2.18. Two vector spaces over F are isomorphic if and only if they have the same cardinal-number dimension. In fact, this result follows from Proposition 2.13 just as it did in the ﬁnitedimensional case; the only changes that are needed in the argument in Section 3 are small adjustments of the notation. Of course, one must not overinterpret this result on the basis of the remark with Theorem 2.42: two vector spaces with dimension +∞ need not be isomorphic. Despite the apparent deﬁnitive sound of Proposition 2.18, one must not attach too much signiﬁcance to it; vector spaces that arise in practice tend to have some additional structure, and an isomorphism based merely on equality of dimensions need not preserve the additional structure. PROPOSITION 2.19. If V is a vector space and V is its dual, then dim V ≤ dim V . (In the inﬁnite-dimensional case we do not have equality.) In fact, take a basis {vα } of V . If for each α we deﬁne vα (vβ ) = δαβ and use Proposition 2.13 to form the linear extension vα , then the set {vα } is a linearly independent subset of V that is in one-one correspondence with the basis of V . Extending {vα } to a basis of V , we obtain the result. PROPOSITION 2.20. Let V be a vector space, and let U be a vector subspace of V . Then (b) every linear functional on U extends to a linear functional on V , (c) whenever v0 is a member of V that is not in U , there exists a linear functional on V that is 0 on U and is 1 on v0 .

9. Bases in the Infinite-Dimensional Case

81

Conclusion (a) of the original Proposition 2.20, which concerns annihilators, does not extend to the inﬁnite-dimensional case. To prove (b) without the ﬁnite dimensionality, let u be a given linear functional on U , let {u α } be a basis of U , and let {vβ } be a subset of V such that {u α } ∪ {vβ } is a basis of V . Deﬁne v (u α ) = u (u α ) for each α and v (vβ ) = 0 for each β. Using Proposition 2.13, let v be the linear extension to a linear functional on V . Then v has the required properties. To prove (c) without the ﬁnite dimensionality, we take a basis {u α } of U and extend {u α } ∪ {v0 } to a basis of V . Deﬁne v to equal 0 on each u α , to equal 1 on v0 , and to equal 0 on the remaining members of the basis of V . Then the linear extension of v to V is the required linear functional. PROPOSITION 2.22. If V is any vector space over F, then the canonical map ι : V → V is one-one. The canonical map is not onto V if V is inﬁnitedimensional. The proof that it is one-one given in Section 4 is applicable in the inﬁnitedimensional case since we know from Theorem 2.42 that any linearly independent subset of V can be extended to a basis. For the second conclusion when V has a countably inﬁnite basis, see Problem 31 at the end of the chapter. PROPOSITION 2.23 THROUGH COROLLARY 2.29. For these results about quotients, the only place that ﬁnite dimensionality played a role was in the dimension formulas, Corollaries 2.24 and 2.29. We restate these two results separately. COROLLARY 2.24. If V is a vector space over F and U is a vector subspace, then (a) dim V = dim U + dim(V /U ), (b) the subspace U is the kernel of some linear map deﬁned on V . The proof in Section 5 requires no changes: Let q be the quotient map. The linear map q meets the conditions of (b). For (a), take a basis of U and extend to a basis of V . Then the images under q of the additional vectors form a basis of V /U . COROLLARY 2.29. Let M and N be vector subspaces of a vector space V over F. Then dim(M + N ) + dim(M ∩ N ) = dim M + dim N . In fact, Corollary 2.24a gives us dim(M + N ) = dim((M + N )/M) + dim M. Substituting dim((M + N )/M) = dim(N /(M ∩ N )) from Theorem 2.28 and adding dim(M ∩ N ) to both sides, we obtain dim(M + N ) + dim(M ∩ N ) = dim(M ∩ N ) + dim(N /(M ∩ N )) + dim M. The ﬁrst two terms on the right side add to dim N by Corollary 2.24a, and the result follows.

82

II. Vector Spaces over Q, R, and C

PROPOSITIONS 2.30 THROUGH 2.33. These results about direct products and direct sums did not assume any ﬁnite dimensionality. The determinants of Sections 7–8 have no inﬁnite-dimensional generalization, and Proposition 2.41 is the only result in those two sections with a valid inﬁnitedimensional analog. The valid analog in the inﬁnite-dimensional case is that eigenvectors for distinct eigenvalues under a linear map are linearly independent. The proof given for Proposition 2.41 in Section 8 adapts to handle this analog, ( j) provided we interpret components vi of a vector vi as the coefﬁcients needed to expand vi in a basis of the underlying vector space.

10. Problems 1.

2.

3.

4.

Determine bases of the following subsets of R3 : (a) the plane 3x − 2y + 5z = 0, % x = 2t (b) the line y = −t , where −∞ < t < ∞. z = 4t This problem shows that the associativity law in the deﬁnition of “vector space” implies certain more complicated formulas of which the stated law is a special case. Let v1 , . . . , vn be vectors in a vector space V . The only vector-space properties that are to be used in this problem are associativity of addition and the existence of the 0 element. (a) Deﬁne v(k) inductively upward by v(0) = 0 and v(k) = v(k−1) +vk , and deﬁne v (l) inductively downward by v (n+1) = 0 and v (l) = vl + v (l+1) . Prove that v(k) + v (k+1) is always the same element for 0 ≤ k ≤ n. (b) Prove that the same element of V results from any way of inserting parentheses in the sum v1 + · · · + vn so that each step requires the addition of only two members of V . This problem shows that the commutative and associative laws in the deﬁnition of “vector space” together imply certain more complicated formulas of which the stated commutative law is a special case. Let v1 , . . . , vn be vectors in a vector space V . The only vector-space properties that are to be used in this problem are commutativity of addition and the properties in the previous problem. Because of the previous problem, v1 + · · · + vn is a well-deﬁned element of V , and it is not necessary to insert any parentheses in it. Prove that v1 + v2 + · · · + vn = vσ (1) + vσ (2) + · · · + vσ (n) for each permutation σ of {1, . . . , n}.

1 2 −1 For the matrix A = 2 4 6 , ﬁnd 0 0 −8

(a) a basis for the row space, (b) a basis for the column space, and

10. Problems

83

(c) the rank of the matrix. 5.

Let A be an n-by-n matrix of rank one. Prove that there exists an n-dimensional column vector c and an n-dimensional row vector r such that A = cr .

6.

Let A be a k-by-n matrix, and let R be a reduced row-echelon form of A. (a) Prove for each r that the rows of R whose ﬁrst r entries are 0 form a basis for the vector subspace of all members of the row space of A whose ﬁrst r entries are 0. (b) Prove that the reduced row-echelon form of A is unique in the sense that any two sequences of steps of row reduction lead to the same reduced form.

7.

Let E be an ﬁnite set of N points, let V be the N -dimensional vector space of all real-valued functions on E, and let n be an integer with 0 < n ≤ N . Suppose that U is an n-dimensional subspace of V . Prove that there exists a subset D of n points in E such that the vector space of restrictions to D of the members of U has dimension n.

8.

2 2 A by the matrix linear map L : R → R is given in the standard ordered # basis $ −6 −12 3 −4 . Find the matrix of L in the ordered basis , 3 . 6 11 −2

9.

Let V be the real vector space of all polynomials in x of degree ≤ 2, and let L : V → V be the linear map I − D 2 , where I is the identity and D is the differentiation operator d/d x. Prove that L is invertible.

10. Let A be in Mkm (C) and B be in Mmn (C). Prove that rank(AB) ≤ max(rank A, rank B). 11. Let A be in Mkn (C) with k > n. Prove that there exists no B in Mnk (C) with AB = I . 12. Let A be in Mkn (C) and B be in Mnk (C). Give an example with k = n to show that rank(AB) need not equal rank(B A). 13. With the differential equation y (t) = y(t) in Example 2 of Section 3, two examples of linear functionals on the vector space of solutions are given by

1 (y) = y(0) and 2 (y) = y (0). Find a basis of the space of solutions such that { 1 , 2 } is the dual basis. 14. Suppose that a vector space V has a countably inﬁnite basis. Prove that the dual V has an uncountable linearly independent set. 15. (a) Give an example of a vector space and three vector subspaces L, M, and N such that L ∩ (M + N ) = (L ∩ M) + (L ∩ N ). (b) Show that inclusion always holds in one direction in (a). (c) Show that equality always holds in (a) if L ⊇ M.

84

II. Vector Spaces over Q, R, and C

16. Construct three vector subspaces M, N1 , and N2 of a vector space V such that M ⊕ N1 = M ⊕ N2 = V but N1 = N2 . What is the geometric picture corresponding to this situation? 17. Suppose that x, y, u, and v are vectors in R4 ; let M and N be the vector subspaces of R4 spanned by {x, y} and {u, v}, respectively. In which of the following cases is it true that R4 = M ⊕ N ? (a) x = (1, 1, 0, 0), y = (1, 0, 1, 0), u = (0, 1, 0, 1), v = (0, 0, 1, 1); (b) x = (−1, 1, 1, 0), y = (0, 1, −1, 1), u = (1, 0, 0, 0), v = (0, 0, 0, 1); (c) x = (1, 0, 0, 1), y = (0, 1, 1, 0), u = (1, 0, 1, 0), v = (0, 1, 0, 1). 18. Section 6 gave deﬁnitions and properties of projections and injections associated with the direct sum of two vector spaces. Write down corresponding deﬁnitions and properties for projections and injections in the case of the direct sum of n vector spaces, n being an integer > 2. 19. Let T : Rn → Rn be a linear map with ker T ∩ image T = 0. (a) Prove that Rn = ker T ⊕ image T . (b) Prove that the condition ker T ∩ image T = 0 is satisﬁed if T 2 = T . 20. If V1 and V2 are two vector spaces over F, prove that (V1 ⊕ V2 ) is canonically isomorphic to V1 ⊕ V2 . 21. Suppose that M is a vector subspace of a vector space V and that q : V → V /M is the quotient map. Corresponding to each linear functional y on V /M is a linear functional z on V given by z = yq. Why is the correspondence y → z an isomorphism between (V /M) and Ann M? 22. Let M be a vector subspace of the vector space V , and let q : V → V /M be the quotient map. Suppose that N is a vector subspace of V . Prove that V = M ⊕ N if and only if the restriction of q to N is an isomorphism of N onto V /M. 23. For a square matrix A of integers, prove that the inverse has integer entries if and only if det A = ±1. 24. Let A be in Mkn (C), and let r = rank A. Prove that r is the largest integer such that there exist r row indices i 1 , . . . , ir and r column indices j1 , . . . , jr for which the r -by-r matrix formed from these rows and columns of A has nonzero determinant. (Educational note: This problem characterizes the subset of matrices of rank ≤ r − 1 as the set in which all determinants of r -by-r submatrices are zero.) 25. Suppose that a linear combination of functions t → ect with c real vanishes for every integer t ≥ 0. Prove that it vanishes for every real t. 01 26. Find all eigenvalues and eigenvectors of A = −6 5 . 27. Let A and C be n-by-n matrices with C invertible. By making a direct calculation with the entries, prove that Tr(C −1 AC) = Tr A.

10. Problems

85

⎛ ⎜ ⎜ ⎜ 28. Find the characteristic polynomial of the n-by-n matrix ⎜ ⎜ ⎝

0 0 0 0

1 0 0 0

0 1 0 0

0 0 1 0

..

0 0 0 0

0 0 0 0

0

1

.

0 0 0 0 ···

⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠

a0 a1 a2 a3 ··· an−2 an−1

29. Let A and B be in Mnn (C). (a) Prove under the assumption that A is invertible that det(λI − AB) = det(λI − B A). (b) By working with A + I and letting tend to 0, show that the assumption in (a) that A is invertible can be dropped. 30. In proving Theorem 2.42a, it is tempting to argue by considering all spanning subsets of the given set, ordering them by inclusion downward, and seeking a minimal element by Zorn’s Lemma. Give an example of a chain in this ordering that has no lower bound, thereby showing that this line of argument cannot work. Problems 31–34 concern annihilators. Let V be a vector space, let M and N be vector subspaces, and let ι : V → V be the canonical map. 31. If V has a countably inﬁnite basis, how can we conclude that ι does not carry V onto V ? 32. Prove that Ann(M + N ) = Ann M ∩ Ann N . 33. Prove that Ann(M ∩ N ) = Ann M + Ann N . 34. (a) Prove that ι(M) ⊆ Ann(Ann M). (b) Prove that equality holds in (a) if V is ﬁnite-dimensional. (c) Give an inﬁnite-dimensional example in which equality fails in (a). Problems 35–39 concern operations by blocks within matrices. 35. Let A be a k-by-m matrix of the form A = ( A1 A2 ), where A1 has size k-by-m 1 , A2 has size 2 , and m 1 + m 2 = m. Let B by an m -by-n matrix

k-by-m B1 of the form B = , where B1 has size m 1 -by-n, B2 has size m 2 -by-n, and B2 m 1 + m 2 = m . (a) If m 1 = m 1 and m 2 = m 2 , prove that AB =A1 B1 + A2 B2 .

B1 A1 B1 A2 (b) If k = n, prove that B A = . B2 A2 B2 A2 (c) Deduce a general rule for block multiplication of matrices that are in 2-by-2 block form. 36. Let A be in Mkk (C), B be in Mkn (C), and D be in Mnn (C). Prove that A B det = det A det D. 0 D

86

II. Vector Spaces over Q, R, and C

37. Let A, B, C, and

D be inMnn (C). Suppose that A is invertible and that AC = C A. A B Prove that det = det(AD − C B). C D 38. Let A be in Mkn (C) and B be in Mnk (C) with k ≤ n. Let Ik be the k-byk identity, and let In be the n-by-n identity. Using Problem 29, prove that det(λIn − B A) = λn−k det(λIk − AB). 39. Prove the following block-form generalization of the expansion-in-cofactors formula. For each subset S of {1, . . . , n}, let S c be the complementary subset within {1, . . . , n}, and let sgn(S, S c ) be the sign of the permutation that carries (1, . . . , n) to the members of S in order, followed by the members of S c in order. Fix k with 1 ≤ k ≤ n − 1, and let the subset S have |S| = k. For an n-by-n matrix A, deﬁne A(S) to be the square matrix of size k obtained by using the rows of A indexed by 1, . . . , k and the columns indexed by the members of S. Let A(S) be the square matrix of size k − 1 obtained by using the rows of A indexed by k + 1, . . . , n and the columns indexed by the members of S c . Prove that det A = sgn(S, S c ) det A(S) det A(S). S⊆{1,...,n}, |S|=k

Problems 40–44 compute the determinants of certain matrices known as Cartan matrices. These have geometric signiﬁcance in the theory of Lie groups. ⎛ 2 −1 0 0 ··· 0 0 ⎞ ⎜ ⎜ ⎜ 40. Let An be the n-by-n matrix ⎜ ⎜ ⎝

−1

2 −1 0 ··· 0 −1 2 −1 ··· 0 0 −1 2 ···

..

0 0

0 0

0 0

0 0 0

0 ⎟ 0⎟ 0⎟

.

0 ··· 2 −1 0 ··· −1 2

⎟. Using expansion in ⎟ ⎠

cofactors about the last row, prove that det An = 2 det An−1 − det An−2 for n ≥ 3. 41. Computing det A1 and det A2 directly and using the recursion in Problem 40, prove that det An = n + 1 for n ≥ 1. 42. Let Cn for n ≥ 2 be the matrix An except that the (1, 2)th entry is changed from −1 to −2. (a) Expanding in cofactors about the last row, prove that the argument of Problem 40 is still applicable when n ≥ 4 and a recursion formula for det Cn results with the same coefﬁcients. (b) Computing det C2 and det C3 directly and using the recursion equation in (a), prove that det Cn = 2 for n ≥ 2.

10. Problems

87

43. Let Dn for n ≥ 3 be the matrix An except that the upper left 3-by-3 piece is

2 −1 0 2 0 − 1 0 2 −1 . changed from −1 2 −1 to 0 −1

2

−1 − 1

2

(a) Expanding in cofactors about the last row, prove that the argument of Problem 40 is still applicable when n ≥ 5 and a recursion formula for det Dn results with the same coefﬁcients. (b) Show that D3 can be transformed into A3 by suitable interchanges of rows and interchanges of columns, and conclude that det D3 = det A3 = 4. (c) Computing det D4 directly and using (b) and the recursion equation in (a), prove that det Dn = 4 for n ≥ 3. 44. Let E n for n ≥ 4 be the matrix An except that the upper left 4-by-4 piece is ⎛ ⎞ ⎛ ⎞ changed from

2 −1 0 0 2 −1 0 ⎠ 0 −1 2 −1 0 0 −1 2

⎝ −1

to

2 −1 0 0 2 0 −1 ⎠ . 0 0 2 −1 0 −1 −1 2

⎝ −1

(a) Expanding in cofactors about the last row, prove that the argument of Problem 40 is still applicable when n ≥ 6 and a recursion formula for det E n results with the same coefﬁcients. (b) Show that E 4 can be transformed into A4 by suitable interchanges of rows and interchanges of columns, and conclude that det E 4 = det A4 = 5. (c) Show that E 5 can be transformed into D5 by suitable interchanges of rows and interchanges of columns, and conclude that det E 5 = det D5 = 4. (d) Using (b) and (c) and the recursion equation in (a), prove that det E n = 9−n for n ≥ 4.

CHAPTER III Inner-Product Spaces

Abstract. This chapter investigates the effects of adding the additional structure of an inner product to a ﬁnite-dimensional real or complex vector space. Section 1 concerns the effect on the vector space itself, deﬁning inner products and their corresponding norms and giving a number of examples and formulas for the computation of norms. Vector-space bases that are orthonormal play a special role. Section 2 concerns the effect on linear maps. The inner product makes itself felt partly through the notion of the adjoint of a linear map. The section pays special attention to linear maps that are self-adjoint, i.e., are equal to their own adjoints, and to those that are unitary, i.e., preserve norms of vectors. Section 3 proves the Spectral Theorem for self-adjoint linear maps on ﬁnite-dimensional innerproduct spaces. The theorem says in part that any self-adjoint linear map has an orthonormal basis of eigenvectors. The Spectral Theorem has several important consequences, one of which is the existence of a unique positive semideﬁnite square root for any positive semideﬁnite linear map. The section concludes with the polar decomposition, showing that any linear map factors as the product of a unitary linear map and a positive semideﬁnite one.

1. Inner Products and Orthonormal Sets In this chapter we examine the effect of adding further geometric structure to the structure of a real or complex vector space as deﬁned in Chapter II. To be a little more speciﬁc in the cases of R2 and R3 , the development of Chapter II amounted to working with points, lines, planes, coordinates, and parallelism, but nothing further. In the present chapter, by comparison, we shall take advantage of additional structure that captures the notions of distances and angles. We take F to be R or C, continuing to call its members the scalars. We do not allow F to be Q in this chapter; the main results will make essential use of additional facts about R and C beyond those of addition, subtraction, multiplication, and division. The relevant additional facts are summarized in Sections A3 and A4 of the appendix.1 1 The theory of Chapter II will be observed in Chapter IV to extend to any “ﬁeld” F in place of Q or R or C, but the theory of the present chapter is limited to R and C, as well as some other special ﬁelds that we shall not try to isolate.

88

1. Inner Products and Orthonormal Sets

89

Many of the results that we obtain will be limited to the ﬁnite-dimensional case. The theory of inner-product spaces that we develop has an inﬁnite-dimensional generalization, but useful results for the generalization make use of a hypothesis of “completeness” for an inner-product space that we are not in a position to verify in examples.2 Let V be a vector space over F. An inner product on V is a function from V × V into F, which we here denote by ( · , · ), with the following properties: (i) the function u → (u, v) of V into F is linear, (ii) the function v → (u, v) of V into F is conjugate linear in the sense that it satisﬁes (u, v1 + v2 ) = (u, v1 ) + (u, v2 ) for v1 and v2 in V and (u, cv) = c(u, ¯ v) for v in V and c in F, (iii) (u, v) = (v, u) for u and v in V , (iv) (v, v) ≥ 0 for all v in V , (v) (v, v) = 0 only if v = 0 in V . The overbars in (ii) and (iii) indicate complex conjugation. Property (ii) reduces when F = R to the fact that v → (u, v) is linear. Properties (i) and (ii) together are summarized by saying that ( · , · ) is bilinear if F = R or sesquilinear if F = C. Property (iii) is summarized when F = R by saying that ( · , · ) is symmetric, or when F = C by saying that ( · , · ) is Hermitian symmetric. An inner-product space, for purposes of this book, is a vector space over R or C with an inner product in the above sense.3,4 EXAMPLES. as the dot product, i.e., with (x, y) = y t x = (1) V = Rn with ( · ,· ) y1 x1 . . x1 y1 + · · · + xn yn if x = .. and y = .. . The traditional notation for the xn

yn

dot product is x · y. , · ) deﬁned by (x, y) = y¯ t x = x1 y¯1 + · · · + xn y¯n if (2) V = Cn with ( · y1

x1

x=

.. .

xn

and y =

.. .

. Here y¯ denotes the entry-by-entry complex conjugate

yn

of y. The sesquilinear expression ( · , · ) is different from the complex bilinear dot product x · y = x1 y1 + · · · + xn yn . 2 A careful study in the inﬁnite-dimensional case is normally made only after the development of a considerable number of topics in real analysis. 3 When the scalars are complex, many books emphasize the presence of complex scalars by referring to the inner product as a “Hermitian inner product.” This book does not need to distinguish the complex case very often and therefore will not use the modiﬁer “Hermitian” with the term “inner product.” 4 Some authors, particularly in connection with mathematical physics, reverse the roles of the two variables, deﬁning inner products to be conjugate linear in the ﬁrst variable and linear in the second variable.

90

&1 0

III. Inner-Product Spaces

(3) V equal to the vector space of all complex-valued polynomials with ( f, g) = f (x)g(x) d x.

√ Let V be an inner-product space. If v is in V , deﬁne v = (v, v), calling · the norm associated with the inner product. The norm of v is understood to be the nonnegative square root of the nonnegative real number (v, v) and is well deﬁned as a consequence of (iv). In the case of Rn , x is the Euclidean distance '

x12 + · · · + xn2 from the origin to the column vector x = (x1 , . . . , xn ). In this interpretation the dot product of two nonzero vectors in Rn is shown in analytic geometry to be given by x · y = xy cos θ, where θ is the angle between the vectors x and y. Direct expansion of norms squared of sums of vectors using bilinearity or sesquilinearity leads to certain formulas of particular interest. The formula that we shall use most frequently is u + v2 = u2 + 2 Re(u, v) + v2 , which generalizes from R2 a version of the law of cosines in trigonometry relating the lengths of the three sides of a triangle when one of the angles is known. With the additional hypothesis that (u, v) = 0, this formula generalizes from R2 the Pythagorean Theorem u + v2 = u2 + v2 . Another such formula is the parallelogram law u + v2 + u − v2 = 2u2 + 2v2

for all u and v in V,

which is proved by computing u + v2 and u − v2 by the law of cosines and adding the results. The name “parallelogram law” is explained by the geometric interpretation in the case of the dot product for R2 and is illustrated in Figure 3.1. That ﬁgure uses the familiar interpretation of vectors in R2 as arrows, two arrows being identiﬁed if they are translates of one another; thus the arrow from v to u represents the vector u − v. The parallelogram law is closely related to a formula for recovering the inner product from the norm, namely (u, v) =

1 k i u + i k v2 , 4 k

where the sum extends for k ∈ {0, 2} if the scalars are real and extends for k ∈ {0, 1, 2, 3} if the scalars are complex. This formula goes under the name

1. Inner Products and Orthonormal Sets

91

polarization. To prove it, weexpand u + i k v2 = u2 + 2 Re(u, i k v) + v2 = u2 + 2 Re (−i)k (u, v) + v2 . Multiplying by i k and summing on k k k k 2 k shows that k i u + i v = 2 k i Re (−i) (u, v) . If k is even, then z) = Re z for any complex z, while if k is odd, then i k Re((−i)k z) = i k Re((−i)k k k i Im z. So 2 k i Re((−i) z) = 4z, and k i k u +i k v2 = 4(u, v), as asserted. u+v

v u−v u 0

FIGURE 3.1. Geometric interpretation of the parallelogram law: the sum of the squared lengths of the four sides of a parallelogram equals the sum of the squared lengths of the diagonals. Proposition 3.1 (Schwarz inequality). |(u, v)| ≤ uv for all u and v in V .

In any inner-product space V ,

REMARK. The proof is written so as to use properties (i) through (iv) in the deﬁnition of inner product but not (v), a situation often encountered with integrals. PROOF. Possibly replacing u by eiθ u for some real θ, we may assume that (u, v) is real. In the case that v = 0, the law of cosines gives u − v−2 (u, v)v 2 = u2 − 2v−2 |(u, v)|2 + v−4 |(u, v)|2 v2 . The left side is ≥ 0, and the right side simpliﬁes to u2 − v−2 |(u, v)|2 . Thus the inequality follows in this case. In the case that v = 0, it is enough to prove that (u, v) = 0 for all u. If c is a scalar, then we have u + cv2 = u2 + 2 Re c(u, v) + |c|2 v2 = u2 + 2 Re c(u, v) . The left side is ≥ 0 as c varies, but the right side is < 0 for a suitable choice of c unless (u, v) = 0. This completes the proof. Proposition 3.2. In any inner-product space V , the norm satisﬁes (a) v ≥ 0 for all v in V , with equality if and only if v = 0, (b) cv = |c|v for all v in V and all scalars c, (c) u + v ≤ u + v for all u and v in V .

92

III. Inner-Product Spaces

PROOF. Conclusion (a) is immediate from properties (iv) and (v) of an inner ¯ v) = |c|2 v2 . Finally product, and (b) follows since cv2 = (cv, cv) = cc(v, we use the law of cosines and the Schwarz inequality (Proposition 3.1) to write u +v2 = u2 +2 Re(u, v)+v2 ≤ u2 +2uv+v2 = (u+v)2 . Taking the square root of both sides yields (c). Two vectors u and v in V are said to be orthogonal if (u, v) = 0, and one sometimes writes u ⊥ v in this case. The notation is a reminder of the interpretation in the case of dot product—that dot product 0 means that the cosine of the angle between the two vectors is 0 and the vectors are therefore perpendicular. An orthogonal set in V is a set of vectors such that each pair is orthogonal. The nonzero members of an orthogonal set are linearly independent. In fact, if {v1 , . . . , vk } is an orthogonal set of nonzero vectors and some linear combination has c1 v1 + · · · + ck vk = 0, then the inner product of this relation with v j gives 0 = (c1 v1 + · · · + ck vk , v j ) = c j v j 2 , and we see that c j = 0 for each j. A unit vector in V is a vector u with u = 1. If v is any nonzero vector, then v/v is a unit vector. An orthonormal set in V is an orthogonal set of unit vectors. Under the assumption that V is ﬁnite-dimensional, an orthonormal basis of V is an orthonormal set that is a vector-space basis.5 EXAMPLES. (1) In Rn or Cn , the standard basis {e1 , . . . , en } is an orthonormal set. (2) Let V be the complex inner-product space of all complex ﬁnite linear combinations, for n from −N to +N , of the functions &x → einx on the closed π 1 interval [−π, π], the inner product being ( f, g) = 2π −π f (x)g(x) d x. With respect to this inner product, the functions einx form an orthonormal set. A simple but important exercise in an inner-product space is to resolve a vector into the sum of a multiple of a given unit vector and a vector orthogonal to the given unit vector. This exercise is solved as follows: If v is given and u is a unit vector, then v decomposes as v = (v, u)u + v − (v, u)u . Here (v, u)u is a multiple of u, and the two components are orthogonal since u, v − (v, u)u = (u, v) − (v, u)(u, u) = (u, v) − (u, v) = 0. This decomposition is unique since if v = v1 + v2 with v1 = cu and (v2 , u) = 0, then the inner product of v = v1 + v2 with u yields (v, u) = (cu, u) + (v2 , u) = c. Hence 5 In the inﬁnite-dimensional theory the term “orthonormal basis” is used for an orthonormal set that spans V when limits of ﬁnite sums are allowed, in addition to ﬁnite sums themselves; when V is inﬁnite-dimensional, an orthonormal basis is never large enough to be a vector-space basis.

1. Inner Products and Orthonormal Sets

93

c must be (v, u), v1 must be (v, u)u, and v2 must be v − (v, u)u. Figure 3.2 illustrates the decomposition, and Proposition 3.3 generalizes it by replacing the multiples of a single unit vector by the span of a ﬁnite orthonormal set. v v − (v, u)u (v, u)u u 0 FIGURE 3.2. Resolution of v into a component (v, u)u parallel to a unit vector u and a component orthogonal to u. Proposition 3.3. Let V be an inner-product space. If {u 1 , . . . , u k } is an orthonormal set in V and if v is given in V , then there exists a unique decomposition v = c1 u 1 + · · · + ck u k + v ⊥ with v ⊥ orthogonal to u j for 1 ≤ j ≤ k. In this decomposition c j = (v, u j ). REMARK. The proof illustrates a technique that arises often in mathematics. We seek to prove an existence–uniqueness theorem, and we begin by making calculations toward uniqueness that narrow down the possibilities. We are led to some formulas or conditions, and we use these to deﬁne the object in question and thereby prove existence. Although it may not be so clear except in retrospect, this was the technique that lay behind proving the equivalence of various conditions for the invertibility of a square matrix in Section I.6. The technique occurred again in deﬁning and working with determinants in Section II.7. PROOF OF UNIQUENESS. Taking the inner product of both sides with u j , we obtain (v, u j ) = (c1 u 1 + · · · + ck u k + v ⊥ , u j ) = c j for each j. Then c j = (v, u j ) is forced, and v ⊥ must be given by v − (v, u 1 )u 1 − · · · − (v, u k )u k . PROOF OF EXISTENCE. Putting c j = (v, u j ), we need check only that the difference v −(v, u 1 )u 1 −· · ·−(v, u k )u k is orthogonal to each u j with 1 ≤ j ≤ k. Direct calculation gives v − i (v, u i )u i , u j = (v, u j ) − i ((v, u i )u i , u j ) = (v, u j ) − (v, u j ) = 0, and the proof is complete.

Corollary 3.4 (Bessel’s inequality). Let V be an inner-product space. If {u 1 , . . . , u k } is an orthonormal set in V and if v is given in V , then kj=1 |(v, u j )|2 ≤ v2 with equality if and only if v is in span{u 1 , . . . , u k }.

III. Inner-Product Spaces

94

PROOF. Using Proposition 3.3, write v = orthogonal to u 1 , . . . , u k . Then k

k j=1

(v, u j )u j + v ⊥ with v ⊥

(v, u j )u j + v ⊥ ⊥ = i, j (v, u i )(v, u j )(u i , u j ) + i (v, u i )u i , v + v ⊥ , j (v, u j )u j + v ⊥ 2 = i, j (v, u i )(v, u j )δi j + 0 + 0 + v ⊥ 2 = kj=1 |(v, u j )|2 + v ⊥ 2 .

v2 =

i=1

(v, u i )u i + v ⊥ ,

k

j=1

From Proposition 3.3 we know that v is in span{u 1 , . . . , u k } if and only if v ⊥ = 0, and the corollary follows. We shall now impose the condition of ﬁnite dimensionality in order to obtain suitable kinds of orthonormal sets. The argument will enable us to give a basisfree interpretation of Proposition 3.3 and Corollary 3.4, and we shall obtain equivalent conditions for the vector v ⊥ in Proposition 3.3 and Corollary 3.4 to be 0 for every v. If an ordered set of k linearly independent vectors in the inner-product space V is given, the above proposition suggests a way of adjusting the set so that it becomes orthonormal. Let us write the formulas here and carry out the veriﬁcation via Proposition 3.3 in the proof of Proposition 3.5 below. The method of adjusting the set so as to make it orthonormal is called the Gram–Schmidt orthogonalization process. The given linearly independent set is denoted by {v1 , . . . , vk }, and we deﬁne v1 , v1 u 2 = v2 − (v2 , u 1 )u 1 , u1 =

u2 =

u 2 , u 2

u 3 = v3 − (v3 , u 1 )u 1 − (v3 , u 2 )u 2 , u3 = .. .

u 3 , u 3

u k = vk − (vk , u 1 ) − · · · − (vk , u k−1 )u k−1 , uk =

u k . u k

1. Inner Products and Orthonormal Sets

95

Proposition 3.5. If {v1 , . . . , vk } is a linearly independent set in an innerproduct space V , then the Gram–Schmidt orthogonalization process replaces {v1 , . . . , vk } by an orthonormal set {u 1 , . . . , u k } such that span{v1 , . . . , v j } = span{u 1 , . . . , u j } for all j. PROOF. We argue by induction on j. The base case is j = 1, and the result is evident in this case. Assume inductively that u 1 , . . . , u j−1 are well deﬁned and orthonormal and that span{v1 , . . . , v j−1 } = span{u 1 , . . . , u j−1 }. Proposition 3.3 shows that u j is orthogonal to u 1 , . . . , u j−1 . If u j = 0, then v j has to be in span{u 1 , . . . , u j−1 } = span{v1 , . . . , v j−1 }, and we have a contradiction to the assumed linear independence of {v1 , . . . , vk }. Thus u j = 0, and {u 1 , . . . , u j } is a well-deﬁned orthonormal set. This set must be linearly independent, and hence its linear span is a j-dimensional vector subspace of the linear span of {v1 , . . . , v j }. By Corollary 2.4, the two linear spans coincide. This completes the induction and the proof. Corollary 3.6. If V is a ﬁnite-dimensional inner-product space, then any orthonormal set in a vector subspace S of V can be extended to an orthonormal basis of S. PROOF. Extend the given orthonormal set to a basis of S by Corollary 2.3b. Then apply the Gram–Schmidt orthogonalization process. The given vectors do not get changed by the process, as we see from the formulas for the vectors u j and u j , and hence the result is an extension of the given orthonormal set to an orthonormal basis. Corollary 3.7. If S is a vector subspace of a ﬁnite-dimensional inner-product space V , then S has an orthonormal basis. PROOF. This is the special case of Corollary 3.6 in which the given orthonormal set is empty. The set of all vectors orthogonal to a subset M of the inner-product space V is denoted by M ⊥ . In symbols, M ⊥ = {u ∈ V | (u, v) = 0 for all v ∈ M}. We see by inspection that M ⊥ is a vector subspace. Moreover, M ∩ M ⊥ = 0 since any u in M ∩ M ⊥ must have (u, u) = 0. The interest in the vector subspace M ⊥ comes from the following proposition. Theorem 3.8 (Projection Theorem). If S is a vector subspace of the ﬁnitedimensional inner-product space V , then every v in V decomposes uniquely as v = v1 + v2 with v1 in S and v2 in S ⊥ . In other words, V = S ⊕ S ⊥ .

96

III. Inner-Product Spaces

REMARKS. Because of this proposition, S ⊥ is often called the orthogonal complement of the vector subspace S. PROOF. Uniqueness follows from the fact that S ∩ S ⊥ = 0. For existence, use of Corollaries 3.7 and 3.6 produces an orthonormal basis {u 1 , . . . , u r } of S and extends it to an orthonormal basis {u 1 , . . . , u n } of V . The vectors u j for j > r are orthogonal to each u i with i ≤ r and hence arein S ⊥ . If v is given nin S, we n r can write v = j=1 u j as v = v1 + v2 with v1 = i=1 u i and v2 = j=r +1 u j , and this decomposition for all v shows that V = S + S ⊥ . Corollary 3.9. If S is a vector subspace of the ﬁnite-dimensional inner-product space V , then (a) dim V = dim S + dim S ⊥ , (b) S ⊥⊥ = S. PROOF. Conclusion (a) is immediate from the direct-sum decomposition V = S ⊕ S ⊥ of Theorem 3.8. For (b), the deﬁnition of orthogonal complement gives S ⊆ S ⊥⊥ . On the other hand, application of (a) twice shows that S and S ⊥⊥ have the same ﬁnite dimension. By Corollary 2.4, S ⊥⊥ = S. Section II.6 introduced “projection” mappings in the setting of any direct sum of two vector spaces, and we shall use those mappings in connection with the decomposition V = S ⊕ S ⊥ of Theorem 3.8. We make one adjustment in working with the projections, changing their ranges from the image, namely S or S ⊥ , to the larger space V . In effect, a linear map p1 or p2 as in Section II.6 will be replaced by i 1 p1 or i 2 p2 . Speciﬁcally let E : V → V be the linear map that is the identity on S and is 0 on S ⊥ . Then E is called the orthogonal projection of V on S. The linear map I − E is the identity on S ⊥ and is 0 on S. Since S = S ⊥⊥ , I − E is the orthogonal projection of V on S ⊥ . It is the linear map that picks out the S ⊥ component relative to the direct-sum decomposition V = S ⊥ ⊕ S ⊥⊥ . Proposition 3.3 and Corollary 3.4 can be restated in terms of orthogonal projections. Corollary 3.10. Let V be a ﬁnite-dimensional inner-product space, let S be a vector subspace of V , let {u 1 , . . . , u k } be an orthonormal basis of S, and let E be the orthogonal projection of V on S. If v is in V , then E(v) =

k

(v, u j )u j

j=1

and

E(v)2 =

k j=1

|(v, u j )|2 .

1. Inner Products and Orthonormal Sets

97

The vector v ⊥ in the expansion v = kj=1 (v, u j )u j + v ⊥ of Proposition 3.3 is equal to (I − E)v, and the equality of norms v2 =

k

|(v, u j )|2 + v ⊥ 2

j=1

has the interpretations that v2 = E(v)2 + (I − E)v2 and that equality holds in Bessel’s inequality if and only if E(v) = v. PROOF. Write v = kj=1 (v, u j )u j + v ⊥ as in Proposition 3.3. Then each u j is in S, and the vector v ⊥ , being orthogonal to each member of a basis of S, is in S ⊥ . This proves the formula for E(v), and the formula for E(v)2 follows by applying Corollary 3.4 to v − v ⊥ . Reassembling v, we now have v = E(v) + v ⊥ , and hence v ⊥ = v − E(v) = (I − E)v. Finally the decomposition v = E(v) + (I − E)(v) is into orthogonal terms, and the Pythagorean Theorem shows that v2 = E(v)2 + (I − E)v2 . Theorem 3.11 (Parseval’s equality). If V is a ﬁnite-dimensional inner-product space, then the following conditions on an orthonormal set {u 1 , . . . , u m } are equivalent: (a) {u 1 , . . . , u m } is a vector-space basis of V , hence an orthonormal basis, (b) the only m vector orthogonal to all of u 1 , . . . , u m is 0, (c) v = j=1 (v, u j )u j for all v in V , (d) v2 = mj=1 |(v, u j )|2 for all v in V , (e) (v, w) = mj=1 (v, u j )(w, u j ) for all v and w in V . PROOF. Let S = span{u 1 , . . . , u m }, and let E be the orthogonal projection of V on S. If (a) holds, then S = V and S ⊥ = 0. Thus (b) holds. If (b) holds, then S ⊥ = 0 and E is the identity. Thus (c) holds by Corollary 3.10. If (c) holds, then Corollary 3.4 shows that (d) holds. If (d) holds, we use polarization to prove (e). Let k be in {0, 2} if F = R, or in {0, 1, 2, 3} if F = C. Conclusion (d) gives us m m |(v + i k w, u j )|2 = v2 + 2 Re (v, u j )i k (w, u j ) + w2 . v + i k w2 = j=1

j=1

Multiplying by i k and summing over k, we obtain m 4(v, w) = 2 i k Re (−i)k (v, u j )(w, u j ) . j=1

k

III. Inner-Product Spaces

98

In the proof of polarization, we saw that 2 k i k Re((−i)k z) = 4z. Hence 4(v, w) = 4 mj=1 (v, u j )(w, u j ). This proves (e). If (e) holds, we take w = v in (e) and apply Corollary 3.10 to see that E(v)2 = v2 for all v. Then (I − E)v2 = 0 for all v, and E(v) = v for all v. Hence S = V , and {u 1 , . . . , u m } is a basis. This proves (a). Theorem 3.12 (Riesz Representation Theorem). If is a linear functional on the ﬁnite-dimensional inner-product space V , then there exists a unique v in V with (u) = (u, v) for all u in V . PROOF. Uniqueness is immediate by subtracting two such expressions, since if (u, v) = 0 for all u, then the special case u = v gives (v, v) = 0 and v = 0. Let us prove existence. If = 0, take v = 0. Otherwise let S = ker . Corollary 2.15 shows that dim S = dim V − 1, and Corollary 3.9a then shows that dim S ⊥ = 1. Let w be a nonzero vector in S ⊥ . This vector w must have (w) = 0 since S ∩ S ⊥ = 0, and we let v be the member of S ⊥ given by

(w) w. w2

(u) w = 0, and hence u − For any u in V , we have u − (w) v=

⊥

v is in S , u − (u, v) =

(u)

(w)

w is orthogonal to v. Thus

(u)

(w)

w is in S. Since

(u)

(w) w2

(w) w, v = w, w =

(u) = (u).

(w)

(w) w2

(w) w2

(u)

This proves existence.

2. Adjoints Throughout this section, V will denote a ﬁnite-dimensional inner-product space with inner product ( · , · ) and with scalars from F, with F equal to R or C. We shall study aspects of linear maps L : V → V related to the inner product on V . The starting point is to associate to any such L another linear map L ∗ : V → V known as the “adjoint” of V , and then to investigate some of its properties. A tool in this investigation will be the scalar-valued function on V × V given by (u, v) → (L(u), v), which captures the information in any matrix of L without requiring the choice of an ordered basis. This function determines L uniquely because an equality (L(u), v) = (L (u), v) for all u and v implies (L(u) − L (u), v) = 0 for all u and v, in particular for v = L(u) − L (u); thus L(u) − L (u)2 = 0 and L(u) = L (u) for all u.

2. Adjoints

99

Proposition 3.13. Let L : V → V be a linear map on the ﬁnite-dimensional inner-product space V . For each u in V , there exists a unique vector L ∗ (u) in V such that for all v in V . (L(v), u) = (v, L ∗ (u)) As u varies, this formula deﬁnes L ∗ as a linear map from V to V . REMARK. The linear map L ∗ : V → V is called the adjoint of L. PROOF. The function v → (L(v), u) is a linear functional on V , and Theorem 3.12 shows that it is given by the inner product with a unique vector of V . Thus we deﬁne L ∗ (u) to be the unique vector of V with (L(v), u) = (v, L ∗ (u)) for all v in V . If c is a scalar, then the uniqueness and the computation (v, L ∗ (cu)) = (L(v), cu) = c(L(v), ¯ u) = c(v, ¯ L ∗ (u)) = (v, cL ∗ (u)) yield L ∗ (cu) = cL ∗ (u). Similarly the uniqueness and the computation (v, L ∗ (u 1 + u 2 )) = (L(v), u 1 + u 2 ) = (L(v), u 1 ) + (L(v), u 2 ) = (v, L ∗ (u 1 )) + (v, L ∗ (u 2 )) = (v, L ∗ (u 1 ) + L ∗ (u 2 )) yield L ∗ (u 1 + u 2 ) = L ∗ (u 1 ) + L ∗ (u 2 ). Therefore L ∗ is linear.

The passage L → L ∗ to the adjoint is a function from HomF (V, V ) to itself that is conjugate linear, and it reverses the order of multiplication: (L 1 L 2 )∗ = L ∗2 L ∗1 . Since the formula (L(v), u) = (v, L ∗ (u)) in the proposition is equivalent to the formula (u, L(v)) = (L ∗ (u), v), we see that L ∗∗ = L. All of the results in Section II.3 concerning the association of matrices to linear maps are applicable here, but our interest now will be in what happens when the bases we use are orthonormal. Recall from Section II.3 that if = (u 1 , . . . , u n) L and = (v1 , . . . , vn ) are any ordered bases of V , then the matrix A =

L(u j ) associated to the linear map L : V → V has Ai j = . i Lemma 3.14. If L : V → V is a linear map on the ﬁnite-dimensional innerproduct space V and if = (u 1 , . . . , u n ) and

= (v 1 , . . . , vn ) are ordered L orthonormal bases of V , then the the matrix A = has Ai j = (L(u j ), vi ). PROOF. Applying Theorem 3.11c, we have

L(u j ) (L(u j ), vi )vi i = Ai j = i i

v = (L(u j ), vi ) i = (L(u j ), vi )δii = (L(u j ), vi ). i i

i

100

III. Inner-Product Spaces

Proposition 3.15. If L : V → V is a linear map on the ﬁnite-dimensional ) are ordered inner-product space V and if = (u 1 , . . . , u n ) and

= (v1 , . . . , vn

L∗ L ∗ of L orthonormal bases of V , then the matrices A = and A = and its adjoint are related by Ai∗j = A ji . PROOF. Lemma 3.14 and the deﬁnition of L ∗ give Ai∗j = (L ∗ (v j ), u i ) = (v j , L(u i )) = (L(u i ), v j ) = A ji . Accordingly, we deﬁne A∗ = A t for any square matrix A, sometimes calling A the adjoint6 of A. A linear map L : V → V is called self-adjoint if L ∗ = L. Correspondingly a square matrix A is self-adjoint if A∗ = A. It is more common, however, to say that a matrix with A∗ = A is symmetric if F = R or Hermitian7 if F = C. A real Hermitian matrix is symmetric, and the term “Hermitian” is thus applicable also when F = R. Any Hermitian matrix A arises from a self-adjoint linear map L. Namely, we take V to be Fn with the usual inner product, and we let and each be the standard ordered basis = (e1 , . . . , en ). This basis is orthonormal, and we deﬁne L by the matrix product L(v) = Av for any column vector v. We know that

L = A. Since A∗ = A, we conclude from Proposition 3.15 that L ∗ = L. Thus we are free to deduce properties of Hermitian matrices from properties of self-adjoint linear maps. Self-adjoint linear maps will be of special interest to us. Nontrivial examples of self-adjoint linear maps, constructed without simply writing down Hermitian matrices, may be produced by the following proposition. ∗

Proposition 3.16. If V is a ﬁnite-dimensional inner-product space and S is a vector subspace of V , then the orthogonal projection E : V → V of V on S is self-adjoint. PROOF. Let v = v1 +v2 and u = u 1 +u 2 be the decompositions of two members of V according to V = S ⊕ S ⊥ . Then we have (v, E ∗ (u)) = (E(v), u) = (v1 , u 1 + u 2 ) = (v1 , u 1 ) = (v, u 1 ) = (v, E(u)), and the proposition follows by the uniqueness in Proposition 3.13. 6 The name “adjoint” happens to coincide with the name for a different notion that arose in connection with Cramer’s rule in Section II.7. The two notions never seem to arise at the same time, and thus no confusion need occur. 7 The term “Hermitian” is used also for a class of linear maps in the inﬁnite-dimensional case, but care is needed because the terms “Hermitian” and “self-adjoint” mean different things in the inﬁnite-dimensional case.

2. Adjoints

101

To understand Proposition 3.16 in terms of matrices, take an ordered orthonormal basis (u 1 , . . . , u r ) of S, and extend it to an ordered orthonormal basis = (u 1 , . . . , u n ) of V . Then uj for j ≤ r, E(u j ) = 0 for j > r,

E(u j ) and hence equals the j th standard basis vector e j if j ≤ r and equals 0 if

E j > r . Consequently the matrix is diagonal with 1’s in the ﬁrst r diagonal entries and 0’s elsewhere. This matrix is equal to its conjugate transpose, as it must be according to Propositions 3.15 and 3.16. Proposition 3.17. If V is a ﬁnite-dimensional inner-product space and L : V → V is a self-adjoint linear map, then (L(v), v) is in R for every v in V , and consequently every eigenvalue of L is in R. Conversely if F = C and if L : V → V is a linear map such that (L(v), v) is in R for every v in V , then L is self-adjoint. ◦ REMARK. The hypothesis F = C is essential in the converse. In fact, the 90 01 rotation L of R2 whose matrix in the standard basis is −1 0 is not self-adjoint

but does have L(v) · v = 0 for every v in R2 .

PROOF. If L = L ∗ , then (L(v), v) = (v, L ∗ (v)) = (v, L(v)) = (L(v), v), and hence (L(v), v) is real-valued. If v is an eigenvector with eigenvalue λ, then substitution of L(v) = λv into (L(v), v) = (L(v), v) gives λv2 = λ¯ v2 . Since v = 0, λ must be real. For the converse we begin with the special case that (L(w), w) = 0 for all w. For 0 ≤ k ≤ 3, we then have (−i)k (L(u), v)+i k (L(v), u) = (L(u+i k v), u+i k v)−(L(u), u)−(L(v), v) = 0. Taking k = 0 gives (L(u), v) + (L(v), u) = 0, while taking k = 1 gives (L(u), v) − (L(v), u) = 0. Hence (L(u), v) = 0 for all u and v. Since the function (u, v) → L(u, v) determines L, we obtain L = 0. In the general case, (L(v), v) real-valued implies that (L(v), v) = (L ∗ (v), v) for all v. Therefore ((L − L ∗ )(v), v) = 0 for all v, and the special case shows that L − L ∗ = 0. This completes the proof. We conclude this section by examining one further class of linear maps having a special relationship with their adjoints.

III. Inner-Product Spaces

102

Proposition 3.18. If V is a ﬁnite-dimensional inner-product space, then the following conditions on a linear map L : V → V are equivalent: (a) L ∗ L = I , (b) L carries some orthonormal basis of V to an orthonormal basis, (c) L carries each orthonormal basis of V to an orthonormal basis, (d) (L(u), L(v)) = (u, v) for all u and v in V , (e) L(v) = v for all v in V . REMARK. A linear map satisfying these equivalent conditions is said to be orthogonal if F = R and unitary if F = C. PROOF. We prove that (a), (d), and (e) are equivalent and that (b), (c), and (d) are equivalent. If (a) holds and u and v are given in V , then (L(u), L(v)) = (L ∗ L(u), v) = (I (u), v) = (u, v), and (d) holds. If (d) holds, then setting u = v shows that (e) holds. If (e) holds, we use polarization twice to write (L(u), L(v)) = =

1 k k 2 k 4 i L(u) + i L(v) = 1 k k 2 k 4 i u + i v = (u, v).

1 k k 4 i L(u

+ i k v)2

Then ((L ∗ L − I )(u), v) = 0 for all u and v, and we conclude that (a) holds. Since (b) is a special case of (c) and (c) is a special case of (d), proving that (b) implies (d) will prove that (b), (c), and (d) are equivalent. Thus let {u 1 , . . . , u n } be an orthonormal basis of V such that {L(u 1 ), . . . , L(u n )} is an orthonormal basis, and let u and v be given. Then (L(u), L(v)) = L i (u, u i )u i , L j (v, u j )u j = i, j (u, u i )(v, u j )(L(u i ), L(u j )) = i, j (u, u i )(v, u j )δi j = i (u, u i )(v, u i ) = (u, v), the last equality following from Parseval’s equality (Theorem 3.11).

As with self-adjointness, we use the geometrically meaningful deﬁnition for linear maps to obtain a deﬁnition for matrices: a square matrix A with A∗ A = I is said to be orthogonal if F = R and unitary if F = C. The condition is that A is invertible and its equals its adjoint. In terms of individual entries, inverse ∗ Ak j = δi j , hence that k Aki Ak j = δi j . This is the the condition is that k Aik condition that the columns of A form an orthonormal basis relative to the usual inner product on Rn or Cn . A real unitary matrix is orthogonal. If A is an orthogonal or unitary matrix, we can construct a corresponding orthogonal or unitary linear map on Rn or Cn relative to the standard ordered

2. Adjoints

103

basis . Namely, we deﬁne L(v) = Av, and Proposition 3.15 shows that L is orthogonal or unitary: L ∗ L(v) = A∗ Av = I v = v. Proposition 3.19 below gives a converse. Let us notice that an orthogonal or unitary matrix A necessarily has | det A| = 1. In fact, the formula A∗ = (A)t implies that det A∗ = det A. Then 1 = det I = det A∗ A = det A∗ det A = det A det A = | det A|2 . An orthogonal matrix thus has determinant ±1, while we conclude for a unitary matrix only that the determinant is a complex number of absolute value 1. EXAMPLES. (1)The 2-by-2orthogonal matrices of determinant +1 are all matrices of the cos θ sin θ form − sin θ cos θ . The 2-by-2 orthogonal matrices of determinant −1 are the 1 0 product of 0 −1 and the 2-by-2 orthogonal matrices of determinant +1. 2-by-2 unitary matrices of determinant +1 are all matrices of the form (2) The α β with |α|2 +|β|2 = 1; these may be regarded as parametrizing the points of −β¯ α¯ the unit sphere S 3 of R4 . The unitary matrices of arbitrary determinant are 2-by-2 1 0 the products of all matrices 0 eiθ and the 2-by-2 unitary matrices of determinant +1. Proposition 3.19. If V is a ﬁnite-dimensional inner-product space, if = (u 1 , . . . , u n ) and = (v1 , . . . , vn ) are ordered orthonormal bases of V , and if L : V → V is a linear

map that is orthogonal if F = R and unitary if F = C, L then the matrix A = is orthogonal or unitary.

∗

L L ∗ = PROOF. Proposition 3.15 and Theorem 2.16 give A A =

I , and the right side is the identity matrix, as required.

I One consequence of Proposition 3.19 is that any matrix relative to two ordered orthonormal bases is orthogonal or unitary, since the identity function I : V → V is certainly orthogonal or unitary. Thus a change from writing the matrix of a linear map L in one ordered orthonormal basis to writing the matrix of L in orthonormal basis is implemented by the formula

another ordered L I L −1 C, where C is the orthogonal or unitary matrix . =C

104

III. Inner-Product Spaces

L Another consequence of Proposition 3.19 is that the matrix of an orthogonal or unitary linear map L in an ordered orthonormal basis is an orthogonal or unitary matrix. We have deﬁned det L to be the determinant of

L relative to any , and we conclude that | det L| = 1. 3. Spectral Theorem In this section we deal with the geometric structure of certain kinds of linear maps from ﬁnite-dimensional inner-product spaces into themselves. We shall see that linear maps that are self-adjoint or unitary, among other possible conditions, have bases of eigenvectors in the sense of Section II.8. Moreover, such a basis may be taken to be orthonormal. When an ordered basis of eigenvectors is used for expressing the linear map as a matrix, the result is that the matrix is diagonal. Thus these linear maps have an especially uncomplicated structure. In terms of matrices, the result is that a Hermitian or unitary matrix A is similar to a diagonal matrix D, and the matrix C with D = C −1 AC may be taken to be unitary. We begin with a lemma. Lemma 3.20. If L : V → V is a self-adjoint linear map on an inner-product space V , then v → (L(v), v) is real-valued, every eigenvalue of L is real, eigenvalues under L for distinct eigenvalues are orthogonal, and every vector subspace S of V with L(S) ⊆ S has L(S ⊥ ) ⊆ S ⊥ . PROOF. The ﬁrst two conclusions are contained in Proposition 3.17. If v1 and v2 are eigenvectors of L with distinct real eigenvalues λ1 and λ2 , then (λ1 − λ2 )(v1 , v2 ) = (λ1 v1 , v2 ) − (v1 , λ2 v2 ) = (L(v1 ), v2 ) − (v1 , L(v2 )) = 0. Since λ1 = λ2 , we must have (v1 , v2 ) = 0. If S is a vector subspace with L(S) ⊆ S, then also L(S ⊥ ) ⊆ S ⊥ because s ∈ S and s ⊥ ∈ S ⊥ together imply 0 = (L(s), s ⊥ ) = (s, L(s ⊥ )).

Theorem 3.21 (Spectral Theorem). Let L : V → V be a self-adjoint linear map on an inner-product space V . Then V has an orthonormal basis of eigenvectors of L. In addition, for each scalar λ, let Vλ = {v ∈ V | L(v) = λv}, so that Vλ when nonzero is the eigenspace of L for the eigenvalue λ. Then the eigenvalues of L are all real, the vector subspaces Vλ are mutually orthogonal,

3. Spectral Theorem

105

and any orthonormal basis of V of eigenvectors of L is the union of orthonormal bases of the Vλ ’s. Correspondingly if A is any Hermitian n-by-n matrix, then there exists a unitary matrix C such that C −1 AC is diagonal with real entries. If the matrix A has real entries, then C may be taken to be an orthogonal matrix. PROOF. Lemma 3.20 shows that the eigenvalues of L are all real and that the vector subspaces Vλ are mutually orthogonal. To proceed further, we ﬁrst assume that F = C. Applying the Fundamental Theorem of Algebra (Theorem 1.18) to the characteristic polynomial of L, we see that L has at least one eigenvalue, say λ1 . Then L(Vλ1 ) ⊆ Vλ1 , and Lemma 3.20 shows that L((Vλ1 )⊥ ) ⊆ (Vλ1 )⊥ . The vector subspace (Vλ1 )⊥ is an inner-product space, and the claim is that L (Vλ )⊥ is self-adjoint. In fact, if v1 and v2 are in (Vλ1 )⊥ , then (L (Vλ

)⊥ 1

1

)∗ (v1 ), v2 = v1 , L (Vλ

(v2 ) = (v1 , L(v2 )) = (L(v1 ), v2 ) = L (Vλ )⊥ (v1 ), v2 , )⊥ 1

1

and the claim is proved. Since λ1 is an eigenvalue of L, dim(Vλ1 )⊥ < dim V . Therefore we can now set up an induction that ultimately exhibits V as an orthogonal direct sum V = Vλ1 ⊕ · · · ⊕ Vλk . If v is an eigenvector of L with eigenvalue λ , then either λ = λ j for some j in this decomposition, in which case v is in Vλj , or λ is not equal to any λ j , in which case v, by the lemma, is orthogonal to all vectors in Vλ1 ⊕ · · · ⊕ Vλk , hence to all vectors in V ; being orthogonal to all vectors in V , v must be 0. Choosing an orthonormal basis for each Vλj and taking their union provides an orthonormal basis of eigenvectors and completes the proof for L when F = C. Next assume that A is a Hermitian n-by-n matrix. We deﬁne a linear map L : Cn → Cn by L(v) = Av, and we know from Proposition 3.15 that L is selfadjoint. The case just proved shows that L has an ordered orthonormal basis of eigenvectors, all the

eigenvalues being real. If denotes the standard ordered L is diagonal with real entries and is equal to basis of Cn , then D =

I

L

I

= C −1 AC,

L . The matrix C is unitary by Proposition 3.19, and the formula D = C −1 AC shows that A is as asserted. Now let us return to L and suppose that F = R. The idea is to use the same argument as above in the case that F = C, but we need a substitute for

where C =

106

III. Inner-Product Spaces

the use of the Fundamental Theorem of Algebra. Fixing any orthonormal basis of V , let A be the matrix of L. Then A is Hermitian with real entries. The previous paragraph shows that any Hermitian matrix, whether or not real, has a characteristic polynomial that splits as a product mj=1 (λ − r j )m j with all r j real. Consequently L has this property as well. Thus any self-adjoint L when F = R has an eigenvalue. Returning to the argument for L above when F = C, we readily see that it now applies when F = R. Finally if A is a Hermitian matrix with real entries, then we can deﬁne a selfadjoint linear map L : Rn → Rn by L(v) = Av, obtain an orthonormal basis of eigenvectors for L, and argue as above to obtain D = C −1 AC, where D is diagonal and C is unitary. The matrix C has columns that are eigenvectors in Rn of the associated L, and these have real entries. Thus C is orthogonal. An important application of the Spectral Theorem is to the formation of a square root for any “positive semideﬁnite” linear map. We say that a linear map L : V → V on a ﬁnite-dimensional inner-product space is positive semideﬁnite if L ∗ = L and (L(v), v) ≥ 0 for all v in V . If F = C, then the condition L ∗ = L is redundant, according to Proposition 3.17, but that fact will not be important for us. Similarly an n-by-n matrix A is positive semideﬁnite if A∗ = A and x¯ t Ax ≥ 0 for all column vectors x. An example of a positive semideﬁnite n-by-n matrix is any matrix A = B ∗ B, where B is an arbitrary k-by-n matrix. In fact, if x is in Fn , then x¯ t B ∗ Bx = (Bx)t (Bx), and the right side is ≥ 0, being a sum of absolute values squared. Corollary 3.22. Let L : V → V be a positive semideﬁnite linear map on a ﬁnite-dimensional inner-product space, and let A be an n-by-n Hermitian matrix. Then (a) L or A is positive semideﬁnite if and only if all of its eigenvalues are ≥ 0. (b) whenever L or A is positive semideﬁnite, L or A is invertible if and only if (L(v), v) > 0 for all v = 0 or x¯ t Ax > 0 for all x = 0. (c) whenever L or A is positive semideﬁnite, L or A has a unique positive semideﬁnite square root. REMARKS. A positive semideﬁnite linear map or matrix satisfying the condition in (b) is said to be positive deﬁnite, and the content of (b) is that a positive semideﬁnite linear map or matrix is positive deﬁnite if and only if it is invertible. PROOF. We apply the Spectral Theorem (Theorem 3.21). For each conclusion the result for a matrix A is a special case of the result for the linear map L, and it is enough to treat only L. In (a), let (u 1 , . . . , u n ) be an ordered basis of eigen-

3. Spectral Theorem

107

vectors with respective eigenvalues λ1 , . . . , λn , not necessarily distinct. Then (L(u j ), u j ) = λ j shows the necessity of having λ j ≥ 0, while the computation (L(v), v) = L i (v, u i )u i , j (v, u j )u j = i λi (v, u i )u i , j (v, u j )u j = i λi |(v, u i )|2 shows the sufﬁciency. In (b), if L fails to be invertible, then 0 is an eigenvalue for some eigenvector v = 0, and v has (L(v), v) = 0. Conversely if L is invertible, then all the eigenvalues λi are > 0 by (a), and the computation in (a) yields λi |(v, u i )|2 ≥ min λ j |(v, u i )|2 = min λ j v2 , (L(v), v) = j

i

j

i

the last step following from Parseval’s equality (Theorem 3.11). For existence in (c), the Spectral Theorem says that there exists an ordered orthonormal basis = (u 1 , . . . , u n ) of eigenvectors of L, say with respective eigenvalues λ1 , . . . , λn . The eigenvalues are all ≥ 0 by (a). The linear extension 1/2 of the function P with P(u j ) = λ j u j is given by P(v) =

n

1/2

λ j (v, u j )u j ,

j=1

and it has P 2 (v) =

j

λ j (v, u j )u j =

j

(v, u j )L(u j ) = L

j

(v, u j )u j = L(v).

Thus P 2 = L. Relative to , we have

P 1/2 = (P(u j ), u 1 )u 1 + · · · + (P(u j ), u n )u n i = (P(u j ), u i ) = λ j δi j , i j and this is a Hermitian matrix; Proposition 3.15 therefore shows that P ∗ = P. Finally 1/2 1/2 2 (P(v), v) = i λi (v, u i )u i , j (v, u j )u j = λi |(v, u i )| ≥ 0, and thus P is positive semideﬁnite. This proves existence. For uniqueness in (c), let P satisfy P ∗ = P and P 2 = L, and suppose P is positive semideﬁnite. Choose an orthonormal basis of eigenvectors u 1 , . . . , u n of P, say with eigenvalues c1 , . . . , cn , all ≥ 0. Then L(u j ) = P 2 (u j ) = c2j u j , and we see that u 1 , . . . , u n form an orthonormal basis of eigenvectors of L with eigenvalues c2j . On the space where L acts as the scalar λi , P must therefore act 1/2

as the scalar λi . We conclude that P is unique.

108

III. Inner-Product Spaces

The technique of proof of (c) allows one, more generally, to deﬁne f (L) for any function f : R → C whenever L is self-adjoint. Actually, the function f needs to be deﬁned only on the set of eigenvalues of L for the deﬁnition to make sense. At the end of this section, we shall use the existence of the square root in (c) to obtain the so-called “polar decomposition” of square matrices. But before doing that, let us mine three additional easy consequences of the Spectral Theorem. The ﬁrst deals with several self-adjoint linear maps rather than one, and the other two apply that conclusion to deal with single linear maps that are not necessarily self-adjoint. Corollary 3.23. Let V be a ﬁnite-dimensional inner-product space, and let L 1 , . . . , L m be self-adjoint linear maps from V to V that commute in the sense that L i L j = L j L i for all i and j. Then V has an orthonormal basis of simultaneous eigenvectors of L 1 , . . . , L m . In addition, for each m-tuple of scalars λ1 , . . . , λm , let Vλ1 ,...,λm = {v ∈ V | L j (v) = λ j v for 1 ≤ j ≤ m} consist of 0 and the simultaneous eigenvectors of L 1 , . . . , L m corresponding to λ1 , . . . , λm . Then all the eigenvalues λ j are real, the vector subspaces Vλ1 ,...,λm are mutually orthogonal, and any orthonormal basis of V of simultaneous eigenvectors of L 1 , . . . , L m is the union of orthonormal bases of the Vλ1 ,...,λm ’s. Correspondingly if A1 , . . . , Am are commuting Hermitian n-by-n matrices, then there exists a unitary matrix C such that C −1 A j C is diagonal with real entries for all j. If all the matrices A j have real entries, then C may be taken to be an orthogonal matrix. PROOF. This follows by iterating the Spectral Theorem (Theorem 3.21). In fact, let {Vλ1 } be the system of vector subspaces produced by the theorem for L 1 . For each j, the commutativity of the linear maps L i forces L 1 (L i (v)) = L i (L 1 (v)) = L i (λ1 v) = λ1 L i (v)

for v ∈ Vλ1 ,

and thus L i (Vλ1 ) ⊆ Vλ1 . The restrictions of L 1 , . . . , L m to Vλ1 are self-adjoint and commute. Let {Vλ1 ,λ2 } be the system of vector subspaces produced by the Spectral Theorem for L 2 Vλ . Each of these, by the commutativity, is carried 1 into itself by L 3 , . . . , L m , and the restrictions of L 3 , . . . , L m to Vλ1 ,λ2 form a commuting family of self-adjoint linear maps. Continuing in this way, we arrive at the decomposition asserted by the corollary for L 1 , . . . , L m . The assertion of the corollary about commuting Hermitian matrices is a special case, in the same way that the assertions in Theorem 3.21 about matrices were special cases of the assertions about linear maps.

3. Spectral Theorem

109

A linear map L : V → V , not necessarily self-adjoint, is said to be normal if L commutes with its adjoint: L L ∗ = L ∗ L. Corollary 3.24. Suppose that F = C, and let L : V → V be a normal linear map on the ﬁnite-dimensional inner-product space V . Then V has an orthonormal basis of eigenvectors of L. In addition, for each complex scalar λ, let Vλ = {v ∈ V | L(v) = λv}, so that Vλ when nonzero is the eigenspace of L for the eigenvalue λ. Then the vector subspaces Vλ are mutually orthogonal, and any orthonormal basis of V of eigenvectors of L is the union of orthonormal bases of the Vλ ’s. Correspondingly if A is any n-by-n complex matrix such that A A∗ = A∗ A, then there exists a unitary matrix C such that C −1 AC is diagonal. if F = R: for the linear map L : R2 → R2 REMARK. The corollaryfails 01 with L(v) = Av and A = −1 0 , L ∗ = L −1 commutes with L, but L has no

eigenvectors in R2 since the characteristic polynomial λ2 + 1 has no ﬁrst-degree factors with real coefﬁcients. PROOF. The point is that L = 12 (L + L ∗ ) +i 2i1 (L − L ∗ ) and that 12 (L + L ∗ ) and 2i1 (L − L ∗ ) are self-adjoint. If L commutes with L ∗ , then T1 = 12 (L + L ∗ ) and T2 = 2i1 (L − L ∗ ) commute with each other. We apply Corollary 3.23 to the commuting self-adjoint linear maps T1 and T2 . The vector subspace Vα,β produced by Corollary 3.23 coincides with the vector subspace Vα+iβ deﬁned in the present corollary, and the result for L follows. The result for matrices is a special case. Corollary 3.25. Suppose that F = C, and let L : V → V be a unitary linear map on the ﬁnite-dimensional inner-product space V . Then V has an orthonormal basis of eigenvectors of L. In addition, for each complex scalar λ, let Vλ = {v ∈ V | L(v) = λv}, so that Vλ when nonzero is the eigenspace of L for the eigenvalue λ. Then the eigenvalues of L all have absolute value 1, the vector subspaces Vλ are mutually orthogonal, and any orthonormal basis of V of eigenvectors of L is the union of orthonormal bases of the Vλ ’s. Correspondingly if A is any n-by-n unitary matrix, then there exists a unitary matrix C such that C −1 AC is diagonal; the diagonal entries of C −1 AC all have absolute value 1. PROOF. This is a special case of Corollary 3.24 since a unitary linear map L has L L ∗ = I = L ∗ L. The eigenvalues all have absolute value 1 as a consequence of Proposition 3.18e.

110

III. Inner-Product Spaces

Now we come to the polar decomposition of linear maps and of matrices. When F = C, this is a generalization of the polar decomposition z = eiθ r of complex numbers. When F = R, it generalizes the decomposition x = (sgn x)|x| of real numbers. Theorem 3.26 (polar decomposition). If L : V → V is a linear map on a ﬁnite-dimensional inner-product space, then L decomposes as L = U P, where P is positive semideﬁnite and U is orthogonal if F = R and unitary if F = C. The linear map P is unique, and U is unique if L is invertible. Correspondingly any n-by-n matrix A decomposes as A = U P, where P is a positive semideﬁnite matrix and U is an orthogonal matrix if F = R and a unitary matrix if F = C. The matrix P is unique, and U is unique if A is invertible. REMARKS. As we have already seen in other situations, the motivation for the proof comes from the uniqueness. PROOF OF UNIQUENESS. Let L = U P = U P . Then L ∗ L = P 2 = P 2 . The linear map L ∗ L is positive semideﬁnite since its adjoint is (L ∗ L)∗ = L ∗ L ∗∗ = L ∗ L and since (L ∗ L(v), v) = (L(v), L(v)) ≥ 0. Therefore Corollary 3.22c shows that L ∗ L has a unique positive semideﬁnite square root. Hence P = P . If L is invertible, then P is invertible and L = U P implies that U = L P −1 . The same argument applies in the case of matrices. PROOF OF EXISTENCE. If L is given, then we have just seen that L ∗ L is positive semideﬁnite. Let P be its unique positive semideﬁnite square root. The proof is clearer when L is invertible, and we consider that case ﬁrst. Then we can set U = L P −1 . Since U ∗ = (P −1 )∗ L ∗ = P −1 L ∗ , we ﬁnd that U ∗ U = P −1 L ∗ L P −1 = P −1 P 2 P −1 = I , and we conclude that U is unitary. When L is not necessarily invertible, we argue a little differently with the positive semideﬁnite square root P of L ∗ L. The kernel K of P is the 0 eigenspace of P, and the Spectral Theorem (Theorem 3.21) shows that the image of P is the sum of all the other eigenspaces and is just K ⊥ . Since K ∩ K ⊥ = 0, P is one-one from K ⊥ onto itself. Thus P(v) → L(v) is a one-one linear map from K ⊥ into V . Call this function U , so that U (P(v)) = L(v). For any v1 and v2 in V , we have (L(v1 ), L(v2 )) = (L ∗ L(v1 ), v2 ) = (P 2 (v1 ), v2 ) = (P(v1 ), P(v2 )),

(∗)

and hence U : K ⊥ → V preserves inner products. Let {u 1 , . . . , u k } be an orthonormal basis of K ⊥ , and let {u k+1 , . . . , u n } be an orthonormal basis of K . Since U preserves inner products and is linear, {U (u 1 ), . . . , U (u k )} is an orthonormal basis of U (K ⊥ ). Extend {U (u 1 ), . . . , U (u k )} to an orthonormal basis of V by adjoining vectors vk+1 , . . . , vn , deﬁne U (u j ) = v j for k + 1 ≤

4. Problems

111

j ≤ n, and write U also for the linear extension to all of V . Since U carries one orthonormal basis {u 1 , . . . , u n } of V to another, U is unitary. We have U P = L on K ⊥ , and equation (∗) with v1 = v2 shows that ker L = ker P = K . Therefore U P = L everywhere.

4. Problems 1.

Let V = Mnn (C), and deﬁne an inner product on V by A, B = Tr(B ∗ A). The norm · HS obtained from this inner product is called the Hilbert–Schmidt norm of the matrix in question. (a) Prove that A2HS = i, j |Ai j |2 for A in V . (b) Let E i j be the matrix that is 1 in the (i, j)th entry and is 0 elsewhere. Prove that the set of all E i j is an orthonormal basis of V . (c) Interpret (a) in the light of (b). (d) Prove that the Hilbert–Schmidt norm is given on any matrix A in V by A2HS =

j

Au j 2 =

i, j

|vi∗ Au j |2 ,

where {u 1 , . . . , u n } and {v1 , . . . , vn } are any orthonormal bases of Cn and v ∗ refers to the conjugate transpose of any member v of Cn . (e) Let W be the vector subspace of all diagonal matrices in V . Describe explicitly the orthogonal complement W ⊥ , and ﬁnd its dimension. 2.

Let Vn be the inner-product space over R of all polynomials on [0, 1] of degree ≤ n with real coefﬁcients. (The 0 polynomial is to be included.) The Riesz Representation Theorem says that there is a unique polynomial pn such that &1 f 12 = 0 f (x) pn (x) d x for all f in Vn . Set up a system of linear equations whose solution tells what pn is.

3.

Let V be a ﬁnite-dimensional inner-product space, and suppose that L and M are self-adjoint linear maps from V to V . Show that L M is self-adjoint if and only if L M = M L.

4.

Let V be a ﬁnite-dimensional inner-product space. If L : V → V is a linear map with adjoint L ∗ , prove that ker L = (image L ∗ )⊥ .

5.

Find all 2-by-2 Hermitian matrices A with characteristic polynomial λ2 + 4λ + 6.

6.

Let V1 and V2 be ﬁnite-dimensional inner-product spaces over the same F, the inner products being ( · , · )1 and ( · , · )2 . (a) Using the case when V1 = V2 as a model, deﬁne the adjoint of a linear map L : V1 → V2 , proving its existence. The adjoint is to be a linear map L ∗ : V2 → V1 .

112

III. Inner-Product Spaces

(b) If is an orthonormal basis of V1 and is an orthonormal basis of V2 , prove that the matrices of L and L ∗ in these bases are conjugate transposes of one another. 7.

Suppose that a ﬁnite-dimensional inner-product space V is a direct sum V = S ⊕ T of vector subspaces. Let E : V → V be the linear map that is the identity on S and is 0 on T . (a) Prove that V = S ⊥ ⊕ T ⊥ . (b) Prove that E ∗ : V → V is the linear map that is the identity on T ⊥ and is 0 on S ⊥ .

8.

(Iwasawa decomposition) Let g be an invertible n-by-n complex matrix. Apply the Gram–Schmidt orthogonalization process to the basis {ge1 , . . . , gen }, where {e1 , . . . , en } is the standard basis, and let the resulting orthonormal basis be {v1 , . . . , vn }. Deﬁne an invertible n-by-n matrix k such that k −1 v j = e j for 1 ≤ j ≤ n. Prove that k −1 g is upper triangular with positive diagonal entries, and conclude that g = k(k −1 g) exhibits g as the product of a unitary matrix and an upper triangular matrix whose diagonal entries are positive.

9.

Let A be an n-by-n positive deﬁnite matrix. (a) Prove that det A > 0. (b) Prove for any subset of integers 1 ≤ i 1 < i 2 < · · · < i k ≤ n that the submatrix of A built from rows and columns indexed by (i 1 , . . . , i k ) is positive deﬁnite.

10. Prove that if A is a positive deﬁnite n-by-n matrix, then there exists an n-by-n upper-triangular matrix B with positive diagonal entries such that A = B ∗ B. 11. The most general 2-by-2 Hermitian matrix is of the form A = ab¯ db with a and d real and with b complex. Find a diagonal matrix D and a unitary matrix U such that D = U −1 AU . 12. In the previous problem, (a) what conditions on A make A positive deﬁnite? (b) when A is positive deﬁnite, how can its positive deﬁnite square root be computed explicitly? 13. Prove that if an n-by-n real symmetric matrix A has v t Av = 0 for all v in Rn , then A = 0. 14. Let L : Cn → Cn be a self-adjoint linear map. Show for each x ∈ Cn that there is some y ∈ Cn such that (I − L)2 (y) = (I − L)(x). 15. In the polar decomposition L = U P, prove that if P and U commute, then L is normal. 16. Let V be an n-dimensional inner-product space over R. What is the largest possible dimension of a commuting family of self-adjoint linear maps L : V → V ?

4. Problems

113

17. Let v1 , . . . , vn be an ordered list of vectors in an inner-product space. The associated Gram matrix is the Hermitian matrix of inner products given by G(v1 , . . . , vn ) = [(vi , v j )], and det G(v1 , . . . , vn ) is called its Gram c1 determinant. .. t (a) If c1 , . . . , cn are in C, let c = . . Prove that c G(v1 , . . . , vn )c¯ = cn

c1 v1 + · · · + cn vn 2 , and conclude that G(v1 , . . . , vn ) is positive semideﬁnite. (b) Prove that det G(v1 , . . . , vn ) ≥ 0 with equality if and only if v1 , . . . , vn are linearly dependent. (This generalizes the Schwarz inequality.) (c) Under what circumstances does equality hold in the Schwarz inequality? Problems 18–23 introduce the Legendre polynomials and establish some of their elementary properties, including their orthogonality under the inner product P, Q = &1 −1 P(x)Q(x) d x. They form the simplest family of classical orthogonal polynomials. They are uniquely determined by the conditions that the n th one Pn , for n ≥ 0, is of degree n, they are orthogonal under · , · , and they are normalized so that Pn (1) = 1. But these conditions are a little hard to work with initially, and instead we adopt the recursive deﬁnition P0 (x) = 1, P1 (x) = x, and (n + 1)Pn+1 (x) = (2n + 1)x Pn (x) − n Pn−1 (x)

for n ≥ 1.

18. (a) Prove that Pn (x) has degree n, that Pn (−x) = (−1)n Pn (x), and that Pn (1) = 1. In particular, Pn is an even function if n is even and is an odd function if n is odd. (b) Let c(n) be the constant term of Pn if n is even and the coefﬁcient of x if n is odd, so that c(0) = c(1) = 1. Prove that c(n) = − n−n 1 c(n−2) for n ≥ 2. 19. This part establishes a useful concrete formula for Pn (x). Let D = d/d x and X = x 2 −1, writing X = 2x, X = 2, and X = 0 for the Two parts derivatives. of this problem make use of the Leibniz rule D n ( f g) = nk=0 nk (D n−k f )(D k g) for higher-order derivatives of a product. (a) Verify that D 2 (X n+1 ) = (2n + 1)D(X n X ) − n(2n + 1)X X n − 4n 2 X n−1 . (b) By applying D n−1 to the result of (a) and rearranging terms, show that D n+1 (X n+1 ) = (2n + 1)X D n (X n ) − 4n 2 D n−1 (X n−1 ). (c) Put Rn (x) = (2n n!)−1 D n (X n ) for n ≥ 0. Show that R0 (x) = 1, R1 (x) = x, and (n + 1)Rn+1 (x) = (2n + 1)x Rn (x) − n Rn−1 (x) for n ≥ 1. n (d) (Rodrigues’s formula) Conclude that 2n n!Pn (x) = ddx [(x 2 − 1)n ]. 20. Using Rodrigues’s formula and iterated integration by parts, prove that &1 for m < n. −1 Pm (x)Pn (x) d x = 0 Conclude that {P0 , P1 , . . . , Pn } is an orthogonal basis of the inner-product space of polynomials on [−1, 1] of degree ≤ n with inner product · , · .

114

III. Inner-Product Spaces

21. Arguing as in the previous problem and taking for granted that −1 2(2n n !)2 , prove that Pn , Pn = n + 12 . (2n+1)!

&1 −1

(1−x 2 )n d x =

22. This problem shows that Pn (x) satisﬁes a certain second-order differential equation. Let D = d/d x. The ﬁrst two parts of this problem use the Leibniz rule quoted in Problem 19. Let X = x 2 − 1 and K n = 2n n!, so that Rodrigues’s formula says that K n Pn = D n (X n ). (a) Expand D n+1 [(D(X n ))X ] by the Leibniz rule. (b) Observe that (D(X n ))X = n X n X , and expand D n+1 [(n X n )X ] by the Leibniz rule. (c) Equating the results of the previous two parts, conclude that y = Pn (x) satisﬁes the differential equation (1 − x 2 )y − 2x y + n(n + 1)y = 0. 23. Let Pn (x) = nk=0 ck x k . Using the differential equation, show that the coefﬁcients ck satisfy k(k − 1)ck = [(k − 2)(k − 1) − n(n + 1)]ck−2 for k ≥ 2 and that ck = 0 unless n − k is even. Problems 24–28 concern the complex conjugate of an inner-product space over C. For any ﬁnite-dimensional inner-product space V , the Riesz Representation Theorem identiﬁes the dual V with V , saying that each member of V is given by taking the inner product with some member of V . When the scalars are real, this identiﬁcation is linear; thus the Riesz theorem uses the inner product to construct a canonical isomorphism of V onto V . When the scalars are complex, the identiﬁcation is conjugate linear, and we do not get an isomorphism of V with V . The complex conjugate of V provides a substitute result. 24. Let V be a ﬁnite-dimensional vector space over C. Deﬁne a new complex vector space V as follows: The elements of V are the elements of V , and the deﬁnition of addition is unchanged. However, there is a change in the deﬁnition of scalar multiplication, in that if v is in V , then the product cv in V is to equal the product cv ¯ in V . Verify that V is indeed a complex vector space. 25. If V is a complex vector space and L : V → V is a linear map, deﬁne L : V → V to be the same function as L. Prove that L is linear. 26. Suppose that the complex vector space V is actually a ﬁnite-dimensional innerproduct space, with inner product ( · , · )V . Deﬁne (u, v)V = (v, u)V . Verify that V is an inner-product space. 27. With V as in the previous problem, show that the Riesz Representation Theorem uses the inner product to set up a canonical isomorphism of V with V . 28. With V and V as in the two previous problems, let L : V → V be linear, so that (L)∗ : V → V is linear. Under the identiﬁcation of the previous problem of V with V , show that (L)∗ corresponds to the contragredient L t as deﬁned in Section II.4.

4. Problems

115

Problems 29–32 use inner-product spaces to obtain a decomposition of polynomials in several variables. A real-valued polynomial function p in x1 , . . . , xn is said to be homogeneous of degree N if every monomial in p has total degree N . Let VN be the space of real-valued polynomials in x1 , . . . , xn homogeneous of degree N . For any homogeneous polynomial p, we deﬁne a differential operator ∂( p) with constant coefﬁcients by requiring that ∂( · ) be linear in ( · ) and that ∂(x1k1 · · · xnkn ) =

∂ k1 +···+kn ∂ x1k1 · · · ∂ xnkn

.

For example, if |x|2 stands for x12 + · · · + xn2 , then ∂(|x|2 ) = =

∂2 ∂ x12

+ ··· +

∂2 . ∂ xn2

If p and q are in the same VN , then ∂(q) p is a constant polynomial, and we deﬁne p, q to be that constant. Then · , · is bilinear. 29. (a) Prove that · , · satisﬁes p, q = q, p. (b) Prove that x1k1 · · · xnkn , x1l1 · · · xnln is positive if (k1 , . . . , kn ) = (l1 , . . . , ln ) and is 0 otherwise. (c) Deduce that · , · is an inner product on VN . 30. Call p ∈ VN harmonic if ∂(|x|2 ) p = 0, and let HN be the vector subspace of harmonic polynomials. Prove that the orthogonal complement of |x|2 VN −2 in VN relative to · , · is HN . 31. Deduce from Problem 30 that each p ∈ VN decomposes uniquely as p = h N + |x|2 h N −2 + |x|4 h N −4 + · · · with h N , h N −2 , h N −4 , . . . homogeneous harmonic of the indicated degrees. 32. For n = 2, describe a computational procedure for decomposing the element x14 + x24 of V4 as in Problem 31. Problems 33–34 concern products of n-by-n positive semideﬁnite matrices. They make use of Problem 26 in Chapter II, which says that det(λI −C D) = det(λI − DC). 33. Let A and B be positive semideﬁnite. Using the positive deﬁnite square root of B, prove that every eigenvalue of AB is ≥ 0. 34. Let A, B, and C be positive semideﬁnite, and suppose that ABC is Hermitian. Under the assumption that C is invertible, introduce the positive deﬁnite square root P of C. By considering P −1 ABC P −1 , prove that ABC is positive semideﬁnite.

CHAPTER IV Groups and Group Actions

Abstract. This chapter develops the basics of group theory, with particular attention to the role of group actions of various kinds. The emphasis is on groups in Sections 1–3 and on group actions starting in Section 6. In between is a two-section digression that introduces rings, ﬁelds, vector spaces over general ﬁelds, and polynomial rings over commutative rings with identity. Section 1 introduces groups and a number of examples, and it establishes some easy results. Most of the examples arise either from number-theoretic settings or from geometric situations in which some auxiliary space plays a role. The direct product of two groups is discussed brieﬂy so that it can be used in a table of some groups of low order. Section 2 deﬁnes coset spaces, normal subgroups, homomorphisms, quotient groups, and quotient mappings. Lagrange’s Theorem is a simple but key result. Another simple but key result is the construction of a homomorphism with domain a quotient group G/H when a given homomorphism is trivial on H . The section concludes with two standard isomorphism theorems. Section 3 introduces general direct products of groups and direct sums of abelian groups, together with their concrete “external” versions and their universal mapping properties. Sections 4–5 are a digression to deﬁne rings, ﬁelds, and ring homomorphisms, and to extend the theories concerning polynomials and vector spaces as presented in Chapters I–II. The immediate purpose of the digression is to make prime ﬁelds and the notion of characteristic available for the remainder of the chapter. The deﬁnitions of polynomials are extended to allow coefﬁcients from any commutative ring with identity and to allow more than one indeterminate, and universal mapping properties for polynomial rings are proved. Sections 6–7 introduce group actions. Section 6 gives some geometric examples beyond those in Section 1, it establishes a counting formula concerning orbits and isotropy subgroups, and it develops some structure theory of groups by examining speciﬁc group actions on the group and its coset spaces. Section 7 uses a group action by automorphisms to deﬁne the semidirect product of two groups. This construction, in combination with results from Sections 5–6, allows one to form several new ﬁnite groups of interest. Section 8 deﬁnes simple groups, proves that alternating groups on ﬁve or more letters are simple, and then establishes the Jordan–H¨older Theorem concerning the consecutive quotients that arise from composition series. Section 9 deals with ﬁnitely generated abelian groups. It is proved that “rank” is well deﬁned for any ﬁnitely generated free abelian group, that a subgroup of a free abelian group of ﬁnite rank is always free abelian, and that any ﬁnitely generated abelian group is the direct sum of cyclic groups. Section 10 returns to structure theory for ﬁnite groups. It begins with the Sylow Theorems, which produce subgroups of prime-power order, and it gives two sample applications. One of these classiﬁes the groups of order pq, where p and q are distinct primes, and the other provides the information necessary to classify the groups of order 12. Section 11 introduces the language of “categories” and “functors.” The notion of category is a precise version of what is sometimes called a “context” at points in the book before this section, 116

1. Groups and Subgroups

117

and some of the “constructions” in the book are examples of “functors.” The section treats in this language the notions of “product” and “coproduct,” which are abstractions of “direct product” and “direct sum.”

1. Groups and Subgroups Linear algebra and group theory are two foundational subjects for all of algebra, indeed for much of mathematics. Chapters II and III have introduced the basics of linear algebra, and the present chapter introduces the basics of group theory. In this section we give the deﬁnition and notation for groups and provide examples that ﬁt with the historical development of the notion of group. Many readers will already be familiar with some group theory, and therefore we can be brief at the start. A group is a nonempty set G with an operation G × G → G satisfying the three properties (i), (ii), and (iii) below. In the absence of any other information the operation is usually called multiplication and is written (a, b) → ab with no symbol to indicate the multiplication. The deﬁning properties of a group are (i) (ab)c = a(bc) for all a, b, c in G (associative law), (ii) there exists an element 1 in G such that a1 = 1a = a for all a in G (existence of identity), (iii) for each a in G, there exists an element a −1 in G with aa −1 = a −1 a = 1 (existence of inverses). It is immediate from these properties that • 1 is unique (since 1 = 1 1 = 1), • a −1 is unique (since (a −1 ) = (a −1 ) 1 = (a −1 ) (a(a −1 )) = ((a −1 ) a)(a −1 ) = 1(a −1 ) = (a −1 )), • the existence of a left inverse for each element implies the existence of a right inverse for each element (since ba = 1 and cb = 1 together imply c = c(ba) = (cb)a = a and hence also ab = cb = 1), • 1 is its own inverse (since 11 = 1), • ax = ay implies x = y, and xa = ya implies x = y (cancellation laws) (since x = 1x = (a −1 a)x = a −1 (ax) = a −1 (ay) = (a −1 a)y = 1y = y and since a similar argument proves the second implication). Problem 2 at the end of Chapter II shows that the associative law extends to products of any ﬁnite number of elements of G as follows: parentheses can be inserted in any fashion in such a product, and the value of the product is unchanged; hence any expression a1 a2 · · · an in G is well deﬁned without the use of parentheses. The group whose only element is the identity 1 will be denoted by {1}. It is called the trivial group.

118

IV. Groups and Group Actions

We come to other examples in a moment. First we make three more deﬁnitions and offer some comments. A subgroup H of a group G is a subset containing the identity that is closed under multiplication and inverses. Then H itself is a group because the associativity in G implies associativity in H . The intersection of any nonempty collection of subgroups of G is again a subgroup. An isomorphism of a group G 1 with a group G 2 is a function ϕ : G 1 → G 2 that is one-one onto and satisﬁes ϕ(ab) = ϕ(a)ϕ(b) for all a and b in G 1 . It is immediate that • ϕ(1) = 1 (by taking a = b = 1), • ϕ(a −1 ) = ϕ(a)−1 (by taking b = a −1 ), • ϕ −1 : G 2 → G 1 satisﬁes ϕ −1 (cd) = ϕ −1 (c)ϕ −1 (d) (by taking c = ϕ(a) and d = ϕ(b) on the right side and then observing that ϕ ϕ −1 (c)ϕ −1 (d) = ϕ(ab) = ϕ(a)ϕ(b) = cd = ϕ(ϕ −1 (cd))). The ﬁrst and second of these properties show that an isomorphism respects all the structure of a group, not just products. The third property shows that the inverse of an isomorphism is an isomorphism, hence that the relation “is isomorphic to” is symmetric. Since the identity isomorphism exhibits this relation as reﬂexive and since the use of compositions shows that it is transitive, we see that “is isomorphic to” is an equivalence relation. Common notation for an isomorphism between G 1 and G 2 is G 1 ∼ = G 2 ; because of the symmetry, one can say that G 1 and G 2 are isomorphic. An abelian group is a group G with the additional property (iv) ab = ba for all a and b in G (commutative law). In an abelian group the operation is sometimes, but by no means always, called addition instead of “multiplication.” Addition is typically written (a, b) → a+b, and then the identity is usually denoted by 0 and the inverse of a is denoted by −a, the negative of a. Depending on circumstances, the trivial abelian group may be denoted by {0} or 0. Problem 3 at the end of Chapter II shows for an abelian group G with its operation written additively that n-fold sums of elements of G can be written in any order: a1 + a2 + · · · + an = aσ (1) + aσ (2) + · · · + aσ (n) for each permutation σ of {1, . . . , n}. Historically the original examples of groups arose from two distinct sources, and it took a while for the above deﬁnition of group to be distilled out as the essence of the matter. One of the two sources involved number systems and vectors. Here are examples. EXAMPLES. (1) Additive groups of familiar number systems. The systems in question are the integers Z, the rational numbers Q, the real numbers R, and the complex

1. Groups and Subgroups

119

numbers C. In each case the set with its usual operation of addition forms an abelian group. The group properties of Z under addition are taken as known in advance in this book, as mentioned in Section A3 of the appendix, and the group properties of Q, R, and C under addition are sketched in Sections A3 and A4 of the appendix as part of the development of these number systems. (2) Multiplicative groups connected with familiar number systems. In the cases of Q, R, and C, the nonzero elements form a group under multiplication. These groups are denoted by Q× , R× , and C× . Again the properties of a group for each of them are properties that are sketched during the development of each of these number systems in Sections A3 and A4 of the appendix. With Z, the nonzero integers do not form a group under multiplication, because only the two units, i.e., the divisors +1 and −1 of 1, have inverses. The units do form a group, however, under multiplication, and the group of units is denoted by Z× . (3) Vector spaces under addition. Spaces such as Qn and Rn and Cn provide us with further examples of abelian groups. In fact, the deﬁning properties of addition in a vector space are exactly the deﬁning properties of an abelian group. Thus every vector space provides us with an example of an abelian group if we simply ignore the scalar multiplication. (4) Integers modulo m, under addition. Another example related to number systems is the additive group of integers modulo a positive integer m. Let us say that an integer n 1 is congruent modulo m to an integer n 2 if m divides n 1 − n 2 . One writes n 1 ≡ n 2 or n 1 ≡ n 2 mod m or n 1 = n 2 mod m for this relation.1 It is an equivalence relation, and we can write [n] for the equivalence class of n when it is helpful to do so. The division algorithm (Proposition 1.1) tells us that each equivalence class has one and only one member between 0 and m − 1. Thus there are exactly m equivalence classes, and we know a representative of each. The set of classes will be denoted by2 Z/mZ. The point is that Z/mZ inherits an abelian-group structure from the abelian-group structure of Z. Namely, we attempt to deﬁne [a] + [b] = [a + b]. To see that this formula actually deﬁnes an operation on Z/mZ, we need to check that the result is meaningful if the representatives of the classes [a] and [b] are changed. Thus let [a] = [a ] and [b] = [b ]. Then m divides a − a and b − b , and m must divide the sum (a − a ) + (b − b ) = (a + b) − (a + b ); consequently [a + b] = [a + b ], and addition is well deﬁned. The same kind of 1 This notation was anticipated in a remark explaining the classical form of the Chinese Remainder Theorem (Corollary 1.9). 2 The notation Z /(m) is an allowable alternative. Some authors, particularly in topology, write Zm for this set, but the notation Zm can cause confusion since Z p is the standard notation for the “ p-adic integers” when p is prime. These are deﬁned in Advanced Algebra.

120

IV. Groups and Group Actions

argument shows that the associativity and commutativity of addition in Z imply associativity and commutativity in Z/mZ. The identity element is [0], and group inverses (negatives) are given by −[a] = [−a]. Therefore Z/mZ is an abelian group under addition, and it has m elements. If x and y are members of Z/mZ, their sum is often denoted by x + y mod m. The other source of early examples of groups historically has the members of the group operating as transformations of some auxiliary space. Before abstracting matters, let us consider some concrete examples, ignoring some of the details of verifying the deﬁning properties of a group. EXAMPLES, CONTINUED. (5) Permutations. A permutation of a nonempty ﬁnite set E of n elements is a one-one function from E onto itself. Permutations were introduced in Section I.4. The product of two permutations is just the composition, deﬁned by (σ τ )(x) = σ (τ (x)) for x in E, with the symbol ◦ for composition dropped. The resulting operation makes the set of permutations of E into a group: we already observed in Section I.4 that composition is associative, and it is plain that the identity permutation may be taken as the group identity and that the inverse function to a permutation is the group inverse. The group is called the symmetric group on the n letters of E. It has n! members for n ≥ 1. The notation Sn is often used for this group, especially when E = {1, . . . , n}. Signs ±1 were deﬁned for permutations in Section I.4, and we say that a permutation is even or odd according as its sign is +1 or −1. The sign of a product is the product of the signs, according to Proposition 1.24, and it follows that the even permutations form a subgroup of Sn . This subgroup is called the alternating group on n letters and is denoted by An . It has 12 (n!) members if n ≥ 2. (6) Symmetries of a regular polygon. Imagine a regular polygon in R2 centered at the origin. The plane-geometry rotations and reﬂections about the origin that carry the polygon to itself form a group. If the number of sides of the polygon is n, then the group always contains the rotations through all multiples of the angle 2π/n. The rotations themselves form an n-element subgroup of the group of all symmetries. To consider what reﬂections give symmetries, we distinguish the cases n odd and n even. When n is odd, the reﬂection in the line that passes through any vertex and bisects the opposite side carries the polygon to itself, and no other reﬂections have this property. Thus the group of symmetries contains n reﬂections. When n is even, the reﬂection in the line passing through any vertex and the opposite vertex carries the polygon to itself, and so does the reﬂection in the line that bisects a side and also the opposite side. There are n/2 reﬂections of each kind, and hence the group of symmetries again contains n reﬂections. The group of symmetries thus has 2n elements in all cases. It is called the dihedral

1. Groups and Subgroups

121

group Dn . The group Dn is isomorphic to a certain subgroup of the permutation group Sn . Namely, we number the vertices of the polygon, and we associate to each member of Dn the permutation that moves the vertices the way the member of Dn does. (7) General linear group. With F equal to Q or R or C, consider any ndimensional vector space V over F. One possibility is V = Fn , but we do not insist on this choice. Among all one-one functions carrying V onto itself, let G consist of the linear ones. The composition of two linear maps is linear, and the inverse of an invertible function is linear if the given function is linear. The result is a group known as the general linear group GL(V ). When V = Fn , we know from Chapter II that we can identify linear maps from Fn to itself with matrices in Mnn (F) and that composition corresponds to matrix multiplication. It follows that the set of all invertible matrices in Mnn (F) is a group, which is denoted by GL(n, F), and that this group is isomorphic to GL(Fn ). The set SL(V ) or SL(n, F) of all members of GL(V ) or GL(n, F) of determinant 1 is a group since the determinant of a product is the product of the determinants; it is called the special linear group. The dihedral group Dn is isomorphic to a subgroup of GL(2, R) since each rotation and reﬂection of R2 that ﬁxes the origin is given by the operation of a 2-by-2 matrix. (8) Orthogonal and unitary groups. If V is a ﬁnite-dimensional inner-product space over R or C, Chapter III referred to the linear maps carrying the space to itself and preserving lengths of vectors as orthogonal in the real case and unitary in the complex case. Such linear maps are invertible. The condition of preserving lengths of vectors is maintained under composition and inverses, and it follows that the orthogonal or unitary linear maps form a subgroup O(V ) or U(V ) of the general linear group GL(V ). One writes O(n) for O(Rn ) and U(n) for U(Cn ). The subgroup of members of O(V ) or O(n) of determinant 1 is called the rotation group SO(V ) or SO(n). The subgroup of members of U(V ) or U(n) of determinant 1 is called the special unitary group SU(V ) or SU(n). Before coming to Example 9, let us establish a closure property under the arithmetic operations for certain subsets of C. We are going to use the theories of polynomials as in Chapter I and of vector spaces as in Chapter II with the rationals Q as the scalars. Fix a complex number θ, and form the result of evaluating at θ every polynomial in one indeterminate with coefﬁcients in Q. The resulting set of complex numbers comes by substituting θ for X in the members of Q[X ], and we denote this subset of C by Q[θ]. Suppose that θ has the property that the set {1, θ, θ 2 , . . . , θ n } is linearly dependent over Q for some integer n ≥ 1, i.e., has the property that F0 (θ )√= 0 for some nonzero F0 of Q[X ] of degree ≤ n. For example, if θ = 2, then √ member √ √ the set {1, 2, ( 2)2 } is linearly dependent since 2 − ( 2)2 = 0; if θ = e2πi/5 ,

122

IV. Groups and Group Actions

then {1, θ, θ 2 , θ 3 , θ 4 , θ 5 } is linearly dependent since 1 − θ 5 = 0, or alternatively since 1 + θ + θ 2 + θ 3 + θ 4 = 0. Returning to the general θ, we lose no generality if we assume that the polynomial F0 has degree exactly n. If we divide the equation F0 (θ ) = 0 by the leading coefﬁcient, we obtain an equality θ n = G 0 (θ ), where G 0 is the zero polynomial or is a nonzero polynomial of degree at most n − 1. Then θ n+m = θ m G 0 (θ ), and we see inductively that every power θ r with r ≥ n is a linear combination of the members of the set {1, θ, θ 2 , . . . , θ n−1 }. This set is therefore a spanning set for the vector space Q[θ], and we ﬁnd that Q[θ] is ﬁnite-dimensional, with dimension at most n. Since every positive integer power of θ lies in Q[θ] and since these powers are closed under multiplication, the vector space Q[θ] is closed under multiplication. More striking is that Q[θ] is closed under division, as is asserted in the following proposition. Proposition 4.1. Let θ be in C, and suppose for some integer n ≥ 1 that the set {1, θ, θ 2 , . . . , θ n } is linearly dependent over Q. Then the ﬁnite-dimensional rational vector space Q[θ] is closed under taking reciprocals (of nonzero elements), as well as multiplication, and hence is closed under division. REMARKS. Under the hypotheses of Proposition 4.1, Q[θ] is called an algebraic number ﬁeld,3 or simply a number ﬁeld, and θ is called an algebraic number. The relevant properties of C that are used in proving the proposition are that C is closed under the usual arithmetic operations, that these satisfy the usual properties, and that Q is a subset of C. The deeper closure properties of C that are developed in Sections A3 and A4 of the appendix play no role. PROOF. We have seen that Q[θ] is closed under multiplication. If x is a nonzero member of Q[θ], then all positive powers of x must be in Q[θ], and the fact that dim Q[θ] ≤ n forces {1, x, x 2 , . . . , x n } to be linearly dependent. Therefore there are integers j and k with 0 ≤ j < k ≤ n such that c j x j +c j+1 x j+1 +· · ·+ck x k = 0 for some rational numbers c j , . . . , ck with ck = 0. Since x is assumed nonzero, we can discard unnecessary terms and arrange that c j = 0. Then −1 −1 k− j−1 1 = x(−c−1 ), j c j+1 − c j c j+2 x − c j ck x

and the reciprocal of x has been exhibited as in Q[θ].

EXAMPLES, CONTINUED. (9) Galois’s notion of automorphisms of number ﬁelds. Let θ be a complex number as in Proposition 4.1. The subject of Galois theory, whose details will 3 The deﬁnition of “algebraic number ﬁeld” that is given later in the book is ostensibly more general, but the Theorem of the Primitive Element in Chapter IX will show that it amounts to the same thing as this.

1. Groups and Subgroups

123

be discussed in Chapter IX and whose full utility will be glimpsed only later, works in an important special case with the “automorphisms” of Q[θ] that ﬁx Q. The automorphisms are the one-one functions from Q[θ] onto itself that respect addition and multiplication and carry every element of Q to itself. The identity is such a function, the composition of two such functions is again one, and the inverse of such a function is again one. Therefore the automorphisms of Q[θ] form a group under composition. We call this group Gal(Q[θ]/Q). Let us see that it is ﬁnite. In fact, if σ is in Gal(Q[θ]/Q), then σ is determined by its effect on θ, since we must have σ (F(θ)) = F(σ (θ )) for every F in Q[X ]. We know that there is some nonzero polynomial F0 (X ) such that F0 (θ ) = 0. Applying σ to this equality, we see that F0 (σ (θ )) = 0. Therefore σ (θ) has to be a root of F0 . Viewing F0 as in C[X ], we can apply Corollary 1.14 and see that F0 has only ﬁnitely many complex roots. Therefore there are only ﬁnitely many possibilities for σ , and the group Gal(Q[θ]/Q) has to be ﬁnite. Galois theory shows that this group gives considerable insight into the structure of Q[θ]. For example it allows one to derive the Fundamental Theorem of Algebra (Theorem 1.18) just from algebra and the Intermediate Value Theorem (Section A3 of the appendix); it allows one to show the impossibility of certain constructions in plane geometry by straightedge and compass; and it allows one to show that a quintic polynomial with rational coefﬁcients need not have a root that is expressible in terms of rational numbers, arithmetic operations, and the extraction of square roots, cube roots, and so on. We return to these matters in Chapter IX. Examples 5–9, which all involve auxiliary spaces, ﬁt the pattern that the members of the group are invertible transformations of the auxiliary space and the group operation is composition. This notion will be abstracted in Section 6 and will lead to the notion of a “group action.” For now, let us see why we obtained groups in each case. If X is any nonempty set, then the set of invertible functions f : X → X forms a group under composition, composition being deﬁned by ( f g)(x) = f (g(x)) with the usual symbol ◦ dropped. The associative law is just a matter of unwinding this deﬁnition: (( f g)h)(x) = ( f g)(h(x)) = f (g(h(x))) = f ((gh)(x)) = ( f (gh))(x). The identity function is the identity of the group, and inverse functions provide the inverse elements in the group. For our examples, the set X was E in Example 5, R2 in Example 6, V or Fn in Example 7, V or Qn or Rn or Cn in Example 8, and Q[θ] in Example 9. All that was needed in each case was to know that our set G of invertible functions from X to itself formed a subgroup of the set of all invertible functions from X to itself. In other words, we had only to check that G contained the identity and was closed under composition and inversion. Associativity was automatic for G because it was valid for the group of all invertible functions from X to itself.

124

IV. Groups and Group Actions

Actually, any group can be realized in the fashion of Examples 5–9. This is the content of the next proposition. Proposition 4.2 (Cayley’s Theorem). Any group G is isomorphic to a subgroup of invertible functions on a set X . The set X can be taken to be G itself. In particular any ﬁnite group with n elements is isomorphic to a subgroup of the symmetric group Sn . PROOF. Deﬁne X = G, put f a (x) = ax for a in G, and let G = { f a | a ∈ G}. To see that G is a group, we need G to contain the identity and to be closed under composition and inverses. Since f 1 is the identity, the identity is indeed in G . Since f ab (x) = (ab)x = a(bx) = f a (bx) = f a ( f b (x)) = ( f a f b )(x), G is closed under composition. The formula f a f a −1 = f 1 = f a −1 f a then shows that f a −1 = ( f a )−1 and that G is closed under inverses. Thus G is a group. Deﬁne ϕ : G → G by ϕ(a) = f a . Certainly ϕ is onto G , and it is oneone because ϕ(a) = ϕ(b) implies f a = f b , f a (1) = f b (1), and a = b. Also, ϕ(ab) = f ab = f a f b = ϕ(a)ϕ(b), and hence ϕ is an isomorphism. In the case that G is ﬁnite with n elements, G is exhibited as isomorphic to a subgroup of the group of permutations of the members of G. Hence it is isomorphic to a subgroup of Sn . It took the better part of a century for mathematicians to sort out that two distinct notions are involved here—that of a group, as deﬁned above, and that of a group action, as will be deﬁned in Section 6. In sorting out these matters, mathematicians realized that it is wise to study the abstract group ﬁrst and then to study the group in the context of its possible group actions. This does not at all mean ignoring group actions until after the study of groups is complete; indeed, we shall see in Sections 6, 7, and 10 that group actions provide useful tools for the study of abstract groups. We turn to a discussion of two general group-theoretic notions—cyclic group and the direct product of two or more groups. The second of these notions will be discussed only brieﬂy now; more detail will come in Section 3. If a is an element of a group, we deﬁne a n for integers n > 0 inductively by a 1 = a and a n = a n−1 a. Then we can put a 0 = 1 and a −n = (a −1 )n for n > 0. A little checking, which we omit, shows that the ordinary rules of exponents apply: a m+n = a m a n and a mn = (a m )n for all integers m and n. If the underlying group is abelian and additive notation is being used, these formulas read (m + n)a = ma + na and (mn)a = n(ma). A cyclic group is a group with an element a such that every element is a power of a. The element a is called a generator of the group, and the group is said to be generated by a.

1. Groups and Subgroups

125

Proposition 4.3. Each cyclic group G is isomorphic either to the additive group Z of integers or to the additive group Z/mZ of integers modulo m for some positive integer m. PROOF. If all a n are distinct, then the rule a m+n = a m a n implies that the function n → a n is an isomorphism of Z with G. On the other hand, if a k = a l with k > l, then a k−l = 1 and there exists a positive integer n such that a n = 1. Let m be the least positive integer with a m = 1. For any integers q and r , we have a qm+r = (a m )q a r = a r . Thus the function ϕ : Z/mZ → G given by ϕ([n]) = a n is well deﬁned, is onto G, and carries sums in Z/mZ to products in G. If 0 ≤ l < k < m, then a k = a l since otherwise a k−l would be 1. Hence ϕ is one-one, and we conclude that ϕ : Z/mZ → G is an isomorphism. Let us denote abstract cyclic groups by C∞ and Cm , the subscript indicating the number of elements. Finite cyclic groups arise in guises other than as Z/mZ. For example the set of all elements e2πik/m in C, with multiplication as operation, forms a group isomorphic to Cm . So does the set of all rotation matrices cos 2π k/m − sin 2πk/m with matrix multiplication as operation. sin 2π k/m cos 2π k/m Proposition 4.4. Any subgroup of a cyclic group is cyclic. PROOF. Let G be a cyclic group with generator a, and let H be a subgroup. We may assume that H = {1}. Then there exists a positive integer n such that a n is in H , and we let k be the smallest such positive integer. If n is any integer such that a n is in H , then Proposition 1.2 produces integers x and y such that xk + yn = d, where d = GCD(k, n). The equation a d = (a k )x (a n ) y exhibits a d as in H , and the minimality of k forces d ≥ k. Since GCD(k, n) ≤ k, we conclude that d = k. Hence k divides n. Consequently H consists of the powers of a k and is cyclic. A notion of the direct product of two groups is deﬁnable in the same way as was done with vector spaces in Section II.6, except that a little care is needed in saying how this construction interacts with mappings. As with the corresponding construction for vector spaces, one can deﬁne an explicit “external” direct product, and one can recognize a given group as an “internal” direct product, i.e., as isomorphic to an external direct product. We postpone a fuller discussion of direct product, as well as all comments about direct sums and mappings associated with direct sums and direct products, to Section 3. The external direct product G 1 × G 2 of two groups G 1 and G 2 is a group whose underlying set is the set-theoretic product of G 1 and G 2 and whose group law is (g1 , g2 )(g1 , g2 ) = (g1 g1 , g2 g2 ). The identity is (1, 1), and the formula for inverses is (g1 , g2 )−1 = (g1−1 , g2−1 ). The two subgroups G 1 × {1} and {1} × G 2 of G 1 × G 2 commute with each other.

IV. Groups and Group Actions

126

A group G is the internal direct product of two subgroups G 1 and G 2 if the function from the external direct product G 1 × G 2 to G given by (g1 , g2 ) → g1 g2 is an isomorphism of groups. The literal analog of Proposition 2.30, which gave three equivalent deﬁnitions of internal direct product4 of vector spaces, fails here. It is not sufﬁcient that G 1 and G 2 be two subgroups such that G 1 ∩ G 2 = {1} and every element in G decomposes as a product g1 g2 with g1 ∈ G 1 and g2 ∈ G 2 . For example, with G = S3 , the two subgroups G 1 = {1, (1 2)}

and

G 2 = {1, (1 2 3), (1 3 2)}

have these properties, but G is not isomorphic to G 1 × G 2 because the elements of G 1 do not commute with the elements of G 2 . Proposition 4.5. If G is a group and G 1 and G 2 are subgroups, then the following conditions are equivalent: (a) G is the internal direct product of G 1 and G 2 , (b) every element in G decomposes uniquely as a product g1 g2 with g1 ∈ G 1 and g2 ∈ G 2 , and every member of G 1 commutes with every member of G 2, (c) G 1 ∩ G 2 = {1}, every element in G decomposes as a product g1 g2 with g1 ∈ G 1 and g2 ∈ G 2 , and every member of G 1 commutes with every member of G 2 . PROOF. We have seen that (a) implies (b). If (b) holds and g is in G 1 ∩ G 2 , then the formula 1 = gg −1 and the uniqueness of the decomposition of 1 as a product together imply that g = 1. Hence (c) holds. If (c) holds, deﬁne ϕ : G 1 × G 2 → G by ϕ(g1 , g2 ) = g1 g2 . This map is certainly onto G. To see that it is one-one, suppose that ϕ(g1 , g2 ) = ϕ(g1 , g2 ). Then g1 g2 = g1 g2 and hence g1 −1 g1 = g2 g2−1 . Since G 1 ∩ G 2 = {1}, g1 −1 g1 = g2 g2−1 = 1. Thus (g1 , g2 ) = (g1 , g2 ), and ϕ is one-one. Finally the fact that elements of G 1 commute with elements of G 2 implies that ϕ((g1 , g2 )(g1 , g2 )) = ϕ(g1 g1 , g2 g2 ) = g1 g1 g2 g2 = g1 g2 g1 g2 = ϕ(g1 , g2 )ϕ(g1 , g2 ). Therefore ϕ is an isomorphism, and (a) holds. Here are two examples of internal direct products of groups. In each let R+ be the multiplicative group of positive real numbers. The ﬁrst example is R× ∼ = C2 ×R+ with C2 providing the sign. The second example is C× ∼ = S 1 ×R+ , where S 1 is the multiplicative group of complex numbers of absolute value 1; the isomorphism here is given by the polar-coordinate mapping (eiθ , r ) → eiθ r . 4 The direct sum and direct product of two vector spaces were deﬁned to be the same thing in Chapter II.

1. Groups and Subgroups

127

We conclude this section by giving an example of a group that falls outside the pattern of the examples above and by summarizing what groups we have identiﬁed with ≤ 15 elements. EXAMPLES, CONTINUED. (10) Groups associated with the quaternions. The set H of quaternions is an object like R or C in that it has both an addition/subtraction and a multiplication/division, but H is unlike R and C in that multiplication is not commutative. We give two constructions. In one we start from R4 with the standard basis vectors written as 1, i, j, k. The multiplication table for these basis vectors is 11 = 1,

1i = i,

1j = j,

1k = k,

i1 = i,

ii = −1,

ij = k,

ik = −j,

j1 = j,

ji = −k,

jj = −1,

jk = i,

kj = −i,

kk = −1,

k1 = k, ki = j,

and the multiplication is extended to general elements by the usual distributive laws. The multiplicative identity is 1, and multiplicative inverses of nonzero elements are given by (a1 + bi + cj + k)−1 = s −1 a1 − s −1 bi − s −1 cj − s −1 dk √ with s = a 2 + b2 + c2 + d 2 . Since ij = k while ji = −k, multiplication is not commutative. What takes work to see is that multiplication is associative. To see this, we give another construction, using M22 (C). Within M22 (C), take i 0 0 −1 0 −i 1 = 10 01 , i = 0 −i , j= 1 0 , k = −i 0 , and deﬁne H to be the linear span, with real coefﬁcients, of these matrices. The operations are the usual matrix addition and multiplication. Then multiplication is associative, and we readily verify the multiplication table for 1, i, j, k. A little computation veriﬁes also the formula for multiplicative inverses. The set H× of nonzero elements forms a group under multiplication, and it is isomorphic to R+ × SU(2), where $ # α β 2 2 + |β| = 1 |α| SU(2) = ¯ −β α¯ is the 2-by-2 special unitary group deﬁned in Example 8. Of interest for our current purposes is the 8-element subgroup ±1, ±i, ±j, ±k, which is called the quaternion group and will be denoted by H8 .

IV. Groups and Group Actions

128

The order of a ﬁnite group is the number of elements in the group. Let us list some of the groups we have discussed that have order at most 15: 1 2 3 4 5 6 7 8

C1 C2 C3 C4 , C2 × C2 C5 C 6 , D3 C7 C8 , C4 × C2 , C2 × C2 × C2 , D4 , H8

9 10 11 12 13 14 15

C9 , C3 × C3 C10 , D5 C11 C12 , C6 × C2 , D6 , A4 C13 C14 , D7 C15

No two groups in the above table are isomorphic, as one readily checks by counting elements of each “order” in the sense of the next section. We shall see in Section 10 and in the problems at the end of the chapter that the above table is complete through order 15 except for one group of order 12. Some groups that we have discussed have been omitted from the above table because of isomorphisms with the groups above. For example, S2 ∼ = C2 , A3 ∼ = C3 , C3 × C2 ∼ = C6 , S3 ∼ = D3 , ∼ ∼ ∼ , C × C , D × C , C × C C5 × C2 ∼ C C D C = 10 4 3 = 12 3 2 = 6 7 2 = 14 , and ∼ C5 × C3 = C15 .

2. Quotient Spaces and Homomorphisms Let G be a group, and let H be a subgroup. For purposes of this paragraph, say that g1 in G is equivalent to g2 in G if g1 = g2 h for some h in H . The relation “equivalent” is an equivalence relation: it is reﬂexive because 1 is in H , it is symmetric since H is closed under inverses, and it is transitive since H is closed under products. The equivalence classes are called left cosets of H in G. The left coset containing an element g of G is the set g H = {gh | h ∈ H }. EXAMPLES. (1) When G = Z and H = m Z , the left cosets are the sets r + mZ, i.e., the sets {x ∈ Z | x ≡ r mod m} for the various values of r . (2) When G = S3 and H = {(1), (1 3)}, there are three left cosets: H , (1 2)H = {(1 2), (1 3 2)}, and (2 3)H = {(2 3), (1 2 3)}. Similarly one can deﬁne the right cosets H g of H in G. When G is nonabelian, these need not coincide with the left cosets; in Example 2 above with G = S3 and H = {(1), (1 3)}, the right coset H (1 2) = {(1 2), (1 2 3)} is not a left coset.

2. Quotient Spaces and Homomorphisms

129

Lemma 4.6. If H is a subgroup of the group G, then any two left cosets of H in G have the same cardinality, namely card H . REMARKS. We shall be especially interested in the case that card H is ﬁnite, and then we write |H | = card H for the number of elements in H . PROOF. If g1 H and g2 H are given, then the map g → g2 g1−1 g is one-one on G and carries g1 H onto g2 H . Hence g1 H and g2 H have the same cardinality. Taking g1 = 1, we see that this common cardinality is card H . We write G/H for the set {g H } of all left cosets of H in G, calling it the quotient space or left-coset space of G by H . The set {H g} of right cosets is denoted by H \G. Theorem 4.7 (Lagrange’s Theorem). If G is a ﬁnite group, then |G| = |G/H | |H |. Consequently the order of any subgroup of G divides the order of G. PROOF. Lemma 4.6 shows that each left coset has |H | elements. The left cosets are disjoint and exhaust G, and there are |G/H | left cosets. Thus G has |G/H | |H | elements. If a is an element of a group G, then we have seen that the powers a n of a form a cyclic subgroup of G that is isomorphic either to Z or to some group Z/mZ for a positive integer m. We say that a has ﬁnite order m when the cyclic group is isomorphic to Z/mZ. Otherwise a has inﬁnite order. In the ﬁnite-order case the order of a is thus the least positive integer n such that a n = 1. Corollary 4.8. If G is a ﬁnite group, then each element a of G has ﬁnite order, and the order of a divides the order of G. PROOF. The order of a equals |H | if H = {a n | n ∈ Z}, and Corollary 4.8 is thus a special case of Theorem 4.7. Corollary 4.9. If p is a prime, then the only group of order p, up to isomorphism, is the cyclic group C p , and it has no subgroups other than {1} and C p itself. PROOF. Suppose that G is a ﬁnite group of order p and that H = {1} is a subgroup of G. Let a = 1 be in H , and let P = {a n | n ∈ Z}. Since a = 1, Corollary 4.8 shows that the order of a is an integer > 1 that divides p. Since p is prime, the order of a must equal p. Then |P| = p. Since P ⊆ H ⊆ G and |G| = p, we must have P = G.

130

IV. Groups and Group Actions

Let G 1 and G 2 be groups. We say that ϕ : G 1 → G 2 is a homomorphism if ϕ(ab) = ϕ(a)ϕ(b) for all a and b in G. In other words, ϕ is to respect products, but it is not assumed that ϕ is one-one or onto. Any homomorphism ϕ automatically respects the identity and inverses, in the sense that • ϕ(1) = 1 (since ϕ(1) = ϕ(11) = ϕ(1)ϕ(1)), • ϕ(a −1 ) = ϕ(a)−1 (since 1 = ϕ(1) = ϕ(aa −1 ) = ϕ(a)ϕ(a −1 ) and similarly 1 = ϕ(a −1 )ϕ(a)). EXAMPLES. The following functions are homomorphisms: any isomorphism, the function ϕ : Z → Z/mZ given by ϕ(k) = k mod m, the function ϕ : Sn → {±1} given by ϕ(σ ) = sgn σ , the function ϕ : Z → G given for ﬁxed a in G by ϕ(n) = a n , and the function ϕ : G L(n, F) → F× given by ϕ(A) = det A. The image of a homomorphism ϕ : G 1 → G 2 is just the image of ϕ considered as a function. It is denoted by image ϕ = ϕ(G 1 ) and is necessarily a subgroup of G 2 since if ϕ(g1 ) = g2 and ϕ(g1 ) = g2 , then ϕ(g1 g1 ) = g2 g2 and ϕ(g1−1 ) = g2−1 . The kernel of a homomorphism ϕ : G 1 → G 2 is the set ker ϕ = ϕ −1 ({1}) = {x ∈ G 1 | ϕ(x) = 1}. This is a subgroup since if ϕ(x) = 1 and ϕ(y) = 1, then ϕ(x y) = ϕ(x)ϕ(y) = 1 and ϕ(x −1 ) = ϕ(x)−1 = 1. The homomorphism ϕ : G 1 → G 2 is one-one if and only if ker ϕ is the trivial group {1}. The necessity follows since 1 is already in ker ϕ, and the sufﬁciency follows since ϕ(x) = ϕ(y) implies that ϕ(x y −1 ) = 1 and therefore that x y −1 is in ker ϕ. The kernel H of a homomorphism ϕ : G 1 → G 2 has the additional property of being a normal subgroup of G 1 in the sense that ghg −1 is in H whenever g is in G 1 and h is in H , i.e., g H g −1 = H . In fact, if h is in ker ϕ and g is in G 1 , then ϕ(ghg −1 ) = ϕ(g)ϕ(h)ϕ(g)−1 = ϕ(g)ϕ(g)−1 = 1 shows that ghg −1 is in ker ϕ. EXAMPLES. (1) Any subgroup H of an abelian group G is normal since ghg −1 = gg −1 h = h. The alternating subgroup An of the symmetric group Sn is normal since An is the kernel of the homomorphism σ → sgn σ . (2) The subgroup H = {1, (1 3)} of S3 is not normal since (1 2)H (1 2)−1 = {1, (2 3)}. (3) If a subgroup H of a group G has just two left cosets, then H is normal even if G is an inﬁnite group. In fact, suppose G = H ∪ g0 H whenever g0 is not in H . Taking inverses of all elements of G, we see that G = H ∪ H g1 whenever g1 is not in H . If g in G is given, then either g is in H and g H g −1 = H , or g is not in H and g H = H g, so that g H g −1 = H in this case as well.

2. Quotient Spaces and Homomorphisms

131

Let H be a subgroup of G. Let us look for the circumstances under which G/H inherits a multiplication from G. The natural deﬁnition is ?

(g1 H )(g2 H ) = g1 g2 H, but we have to check that this deﬁnition makes sense. The question is whether we get the same left coset as product if we change the representatives of g1 H and g2 H from g1 and g2 to g1 h 1 and g2 h 2 . Since our prospective deﬁnition makes (g1 h 1 H )(g2 h 2 H ) = g1 h 1 g2 h 2 H , the question is whether g1 h 1 g2 h 2 H equals g1 g2 H . That is, we ask whether g1 h 1 g2 h 2 = g1 g2 h for some h in H . If this equality holds, then h 1 g2 h 2 = g2 h, and hence g2−1 h 2 g2 equals hh −1 2 , which is −1 an element of H . Conversely if every expression g2 h 2 g2 is in H , then we can go backwards and see that g1 h 1 g2 h 2 = g1 g2 h for some h in H , hence see that G/H indeed inherits a multiplication from G. Thus a necessary and sufﬁcient condition for G/H to inherit a multiplication from G is that the subgroup H is normal. According to the next proposition, the multiplication inherited by G/H when this condition is satisﬁed makes G/H into a group. Proposition 4.10. If H is a normal subgroup of a group G, then G/H becomes a group under the inherited multiplication (g1 H )(g2 H ) = (g1 g2 )H , and the function q : G → G/H given by q(g) = g H is a homomorphism of G onto G/H with kernel H . Consequently every normal subgroup of G is the kernel of some homomorphism. REMARKS. When H is normal, the group G/H is called a quotient group of G, and the homomorphism q : G → G/H is called the quotient homomorphism.5 In the special case that G = Z and H = mZ, the construction reduces to the construction of the additive group of integers modulo m and accounts for using the notation Z/mZ for that group. PROOF. The coset 1H is the identity, and (g H )−1 = g −1 H . Also, the computation (g1 H g2 H )g3 H = g1 g2 g3 H = g1 H (g2 H g3 H ) proves associativity. Certainly q is onto G/H . It is a homomorphism since q(g1 g2 ) = g1 g2 H = g1 H g2 H = q(g1 )q(g2 ). In analogy with what was shown for vector spaces in Proposition 2.25, quotients in the context of groups allow for the factorization of certain homomorphisms of groups. The appropriate result is stated as Proposition 4.11 and is pictured in Figure 4.1. We can continue from there along the lines of Section II.5. 5 Some

authors call G/H a “factor group.” A “factor set,” however, is something different.

132

IV. Groups and Group Actions

Proposition 4.11. Let ϕ : G 1 → G 2 be a homomorphism between groups, let H0 = ker ϕ, let H be a normal subgroup of G 1 contained in H0 , and deﬁne q : G 1 → G 1 /H to be the quotient homomorphism. Then there exists a homomorphism ϕ : G 1 /H → G 2 such that ϕ = ϕ ◦ q, i.e, ϕ(g1 H ) = ϕ(g1 ). It has the same image as ϕ, and ker ϕ = {h 0 H | h 0 ∈ H0 }. G1 ⏐ ⏐ q

ϕ

−−−→ G 2 ϕ

G 1 /H FIGURE 4.1. Factorization of homomorphisms of groups via the quotient of a group by a normal subgroup. REMARK. One says that ϕ factors through G 1 /H or descends to G 1 /H . See Figure 4.1. PROOF. We will have ϕ ◦ q = ϕ if and only if ϕ satisﬁes ϕ(g1 H ) = ϕ(g1 ). What needs proof is that ϕ is well deﬁned. Thus suppose that g1 and g1 are in the same left coset, so that g1 = g1 h with h in H . Then ϕ(g1 ) = ϕ(g1 )ϕ(h) = ϕ(g1 ) since H ⊆ ker ϕ, and ϕ is therefore well deﬁned. The computation ϕ(g1 H g2 H ) = ϕ(g1 g2 H ) = ϕ(g1 g2 ) = ϕ(g1 )ϕ(g2 ) = ϕ(g1 H )ϕ(g2 H ) shows that ϕ is a homomorphism. Since image ϕ = image ϕ, ϕ is onto image ϕ. Finally ker ϕ consists of all g1 H such that ϕ(g1 H ) = 1. Since ϕ(g1 H ) = ϕ(g1 ), the condition that g1 is to satisfy is that g1 be in ker ϕ = H0 . Hence ker ϕ = {h 0 H | h 0 ∈ H0 }, as asserted. Corollary 4.12. Let ϕ : G 1 → G 2 be a homomorphism between groups, and suppose that ϕ is onto G 2 and has kernel H . Then ϕ exhibits the group G 1 /H as canonically isomorphic to G 2 . PROOF. Take H = H0 in Proposition 4.11, and form ϕ : G 1 /H → G 2 with ϕ = ϕ ◦ q. The proposition shows that ϕ is onto G 2 and has trivial kernel, i.e., the identity element of G 1 /H . Having trivial kernel, ϕ is one-one. Theorem 4.13 (First Isomorphism Theorem). Let ϕ : G 1 → G 2 be a homomorphism between groups, and suppose that ϕ is onto G 2 and has kernel K . Then the map H1 → ϕ(H1 ) gives a one-one correspondence between (a) the subgroups H1 of G 1 containing K and (b) the subgroups of G 2 . Under this correspondence normal subgroups correspond to normal subgroups. If H1 is normal in G 1 , then g H1 → ϕ(g)ϕ(H1 ) is an isomorphism of G 1 /H1 onto G 2 /ϕ(H1 ).

2. Quotient Spaces and Homomorphisms

133

REMARK. In the special case of the last statement that ϕ : G 1 → G 2 is a quotient map q : G → G/K and H is a normal subgroup of G containing K , the last statement of the theorem asserts the isomorphism ( G/H ∼ = (G/K ) (H/K ). PROOF. The passage from (a) to (b) is by direct image under ϕ, and the passage from (b) to (a) will be by inverse image under ϕ −1 . Certainly the direct image of a subgroup as in (a) is a subgroup as in (b). To prove the one-one correspondence, we are to show that the inverse image of a subgroup as in (b) is a subgroup as in (a) and that these two constructions invert one another. For any subgroup H2 of G 2 , ϕ −1 (H2 ) is a subgroup of G 1 . In fact, if g1 and g1 are in ϕ −1 (H2 ), we can write ϕ(g1 ) = h 2 and ϕ(g1 ) = h 2 with h 2 and h 2 in H2 . Then the equations ϕ(g1 g1 ) = h 2 h 2 and ϕ(g1−1 ) = ϕ(g1 )−1 = h −1 2 show −1 −1 that h 2 h 2 and h 2 are in ϕ (H2 ). Moreover, the subgroup ϕ −1 (H2 ) contains ϕ −1 ({1}) = K . Therefore the inverse image under ϕ of a subgroup as in (b) is a subgroup as in (a). Since ϕ is a function, we have ϕ(ϕ −1 (H2 )) = H2 . Thus passing from (b) to (a) and back recovers the subgroup of G 2 . If H1 is a subgroup of G 1 containing K , we still need to see that H1 = −1 ϕ (ϕ(H1 )). Certainly H1 ⊆ ϕ −1 (ϕ(H1 )). For the reverse inclusion let g1 be in ϕ −1 (ϕ(H1 )). Then ϕ(g1 ) is in ϕ(H1 ), i.e., ϕ(g1 ) = ϕ(h 1 ) for some h 1 in H1 . −1 Since ϕ is a homomorphism, ϕ(g1 h −1 1 ) = 1. Thus g1 h 1 is in ker ϕ = K , which −1 is contained in H1 by assumption. Then h 1 and g1 h 1 are in H1 , and hence their −1 product (g1 h −1 1 )h 1 = g1 is in H1 . We conclude that ϕ (ϕ(H1 )) ⊆ H1 , and thus passing from (a) to (b) and then back recovers the subgroup of G 1 containing K . Next let us show that normal subgroups correspond to normal subgroups. If H2 is normal in G 2 , let H1 be the subgroup ϕ −1 (H2 ) of G 1 . For h 1 in H1 and g1 in G 1 , we can write ϕ(h 1 ) = h 2 with h 2 in H2 , and then ϕ(g1 h 1 g1−1 ) = ϕ(g1 )h 2 ϕ(g1 )−1 is in ϕ(g1 )H2 ϕ(g1 )−1 = H2 . Hence g1 h 1 g1−1 is in ϕ −1 (H2 ) = H1 . In the reverse direction let H1 be normal in G 1 , and let g2 be in G 2 . Since ϕ is onto G 2 , we can write g2 = ϕ(g1 ) for some g1 in G 1 . Then g2 ϕ(H1 )g2−1 = ϕ(g1 )ϕ(H1 )ϕ(g1 )−1 = ϕ(g1 H1 g1−1 ) = ϕ(H1 ). Thus ϕ(H1 ) is normal. For the ﬁnal statement let H2 = ϕ(H1 ). We have just proved that this image is normal, and hence G 2 /H2 is a group. The mapping : G 1 → G 2 /H2 given by (g1 ) = ϕ(g1 )H2 is the composition of two homomorphisms and hence is a homomorphism. Its kernel is {g1 ∈ G 1 | ϕ(g1 ) ∈ H2 } = {g1 ∈ G 1 | ϕ(g1 ) ∈ ϕ(H1 )} = ϕ −1 (ϕ(H1 )), and this equals H1 by the ﬁrst conclusion of the theorem. Applying Corollary 4.12 to , we obtain the required isomorphism : G 1 /H1 → G 2 /ϕ(H1 ).

134

IV. Groups and Group Actions

Theorem 4.14 (Second Isomorphism Theorem). Let H1 and H2 be subgroups of a group G with H2 normal in G. Then H1 ∩ H2 is a normal subgroup of H1 , the set H1 H2 of products is a subgroup of G with H2 as a normal subgroup, and the map h 1 (H1 ∩ H2 ) → h 1 H2 is a well-deﬁned canonical isomorphism of groups H1 /(H1 ∩ H2 ) ∼ = (H1 H2 )/H2 . PROOF. The set H1 ∩ H2 is a subgroup, being the intersection of two subgroups. −1 For h 1 in H1 , we have h 1 (H1 ∩ H2 )h −1 1 ⊆ h 1 H1 h 1 ⊆ H1 since H1 is a subgroup −1 and h 1 (H1 ∩ H2 )h −1 1 ⊆ h 1 H2 h 1 ⊆ H2 since H2 is normal in G. Therefore −1 h 1 (H1 ∩ H2 )h 1 ⊆ H1 ∩ H2 , and H1 ∩ H2 is normal in H1 . The set H1 H2 of products is a subgroup since h 1 h 2 h 1 h 2 = h 1 h 1 (h 1 −1 h 2 h 1 )h 2 −1 −1 and since (h 1 h 2 )−1 = (h −1 2 h 1 h 2 )h 2 , and H2 is normal in H1 H2 since H2 is normal in G. The function ϕ(h 1 (H1 ∩ H2 )) = h 1 H2 is well deﬁned since H1 ∩ H2 ⊆ H2 , and ϕ respects products. The domain of ϕ is {h 1 (H1 ∩ H2 ) | h 1 ∈ H1 }, and the kernel is the subset of this such that h 1 lies in H2 as well as H1 . For this to happen, h 1 must be in H1 ∩ H2 , and thus the kernel is the identity coset of H1 /(H1 ∩ H2 ). Hence ϕ is one-one. To see that ϕ is onto (H1 H2 )/H2 , let h 1 h 2 H2 be given. Then h 1 (H1 ∩ H2 ) maps to h 1 H2 , which equals h 1 h 2 H2 . Hence ϕ is onto.

3. Direct Products and Direct Sums We return to the matter of direct products and direct sums of groups, direct products having been discussed brieﬂy in Section 1. In a footnote in Section II.4 we mentioned a general principle in algebra that “whenever a new systematic construction appears for the objects under study, it is well to look for a corresponding construction with the functions relating these new objects.” This principle will be made more precise in Section 11 of the present chapter with the aid of the language of “categories” and “functors.” Another principle that will be relevant for us is that constructions in one context in algebra often recur, sometimes in slightly different guise, in other contexts. One example of the operation of this principle occurs with quotients. The construction and properties of the quotient of a vector space by a vector subspace, as in Section II.5, is analogous in this sense to the construction and properties of the quotient of a group by a normal subgroup, as in Section 2 in the present chapter. The need for the subgroup to be normal is an example of what is meant by “slightly different guise.” Anyway, this principle too will be made more precise in Section 11 of the present chapter using the language of categories and functors.

3. Direct Products and Direct Sums

135

Let us proceed with an awareness of both these principles in connection with direct products and direct sums of groups, looking for analogies with what happened for vector spaces and expecting our work to involve constructions with homomorphisms as well as with groups. The external direct product G 1 × G 2 was deﬁned as a group in Section 1 to be the set-theoretic product with coordinate-by-coordinate multiplication. There are four homomorphisms of interest connected with G 1 × G 2 , namely i1 : G 1 → G 1 × G 2

given by i 1 (g1 ) = (g1 , 1),

i2 : G 2 → G 1 × G 2

given by i 2 (g2 ) = (1, g2 ),

p1 : G 1 × G 2 → G 1

given by

p1 (g1 , g2 ) = g1 ,

given by p2 (g1 , g2 ) = g2 . p2 : G 1 × G 2 → G 2 Recall from the discussion before Proposition 4.5 that Proposition 2.30 for the direct product of two vector spaces does not translate directly into an analog for the direct product of groups; instead that proposition is replaced by Proposition 4.5, which involves some condition of commutativity. Warned by this anomaly, let us work with mappings rather than with groups and subgroups, and let us use mappings in formulating a deﬁnition of the direct product of groups. As with the direct product of two vector spaces, the mappings to use are p1 and p2 but not i 1 and i 2 . The way in which p1 and p2 enter is through the effect of the direct product on homomorphisms. If ϕ1 : H → G 1 and ϕ2 : H → G 2 are two homomorphisms, then h → (ϕ1 (h), ϕ2 (h)) is the corresponding homomorphism of H into G 1 × G 2 . In order to state matters fully, let us give the deﬁnition with an arbitrary number of factors. Let S be an arbitrary nonempty set of groups, and let G s be the group corresponding to the member s of S. The external direct product of the G s ’s consists of a group s∈S G s and a system of group homomorphisms. The group as a set is ×s∈S G s , whose elements are arbitrary functions from S to that the value of the function at s is in G s , and the group law is s∈S G ssuch are the coordinate {gs }s∈S {gs }s∈S = {gs gs }s∈S . The group homomorphisms mappings ps0 : s∈S G s → G s0 with ps0 {gs }s∈S = gs0 . The individual groups product of n groups may be written as G s are called the factors, and a direct G 1 ×· · ·×G n instead of with the symbol . The group s∈S G s has the universal mapping property described in Proposition 4.15 and pictured in Figure 4.2. Proposition 4.15 (universal mapping property of external direct product). Let {G s | s ∈ S} be a nonempty set of groups, and let s∈S G s be the external direct product, the associated group homomorphisms being the coordinate mappings ps0 : s∈S G s → G s0 . If H is any group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : H → G s , then there exists a unique group homomorphism ϕ : H → s∈S G s such that ps0 ◦ ϕ = ϕs0 for all s0 ∈ S.

136

IV. Groups and Group Actions ϕs

G s0 ←−−− H ϕ p s0 ⏐ ⏐ s∈S G s FIGURE 4.2. Universal mapping property of an external direct product of groups. PROOF s0 (ϕ(h)) . Existence of ϕ is proved by taking ϕ(h) = {ϕs (h)}s∈S . Then p = ps0 {ϕs (h)}s∈S = ϕs0 (h) as required. For uniqueness let ϕ : H → s∈S G s be a homomorphism with ps0 ◦ ϕ = ϕs0 for all s0 ∈ S. For each h in H , we can write ϕ (h) = {ϕ (h)s }s∈S . For s0 in S, we then have ϕs0 (h) = ( ps0 ◦ ϕ )(h) = ps0 (ϕ (h)) = ϕ (h)s0 , and we conclude that ϕ = ϕ. Now we give an abstract deﬁnition of direct product that allows for the possibility that the direct product is “internal” in the sense that the various factors are identiﬁed as subgroups of a given group. The deﬁnition is by means of the above universal mapping property and will be seen to characterize the direct product up to canonical isomorphism. Let S be an arbitrary nonempty set of groups, and let G s be the group corresponding to the member s of S. A direct product of the G s ’s consists of a group G and a system of group homomorphisms ps : G → G s for s ∈ S with the following universal mapping property: whenever H is a group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : H → G s , then there exists a unique group homomorphism ϕ : H → G such that ps ◦ ϕ = ϕs for all s ∈ S. Proposition 4.15 proves existence of a direct product, and the next proposition addresses uniqueness. A direct product is internal if each G s is a subgroup of G and each restriction ps G s is the identity map. ϕs

G s ←−−− H ⏐ ϕ ps ⏐ G FIGURE 4.3. Universal mapping property of a direct product of groups. Proposition 4.16. Let S be a nonempty set of groups, and let G s be the group corresponding to the member s of S. If (G, { ps }) and (G , { ps }) are two direct products, then the homomorphisms ps : G → G s and ps : G → G s are onto G s , there exists a unique homomorphism : G → G such that ps = ps ◦ for all s ∈ S, and is an isomorphism. PROOF. In Figure 4.3 let H = G and ϕs = ps . If : G → G is the homomorphism produced by the fact that G is a direct product, then we have

3. Direct Products and Direct Sums

137

ps ◦ = ps for all s. Reversing the roles of G and G , we obtain a homomorphism : G → G with ps ◦ = ps for all s. Therefore ps ◦(◦ ) = ps ◦ = ps . In Figure 4.3 we next let H = G and ϕs = ps for all s. Then the identity 1G on G has the same property ps ◦ 1G = ps relative to all ps that ◦ has, and the uniqueness says that ◦ = 1G . Reversing the roles of G and G , we obtain ◦ = 1G . Therefore is an isomorphism. For uniqueness suppose that : G → G is another homomorphism with ps = ps ◦ for all s ∈ S. Then the argument of the previous paragraph shows that ◦ = 1G . Applying on the left gives = (◦ )◦ = ◦( ◦) = ◦ 1G = . Thus = . Finally we have to show that the s th mapping of a direct product is onto G s . It isenough to show that ps is onto G s . Taking G as the external direct product s∈S G s with ps equal to the coordinate mapping, form the isomorphism : G → G that has just been proved to exist. This satisﬁes ps = ps ◦ for all s ∈ S. Since ps is onto G s , ps must be onto G s . Let us turn to direct sums. Part of what we seek is a deﬁnition that allows for an abstract characterization of direct sums in the spirit of Proposition 4.16. In particular, the interaction with homomorphisms is to be central to the discussion. In the case of two factors, we use i 1 and i 2 rather than p1 and p2 . If ϕ1 : G 1 → H and ϕ2 : G 2 → H are two homomorphisms, then the corresponding homomorphism ϕ of G 1 ⊕ G 2 to H is to satisfy ϕ1 = ϕ ◦ i 1 and ϕ2 = ϕ ◦ i 2 . With G 1 ⊕ G 2 deﬁned, as expected, to be the same group as G 1 × G 2 , we are led to the formula ϕ(g1 , g2 ) = ϕ(g1 , 1)ϕ(1, g2 ) = ϕ1 (g1 )ϕ2 (g2 ). The images of commuting elements under a homomorphism have to commute, and hence H had better be abelian. Then in order to have an analog of Proposition 4.16, we will want to specialize H at some point to G 1 ⊕ G 2 , and therefore G 1 and G 2 had better be abelian. With these observations in place, we are ready for the general deﬁnition. Let S be an arbitrary nonempty set of abelian groups, and let G s be the group corresponding to the member s of S. We shall use additive notation for the group operation in each G s . The external direct sum of the G s ’s consists of an abelian and a system of group homomorphisms i s for s ∈ S. The group is group s∈S G s the subgroup of s∈S G s of all elements that are equal to 0 in all but ﬁnitely many coordinates. The group homomorphisms are the mappings i s0 : G s0 → s∈S G s carrying a member gs0 of G s0 to the element that is gs0 in coordinate s0 and is 0 at all other coordinates. The individual groups are called the summands, and adirect sum of n abelian groups may be written as G 1 ⊕ · · · ⊕ G n . The group s∈S G s has the universal mapping property described in Proposition 4.17 and pictured in Figure 4.4.

138

IV. Groups and Group Actions

Proposition 4.17 (universal mapping property of external direct sum). Let {G s | s ∈ S} be a nonempty set of abelian groups, and let s∈S G s be the external direct sum, the associated group homomorphisms being the embedding mappings i s0 : G s0 → s∈S G s . If H is any abelian group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : G s → H , then there exists a unique group homomorphism ϕ : s∈S G s → H such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. G s0 ⏐ ⏐ i s0 s∈S

ϕs

−−−→ H ϕ

Gs

FIGURE 4.4. Universal mapping property of an external direct sum of abelian groups. PROOF. Existence of ϕ is proved by taking ϕ {gs }s∈S = s ϕs (gs ). The sum on the right side is meaningful since the element {gs }s∈S of the direct sum has only ﬁnitely many nonzero coordinates. Since H is abelian, the computation ϕ {gs }s∈S + ϕ {gs }s∈S = s ϕs (gs ) + s ϕs (gs ) = s (ϕs (gs ) + ϕs (gs )) = s ϕs (gs + gs ) = ϕ {gs + gs }s∈S = ϕ {gs }s∈S + {gs }s∈S shows that ϕ is a homomorphism. If gs0 is given and {gs }s∈S denotes the elth ement that is g s0 in the s0 coordinate and is 0 elsewhere, then ϕ(i s0 (gs0 )) = ϕ {gs }s∈S = s ϕs (gs ), and the right side equals ϕs0 (gs0 ) since gs = 0 for all other s’s. Thus ϕ ◦ i s0 = ϕ s0 . For uniqueness let ϕ : s∈S G s → H be a homomorphism with ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. Then the value of ϕ is determined at all elements of s∈S G s that are 0 in all but one coordinate. Since the most general member of s∈S G s is a ﬁnite sum of such elements, ϕ is determined on all of s∈S G s . Now we give an abstract deﬁnition of direct sum that allows for the possibility that the direct sum is “internal” in the sense that the various constituents are identiﬁed as subgroups of a given group. Again the deﬁnition is by means of a universal mapping property and will be seen to characterize the direct sum up to canonical isomorphism. Let S be an arbitrary nonempty set of abelian groups, and let G s be the group corresponding to the member s of S. A direct sum of the G s ’s consists of an abelian group G and a system of group homomorphisms i s : G s → G for s ∈ S with the following universal mapping property: whenever H is an abelian group and {ϕs | s ∈ S} is a system of group homomorphisms

3. Direct Products and Direct Sums

139

ϕs : G s → H , then there exists a unique group homomorphism ϕ : G → H such that ϕ ◦ i s = ϕs for all s ∈ S. Proposition 4.17 proves existence of a direct sum, and the next proposition addresses uniqueness. A direct sum is internal if each G s is a subgroup of G and each mapping i s is the inclusion mapping. ϕs

G s −−−→ H ⏐ ⏐ ϕ is G FIGURE 4.5. Universal mapping property of a direct sum of abelian groups. Proposition 4.18. Let S be a nonempty set of abelian groups, and let G s be the group corresponding to the member s of S. If (G, {i s }) and (G , {i s }) are two direct sums, then the homomorphisms i s : G s → G and i s : G s → G are one-one, there exists a unique homomorphism : G → G such that i s = ◦ i s for all s ∈ S, and is an isomorphism. PROOF. In Figure 4.5 let H = G and ϕs = i s . If : G → G is the homomorphism produced by the fact that G is a direct sum, then we have ◦ i s = i s for all s. Reversing the roles of G and G , we obtain a homomorphism : G → G with ◦ i s = i s for all s. Therefore ( ◦ ) ◦ i s = ◦ i s = i s . In Figure 4.5 we next let H = G and ϕs = i s for all s. Then the identity 1G on G has the same property 1G ◦ i s = i s relative to all i s that ◦ has, and the uniqueness says that ◦ = 1G . Reversing the roles of G and G , we obtain ◦ = 1G . Therefore is an isomorphism. For uniqueness suppose that : G → G is another homomorphism with i s = ◦ i s for all s ∈ S. Then the argument of the previous paragraph shows that ◦ = 1G . Applying on the left gives = ( ◦ ) ◦ = ◦ ( ◦ ) = ◦ 1G = . Thus = . Finally we have to show that the s th mapping of a direct sum is one-one on G s . Itis enough to show that i s is one-one on G s . Taking G as the external direct sum s∈S G s with i s equal to the embedding mapping, form the isomorphism : G → G that has just been proved to exist. This satisﬁes i s = ◦ i s for all s ∈ S. Since i s is one-one, i s must be one-one. EXAMPLE. The group Q× is the direct sum of copies of Z, one for each prime, plus one copy of Z/2Z. If p is a prime, the mapping i p : Z → Q× is given by i p (n) = p n . The remaining coordinate gives the sign. The isomorphism results from unique factorization, only ﬁnitely many primes being involved for any particular nonzero rational number.

140

IV. Groups and Group Actions

4. Rings and Fields In this section we begin a two-section digression in order to develop some more number theory beyond what is in Chapter I and to make some deﬁnitions as new notions arise. In later sections of the present chapter, some of this material will yield further examples of concrete groups and tools for working with them. We begin with the additive group Z/mZ of integers modulo a positive integer m. We continue to write [a] for the equivalence class of the integer a when it is helpful to do so. Our interest will be in the multiplication structure that Z/mZ inherits from multiplication in Z. Namely, we attempt to deﬁne [a][b] = [ab]. To see that this formula is meaningful in Z/mZ, we need to check that the same equivalence class results on the right side if the representatives of [a] and [b] are changed. Thus let [a] = [a ] and [b] = [b ]. Then m divides a − a and b − b and must divide the sum of products (a − a )b + a (b − b ) = ab − a b . Consequently [ab] = [a b ], and multiplication is well deﬁned. If x and y are in Z/mZ, their product is often denoted by x y mod m. The same kind of argument as just given shows that the associativity of multiplication in Z and the distributive laws imply corresponding facts about Z/mZ. The result is that Z/mZ is a “commutative ring with identity” in the sense of the following deﬁnitions. A ring is a set R with two operations R × R → R, usually called addition and multiplication and often denoted by (a, b) → a + b and (a, b) → ab, such that (i) R is an abelian group under addition, (ii) multiplication is associative in the sense that a(bc) = (ab)c for all a, b, c in R, (iii) the two distributive laws a(b + c) = (ab) + (ac)

and

(b + c)a = (ba) + (ca)

hold for all a, b, c in R. The additive identity is denoted by 0, and the additive inverse of a is denoted by −a. A sum a + (−b) is often abbreviated a − b. By convention when parentheses are absent, multiplications are to be carried out before additions and subtractions. Thus the distributive laws may be rewritten as a(b + c) = ab + ac

and

(b + c)a = ba + ca.

A ring R is called a commutative ring if multiplication satisﬁes the commutative law (iv) ab = ba for all a and b in R.

4. Rings and Fields

141

A ring R is called a ring with identity6 if there exists an element 1 such that 1a = a1 = a for all a in R. It is immediate from the deﬁnitions that • 0a = 0 and a0 = 0 in any ring (since, in the case of the ﬁrst formula, 0 = 0a − 0a = (0 + 0)a − 0a = 0a + 0a − 0a = 0a), • the multiplicative identity is unique in a ring with identity (since 1 = 1 1 = 1), • (−1)a = −a = a(−1) in any ring with identity (partly since 0 = 0a = (1 + (−1))a = 1a + (−1)a = a + (−1)a). In a ring with identity, it will be convenient not to insist that the identity be different from the zero element 0. If 1 and 0 do happen to coincide in R, then it readily follows that 0 is the only element of R, and R is said to be the zero ring. The set Z of integers is a basic example of a commutative ring with identity. Returning to Z/mZ, suppose now that m is a prime p. If [a] is in Z/ pZ with a in {1, 2, . . . , p − 1}, then GCD(a, p) = 1 and Proposition 1.2 produces integers r and s with ar + ps = 1. Modulo p, this equation reads [a][r ] = [1]. In other words, [r ] is a multiplicative inverse of [a]. The result is that Z/ pZ, when p is a prime, is a “ﬁeld” in the sense of the following deﬁnition. A ﬁeld F is a commutative ring with identity such that F = 0 and such that (v) to each a = 0 in F corresponds an element a −1 in F such that aa −1 = 1. In other words, F× = F − {0} is an abelian group under multiplication. Inverses are necessarily unique as a consequence of one of the properties of groups. When p is prime, we shall write F p for the ﬁeld Z/ pZ. Its multiplicative group F× p has order p − 1, and Lagrange’s Theorem (Corollary 4.8) immediately implies that a p−1 ≡ 1 mod p whenever a and p are relatively prime. This result is known as Fermat’s Little Theorem.7 For general m, certain members of Z/mZ have multiplicative inverses. The product of two such elements is again one, and the inverse of one is again one. Thus, even though Z/mZ need not be a ﬁeld, the subset (Z/mZ)× of members of Z/mZ with multiplicative inverses is a group. The same argument as when m is prime shows that the class of a has an inverse if and only if GCD(a, m) = 1. The number of such classes was deﬁned in Chapter I in terms of the Euler ϕ function as ϕ(m), and a formula for ϕ(m) was obtained in Corollary 1.10. The 6 Some authors, particularly when discussing only algebra, ﬁnd it convenient to incorporate the existence of an identity into the deﬁnition of a ring. However, in real analysis some important natural rings do not have an identity, and the theory is made more complicated by forcing an identity into the picture. For example the space of integrable functions on R forms a very natural ring, with convolution as multiplication, and there is no identity; forcing an identity into the picture in such a way that the space remains stable under translations makes the space large and unwieldy. The distinction between working with rings and working with rings with identity will be discussed further in Section 11. 7 As opposed to Fermat’s Last Theorem, which lies deeper.

IV. Groups and Group Actions

142

conclusion is that (Z/mZ)× is an abelian group of order ϕ(m). Application of Lagrange’s Theorem yields Euler’s generalization of Fermat’s Little Theorem, namely that a ϕ(m) ≡ 1 mod m for every positive integer m and every integer a relatively prime to m. More generally, in any ring R with identity, a unit is deﬁned to be any element a such that there exists an element a −1 with aa −1 = a −1 a = 1. The element a −1 is unique if it exists8 and is called the multiplicative inverse of a. The units of R form a group denoted by R × . For example the group Z× consists of +1 and −1, and the zero ring R has R × = {0}. If R is a nonzero ring, then 0 is not in R × . Here are some further examples of ﬁelds. EXAMPLES OF FIELDS. (1) Q, R, and C. These are all ﬁelds. (2) Q[θ]. This was introduced between Examples 8 and 9 of Section 1. It is assumed that θ is a complex number and that there exists an integer n > 0 such that the complex numbers 1, θ, θ 2 , . . . , θ n are linearly dependent over Q. The set Q[θ] is deﬁned to be the linear span over Q of all powers 1, θ, θ 2 , . . . of θ, which is the same as the linear span of the ﬁnite set 1, θ, θ 2 , . . . , θ n−1 . The set Q[θ] was shown in Proposition 4.1 to be a subset of C that is closed under the arithmetic operations, including the passage to reciprocals in the case of the nonzero elements. It is therefore a ﬁeld. (3) A ﬁeld of 4 elements. Let F4 = {0, 1, θ, θ +1}, where θ is some symbol not standing for 0 or 1. Deﬁne addition in F4 and multiplication in F× 4 by requiring that a + 0 = 0 + a = a for all a, that 1 + 1 = 0,

1 + θ = (θ + 1),

1 + (θ + 1) = θ,

θ + 1 = (θ + 1),

θ + θ = 0,

θ + (θ + 1) = 1,

(θ + 1) + θ = 1,

(θ + 1) + (θ + 1) = 0,

11 = 1,

1θ = θ,

1(θ + 1) = (θ + 1),

θ1 = θ,

θ θ = (θ + 1),

θ (θ + 1) = 1,

(θ + 1) + 1 = θ, and that

(θ + 1)1 = (θ + 1),

(θ + 1)θ = 1,

(θ + 1)(θ + 1) = θ.

The result is a ﬁeld. With this direct approach a certain amount of checking is necessary to verify all the properties of a ﬁeld. We shall return to this matter in Chapter IX when we consider ﬁnite ﬁelds more generally, and we shall then have a way of constructing F4 that avoids tedious checking. 8 In fact, if b and c exist with ab = ca = 1, then a is a unit with a −1 = b = c because b = 1b = (ca)b = c(ab) = c1 = c.

4. Rings and Fields

143

In analogy with the theory of groups, we deﬁne a subring of a ring to be a nonempty subset that is closed under addition, negation, and multiplication. The set 2Z of even integers is a subring of the ring Z of integers. A subﬁeld of a ﬁeld is a subset containing 0 and 1 that is closed under addition, negation, multiplication, and multiplicative inverses for its nonzero elements. The set Q of rationals is a subﬁeld of the ﬁeld R of reals. Intermediate between rings and ﬁelds are two kinds of objects—integral domains and division rings—that arise frequently enough to merit their own names. The setting for the ﬁrst is a commutative ring R. A nonzero element a of R is called a zero divisor if there is some nonzero b in R with ab = 0. For example the element 2 in the ring Z/6Z is a zero divisor because 2 · 3 = 0. An integral domain is a nonzero commutative ring with identity having no zero divisors. Fields have no zero divisors since if a and b are nonzero, then ab = 0 would force b = 1b = (a −1 a)b = a −1 (ab) = a −1 0 = 0 and would give a contradiction; therefore every ﬁeld is an integral domain. The ring of integers Z is another example of an integral domain, and the polynomial rings Q[X ] and R[X ] and C[X ] introduced in Section I.3 are further examples. A cancellation law for multiplication holds in any integral domain: ab = ac with a = 0

implies

b = c.

In fact, ab = ac implies a(b − c) = 0; since a = 0, b − c must be 0. The other object with its own name is a division ring, which is a nonzero ring with identity such that every nonzero element is a unit. The commutative division rings are the ﬁelds, and we have encountered only one noncommutative division ring so far. That is the set H of quaternions, which was introduced in Section 1. Division rings that are not ﬁelds will play only a minor role in this book but are of importance in Advanced Algebra. Let us turn to mappings. A function ϕ : R → R between two rings is an isomorphism of rings if ϕ is one-one onto and satisﬁes ϕ(a + b) = ϕ(a) + ϕ(b) and ϕ(ab) = ϕ(a)ϕ(b) for all a and b in R. In other words, ϕ is to be an isomorphism of the additive groups and to satisfy ϕ(ab) = ϕ(a)ϕ(b). Such a mapping carries the identity, if any, in R to the identity of R . The relation “is isomorphic to” is an equivalence relation. Common notation for an isomorphism of rings is R ∼ = R ; because of the symmetry, one can say that R and R are isomorphic. A function ϕ : R → R between two rings is a homomorphism of rings if ϕ satisﬁes ϕ(a + b) = ϕ(a) + ϕ(b) and ϕ(ab) = ϕ(a)ϕ(b) for all a and b in R. In other words, ϕ is to be a homomorphism of the additive groups and to satisfy ϕ(ab) = ϕ(a)ϕ(b). EXAMPLES OF HOMOMORPHISMS OF RINGS. (1) The mapping ϕ : Z → Z/mZ given by ϕ(k) = k mod m.

144

IV. Groups and Group Actions

(2) The evaluation mapping ϕ : R[X ] → R given by P(X ) → P(r ) for some ﬁxed r in R. (3) Mappings with the direct product Z×Z. The additive group Z×Z becomes a commutative ring with identity under coordinate-by-coordinate multiplication, namely (a, a ) + (b, b ) = (a + b, a + b ). The identity is (1, 1). Projection (a, a ) → a to the ﬁrst coordinate is a homomorphism of rings Z × Z → Z that carries identity to identity. Inclusion a → (a, 0) of Z into the ﬁrst coordinate is a homomorphism of rings Z → Z × Z that does not carry identity to identity.9 Proposition 4.19. If R is a ring with identity 1 R , then there exists a unique homomorphism of rings ϕ1 : Z → R such that ϕ(1) = 1 R . PROOF. The formulas for manipulating exponents of an element in a group, when translated into the additive notation for addition in R, say that n → nr satisﬁes (m + n)r = mr + nr and (mn)r = m(nr ) for all r in R and all integers m and n. The ﬁrst of these formulas implies, for any r in R, that ϕr (n) = nr is a homomorphism between the additive groups of Z and R, and it is certainly uniquely determined by its value for n = 1. The distributive laws imply that ψr (r ) = r r is another homomorphism of additive groups. Hence ψr ◦ ϕr and ϕr r are homomorphisms between the additive groups of Z and R. Since (ψr ◦ ϕr )(1) = ψr (r ) = r r = ϕr r (1), we must have (ψr ◦ ϕr )(m) = ϕr r (m) for all integers m. Thus (mr )r = m(r r ) for all m. Putting r = n1 R and r = 1 R proves the fourth equality of the computation ϕ1 (mn) = (mn)1 R = m(n1 R ) = m(1 R (n1 R )) = (m1 R )(n1 R ) = ϕ1 (m)ϕ1 (n), and shows that ϕ1 is in fact a homomorphism of rings.

The image of a homomorphism ϕ : R → R of rings is a subring of R , as is easily checked. The kernel turns out to be more than just of subring of R. If a is in the kernel and b is any element of R, then ϕ(ab) = ϕ(a)ϕ(b) = 0ϕ(b) = 0 and similarly ϕ(ba) = 0. Thus the kernel of a ring homomorphism is closed under products of members of the kernel with arbitrary members of R. Adapting a deﬁnition to this circumstance, one says that an ideal I of R (or two-sided ideal in case of ambiguity) is an additive subgroup such that ab and ba are in I whenever a is in I and b is in R. Brieﬂy then, the kernel of a homomorphism of rings is an ideal. Conversely suppose that I is an ideal in a ring R. Since I is certainly an additive subgroup of an abelian group, we can form the additive quotient group 9 Sometimes authors who build the existence of an identity into the deﬁnition of “ring” insist as a matter of deﬁnition that homomorphisms of rings carry identity to identity. Such authors would then exclude this particular mapping from consideration as a homomorphism.

4. Rings and Fields

145

R/I . It is customary to write the individual cosets in additive notation, thus as r + I . In analogy with Proposition 4.10, we have the following result for the present context. Proposition 4.20. If I is an ideal in a ring R, then a well-deﬁned operation of multiplication is obtained within the additive group R/I by the deﬁnition (r1 + I )(r2 + I ) = r1r2 + I , and R/I becomes a ring. If R has an identity 1, then 1 + I is an identity in R/I . With these deﬁnitions the function q : R → R/I given by q(r ) = r + I is a ring homomorphism of R onto R/I with kernel I . Consequently every ideal of R is the kernel of some homomorphism of rings. REMARKS. When I is an ideal, the ring R/I is called a quotient ring10 of R, and the homomorphism q : R → R/I is called the quotient homomorphism. In the special case that R = Z and I = mZ, the construction of R/I reduces to the construction of Z/mZ as a ring at the beginning of this section. PROOF. If we change the representatives of the cosets from r1 and r2 to r1 + i 1 and r2 + i 2 with i 1 and i 2 in I , then (r1 + i 1 )(r2 + i 2 ) = r1r2 + (i 1r1 + r1 i 2 + i 1 i 2 ) is in r1r2 + I by the closure properties of I . Hence multiplication is well deﬁned. The associativity of this multiplication follows from the associativity of multiplication in R because (r1 + I )(r2 + I ) (r3 + I ) = (r1r2 + I )(r3 + I ) = (r1r2 )r3 + I = r1 (r2r3 ) + I = (r1 + I )(r2r3 + I ) = (r1 + I ) (r2 + I )(r3 + I ) . Similarly the computation (r1 + I ) (r2 + I ) + (r3 + I ) = r1 (r2 + r3 ) + I = (r1r2 + r1r3 ) + I = (r1 + I )(r2 + I ) + (r1 + I )(r3 + I ) yields one distributive law, and the other distributive law is proved in the same way. If R has an identity 1, then (1 + I )(r + I ) = 1r + I = r + I and (r + I )(1 + I ) = r 1 + I = r + I show that 1 + I is an identity in R/I . Finally we know that the quotient map q : R → R/I is a homomorphism of additive groups, and the computation q(r1r2 ) = r1r2 + I = (r1 + I )(r2 + I ) = q(r1 )q(r2 ) shows that q is a homomorphism of rings. EXAMPLES OF IDEALS. (1) The ideals in the ring Z coincide with the additive subgroups and are the sets mZ; the reason each mZ is an ideal is that if a and b are integers and m divides a, then m divides ab. 10 Quotient rings are known also as “factor rings.” A “ring of quotients,” however, is something different.

146

IV. Groups and Group Actions

(2) The ideals in a ﬁeld F are 0 and F itself, no others; in fact, if a = 0 is in an ideal and b is in F, then the equality b = (ba −1 )a shows that b is in the ideal and that the ideal therefore contains all elements of F. (3) If R is Q[X ] or R[X ] or C[X ], then every ideal I is of the form I = R f (X ) for some polynomial f (X ). In fact, we can take f (X ) = 0 if I = 0. If I = 0, let f (X ) be a nonzero member of I of lowest possible degree. If A(X ) is in I , then Proposition 1.12 shows that A(X ) = f (X )B(X ) + C(X ) with C(X ) = 0 or deg C < deg f . The equality C(X ) = A(X ) − f (X )B(X ) shows that C(X ) is in I , and the minimality of deg f implies that C(X ) = 0. Thus A(X ) = f (X )B(X ). (4) In a ring R with identity 1, an ideal I is a proper subset of R if and only if 1 is not in I . In fact, I is certainly a proper subset if 1 is not in I . In the converse direction if 1 is in I , then every element r = r 1, for r in R, lies in I . Hence I = R, and I is not a proper subset. In analogy with what was shown for vector spaces in Proposition 2.25 and for groups in Proposition 4.11, quotients in the context of rings allow for the factorization of certain homomorphisms of rings. The appropriate result is stated as Proposition 4.21 and is pictured in Figure 4.6. Proposition 4.21. Let ϕ : R1 → R2 be a homomorphism of rings, let I0 = ker ϕ, let I be an ideal of R1 contained in I0 , and let q : R1 → R1 /I be the quotient homomorphism. Then there exists a homomorphism of rings ϕ : R1 /I → R2 such that ϕ = ϕ ◦ q, i.e., ϕ(r1 + I ) = ϕ(r1 ). It has the same image as ϕ, and ker ϕ = {r + I | r ∈ I0 }. R1 ⏐ ⏐ q

ϕ

−−−→ R2 ϕ

R1 /I FIGURE 4.6. Factorization of homomorphisms of rings via the quotient of a ring by an ideal. REMARK. One says that ϕ factors through R1 /I or descends to R1 /I . PROOF. Proposition 4.11 shows that ϕ descends to a homomorphism ϕ of the additive group of R1 /I into the additive group of R2 and that all the other conclusions hold except possibly for the fact that ϕ respects multiplication. To see that ϕ respects multiplication, we just compute that ϕ((r + I )(r + I )) = ϕ(rr + I ) = ϕ(rr ) = ϕ(r )ϕ(r ) = ϕ(r + I )ϕ(r + I ).

5. Polynomials and Vector Spaces

147

An example of special interest occurs when ϕ is a homomorphism of rings ϕ : Z → R and the ideal mZ of Z is contained in the kernel of ϕ. Then the proposition says that ϕ descends to a homomorphism of rings ϕ : Z/mZ → R. We shall make use of this result shortly. But ﬁrst let us state a different special case as a corollary. Corollary 4.22. Let ϕ : R1 → R2 be a homomorphism of rings, and suppose that ϕ is onto R2 and has kernel I . Then ϕ exhibits the ring R1 /I as canonically isomorphic to R2 . PROOF. Take I = I0 in Proposition 4.21, and form ϕ : R1 /I → R2 with ϕ = ϕ ◦ q. The proposition shows that ϕ is onto R2 and has trivial kernel, i.e., the identity element of R1 /I . Having trivial kernel, ϕ is one-one. Proposition 4.23. Any ﬁeld F contains a subﬁeld isomorphic to the rationals Q or to some ﬁeld F p with p prime. REMARKS. The subﬁeld in the proposition is called the prime ﬁeld of F. The characteristic of F is deﬁned to be 0 if the prime ﬁeld is isomorphic to Q and to be p if the prime ﬁeld is isomorphic to F p . PROOF. Proposition 4.19 produces a homomorphism of rings ϕ1 : Z → F with ϕ1 (1) = 1. The kernel of ϕ1 is an ideal, necessarily of the form mZ with m an integer ≥ 0, and the image of ϕ1 is a commutative subring with identity in F. Let ϕ 1 : Z/mZ → F be the descended homomorphism given by Proposition 4.21. The integer m cannot factor nontrivially, say as m = r s, because otherwise ϕ 1 (r ) and ϕ 1 (s) would be nonzero members of F with ϕ 1 (r )ϕ 1 (s) = ϕ 1 (r s) = ϕ 1 (0) = 0, in contradiction to the fact that a ﬁeld has no zero divisors. Thus m is prime or m is 0. If m is a prime p, then Z/ pZ is a ﬁeld, and the image of ϕ 1 is the required subﬁeld of F. Thus suppose that m = 0. Then ϕ1 is one-one, and F contains a subring with identity isomorphic to Z. Deﬁne a function 1 : Q → F by saying that if k and l are integers with l = 0, then 1 (kl −1 ) = ϕ1 (k)ϕ1 (l)−1 . This is well deﬁned because ϕ1 (l) = 0 and because k1l1−1 = k2l2−1 implies k1l2 = k2l1 and hence ϕ1 (k1 )ϕ1 (l2 ) = ϕ1 (k2 )ϕ1 (l1 ) and ϕ1 (k1 )ϕ1 (l1 )−1 = ϕ1 (k2 )ϕ1 (l2 )−1 . We readily check that 1 is a homomorphism with kernel 0. Then F contains the subﬁeld 1 (Q) isomorphic to Q.

5. Polynomials and Vector Spaces In this section we complete the digression begun in Section 4. We shall be using the elementary notions of rings and ﬁelds established in Section 4 in order to

148

IV. Groups and Group Actions

work with (i) polynomials over any commutative ring with identity and (ii) vector spaces over arbitrary ﬁelds. It is an important observation that a good deal of what has been proved so far in this book concerning polynomials when F is Q or R or C remains valid when F is any ﬁeld. Speciﬁcally all the results in Section I.3 through Theorem 1.17 on the topic of polynomials in one indeterminate remain valid as long as the coefﬁcients are from a ﬁeld. The theory breaks down somewhat when one tries to extend it by allowing coefﬁcients that are not in a ﬁeld or by allowing more than one indeterminate. Because of this circumstance and because we have not yet announced a universal mapping property for polynomial rings and because we have not yet addressed the several-variable case, we shall brieﬂy review matters now while extending the reach of the theory that we have. Let R be a nonzero commutative ring with identity, so that 1 = 0. A polynomial in one indeterminate is to be an expression P(X ) = an X n +· · ·+a2 X 2 +a1 X +a0 in which X is a symbol, not a variable. Nevertheless, the usual kinds of manipulations with polynomials are to be valid. This description lacks precision because X has not really been deﬁned adequately. To make a precise deﬁnition, we remove X from the formalism and simply deﬁne the polynomial to be the tuple (a0 , a1 , . . . , an , 0, 0, . . . ) of its coefﬁcients. Thus a polynomial in one indeterminate with coefﬁcients in R is an inﬁnite sequence of members of R such that all terms of the sequence are 0 from some point on. The indexing of the sequence is to begin with 0, and X is to refer to the polynomial (0, 1, 0, 0, . . . ). We may refer to a polynomial P as P(X ) if we want to emphasize that the indeterminate is called X . Addition and negation of polynomials are deﬁned in coordinate-by-coordinate fashion by (a0 , a1 , . . . , an , 0, 0, . . . ) + (b0 ,b1 , . . . , bn , 0, 0, . . . ) = (a0 + b0 , a1 + b1 , . . . , an + bn , 0, 0, . . . ), −(a0 , a1 , . . . , an , 0, 0, . . . ) = (−a0 , −a1 , . . . , −an , 0, 0, . . . ), and the set R[X ] of polynomials is then an abelian group isomorphic to the direct sum of inﬁnitely many copies of the additive group of R. As in Section I.3, X n is to be the polynomial whose coefﬁcients are 1 in the n th position, with n ≥ 0, and 0 in all other positions. Polynomial multiplication is then deﬁned so as to match multiplication of expressions an X n + · · · + a1 X + a0 if the product is expanded out, powers of X are added, and the terms containing like powers of X are collected. Thus the precise deﬁnition is that (a0 , a1 , . . . , 0, 0, . . . )(b0 , b1 , . . . , 0, 0, . . . ) = (c0 , c1 , . . . , 0, 0, . . . ), N ak b N −k . It is a simple matter to check that this multiplication where c N = k=0 makes R[X ] into a commutative ring.

5. Polynomials and Vector Spaces

149

The polynomial with all entries 0 is denoted by 0 and is called the zero polynomial. For all polynomials P = (a0 , . . . , an , 0, . . . ) other than 0, the degree of P, denoted by deg P, is deﬁned to be the largest index n such that an = 0. In this case, an is called the leading coefﬁcient, and an X n is called the leading term; if an = 1, the polynomial is called monic. The usual convention with the 0 polynomial is either to leave its degree undeﬁned or to say that the degree is −∞; let us follow the latter approach in this section in order not to have to separate certain formulas into cases. There is a natural one-one homomorphism of rings ι : R → R[X ] given by ι(c) = (c, 0, 0, . . . ) for c in R. This sends the identity of R to the identity of R[X ]. Thus we can identify R with the constant polynomials, i.e., those of degree ≤ 0. If P and Q are nonzero polynomials, then deg(P + Q) ≤ max(deg P, deg Q). In this formula equality holds if deg P = deg Q. In the case of multiplication, let P and Q have respective leading terms am X m and bn X n . All the coefﬁcients of P Q are 0 beyond the (m + n)th , and the (m + n)th is am bn . This in principle could be 0 but is nonzero if R is an integral domain. Thus P and Q nonzero implies ≤ deg P + deg Q for general R, deg(P Q) = deg P + deg Q if R is an integral domain. It follows in particular that R[X ] is an integral domain if R is. Normally we shall write out speciﬁc polynomials using the informal notation with powers of X , using the more precise notation with tuples only when some ambiguity might otherwise result. In the special case that R is a ﬁeld, Section I.3 introduced the notion of evaluation of a polynomial P(X ) at a point r in the ﬁeld, thus providing a mapping P(X ) → P(r ) from R[X ] to R for each r in R. We listed a number of properties of this mapping, and they can be summarized in our present language by the statement that the mapping is a homomorphism of rings. Evaluation is a special case of a more sweeping property of polynomials given in the next proposition as a universal mapping property of R[X ]. Proposition 4.24. Let R be a nonzero commutative ring with identity, and let ι : R → R[X ] be the identiﬁcation of R with constant polynomials. If T is any commutative ring with identity, if ϕ : R → T is a homomorphism of rings sending 1 into 1, and if t is in T , then there exists a unique homomorphism of rings : R[X ] → T carrying identity to identity such that (ι(r )) = ϕ(r ) for all r ∈ R and (X ) = t.

150

IV. Groups and Group Actions

REMARKS. The mapping is called the substitution homomorphism extending ϕ and substituting t for X , and the mapping is written P(X ) → P ϕ (t). The notation means that ϕ is to be applied to the coefﬁcients of P and then X is to be replaced by t. A diagram of this homomorphism as a universal mapping property appears in Figure 4.7. In the special case that T = R and ϕ is the identity, reduces to evaluation at t, and the mapping is written P(X ) → P(t), just as in Section I.3. R ⏐ ⏐ ι

ϕ

−−−→ T

R[X ] FIGURE 4.7. Substitution homomorphism for polynomials in one indeterminate. PROOF. Deﬁne (a0 , a1 , . . . , an , 0, . . . ) = ϕ(a0 ) + ϕ(a1 )t + · · · + ϕ(an )t n . It is immediate that is a homomorphism of rings sending the identity ι(1) = (1, 0, 0, . . . ) of R[X ] to the identity ϕ(1) of T . If r is in R, then (ι(r )) = (r, 0, 0, . . . ) = ϕ(r ). Also, (X ) = (0, 1, 0, 0, . . . ) = t. This proves existence. Uniqueness follows since ι(R) and X generate R[X ] and since a homomorphism deﬁned on R[X ] is therefore determined by its values on ι(R) and X . The formulation of the proposition with the general ϕ : R → T , rather than just the identity mapping on R, allows several kinds of applications besides the routine evaluation mapping. An example of one kind occurs when R = C[X ] and ϕ : C → C[X ] is the composition of complex conjugation on C followed by the identiﬁcation of complex numbers with constant polynomials in C[X ]; the proposition then says that complex conjugation of the coefﬁcients of a member of C[X ] is a ring homomorphism. This observation simpliﬁes the solution of Problem 7 in Chapter I. Similarly one can set up matters so that the proposition shows the passage from Z[X ] to (Z/mZ)[X ] by reduction of coefﬁcients modulo m to be a ring homomorphism. Still a third kind of application is to take T in the proposition to be a ring with the same kind of universal mapping property that R[X ] has, and the consequence is an abstract characterization of R[X ]. We carry out the details below as Proposition 4.25. This result will be applied later in this section to the several-indeterminate case to show that introducing several indeterminates at once yields the same ring, up to canonical isomorphism, as introducing them one at a time. Proposition 4.25. Let R and S be nonzero commutative rings with identity, let X be an element of S, and suppose that ι : R → S is a one-one ring

5. Polynomials and Vector Spaces

151

homomorphism of R into S carrying 1 to 1. Suppose further that (S, ι , X ) has the following property: whenever T is a commutative ring with identity, ϕ : R → T is a homomorphism of rings sending 1 into 1, and t is in T , then there exists a unique homomorphism : S → T carrying identity to identity such that (ι (r )) = ϕ(r ) for all r ∈ R and (X ) = t. Then there exists a unique homomorphism of rings : R[X ] → S such that ◦ ι = ι and (X ) = X , and is an isomorphism. REMARK. A somewhat weaker conclusion than in the proposition is that any triple (S, ι , X ) having the same universal mapping property as (R[X ], ι, X ) is isomorphic to (S, ι , X ), the isomorphism being unique. PROOF. In the universal mapping property for S, take T = R[X ], ϕ = ι, and t = X . The hypothesis gives us a ring homomorphism : S → R[X ] with (1) = 1, ◦ ι = ι, and (X ) = X . Next apply Proposition 4.24 with T = S, ϕ = ι , and t = X . We obtain a ring homomorphism : R[X ] → S with (1) = 1, ◦ ι = ι , and (X ) = X . Then ◦ is a ring homomorphism from R[X ] to itself carrying 1 to 1, ﬁxing X , and having ◦ ι(R) = ι. From the uniqueness in Proposition 4.24 when T = R[X ], ϕ = ι, and t = X , we see that ◦ is the identity on R[X ]. Reversing the roles of and and applying the uniqueness in the universal mapping property for S, we see that ◦ is the identity on S. Therefore may be taken as the isomorphism in the statement of the proposition. This proves existence for , and uniqueness follows since ι(R) and X together generate R[X ] and since is a homomorphism. If P is a polynomial over R in one indeterminate and r is in R, then r is a root of P if P(r ) = 0. We know as a consequence of Corollary 1.14 that for any prime p, any polynomial in F p [X ] of degree n ≥ 1 has at most n roots. This result does not extend to Z/mZ for all positive integers m: when m = 8, the polynomial X 2 − 1 has 4 roots, namely 1, 3, 5, 7. This result about F p [X ] has the following consequence. Proposition 4.26. If F is a ﬁeld, then any ﬁnite subgroup of the multiplicative group F× is cyclic. PROOF. Let C be a subgroup of F× of ﬁnite order n. Lagrange’s Theorem (Corollary 4.8) shows that the order of each element of C divides n. With h deﬁned as the maximum order of an element of C, it is enough to show that h = n. Let a be an element of order h. The polynomial X h − 1 has at most h roots by Corollary 1.14, and a is one of them, by deﬁnition of “order.” If h < n, then it follows that some member b of C is not a root of X h − 1. The order h of b is then a divisor of n but cannot be a divisor of h since otherwise we would have bh = (bh )h/ h = 1h/ h = 1. Consequently there exists a prime p such that

152

IV. Groups and Group Actions

some power pr of p divides h but not h. Let s < r be the exact power of p s dividing h, and write h = mp s , so that GCD(m, pr ) = 1 and a = a p has order m. Put q = h / pr , so that b = bq has order pr . The proof will be completed by showing that c = a b has order mpr = hpr −s > h, in contradiction to the maximality of h. r r r Let t be the order of c. On the one hand, from cmp = (a )mp (b )mp = r −s r r −s r −s a hp bmp q = a hp bmh = (a h ) p (bh )m = 1, we see that t divides mpr . On t t the other hand, 1 = c says that (a ) = (b )−t . Raising both sides to the pr r r power gives 1 = ((b ) p )−t = (a )t p , and hence m divides t pr ; by Corollary 1.3, m divides t. Raising both sides of (a )t = (b )−t to the m th power gives 1 = ((a )m )t = (b )−tm , and hence pr divides tm; by Corollary 1.3, pr divides t. Applying Corollary 1.4, we conclude that mpr divides t. Therefore t = mpr , and the proof is complete. Corollary 4.27. The multiplicative group of a ﬁnite ﬁeld is cyclic. PROOF. This is a special case of Proposition 4.26.

A ﬁnite ﬁeld F can have a nonzero polynomial that is 0 at every element of F. Indeed, every element of F p is a root of X p − X , as a consequence of Fermat’s Little Theorem. It is for this reason that it is unwise to confuse a polynomial in an indeterminate with a “polynomial function.” Let us make the notion of a polynomial function of one variable rigorous. If P(X ) is a polynomial with coefﬁcients in the commutative ring R with identity, then Proposition 4.24 gives us an evaluation homomorphism P → P(r ) for each r in R. The function r → P(r ) from R into R is the polynomial function associated to the polynomial P. This function is a member of the commutative ring of all R-valued functions on R, and the mapping P → r → P(r ) is a homomorphism of rings. What we know from Corollary 1.14 is that this homomorphism is one-one if R is an inﬁnite ﬁeld. A negative result is that if R is a ﬁnite commutative ring with identity, then r ∈R (X − r ) is a polynomial that maps to the 0 function, and hence the homomorphism is not one-one. A more general positive result than the one above for inﬁnite ﬁelds is the following. Proposition 4.28. (a) If R is a nonzero commutative ring with identity and P(X ) is a member of R[X ] with a root r , then P(X ) = (X − r )Q(X ) for some Q(X ) in R[X ]. (b) If R is an integral domain, then a nonzero member of R[X ] of degree n has at most n roots. (c) If R is an inﬁnite integral domain, then the ring homomorphism of R[X ] to the ring of polynomial functions from R to R, given by evaluation, is one-one.

5. Polynomials and Vector Spaces

153

PROOF. For (a), we proceed by induction on the degree of P, the base case of the induction being degree ≤ 0. If the conclusion has been proved for degree < n with n ≥ 1, let the leading term of P be an X n . Then P(X ) = an (X −r )n + A(X ) with deg A < n. Evaluation at r gives, by virtue of Proposition 4.24, 0 = 0+ A(r ). By the inductive hypothesis, A(X ) = (X −r )B(X ). Then P(X ) = (X −r )Q(X ) with Q(X ) = an (X − r )n−1 + B(X ), and the induction is complete. For (b), let P(X ) have degree n with at least n + 1 distinct roots r1 , . . . , rn+1 . Part (a) shows that P(X ) = (X − r1 )P1 (X ) with deg P1 = n − 1. Also, 0 = P(r2 ) = (r2 − r1 )P1 (r2 ). Since r2 − r1 = 0 and since R has no zero divisors, P1 (r2 ) = 0. Part (a) then shows that P1 (X ) = (X − r2 )P2 (X ), and substitution gives P(X ) = (X − r1 )(X − r2 )P2 (X ). Continuing in this way, we obtain P(X ) = (X − r1 ) · · · (X − rn )Pn (X ) with deg Pn = 0. Since P = 0, Pn = 0. So Pn is a nonzero constant polynomial Pn (X ) = c = 0. Evaluating at rn+1 , we obtain 0 = (rn+1 − r1 ) · · · (rn+1 − rn )c with each factor nonzero, in contradiction to the fact that R is an integral domain. For (c), a polynomial in the kernel of the ring homomorphism has every member of R as a root. If R is inﬁnite, (b) shows that such a polynomial is necessarily the zero polynomial. Thus the kernel is 0, and the ring homomorphism has to be one-one. Let us turn our attention to polynomials in several indeterminates. Fix the nonzero commutative ring R with identity, and let n be a positive integer. Informally a polynomial over R in n indeterminates is to be a ﬁnite sum j r j1 ,..., jn X 11 · · · X njn j1 ≥0,..., jn ≥0

with each r j1 ,..., jn in R. To make matters precise, we work just with the system of coefﬁcients, just as in the case of one indeterminate. Let J be the set of integers ≥ 0, and let J n be the set of n-tuples of elements of J . A member of J n may be written as j = ( j1 , . . . , jn ). Addition of members of J n is deﬁned coordinate by coordinate. Thus j + j = ( j1 + j1 , . . . , jn + jn ) if j = ( j1 , . . . , jn ) and j = ( j1 , . . . , jn ). A polynomial in n indeterminates with coefﬁcients in R is a function f : J n → R such that f ( j) = 0 for only ﬁnitely many j ∈ J n . Temporarily let us write S for the set of all such polynomials for a particular n. If f and g are two such polynomials, their sum h and product k are the polynomials deﬁned by h( j) = f ( j) + g( j), f ( j)g( j ). k(i) = j+ j =i

Under these deﬁnitions, S is a commutative ring.

154

IV. Groups and Group Actions

Deﬁne a mapping ι : R → S by r ι(r )( j) = 0

if j = (0, . . . , 0), otherwise.

Then ι is a one-one homomorphism of rings, ι(0) is the zero element of S and is called simply 0, and ι(1) is a multiplicative identity for S. The polynomials in the image of ι are called the constant polynomials. For 1 ≤ k ≤ n, let ek be the member of J n that is 1 in the k th place and is 0 elsewhere. Deﬁne X k to be the polynomial that assigns 1 to ek and assigns 0 to all other members of J n . We say that X k is an indeterminate. If j = ( j1 , . . . , jn ) is in J n , deﬁne X j to be the product j

X j = X 11 · · · X njn . If r is in R, we allow ourselves to abbreviate ι(r )X j as r X j , and any such polynomial is called a monomial. The monomial r X j is the polynomial that assigns r to j and assigns 0 to all other members of J n . Then it follows immediately from the deﬁnitions that each polynomial has a unique expansion as a ﬁnite sum of nonzero monomials. Thus the most general member of S is of the form j∈J n r j X j with only ﬁnitely many nonzero terms. This is called the monomial expansion of the given polynomial. We may now write R[X 1 , . . . , X n ] for S. A polynomial j∈J n r j X j may be conveniently abbreviated as P or as P(X ) or as P(X 1 , . . . , X n ) when its monomial expansion is either understood or irrelevant. The degree of the 0 polynomial is deﬁned for this section to be −∞, and the degree of any monomial r X j with r = 0 is deﬁned to be the integer | j| = j1 + · · · + jn

if j = ( j1 , . . . , jn ).

Finally the degree of any nonzero polynomial P, denoted by deg P, is deﬁned to be the maximum of the degrees of the terms in its monomial expansion. If all the nonzero monomials in the monomial expansion of a polynomial P have the same degree d, then P is said to be homogeneous of degree d. Under these deﬁnitions the 0 polynomial has degree −∞ but is homogeneous of every degree. If P and Q are homogeneous polynomials of degrees d and d , then P Q is homogeneous of degree dd (and possibly equal to the 0 polynomial). In any event, by grouping terms in the monomial expansion of a polynomial according to their degree, we see that every polynomial is uniquely the sum of nonzero homogeneous polynomials of distinct degrees. Let us call this the homogeneous-polynomial expansion of the given polynomial. Let us expand two such nonzero polynomials P and Q in this fashion, writing P = Pd1 +· · ·+Pdk

5. Polynomials and Vector Spaces

155

and Q = Q d1 + · · · + Q dl with d1 < · · · < dk and d1 < · · · < dl . Then we see directly that deg(P + Q) ≤ max(deg P, deg Q), deg(P Q) ≤ deg P + deg Q. In the formula for deg(P + Q), the term that is potentially of largest degree is Pdk + Q dl , and it is of degree max(deg P, deg Q) if deg P = deg Q. In the formula for deg(P Q), the term that is potentially of largest degree is Pdk Q dl . It is homogeneous of degree dk + dl , but it could be 0. Some proof is required that it is not 0 if R is an integral domain, as follows. Proposition 4.29. If R is an integral domain, then R[X 1 , . . . , X n ] is an integral domain. PROOF. Let P and Q be nonzero homogeneous polynomials with deg P = d and deg Q = d . We are to prove that P Q = 0. We introduce an ordering on the set of all members j of J n , saying j = ( j1 , . . . , jn ) > j = ( j1 , . . . , jn ) if there i < k and jk > jk . In the monomial expansion is some k such that ji = ji for j of P as P(X ) = | j|=d a j X , let i be the largest n-tuple j in the ordering such that a j = 0. Similarly with Q(X ) = | j |=d b j X j , let i be the largest n-tuple j in the ordering such that b j = 0. Then a j b j X j+ j , P(X )Q(X ) = ai bi X i+i + j, j with ( j, j ) =(i,i )

and all terms in the sum j, j on the right side have j + j < i + i . Thus ai bi X i+i is the only term in the monomial expansion of P(X )Q(X ) involving the monomial X i+i . Since R is an integral domain and ai and bi are nonzero, ai bi is nonzero. Thus P(X )Q(X ) is nonzero. Proposition 4.30. Let R be a nonzero commutative ring with identity, let R[X 1 , . . . , X n ] be the ring of polynomials in n indeterminates, and deﬁne ι : R → R[X 1 , . . . , X n ] to be the identiﬁcation of R with constant polynomials. If T is any commutative ring with identity, if ϕ : R → T is a homomorphism of rings sending 1 into 1, and if t1 , . . . , tn are in T , then there exists a unique homomorphism : R[X 1 , . . . , X n ] → T carrying identity to identity such that (ι(r )) = ϕ(r ) for all r ∈ R and (X j ) = t j for 1 ≤ j ≤ n. REMARKS. The mapping is called the substitution homomorphism extending ϕ and substituting t j for X j for 1 ≤ j ≤ n, and the mapping is written P(X 1 , . . . , X n ) → P ϕ (t1 , . . . , tn ). The notation means that ϕ is to be applied to each coefﬁcient of P and then X 1 , . . . , X n are to be replaced by t1 , . . . , tn .

156

IV. Groups and Group Actions

A diagram of this homomorphism as a universal mapping property appears in Figure 4.8. In the special case that T = R × · · · × R (cf. Example 3 of homomorphisms in Section 4) and ϕ is the identity, reduces to evaluation at (t1 , . . . , tn ), and the mapping is written P(X 1 , . . . , X n ) → P(t1 , . . . , tn ). R ⏐ ⏐ ι

ϕ

−−−→ T

R[X 1 , . . . , X n ] FIGURE 4.8. Substitution homomorphism for polynomials in n indeterminates. j j PROOF. If P(X 1 , . . . , X n ) = j1 ≥0,..., jn ≥0 a j1 ,..., jn X 11 · · · X nn is the monomial expansion of a member P of R[X 1 , . . . , X n ], then (P) is deﬁned to be the cor j j responding ﬁnite sum j1 ≥0,..., jn ≥0 a j1 ,..., jn t1 1 · · · tn n . Existence readily follows, and uniqueness follows since ι(R) and X 1 , . . . , X n generate R[X 1 , . . . , X n ] and since is a homomorphism. Corollary 4.31. If R is a nonzero commutative ring with identity, then R[X 1 , . . . , X n−1 ][X n ] is isomorphic as a ring to R[X 1 , . . . , X n ]. REMARK. The proof will show that the isomorphism is the expected one. PROOF. In the notation with n-tuples and J n , any (n − 1)-tuple may be identiﬁed with an n-tuple by adjoining 0 as its n th coordinate, and in this way, every monomial in R[X 1 , . . . , X n−1 ] can be regarded as a monomial in R[X 1 , . . . , X n ]. The extension of this mapping to sums gives us a one-one homomorphism of rings ι : R[X 1 , . . . , X n−1 ] → R[X 1 , . . . , X n ]. We are going to use Proposition 4.25 to prove the isomorphism of rings R[X 1 , . . . , X n−1 ][X n ] ∼ = R[X 1 , . . . , X n ]. In the notation of that proposition, the role of R is played by R[X 1 , . . . , X n−1 ], we take S = R[X 1 , . . . , X n ], and we have constructed ι . We are to show that (S, ι , X n ) satisﬁes a certain universal mapping property. Thus suppose that T is a commutative ring with identity, that t is in T , and that ϕ : R[X 1 , . . . , X n−1 ] → T is a homomorphism of rings carrying identity to identity. We shall apply Proposition 4.30 in order to obtain the desired homomorphism : S → T . Let ιn−1 : R → R[X 1 , . . . , X n−1 ] be the identiﬁcation of R with constant polynomials in R[X 1 , . . . , X n−1 ], and let ιn = ι ◦ ιn−1 be the identiﬁcation of R with constant polynomials in S. Deﬁne ϕ : R → T by ϕ = ϕ ◦ιn−1 , and take tn = t and t j = ϕ (X j ) for 1 ≤ j ≤ n−1. Then Proposition 4.30 produces a homomorphism of rings : S → T with (ιn (r )) = ϕ(r ) for r ∈ R, (ι (X j )) = ϕ (X j ) for 1 ≤ j ≤ n − 1, and (X n ) = tn . The equations (ι (ιn−1 (r ))) = (ιn (r )) = ϕ(r ) = ϕ (ιn−1 (r )) and

(ι (X j )) = ϕ (X j )

5. Polynomials and Vector Spaces

157

show that ◦ ι = ϕ on R[X 1 , . . . , X n ]. Also, (X n ) = tn = t. Thus the mapping sought by Proposition 4.25 exists. It is unique since R[X 1 , . . . , X n−1 ] and X n together generate S. The conclusion from Proposition 4.25 is that S is isomorphic to R[X 1 , . . . , X n−1 ][X n ] via the expected isomorphism of rings. We conclude the discussion of polynomials in several variables by making the notion of a polynomial function of several variables rigorous. If P(X 1 , . . . , X n ) is a polynomial in n indeterminates with coefﬁcients in the commutative ring R with identity, then Proposition 4.30 gives us an evaluation homomorphism P → P(r1 , . . . , rn ) for each n-tuple (r1 , . . . , rn ) of members of R. The function (r1 , . . . , rn ) → P(r1 , . . . , rn ) from R × · · · × R into R is the polynomial function associated to the polynomial P. This function is a member of the commutative ring of all R-valued functions on R × · · · × R, and the mapping P → (r1 , . . . , rn ) → P(r1 , . . . , rn ) is a homomorphism of rings. Corollary 4.32. If R is an inﬁnite integral domain, then the ring homomorphism of R[X 1 , . . . , X n ] to polynomial functions from R × · · · × R to R, given by evaluation, is one-one. REMARK. This result extends Proposition 4.28 to several indeterminates. PROOF. We proceed by induction on n, the case n = 1 being handled by Proposition 4.28. Assume the result for n − 1 indeterminates. If P = 0 is in R[X 1 , . . . , X n ], Corollary 4.31 allows us to write P(X 1 , . . . , X n ) =

k

Pi (X 1 , . . . , X n−1 )X ni

i=1

for some k, with each Pi in R[X 1 , . . . , X n−1 ] and with Pk (X 1 , . . . , X n−1 ) = 0. By the inductive hypothesis, Pk (r1 , . . . , rn−1 ) is nonzero for some elements k Pi (r1 , . . . , rn−1 )X ni in R[X n ] is not r1 , . . . , rn−1 of R. So the polynomial i=0 the 0 polynomial, and Proposition 4.28 shows that it is not 0 when evaluated at some rn . Then P(r1 , . . . , rn ) = 0. It is possible also to introduce polynomial rings in inﬁnitely many variables. These will play roles only as counterexamples in this book, and thus we shall not stop to treat them in detail. We complete this section with some remarks about vector spaces. The deﬁnition of a vector space over a general ﬁeld F remains the same as in Section II.1, where F is assumed to be Q or R or C. We shall make great use of the fact that all the results in Chapter II concerning vector spaces remain valid when Q or R or

158

IV. Groups and Group Actions

C is replaced by a general ﬁeld F. The proofs need no adjustments, and it is not necessary to write out the details. For the moment we make only the following application of vector spaces over general ﬁelds, but the extended theory of vector spaces will play an important role in most of the remaining chapters of this book. Proposition 4.33. If F is a ﬁnite ﬁeld, then the number of elements in F is a power of a prime. REMARK. We return to this matter in Chapter IX, showing at that time that for each prime power p n > 1, there is one and only one ﬁeld with p n elements, up to isomorphism. PROOF. The characteristic of F cannot be 0 since F is ﬁnite, and hence it is some prime p. Denote the prime ﬁeld of F by F p . By restricting the multiplication so that it is deﬁned only on F p × F, we make F into a vector space over F p , necessarily ﬁnite-dimensional. Proposition 2.18 shows that F is isomorphic as a vector space to the space (F p )n of n-dimensional column vectors for some n, and hence F must have p n elements.

6. Group Actions and Examples Let X be a nonempty set, let F(X ) be the group of invertible functions from X onto itself, the group operation being composition, and let G be a group. A group action of G on X is a homomorphism of G into F(X ). Examples 5–9 of groups in Section 1 were in fact subgroups of various groups F(X ) and are therefore examples of group actions. Thus every group of permutations of {1, . . . , n}, every dihedral group acting on R2 , and every general linear group or subgroup acting on a ﬁnite-dimensional vector space over Q or R or C or an arbitrary ﬁeld F provides an example. So do the orthogonal and unitary groups acting on Rn and Cn , as well as the automorphism group of any number ﬁeld. We saw an indication in Section 1 that many early examples of groups arose in this way. One source of examples that is of some importance and was not listed in Section 1 occurs in the geometry of R2 . The translations in R2 , together with the rotations about arbitrary points of R2 and the reﬂections about arbitrary lines in R2 , form a group G of rigid motions of the plane.11 This group G is a subgroup of F(R2 ), and thus G acts on R2 . More generally, whenever a nonempty set X has a notion of distance, the set of isometries of X , i.e., the distance-preserving members of F(X ), forms a subgroup of F(X ), and thus the group of isometries of X acts on X . 11 One

can show that G is the full group of rigid motions of R2 , but this fact will not concern us.

6. Group Actions and Examples

159

At any rate a group action τ of G on X , being a homomorphism of G into F(X ), is of the form g → τg , where τg is in F(X ) and τg1 g2 = τg1 τg2 . There is an equivalent way of formulating matters that does not so obviously involve the notion of a homomorphism. Namely, we write τg (x) = gx. In this notation the group action becomes a function G × X → X with (g, x) → gx such that (i) (g1 g2 )x = g1 (g2 x) for all g1 and g2 in G and for all x in X (from the fact that τg1 g2 = τg1 τg2 ), (ii) 1x = x for all x in X (from the fact that τ1 = 1). Conversely if G × X → X satisﬁes (i) and (ii), then the formulas x = 1x = (gg −1 )x = g(g −1 x) and x = 1x = (g −1 g)x = g −1 (gx) show that the function x → gx from X to itself is invertible with inverse x → g −1 x. Consequently the deﬁnition τg (x) = gx makes g → τg a function from G into F(X ), and (i) shows that τ is a homomorphism. Thus (i) and (ii) indeed give us an equivalent formulation of the notion of a group action. Both formulations are useful. Quite often the homomorphism G → F(X ) of a group action is one-one, and then G can be regarded as a subgroup of F(X ). Here is an important geometric example in which the homomorphism is not one-one. EXAMPLE. Linear fractional transformations. Let X = C ∪ {∞}, a set that becomes the Riemann sphere in complex analysis. The group G = GL(2, C) acts on X by the linear fractional transformations

a c

b d

(z) =

az + b , cz + d

the understanding being that the image of ∞ is ac−1 and the image of −dc−1 is ∞, just as if we were to pass to a limit in each case. Property (ii) of a group action is clear. To verify (i), we simply calculate that

a c

b d

a c

b d

a az+b cz+d + b (z) = az+b c cz+d + d (a a + b c)z + (a b + b d) (c a + d c)z + (c b + d d)

a b a b = (z), c d c d

=

and indeed we have a group action. Let SL(2, R) be the subgroup of real matrices in GL(2, C) of determinant 1, and let Y be the subset of X where Im z > 0, not

IV. Groups and Group Actions

160

including ∞. The members of SL(2, R) carry the subset Y into itself, as we see from the computation (az + b)(c¯z + d) adz + bc¯z az + b = Im = Im Im 2 cz + d |cz + d| |cz + d|2 =

(ad − bc) Im z Im z = . 2 |cz + d| |cz + d|2

Since the effect of a matrix g −1 is to invert the effect of g, and since both g and g −1 carry Y to itself, we conclude that SL(2, R) acts on Y = {z ∈ C | Im z > 0} by linear fractional transformations. In similar fashion one can verify that the subgroup %

α β 2 2 − |β| = 1 α ∈ C, β ∈ C, |α| SU(1, 1) = β¯ α¯ of GL(2, C) acts on {z ∈ C | |z| < 1} by linear fractional transformations. One group action can yield many others. For example, from an action of G on X , we can construct an action on the space of all complex-valued functions on X . The deﬁnition is (g f )(x) = f (g −1 x), the use of the inverse being necessary in order to verify property (i) of a group action: ((g1 g2 ) f )(x) = f ((g1 g2 )−1 x) = f ((g2−1 g1−1 )x) = f (g2−1 (g1−1 x)) = (g2 f )(g1−1 x) = (g1 (g2 f ))(x). There is nothing special about the complex numbers as range for the functions here. We can allow any set as range, and we can even allow G to act on the range, as well as on the domain.12 If G acts on X and Y , then the set of functions from X to Y inherits a group action under the deﬁnition (g f )(x) = g( f (g −1 x)), as is easily checked. In other words, we are to use g −1 where the domain enters the formula and we are to use g where the range enters the formula. If V is a vector space over a ﬁeld F, a representation of G on V is a group action of G on V by linear functions. Speciﬁcally for each g ∈ G, τg is to be a member of the group of linear maps from V into itself. Usually one writes τ (g) instead of τg in representation theory, and thus the condition is that τ (g) is to be linear for each g ∈ G and we are to have τ (1) = 1 and τ (g1 g2 ) = τ (g1 )τ (g2 ) for all g1 and g2 . There are interesting examples both when V is ﬁnite-dimensional and when V is inﬁnite-dimensional.13 12 When C was used as range in the previous display, the group action of G on C was understood to be trivial in the sense that gz = z for every g in G and z in C. 13 In some settings a continuity assumption may be added to the deﬁnition of a representation, or the ﬁeld F may be restricted in some way. We impose no such assumption here at this time.

6. Group Actions and Examples

161

EXAMPLES OF REPRESENTATIONS. (1) If m ≥ 1, then the additive group Z/mZ acts linearly on R2 by

τ (k) =

− sin 2πk m cos 2πk m

cos 2πk m sin 2πk m

,

k ∈ {0, 1, 2, . . . , m − 1}.

Each τ (k) is a rotation matrix about the origin through an angle that is a multiple of 2π/m. These transformations of R2 form a subgroup of the group of symmetries of a regular k-gon centered at the origin in R2 . (2) The dihedral group D3 acts linearly on R2 with τ (1) =

10 01

, τ (2 3) =

1 0 0 −1

τ (1 2 3) =

, τ (1 2) =

− 12 − √

3 2

√ 3 2

− 12

− 12

√ 3 2

√

3 2

1 2

, τ (1 3) =

, τ (1 3 2) =

− 12

√ − 23

√ 3 2

− 12 − −

√ 3 2

√ 3 2 1 2

,

− 12

.

Each of these matrices carries into itself the equilateral triangle with center at the origin and one vertex at (1, 0). To obtain these matrices, we number the vertices #1, #2, #3 counterclockwise with the vertex at (1, 0) as #1. (3) The symmetric group Sn acts linearly on Rn by permuting the indices of standard basis vectors. For example, with n = 3, we have (1 3)e1 = e3 , (1 3)e2 = e2 , etc. The matrices may be computed by the techniques of Section II.3. With n = 3, we obtain, for example, (1 3) →

0 0 1 010 100

and

(1 2 3) →

0 0 1 100 010

.

(4) If G acts on a set X , then the corresponding action (g f )(x) = f (g −1 x) on complex-valued functions is a representation on the vector space of all complexvalued functions on X . This vector space is inﬁnite-dimensional if X is an inﬁnite set. The linearity of the action on functions follows from the deﬁnitions of addition and scalar multiplication of functions. In fact, let functions f 1 and f 2 be given, and let c be a scalar. Then (g( f 1 + f 2 ))(x) = ( f 1 + f 2 )(g −1 x) = f 1 (g −1 x) + f 2 (g −1 x) = (g f 1 )(x) + (g f 2 )(x) = (g f 1 + g f 2 )(x) and (g(c f 1 ))(x) = (c f 1 )(g −1 x) = c( f 1 (g −1 x)) = c((g f 1 )(x)) = (c(g f 1 ))(x).

162

IV. Groups and Group Actions

One more important class of group actions consists of those that are closely related to the structure of the group itself. Two simple ones are the action of G on itself by left translations (g1 , g2 ) → g1 g2 and the action of G on itself by right translations (g1 , g2 ) → g2 g1−1 . More useful is the action of G on a quotient space G/H , where H is a subgroup. This action is given by (g1 , g2 H ) → g1 g2 H . There are still others, and some of them are particularly handy in analyzing ﬁnite groups. We give some applications in the present section and the next, and we postpone others to Section 10. Before describing some of these actions in detail, let us make some general deﬁnitions and establish two easy results. Let G × X → X be a group action. If p is in X , then G p = {g ∈ G | gp = p} is a subgroup of G called the isotropy subgroup at p. This is not always a normal subgroup; however, the subgroup p∈G G p that ﬁxes all points of X is the kernel of the homomorphism G → F(X ) deﬁning the group action, and such a kernel has to be normal. Let p and q be in X . We say that p is equivalent to q for the purposes of this paragraph if p = gq for some g ∈ G. The result is an equivalence relation: it is reﬂexive since p = 1 p, it is symmetric since p = gq implies g −1 p = q, and it is transitive since p = gq and q = g r together imply p = (gg )r . The equivalence classes are called orbits of the group action. The orbit of a point p in X is Gp = {gp | g ∈ G}. If Y = Gp is an orbit, or more generally if Y is any subset of X carried to itself by every element of G, then G × Y → Y is a group action. In fact, each function y → gy is invertible on Y with y → g −1 y as the inverse function, and properties (i) and (ii) of a group action follow from the same properties for X . A group action G × X → X is said to be transitive if there is just one orbit, hence if X = Gp for each p in X . It is simply transitive if it is transitive and if for each p and q in X , there is just one element g of G with gp = q. Proposition 4.34. Let G × X → X be a group action, let p be in X , and let H be the isotropy subgroup at p. Then the map G → Gp given by g → gp descends to a well-deﬁned map G/H → Gp that is one-one from G/H onto the orbit Gp and respects the group actions. REMARK. In other words, a group action of G on a single orbit is always isomorphic as a group action to the action of G on some quotient space G/H . PROOF. Let ϕ : G → Gp be deﬁned by ϕ(g) = gp. For h in H = G p , ϕ(gh) = (gh) p = g(hp) = gp = ϕ(g) shows that ϕ descends to a well-deﬁned function ϕ : G/H → Gp, and ϕ is certainly onto Gp. If ϕ(g1 H ) = ϕ(g2 H ), then g1 p = ϕ(g1 p) = ϕ(g2 p) = g2 p, and hence g2−1 g1 p = p, g2−1 g1 is in H , g1 is in g2 H , and g1 H = g2 H . Thus ϕ is one-one. Respecting the group action means that ϕ(gg H ) = gϕ(g H ), and this identity holds since gϕ(g H ) = gϕ(g ) = g(g p) = (gg ) p = ϕ(gg ) = ϕ(gg H ).

6. Group Actions and Examples

163

A simple consequence is the following important counting formula in the case of a group action by a ﬁnite group. Corollary 4.35. Let G be a ﬁnite group, let G × X → X be a group action, let p be in X , and G p be the isotropy group at p, and let Gp be the orbit of p. Then |G| = |Gp| |G p |. PROOF. Proposition 4.34 shows that the action of G on some G/G p is the most general group action on a single orbit, G p being the isotropy subgroup. Thus the corollary follows from Lagrange’s Theorem (Theorem 4.7) with H = G p and G/H = Gp. We turn to applications of group actions to the structure of groups. If H is a subgroup of a group G, the index of H in G is the number of elements in G/H , ﬁnite or inﬁnite. The ﬁrst application notes a situation in which a subgroup of a ﬁnite group is automatically normal. Proposition 4.36. Let G be a ﬁnite group, and let p be the smallest prime dividing the order of G. If H is a subgroup of G of index p, then H is normal. REMARKS. The most important case is p = 2: any subgroup of index 2 is automatically normal, and this conclusion is valid even if G is inﬁnite, as was already pointed out in Example 3 of Section 2. If G is ﬁnite and if 2 divides the order of G, there need not, however, be any subgroup of index 2; for example, the alternating group A4 has order 12, and Problem 11 at the end of the chapter shows that A4 has no subgroup of order 6. PROOF. Let X = G/H , and restrict the group action G × X → X to an action H × X → X . The subset {1H } is a single orbit under H , and the remaining p − 1 members of G/H form a union of orbits. Corollary 4.35 shows that the number of elements in an orbit has to be a divisor of |H |, and the smallest divisor of |H | other than 1 is ≥ p since the smallest divisor of |G| other than 1 equals p and since |H | divides |G|. Hence any orbit of H containing more than one element has at least p elements. Since only p − 1 elements are left under consideration, each orbit under H contains only one element. Therefore hg H = g H for all h in H and g in G. Then g −1 hg is in H , and we conclude that H is normal. If G is a group, the center Z G of G is the set of all elements x such that gx = xg for all g in G. The center of G is a subgroup (since gx = xg and gy = yg together imply g(x y) = xgy = (x y)g and xg −1 = g −1 (gx)g −1 = g −1 (xg)g −1 = g −1 x), and every subgroup of the center is normal since x ∈ Z G and g ∈ G together imply gxg −1 = x. Here are examples: the center of a group G is G itself if and only if G is abelian, the center of the quaternion group H8 is {±1}, and the center of any symmetric group Sn with n ≥ 3 is {1}.

164

IV. Groups and Group Actions

If x is in G, the centralizer of x in G, denoted by Z G (x), is the set of all g such that gx = xg. This is a subgroup of G, and it equals G itself if and only if x is in the center of G. For example the centralizer of i in H8 is the 4-element subgroup {±1, ±i}. Having made these deﬁnitions, we introduce a new group action of G on G, namely (g, x) → gxg −1 . The orbits are called the conjugacy classes of G. If x and y are two elements of G, we say that x is conjugate to y if x and y are in the same conjugacy class. In other words, x is conjugate to y if there is some g in G with gxg −1 = y. The result is an equivalence relation. Let us write C (x) for the conjugacy class of x. We can easily compute the isotropy subgroup G x at x under this action; it consists of all g ∈ G such that gxg −1 = x and hence is exactly the centralizer Z G (x) of x in G. In particular, C (x) = {x} if and only if x is in the center Z G . Applying Corollary 4.35, we immediately obtain the following result. Proposition 4.37. If G is a ﬁnite group, then |G| = | C (x)| |Z G (x)| for all x in G. Thus | C (x)| is always a divisor of |G|, and it equals 1 if and only if x is in the center Z G . Let us apply these considerations to groups whose order is a power of a prime. Corollary 4.38. If G is a ﬁnite group whose order is a positive power of a prime, then the center Z G is not {1}. PROOF. Let |G| = p n with p prime and with n > 0. The conjugacy classes of G exhaust G, and thus the sum of all | C (x)|’s equals |G|. Since | C (x)| = 1 if and only if x is in Z G , the sum of |Z G | and all the | C (x)|’s that are not 1 is equal to |G|. All the terms | C (x)| that are not 1 are positive powers of p, by Proposition 4.37, and so is |G|. Therefore p divides |Z G |. Corollary 4.39. If G is a ﬁnite group of order p 2 with p prime, then G is abelian. PROOF. From Corollary 4.38 we see that either |Z G | = p 2 , in which case G is abelian, or |Z G | = p. We show that the latter is impossible. If fact, if x is not in Z G , then Z G (x) is a subgroup of G that contains Z G and the element x. It must then have order p 2 and be all of G. Hence every element of G commutes with x, and x is in Z G , contradiction. Corollary 4.40. If G is a ﬁnite group whose order is a positive power p n of a prime p, then there exist normal subgroups G k of G for 0 ≤ k ≤ n such that |G| = p k for all k ≤ n and such that G k ⊆ G k+1 for all k < n.

6. Group Actions and Examples

165

PROOF. We proceed by induction on n. The base case of the induction is n = 1 and is handled by Corollary 4.9. Assume inductively that the result holds for n, and let G have order p n+1 . Corollary 4.39 shows that Z G = {1}. Any element = 1 in Z G must have order a power of p, and some power of it must therefore have order p. Thus let a be an element of Z G of order p, and let H be the subgroup consisting of the powers of a. Then H is normal and has order p. Let G = G/H be the quotient group, and let ϕ : G → G be the quotient homomorphism. The group G has order p n , and the inductive hypothesis shows that G has normal subgroups G k for 0 ≤ k ≤ n such that |G k | = p k for k ≤ n and G k ⊆ G k+1 for k ≤ n − 1. For 1 ≤ k ≤ n + 1, deﬁne G k = ϕ −1 (G k−1 ), and let G 0 = {1}. The First Isomorphism Theorem (Theorem 4.13) shows that each G k for k ≥ 1 is a normal subgroup of G containing H and that ϕ(G k ) = G k−1 . Then ϕ G k is a homomorphism of G k onto G k−1 with kernel H , and hence |G k | = |G k−1 | |H | = p k−1 p = p k . Therefore the G k ’s will serve as the required subgroups of G. It is not always so easy to determine the conjugacy classes in a particular group. For example, in GL(n, C) the question of conjugacy is the question whether two matrices are similar in the sense of Section II.3; this will be one of the main problems addressed in Chapter V. By contrast, the problem of conjugacy in symmetric groups has a simple answer. Recall that every permutation is uniquely the product of disjoint cycles. The cycle structure of a permutation consists of the number of cycles of each length in this decomposition. Lemma 4.41. Let σ and τ be members of the symmetric group Sn . If σ is expressed as the product of disjoint cycles, then τ σ τ −1 has the same cycle structure as σ , and the expression for τ σ τ −1 as the product of disjoint cycles is obtained from that for σ by substituting τ (k) for k throughout. . For example, ifσ = (a b)(c d e), then τ σ τ −1 decomposes as REMARK τ (a) τ (b) τ (c) τ (d) τ (e) . PROOF. Because the conjugate of a product equals the product of the conjugates, it is enough to handle a cycle γ = (a1 a2 · · · an ) appearing in σ . The corresponding cycle γ = τ γ τ −1 is asserted to be γ = (τ (a1 ) τ (a2 ) · · · τ (an )). Application of τ −1 to τ (a j ) yields a j , application of σ to this yields a j+1 if j < n and a1 if j = n, and application of τ to the result yields τ (a j+1 ) or τ (a1 ). For each of the symbols b not in the list {a1 , . . . , an }, τ γ τ −1 (τ (b)) = τ (b) since γ (b) = b. Thus τ γ τ −1 = γ , as asserted. Proposition 4.42. Let H be a subgroup of a symmetric group Sn . If C (x) denotes a conjugacy class in H , then all members of C (x) have the same cycle

166

IV. Groups and Group Actions

structure. Conversely if H = Sn , then the conjugacy class of a permutation σ consists of all members of Sn having the same cycle structure as σ . PROOF. The ﬁrst conclusion is immediate from Lemma 4.41. For the second conclusion, let σ and σ have the same cycle structure, and let τ be the permutation that moves, for each k, the k th symbol appearing in the disjoint-cycle expansion of σ into the k th symbol in the corresponding expansion of σ . Deﬁne τ on the remaining symbols in any fashion at all. Application of the lemma shows that τ σ τ −1 = σ . Thus any two permutations with the same cycle structure are conjugate. 7. Semidirect Products One more application of group actions to the structure theory of groups will be to the construction of “semidirect products” of groups. If H is a group, then an isomorphism of H with itself is called an automorphism. The set of automorphisms of H is a group under composition, and we denote it by Aut H . We are going to be interested in “group actions by automorphisms,” i.e., group actions of a group G on a space X when X is itself a group and the action by each member of G is an automorphism of the group structure of X ; the group action is therefore a homomorphism of the form τ : G → Aut X . EXAMPLE 1. In R2 , we can identify the additive group of the underlying vector space with the group of translations v (w) = v + w; the identiﬁcation associates a translation with the member (0) of R2 . Let H be the group of translations. about the origin in R2 , namely the linear maps with The rotations cos θ sin θ matrices − sin θ cos θ , form a group G = SO(2) that acts on R2 , hence acts on the set H of translations. The linearity of the rotations says that the action of G = SO(2) on the translations is by automorphisms of H , i.e., that each rotation, in its effect on G, is in Aut H . Out of these data—the two groups G and H and a homomorphism of G into Aut H —we will construct below what amounts to the group of all rotations (about any point) and translations of R2 . The construction is that of a “semidirect product.” EXAMPLE 2. Take any group G, and let G act on X = G by conjugation. Each conjugation x → gxg −1 is an automorphism of G, and thus the action of G on itself by conjugation is an action by automorphisms. Let G and H be groups. Suppose that a group action τ : G → F(H ) is given with G acting on H by automorphisms. That is, suppose that each map h → τg (h) is an automorphism of H . We deﬁne a group G ×τ H whose underlying set will be the Cartesian product G × H . The motivation for the deﬁnition of multiplication

7. Semidirect Products

167

comes from Example 2, in which τg (h) = ghg −1 . We want to write a product g1 h 1 g2 h 2 in the form g h , and we can do so using the formula g1 h 1 g2 h 2 = g1 g2 (g2−1 h 1 g2 )h 2 = g1 g2 (τg−1 (h 1 ))h 2 . 2

Similarly the formula for inverses is motivated by the formula (gh)−1 = h −1 g −1 = g −1 (gh −1 g −1 ) = g −1 τg (h −1 ). Proposition 4.43. Let G and H be groups, and let τ be a group action of G on H by automorphisms. Then the set-theoretic product G × H becomes a group G ×τ H under the deﬁnitions (g1 , h 1 )(g2 , h 2 ) = (g1 g2 , (τg−1 (h 1 ))h 2 ) 2

(g, h)

and

−1

−1

= (g , τg (h −1 )).

The mappings i 1 : G → G ×τ H and i 2 : H → G ×τ H given by i 1 (g) = (g, 1) and i 2 (h) = (1, h) are one-one homomorphisms, and p2 : G ×τ H → G given by p2 (g, h) = g is a homomorphism onto G. The images G = i 1 (G) and H = i 2 (H ) are subgroups of G ×τ H with H normal such that G ∩ H = {1}, such that every element of G ×τ H is the product of an element of G and an element of H , and such that conjugation of G on H is given by i 1 (g)i 2 (h)i 1 (g)−1 = i 2 (τg (h)). REMARK. The group G ×τ H is called the external semidirect product14 of G and H with respect to τ . PROOF. For associativity we compute directly that (g1 , h 1 )(g2 , h 2 ) (g3 , h 3 ) = (g1 g2 g3 , τg−1 (τg−1 (h 1 )h 2 )h 3 ) 3 2 and (g1 , h 1 ) (g2 , h 2 )(g3 , h 3 ) = (g1 g2 g3 , τg−1 g−1 (h 1 )τg−1 (h 2 )h 3 ). 3

2

3

Since τg−1 (τg−1 (h 1 )h 2 ) = (τg−1 τg−1 (h 1 ))τg−1 (h 2 ) = τg−1 g−1 (h 1 )τg−1 (h 2 ), 3

2

3

2

3

3

2

3

we have a match. It is immediate that (1, 1) is a two-sided identity. Since (g, h)(g −1 , τg (h −1 )) = (1, τg (h)τg (h −1 )) = (1, τg (hh −1 )) = (1, τg (1)) = (1, 1) and (g −1 , τg (h −1 ))(g, h) = (1, τg−1 (τg (h −1 ))h) = (1, τ1 (h −1 )h) = (1, 1), (g −1 , τg (h −1 )) is indeed a two-sided inverse of (g, h). It is immediate from the deﬁnition of multiplication that i 1 , i 2 , and p2 are homomorphisms, that i 1 and i 2 are one-one, that p2 is onto, that G ∩ H = {1}, and that G ×τ H = G H . Since i 1 and i 2 are homomorphisms, G and H are subgroups. Since H is the kernel of p2 , H is normal. Finally the deﬁnition of multiplication gives i 1 (g)i 2 (h)i 1 (g)−1 = (g, h)(g −1 , 1) = (1, (τg (h))1) = i 2 (τg (h)), and the proof is complete. 14 The

notation is used by some authors in place of ×τ .

168

IV. Groups and Group Actions

Proposition 4.44. Let S be a group, let G and H be subgroups with H normal, and suppose that G ∩ H = {1} and that every element of S is the product of an element of G and an element of H . For each g ∈ G, deﬁne an automorphism τg of H by τg (h) = ghg −1 . Then τ is a group action of G on H by automorphisms, and the mapping G ×τ H → S given by (g, h) → gh is an isomorphism of groups. REMARKS. In this case we call S an internal semidirect product of G and H with respect to τ . We shall not attempt to write down a universal mapping property that characterizes internal semidirect products. PROOF. Since τg1 g2 (h) = g1 g2 hg2−1 g1−1 = g1 τg2 (h)g1−1 = τg1 τg2 (h) and since each τg is an automorphism of H , τ is an action by automorphisms. Proposition 4.43 therefore shows that G ×τ H is a well-deﬁned group. The function ϕ from G ×τ H to S given by ϕ(g, h) = gh is a homomorphism by the same computation that motivated the deﬁnition of multiplication in a semidirect product, and ϕ is onto S since every element of S lies in the set G H of products. If gh = 1, then g = h −1 exhibits g as in G ∩ H = {1}. Hence g = 1 and h = 1. Therefore ϕ is one-one and must be an isomorphism. EXAMPLE 1. Dihedral groups Dn . We show that Dn is the internal semidirect product of a 2-element group and the rotation subgroup. Let H be the group of rotations about the origin through multiples of the angle 2π/n. This group is cyclic of order n, and it is normal in Dn because it is of index 2. If s is any of the reﬂections in Dn , then G = {1, s} is a subgroup of Dn of order 2 with G ∩ H = {1}. Counting the elements, we see that every element of Dn is of the form r k or sr k , in other words that the set of products G H is all of Dn . Thus Proposition 4.44 shows that Dn is an (internal) semidirect product of G and H with respect to some τ : G → Aut H . To understand the homomorphism τ , let us write the members of H as the powers of r , where r is rotation counterclockwise about the origin through the angle 2π/n. For the reﬂection s (or indeed for any reﬂection in Dn ), a look at the geometry shows that sr k s −1 = r −k for all k. In other words, the automorphism τ (1) leaves each element of H ﬁxed while τ (s) sends each k mod n to −k mod n. The map that sends each element of a cyclic group to its group inverse is indeed an automorphism of the cyclic group, and thus τ is indeed a homomorphism of G into Aut H . EXAMPLE 2. Construction of a nonabelian group of order 21. Let H = C7 , written multiplicatively with generator a, and let G = C3 , written multiplicatively with generator b. To arrange for G to act on H by automorphisms, we make use of a nontrivial automorphism of H of order 3. Such a mapping is a k → a 2k . In fact, there is no doubt that this mapping is an automorphism, and we have to see

7. Semidirect Products

169

that it has order 3. The effect of applying it twice is a k → a 4k , and the effect of applying it three times is a k → a 8k . But a 8k = a k since a 7 = 1, and thus the mapping a k → a 2k indeed has order 3. We send bn into the n th power of this automorphism, and the result is a homomorphism τ : G → Aut H . The semidirect product G ×τ H is certainly a group of order 3 × 7 = 21. To see that it is nonabelian, we observe from the group law in Proposition 4.43 that ab = bτb−1 (a) = ba 4 . Thus ab = ba, and G ×τ H is nonabelian. It is instructive to generalize the construction in Example 2 a little bit. To do so, we need a lemma. Lemma 4.45. If p is a prime, then the automorphisms of the additive group of the ﬁeld F p are the multiplications by the members of the multiplicative group F× p , and consequently Aut C p is isomorphic to a cyclic group C p−1 . PROOF. Let us write Aut F p for the automorphism group of the additive group of F p . Each function ϕa : F p → F p given by ϕa (n) = na, taken modulo p, is in Aut F p as a consequence of the distributive law. We deﬁne a function : Aut F p → F× p by (ϕ) = ϕ(1) for ϕ ∈ Aut F p . Again by the distributive law ϕ(n) = nϕ(1) for every integer n. Thus if ϕ1 and ϕ2 are in Aut F p , then (ϕ1 ◦ ϕ2 ) = (ϕ1 ◦ ϕ2 )(1) = ϕ1 (ϕ2 (1)) = ϕ2 (1)ϕ1 (1), and consequently is a homomorphism. If a member ϕ of Aut F p has (ϕ) = 1 in F× p , then ϕ(1) = 1 and therefore ϕ(n) = nϕ(1) = n for all n. Therefore ϕ is the identity in Aut F p . We conclude that is one-one. If a is given in F× p , then (ϕa ) = ϕa (1) = a, and hence is onto F× . Therefore is an isomorphism of Aut F p and F× p p . By Corollary 4.27, exhibits Aut F p as isomorphic to the cyclic group C p−1 . Proposition 4.46. If p and q are primes with p < q such that p divides q − 1, then there exists a nonabelian group of order pq. REMARKS. For p = 2, the divisibility condition is automatic, and the proof will yield the dihedral group Dq . For p = 3 and q = 7, the condition is that 3 divides 7 − 1, and the constructed group will be the group in Example 2 above. PROOF. Let G = C p with generator a, and let H = Cq . Lemma 4.45 shows that Aut Cq ∼ = Cq−1 . Let b be a generator of Aut Cq . Since p divides q − 1, b(q−1)/ p has order p. Then the map a k → bk(q−1)/ p is a well-deﬁned homomorphism τ of G into Aut H , and it determines a semidirect product S = G ×τ H , by Proposition 4.43. The order of S is pq, and the multiplication is nonabelian since for h ∈ H , we have (a, 1)(1, h) = (a, h) and (1, h)(a, 1) = (a, τa −1 (h)) = (a, b−(q−1)/ p (h)), but b−(q−1)/ p is not the identity automorphism of H because it has order p.

170

IV. Groups and Group Actions

8. Simple Groups and Composition Series A group G = {1} is said to be simple if its only normal subgroups are {1} and G. Among abelian groups the simple ones are the cyclic groups of prime order. Indeed, a cyclic group C p of prime order has no nontrivial subgroups at all, by Corollary 4.9. Conversely if G is abelian and simple, let a = 1 be in G. Then {a n } is a cyclic subgroup and is normal since G is abelian. Thus {a n } is all of G, and G is cyclic. The group Z is not simple, having the nontrivial subgroup 2Z, and the group Z/(r s)Z with r > 1 and s > 1 is not simple, having the multiples of r as a nontrivial subgroup. Thus G has to be cyclic of prime order. The interest is in nonabelian simple groups. We shall establish that the alternating groups An are simple for n ≥ 5, and some other simple groups will be considered in Problems 55–62 at the end of the chapter. Theorem 4.47. The alternating group An is simple if n ≥ 5. PROOF. Let K = {1} be a normal subgroup of An . Choose σ in K with σ = 1 such that σ (i) = i for the maximum possible number of integers i with 1 ≤ i ≤ n. The main step is to show that σ is a 3-cycle. Arguing by contradiction, suppose that σ is not a 3-cycle. Then there are two cases. The ﬁrst case is that the decomposition of σ as the product of disjoint cycles contains a k-cycle for some k ≥ 3. Without loss of generality, we may take the cycle in question to be γ = (1 2 3 · · · ), and then σ = γρ = (1 2 3 · · · )ρ with ρ equal to a product of disjoint cycles not containing the symbols appearing in γ . Being even and not being a 3-cycle, σ moves at least two other symbols besides the three listed ones, say 4 and 5. Put τ = (3 4 5). Lemma 4.41 shows that σ = τ σ τ −1 = γ ρ = (1 2 4 · · · )ρ with ρ not containing any of the symbols appearing in γ . Thus σ σ −1 moves 3 into 4 and cannot be the identity. But σ σ −1 is in K and ﬁxes all symbols other than 1, 2, 3, 4, 5 that are ﬁxed by σ . In addition, σ σ −1 ﬁxes 2, and none of 1, 2, 3, 4, 5 is ﬁxed by σ . Thus σ σ −1 is a member of K other than the identity that ﬁxes fewer symbols than σ , and we have arrived at a contradiction. The second case is that σ is a product σ = (1 2)(3 4) · · · of disjoint transpositions. There must be at least two factors since σ is even. Put τ = (1 2)(4 5), the symbol 5 existing since the group An in question has n ≥ 5. Then σ = (1 2)(3 5) · · · . Since σ σ −1 carries 4 into 5, σ σ −1 is a member of K other than the identity. It ﬁxes all symbols other than 1, 2, 3, 4, 5 that are ﬁxed by σ , and in addition it ﬁxes 1 and 2. Thus σ σ −1 ﬁxes more symbols than σ does, and again we have arrived at a contradiction. We conclude that K contains a 3-cycle, say (1 2 3). If i, j, k, l, m are ﬁve arbitrary symbols, then we can construct a permutation τ with τ (1) = i, τ (2) = j, τ (3) = k, τ (4) = l, and τ (5) = m. If τ is odd, we replace τ by τ (l m), and the

8. Simple Groups and Composition Series

171

result is even. Thus we may assume that τ is in An and has τ (1) = i, τ (2) = j, and τ (3) = k. Lemma 4.41 shows that τ σ τ −1 = (i j k). Since K is normal, we conclude that K contains all 3-cycles. To complete the proof, we show for n ≥ 3 that every element of An is a product of 3-cycles. If σ is in An , we use Corollary 1.22 to decompose σ as a product of transpositions. Since σ is even, we can group these in pairs. If the members of a pair of transpositions are not disjoint, then their product is a 3-cycle. If they are disjoint, then the identity (1 2)(3 4) = (1 2 3)(2 3 4) shows that their product is a product of 3-cycles. This completes the proof. Let G be a group. A descending sequence G n ⊇ G n−1 ⊇ · · · ⊇ G 1 ⊇ G 0 of subgroups of G with G n = G, G 0 = {1}, and each G k−1 normal in G k is called a normal series for G. The normal series is called a composition series if each inclusion G k ⊇ G k−1 is proper and if each consecutive quotient G k /G k−1 is simple. EXAMPLES. (1) Let G be a cyclic group of order N . A normal series for G consists of certain subgroups of G, all necessarily cyclic by Proposition 4.4. Their respective orders Nn , Nn−1 , . . . , N1 , N0 have Nn = N , N0 = 1, and Nk−1 | Nk for all k. The series is a composition series if and only if each quotient Nk /Nk−1 is prime. In this case the primes that occur are exactly the prime divisors of N , and a prime p occurs r times if pr is the exact power of p that divides N . Thus the consecutive quotients from a composition series of this G, up to isomorphisms, are independent of the particular composition series—though they may arise in a different order. (2) For G = Z, a normal series is of the form Z ⊇ m 1 Z ⊇ m 1 m 2 Z ⊇ m 1 m 2 m 3 Z ⊇ · · · ⊇ 0. The group G = Z has no composition series. (3) For the symmetric group G = S4 , let C2 × C2 refer to the 4-element subgroup {1, (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)}. The series S4 ⊇ A4 ⊇ C2 × C2 ⊇ {1, (1 2)} ⊇ {1} is a composition series, the consecutive quotients being C2 , C3 , C2 , C2 . Each term in the composition series except for {1, (1 2)} is actually normal in the whole group G, but there is no way to choose the 2-element subgroup to make it normal in G. The other two possible choices of 2-element subgroup, which lead to different composition series but with isomorphic consecutive quotients, are obtained by replacing {1, (1 2)} by {1, (1 3)} and again by {1, (1 4)}.

IV. Groups and Group Actions

172

(4) For the symmetric group G = S5 , the series S5 ⊇ A5 ⊇ {1} is a composition series, the consecutive quotients being C2 and A5 . (5) Let G be a ﬁnite group of order p n with p prime. Corollary 4.40 produces a composition series, and this time all the subgroups are normal in G. The successive normal subgroups have orders p k for k = n, n − 1, . . . , 0, and each consecutive quotient is isomorphic to C p . Historically the Jordan–H¨older Theorem addressed composition series for groups, showing that the consecutive quotients, up to isomorphisms, are independent of the particular composition series. They can then consistently be called the composition factors of the group. Finding the composition factors of a particular group may be regarded as a step toward understanding the structure of the group. A generalization of the Jordan–H¨older Theorem due to Zassenhaus and Schreier applies to normal series in situations in which composition series might not exist, such as Example 2 above. We prove the Zassenhaus–Schreier Theorem, and the Jordan–H¨older Theorem is then a special case. Two normal series G m ⊇ G m−1 ⊇ · · · ⊇ G 1 ⊇ G 0 and

Hn ⊇ Hn−1 ⊇ · · · ⊇ H1 ⊇ H0

for the same group G are said to be equivalent normal series if m = n and the order of the consecutive quotients G m /G m−1 , G m−1 /G m−2 , . . . , G 1 /G 0 may be rearranged so that they are respectively isomorphic to Hm /Hm−1 , Hm−1 /Hm−2 , . . . , H1 /H0 . One normal series is said to be a reﬁnement of another if the subgroups appearing in the second normal series all appear as subgroups in the ﬁrst normal series. Lemma 4.48 (Zassenhaus). Let G 1 , G 2 , G 1 , and G 2 be subgroups of a group G with G 1 ⊆ G 1 and G 2 ⊆ G 2 , G 1 normal in G 1 , and G 2 normal in G 2 . Then (G 1 ∩ G 2 )G 1 is normal in (G 1 ∩ G 2 )G 1 , (G 1 ∩ G 2 )G 2 is normal in (G 1 ∩ G 2 )G 2 , and ((G 1 ∩ G 2 )G 1 )/((G 1 ∩ G 2 )G 1 ) ∼ = ((G 1 ∩ G 2 )G 2 )/((G 1 ∩ G 2 )G 2 ). PROOF. Let us check that (G 1 ∩ G 2 )G 1 is normal in (G 1 ∩ G 2 )G 1 . Handling conjugation by members of G 1 ∩ G 2 is straightforward: If g is in G 1 ∩ G 2 ,

8. Simple Groups and Composition Series

173

then g(G 1 ∩ G 2 )g −1 = G 1 ∩ G 2 since g is in G 1 and gG 2 g −1 = G 2 . Also, gG 1 g −1 = G 1 since g is in G 1 . Hence g(G 1 ∩ G 2 )G 1 g −1 = (G 1 ∩ G 2 )G 1 . Handling conjugation by members of G 1 requires a little trick: Let g be in G 1 and let hg be in (G 1 ∩ G 2 )G 1 . Then g(hg )g −1 = h(h −1 gh)g g −1 . The left factor h is in G 1 ∩ G 2 . The remaining factors are in G 1 ; for g and g −1 , this is a matter of deﬁnition, and for h −1 gh, it follows because h is in G 1 and g is in G 1 . Thus g(G 1 ∩ G 2 )G 1 g −1 = (G 1 ∩ G 2 )G 1 , and (G 1 ∩ G 2 )G 1 is normal in (G 1 ∩ G 2 )G 1 . The other assertion about normal subgroups holds by symmetry in the indexes 1 and 2. By the Second Isomorphism Theorem (Theorem 4.14), (G 1 ∩ G 2 )/(((G 1 ∩ G 2 )G 1 ) ∩ (G 1 ∩ G 2 )) ∼ = ((G 1 ∩ G 2 )(G 1 ∩ G )G )/((G 1 ∩ G )G ) = ((G 1 ∩

2 G 2 )G 1 )/((G 1

1

∩

2

(∗)

1

G 2 )G 1 ).

Since we have ((G 1 ∩ G 2 )G 1 ) ∩ (G 1 ∩ G 2 ) = ((G 1 ∩ G 2 )G 1 ) ∩ G 2 = (G 1 ∩ G 2 )(G 1 ∩ G 2 ), we can rewrite the conclusion of (∗) as (G 1 ∩ G 2 )/((G 1 ∩ G 2 )(G 1 ∩ G 2 )) ∼ = ((G 1 ∩ G 2 )G 1 )/((G 1 ∩ G 2 )G 1 ). (∗∗) The left side of (∗∗) is symmetric under interchange of the indices 1 and 2. Hence so is the right side, and the lemma follows. Theorem 4.49 (Schreier). Any two normal series of a group G have equivalent reﬁnements. PROOF. Let the two normal series be G m ⊇ G m−1 ⊇ · · · ⊇ G 1 ⊇ G 0 ,

(∗)

Hn ⊇ Hn−1 ⊇ · · · ⊇ H1 ⊇ H0 , and deﬁne

G i j = (G i ∩ Hj )G i+1

for 0 ≤ j ≤ n,

Hji = (G i ∩ Hj )Hj+1

for 0 ≤ i ≤ m.

(∗∗)

Then we obtain respective reﬁnements of the two normal series (∗) given by G = G 00 ⊇ G 01 ⊇ · · · ⊇ G 0n ⊇ G 10 ⊇ G 11 ⊇ · · · ⊇ G 1n · · · ⊇ G m−1,n = {1}, G = H00 ⊇ H01 ⊇ · · · ⊇ H0m ⊇ H10 ⊇ H11 ⊇ · · · ⊇ H1m · · · ⊇ Hn−1,m = {1}.

(†)

174

IV. Groups and Group Actions

The containments G in ⊇ G i+1,0 and Hjm ⊇ Hj+1,0 are equalities in (†), and the only nonzero consecutive quotients are therefore of the form G i j /G i, j+1 and Hji /Hj,i+1 . For these we have G i j /G i, j+1 = ((G i ∩ Hj )G i+1 )/((G i ∩ Hj+1 )G i+1 ) ∼ = ((G i ∩ Hj )Hj+1 )/((G i+1 ∩ Hj )Hj+1 ) = Hji /Hj,i+1

by (∗∗) by Lemma 4.48 by (∗∗),

and thus the reﬁnements (†) are equivalent.

Corollary 4.50 (Jordan–H¨older Theorem). Any two composition series of a group G are equivalent as normal series. PROOF. Let two composition series be given. Theorem 4.49 says that we can insert terms in each so that the reﬁned series have the same length and are equivalent. Since the given series are composition series, the only way to insert a new term is by repeating some term, and the repetition results in a consecutive quotient of {1}. Because of Theorem 4.49 we know that the quotients {1} from the two reﬁned series must match. Thus the number of terms added to each series is the same. Also, the quotients that are not {1} must match in pairs. Thus the given composition series are equivalent. 9. Structure of Finitely Generated Abelian Groups A set of generators for a group G is a set such that each element of G is a ﬁnite product of generators and their inverses. (A generator and its inverse are allowed to occur multiple times in a product.) In this section we shall study abelian groups having a ﬁnite set of generators. Such groups are said to be ﬁnitely generated abelian groups, and our goal is to classify them up to isomorphism. We use additive notation for all our abelian groups in this section. We begin by introducing an analog Zn for the integers Z of the vector space Rn for the reals R, and along with it a generalization. A free abelian group is any abelian group isomorphic to a direct sum, ﬁnite or inﬁnite, of copies of the additive group Z of integers. The external direct sum of n copies of Z will be denoted by Zn . Let us use Proposition 4.17 to see that we can recognize groups isomorphic to free abelian groups by means of the following condition: an abelian group G is isomorphic to a free abelian group if and only if it has a Z basis, i.e., a subset that generates G and is such that no nontrivial linear combination, with integer coefﬁcients, of the members of the subset is equal to the 0 element of the group. It will be helpful to use terminology adapted from the theory of vector spaces for this latter condition—that the subset is to be linearly independent over Z.

9. Structure of Finitely Generated Abelian Groups

175

Let us give the proof that the condition is necessary and sufﬁcient for G to be free abelian. In one direction if G is an external direct sum of copies of Z, then the members of G that are 1 in one coordinate and are 0 elsewhere form a Z basis. Conversely if {gs }s∈S is a Z basis, let G s0 be the subgroup of multiples of gs0 , and let ϕs0 be the inclusion homomorphism of G s0 into G. Proposition 4.17 produces a unique group homomorphism ϕ : s∈S G s → G such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. The spanning condition for the Z basis says that ϕ is onto G, and the linear independence condition for the Z basis says that ϕ has 0 kernel. The similarity between vector-space bases and Z bases suggests further comparison of vector spaces and abelian groups. With vector spaces over a ﬁeld, every vector space has a basis over the ﬁeld. However, it is exceptional for an abelian group to have a Z basis. Two examples that hint at the difﬁculty are the additive group Z/mZ with m > 1 and the additive group Q. The group Z/mZ has no nonempty linearly independent set, while the group Q has a linearly independent set of one element, no spanning set of one element, and no linearly independent set of more than one element. Here are two positive examples. EXAMPLES. (1) The additive group of all points in Rn whose coordinates are integers. The standard basis of Rn is a Z basis. (2) The additivegroup of (x, y) in R2 with x and y both in Z or both 1all1 points 1 in Z + 2 . The set (1, 0), 2 , 2 is a Z basis. Next we take a small step that eliminates technical complications from the discussion, proving that any subgroup of a ﬁnitely generated abelian group is ﬁnitely generated. Lemma 4.51. Let ϕ : G → H be a homomorphism of abelian groups. If ker ϕ and image ϕ are ﬁnitely generated, then G is ﬁnitely generated. PROOF. Let {x1 , . . . , xm } and {y1 , . . . , yn } be respective ﬁnite sets of generators for ker ϕ and image ϕ. For 1 ≤ j ≤ n, choose x j in G with ϕ(x j ) = yj . We shall prove that {x1 , . . . , xm , x1 , . . . , xn } is a set of generators for G. Thus let x be in G. Since ϕ(x) is in image ϕ, there exist integers a1 , . . . , an with ϕ(x) = a1 y1 + · · · + an yn . The element x = a1 x1 + · · · + an xn of G has ϕ(x ) = a1 y1 + · · · + an yn = ϕ(x). Therefore ϕ(x − x ) = 0, and there exist integers b1 , . . . , bm with x − x = b1 x1 + · · · + bm xm . Hence x = b1 x1 + · · · + bm xm + x = b1 x1 + · · · + bm xm + a1 x1 + · · · + an xn . Proposition 4.52. Any subgroup of a ﬁnitely generated abelian group is ﬁnitely generated.

IV. Groups and Group Actions

176

PROOF. Let G be ﬁnitely generated with a set {g1 , . . . , gn } of n generators, and deﬁne G k = Zg1 + · · · + Zgk for 1 ≤ k ≤ n. If H is any subgroup of G, deﬁne Hk = H ∩ G k for 1 ≤ k ≤ n. We shall prove by induction on k that every Hk is ﬁnitely generated, and then the case k = n gives the proposition. For k = 1, G 1 = Zg1 is a cyclic group, and any subgroup of it is cyclic by Proposition 4.4 and hence is ﬁnitely generated. Assume inductively that every subgroup of G k is known to be ﬁnitely generated. Let q : G k+1 → G k+1 /G k be the quotient homomorphism, and let ϕ = q Hk+1 , mapping Hk+1 into G k+1 /G k . Then ker ϕ = Hk+1 ∩ G k is a subgroup of G k and is ﬁnitely generated by the inductive hypothesis. Also, image ϕ is a subgroup of G k+1 /G k , which is a cyclic group with generator equal to the coset of gk+1 . Since a subgroup of a cyclic group is cyclic, image ϕ is ﬁnitely generated. Applying Lemma 4.51 to ϕ, we see that Hk+1 is ﬁnitely generated. This completes the induction and the proof. A free abelian group has ﬁnite rank if it has a ﬁnite Z basis, hence if it is isomorphic to Zn for some n. The ﬁrst theorem is that the integer n is determined by the group. Theorem 4.53. The number of Z summands in a free abelian group of ﬁnite rank is independent of the direct-sum decomposition of the group. We deﬁne this number to be the rank of the free abelian group. Actually, “rank” is a well-deﬁned cardinal in the inﬁnite-rank case as well, because the rank coincides in that case with the cardinality of the group. In any event, Theorem 4.53 follows immediately by two applications of the following lemma. Lemma 4.54. If G is a free abelian group with a ﬁnite Z basis x1 , . . . , xn , then any linearly independent subset of G has ≤ n elements. PROOF. Let {y1 , . . . , ym } be a linearly independent set in G. Since {x 1 , . . . , xn } is a Z basis, we can deﬁne an m-by-n matrix C of integers by yi = nj=1 Ci j x j . As a matrix in Mmn (Q), C has rank ≤ n. Consequently if m > n, then the rows are linearly dependent over Q, and we can ﬁnd rational numbers q1 , . . . , qm not m qi Ci j = 0 for all j. Multiplying by a suitable all 0 such that i=1 m integer to clear fractions, we obtain integers k1 , . . . , km not all 0 such that i=1 ki Ci j = 0 for all j. Then we have m i=1

ki yi =

m i=1

ki

n j=1

Ci j x j =

n m j=1

i=1

n ki C i j x j = 0x j = 0, j=1

in contradiction to the linear independence of {y1 , . . . , ym } over Z. Therefore m ≤ n.

9. Structure of Finitely Generated Abelian Groups

177

Now we come to the two main results of this section. The ﬁrst is a special case of the second by Proposition 4.52 and Lemma 4.54. The two will be proved together, and it may help to regard the proof of the ﬁrst as a part of the proof of the second. Theorem 4.55. A subgroup H of a free abelian group G of ﬁnite rank n is free abelian of rank ≤ n. REMARK. This result persists in the case of inﬁnite rank, but we do not need the more general result and will not give a proof. Theorem 4.56 (Fundamental Theorem of Finitely Generated Abelian Groups). Every ﬁnitely generated abelian group is a ﬁnite direct sum of cyclic groups. The cyclic groups may be taken to be copies of Z and various C pk with p prime, and in this case the cyclic groups are unique up to order and to isomorphism. REMARKS. The main conclusion of the theorem is the decomposition of each ﬁnitely generated abelian group into the direct sum of cyclic groups. An alternative decomposition of the given group that forces uniqueness is as the direct sum of copies of Z and ﬁnite cyclic groups Cd1 , . . . , Cdr such that d1 | d2 , d2 | d3 , . . . , dr −1 | dr . A proof of the additional statement appears in the problems at the end of Chapter VIII. The integers d1 , . . . , dr are sometimes called the elementary divisors of the group. Let us establish the setting for the proof of Theorem 4.56. Let G be the given group, and say that it has a set of n generators. Proposition 4.17 produces a homomorphism ϕ : Zn → G that carries the standard generators x1 , . . . , xn of Zn to the generators of G, and ϕ is onto G. Let H be the kernel of ϕ. As a subgroup of Zn , H is ﬁnitely generated, by Proposition 4.52. Let y1 , . . . , ym be generators. Theorem 4.55 predicts that H is in fact free abelian, hence that {y1 , . . . , ym } could be taken to be linearly independent over Z with m ≤ n, but we do not assume that knowledge in the proof of Theorem 4.56. The motivation for the main part of the proof of Theorem 4.56 comes from the elementary theory of vector spaces, particularly from the method of using a basis for a ﬁnite-dimensional vector space to ﬁnd a basis of a vector subspace when we know a ﬁnite spanning set for the vector subspace. Thus let V be a ﬁnite-dimensional vector space over R, with basis {x j }nj=1 , and let U be a vector m subspace with spanning set {yi }i=1 . To produce a vector-space basis for U , we imagine expanding the yi ’s as linear combinations of x1 , . . . , xn . We can think the product of a row symbolically of this expansion as expressing each yi as x1

vector of real numbers times the formal “column vector”

.. .

xn

. The entries of

IV. Groups and Group Actions

178

this column vector are vectors, but there is no problem in working with itsince y1

this is all just a matter of notation anyway. Then the formal column vector

.. .

ym

of m members of U equalsthe product of an m-by-n matrix of real numbers times x1 . the formal column vector .. . We know from Chapter II that the procedure for xn

ﬁnding a basis of U is to row reduce this matrix of real numbers. The nonzero rows of the result determine a basis of the span of the m vectors we have used, and this basis is related tidily to the given basis for V . We can compare the two bases to understand the relationship between U and V . To prove Theorem 4.56, we would like to use the same procedure, but we have to work with an integer matrix and avoid division. This means that only two of the three usual row operations are fully available for the row reduction; division of a row by an integer is allowable only when the integer is ±1. A partial substitute for division comes by using the steps of the Euclidean algorithm via the division algorithm (Proposition 1.1),

but 2 1 1 even that is not enough. For example, if the m-by-n matrix is , no 0 0 3 further row reduction is possible with integer operations. However, the equations tell us that H is the subgroup of Z3 generated by (2, 1, 1) and (0, 0, 3), and it is not at all clear how to write Z3 /H as a direct sum of cyclic groups. The row operations have the effect of changing the set of generators of H while maintaining the fact that they generate H . What is needed is to allow also column reduction with integer operations. Steps of this kind have the effect of changing the Z basis of Zn . When steps of this kind are allowed, we can produce new generators of H and a new basis of Zn so that the two can be compared. With the example above, suitable column operations are

2 0

1 0

1 3

→

1 2 0 0

1 3

→

1 0 0 0

0 3

→

1 0

0 3

0 0

.

The equations with the new generators say that y1 = x1 and y2 = 3x2 . Thus H is the subgroup Z ⊕ 3Z ⊕ 0Z, nicely aligned with Z3 = Z ⊕ Z ⊕ Z. The quotient is (Z/Z) ⊕ (Z/3Z) ⊕ (Z/0Z) ∼ = C3 ⊕ Z. The proof of Theorem 4.56 will make use of an algorithm that uses row and column operations involving only allowable divisions and that converts the matrix C of coefﬁcients so that its nonzero entries are the diagonal entries Cii for 1 ≤ i ≤ r and no other entries. The algorithm in principle can be very slow, and it may be helpful to see what it does in an ordinary example. EXAMPLE. Suppose that the relationship between generators y1 , y2 , y3 of H

9. Structure of Finitely Generated Abelian Groups

and the standard Z basis {x1 , x2 } of Z2 is

3 y1 x1 , where C = 7 y2 = C x2 y3 5

5 13 9

179

.

In row reduction in vector-space theory, we would start by dividing the ﬁrst row of C by 3, but division by 3 is not available in the present context. Our target for the upper-left entry is GCD(3, 7, 5) = 1, and we use the division algorithm one step at a time to get there. To begin with, it says that 7 = 2 · 3 + 1 and hence 7 − 2 · 3 = 1. The ﬁrst step of row reduction is then to replace the second row by the difference of it and 2 times the ﬁrst row. The result can be achieved by left multiplication by 1 0 0 3 5 and is −2 1 0 1 3 . 0 0 1 5 9 We write this step as ⎛

3 5 7 13 5 9

100

⎞

left by ⎝ −2 1 0 ⎠ 001

|−−−−−−−−−−→

3 1 5

5 3 9

.

The entry 1 in the ﬁrst column is our target for this stage since GCD(3, 7, 5) = 1. The next step interchanges two rows to move the 1 to the upper left entry, and the subsequent step uses the 1 to eliminate the other entries of the ﬁrst column: ⎛

3 1 5

5 3 9

010

⎛

⎞

left by ⎝ 1 0 0 ⎠ 001

|−−−−−−−−−→

1 3 3 5 5 9

100

⎞

left by ⎝ −3 1 0 ⎠

−5 0 1

|−−−−−−−−−−→

1 0 0

3 −4 −6

.

The algorithm next seeks to eliminate the off-diagonal entry 3 in the ﬁrst row. This is done by a column operation:

1 0 0

3 −4 −6

right by

1 −3 0 1

|−−−−−−−−−→

1 0 0

0 −4 −6

.

With two further row operations we are done: ⎛

1 0 0

0 −4 −6

10

0

00

1

⎞

left by ⎝ 0 1 −1 ⎠

|−−−−−−−−−−→

⎛

1 0 0 2 0 −6

100

⎞

left by ⎝ 0 1 0 ⎠ 031

|−−−−−−−−−→

1 0 0

0 2 0

.

180

IV. Groups and Group Actions

Our steps are summarized by the fact that the matrix A with 1 0 0 1 0 0 1 0 0 0 1 0 1 A= 0 1 0 0 1 −1 −3 1 0 1 0 0 −2 0 3 1 0 0 1 −5 0 1 0 0 1 0

has

AC

1 −3 0 1

=

1 0 0

0 2 0

0 1 0

0 0 1

and by the fact that the integer matrices to the left and right of C have determinant −1 1 −3 have integer ±1. The determinant condition ensures that A−1 and 0 1 entries, according to Cramer’s rule (Proposition 2.38). Lemma 4.57. If C is an m-by-n matrix of integers, then there exist an m-by-m matrix A of integers with determinant ±1 and an n-by-n matrix B of integers with determinant ±1 such that for some r ≥ 0, the nonzero entries of D = AC B are exactly the diagonal entries D11 , D22 , . . . , Drr . PROOF. Given C, choose (i, j) with |Ci j | = 0 but |Ci j | as small as possible. (If C = 0, the algorithm terminates.) Possibly by interchanging two rows and/or then two columns (a left multiplication with determinant −1 and then a right multiplication with determinant −1), we may assume that (i, j) = (1, 1). By the division algorithm write, for each i, Ci1 = qi C11 + ri

with 0 ≤ ri < |C11 |,

and replace the i th row by the difference of the i th row and qi times the ﬁrst row (a left multiplication). If some ri is not 0, the result will leave a nonzero entry in the ﬁrst column that is < |C11 | in absolute value. Permute the least such ri = 0 to the upper left and repeat the process. Since the least absolute value is going down, this process at some point terminates with all ri equal to 0. The ﬁrst column then has a nonzero diagonal entry and is otherwise 0. Now consider C1 j and apply the division algorithm and column operations in similar fashion in order to process the ﬁrst row. If we get a smaller nonzero remainder, permute the smallest one to the ﬁrst column. Repeat this process until the ﬁrst row is 0 except for entry C11 . Continue alternately with row and column operations in this fashion until both C1 j = 0 for j > 1 and Ci1 = 0 for i > 1. Repeat the algorithm for the (m − 1)-by-(n − 1) matrix consisting of rows 2 through m and columns 2 through n, and continue inductively. The algorithm terminates when either the reduced-in-size matrix is empty or is all 0. At this point the original matrix has been converted into the desired “diagonal form.”

9. Structure of Finitely Generated Abelian Groups

181

Lemma 4.58. Let G 1 , . . . , G n be abelian groups, and for 1 ≤ j ≤ n, let Hj be a subgroup of G j . Then (G 1 ⊕ · · · ⊕ G n )/(H1 ⊕ · · · ⊕ Hn ) ∼ = (G 1 /H1 ) ⊕ · · · ⊕ (G n /Hn ). PROOF. Let ϕ : G 1 ⊕ · · · ⊕ G n → (G 1 /H1 ) ⊕ · · · ⊕ (G n /Hn ) be the homomorphism deﬁned by ϕ(g1 , . . . , gn ) = (g1 H1 , . . . , gn Hn ). The mapping ϕ is onto (G 1 /H1 ) ⊕ · · · ⊕ (G n /Hn ), and the kernel is H1 ⊕ · · · ⊕ Hn . Then Corollary 4.12 shows that ϕ descends to the required isomorphism. PROOF OF THEOREM 4.55 AND MAIN CONCLUSION OF THEOREM 4.56. Given G with n generators, we set up matters as indicated immediately after the statement of Theorem 4.56, writing y1

.. .

ym

x1

=C

.. .

,

xn

where x1 , . . . , xn are the standard generators of Zn , y1 , . . . , ym are the generators of the kernel of the homomorphism from Zn onto G, and C is a matrix of integers. Applying Lemma 4.57, let A and B be square integer matrices of determinant ±1 such that D = AC B is diagonal as in the statement of the lemma. Deﬁne y1 u1 x1 z1 .. .. .. .. −1 and . =A . . =B . . zm

ym

un

xn

Substitution gives y1 x1 u1 z1 .. . .. −1 .. = AC B ... . . = A . = (AC B)B zm

ym

xn

un

If (c1 · · · cn ) and (d1 · · · dn ) = (c1 · · · cn )B −1 are row vectors, then the formula ⎛ ⎞ ⎛ ⎞ u1 x1 . . c1 u 1 + · · · + cn u n = (c1 · · · cn ) ⎝ .. ⎠ = (d1 · · · dn ) ⎝ .. ⎠ un xn (∗) = d1 x1 + · · · + dn xn shows that {u 1 , . . . , u n } generates the same subset of Zn as {x1 , . . . , xn }. Since (c1 · · · cn ) is nonzero if and only if (d1 · · · dn ) is nonzero, the formula (∗) shows also that the linear independence of {x1 , . . . , xn } implies that of {u 1 , . . . , u n }. Hence {u 1 , . . . , u n } is a Z basis of Zn . Similarly {y1 , . . . , ym } and {z 1 , . . . , z m }

IV. Groups and Group Actions

182

generate the same subgroup H of Zn . Therefore we can compare H and Zn using {z 1 , . . . , z m } and {u 1 , . . . , u n }. Since D is diagonal, the equations relating {z 1 , . . . , z m } and {u 1 , . . . , u n } are z j = D j j u j for j ≤ min(m, n) and z j = 0 for min(m, n) < j ≤ m. If q = min(m, n), then we see that H=

m i=1

Zz i =

q i=1

Dii Zu i +

m

Zz i =

i=q+1

q

Dii Zu i .

i=1

Since the set {u 1 , . . . , u q } is linearly independent over Z, this sum exhibits H as given by H = D11 Z ⊕ · · · ⊕ Dqq Z with D11 u 1 , . . . , Dqq u q as a Z basis. Consequently H has been exhibited as free abelian of rank ≤ q ≤ n. This proves Theorem 4.55. Applying Lemma 4.58 to the quotient Zn /H and letting D11 , . . . , Drr be the nonzero diagonal entries of D, we see that H has rank r , and we obtain an expansion of G in terms of cyclic groups as G = C D11 ⊕ · · · ⊕ C Drr ⊕ Zn−r .

This proves the main conclusion of Theorem 4.56.

PROOF OF THE DECOMPOSITION WITH GROUPS OF PRIME-POWER ORDER. N CYCLIC k p j j with the p j equal to distinct primes, It is enough to prove that if m = j=1 k k then Z/mZ ∼ = (Z/ p11 Z) ⊕ · · · ⊕ (Z/ p NN Z). This is a variant of the Chinese Remainder Theorem (Corollary 1.9). For the proof let ϕ : Z → (Z/ p1k1 Z) ⊕ · · · ⊕ (Z/ p kNN Z) be the homomorphism given by ϕ(s) = s mod p1k1 , . . . , s mod p kNN for s ∈ Z. Since ϕ(m) = (0, . . . , 0), ϕ descends to a homomorphism ϕ : Z/mZ → (Z/ p1k1 Z) ⊕ · · · ⊕ (Z/ p kNN Z). k

The map ϕ is one-one because if ϕ(s) = 0, then p j j divides s for all j. Since k

the p j j are relatively prime in pairs, their product m divides s. Since m divides s, s ≡ 0 mod m. The map ϕ is onto since it is one-one and since the ﬁnite sets Z/mZ and (Z/ p1k1 Z) ⊕ · · · ⊕ (Z/ p kNN Z) both have m elements. PROOF OF UNIQUENESS OF THE DECOMPOSITION. Write G = Zs ⊕ T , where T = (Z/ p1l1 Z) ⊕ · · · ⊕ (Z/ plMM Z) and the p j ’s are not necessarily distinct. The subgroup T is the subgroup of elements of ﬁnite order in G, and it is well deﬁned independently of the decomposition of G as the direct sum of cyclic groups. The quotient G/T ∼ = Zs is

10. Sylow Theorems

183

free abelian of ﬁnite rank, and its rank s is well deﬁned by Theorem 4.53. Thus the number s of factors of Z in the decomposition of G is uniquely determined, and we need only consider uniqueness of the decomposition of the ﬁnite abelian group T . For p prime the elements of T of order pa for some a are those in the sum of l the groups Z/ p jj Z for which p j = p, and we are reduced to considering a group H = Z/ pl1 Z ⊕ · · · ⊕ Z/ pl M Z with p ﬁxed and l1 ≤ · · · ≤ l M . The set of p j powers of elements of H is a subgroup of H and is given by Z/ plt − j Z ⊕ · · · ⊕ Z/ pl M − j Z if lt is the ﬁrst index ≥ j, while the set of p j+1 powers of elements of H is given by Z/ plt − j−1 Z ⊕ · · · ⊕ Z/ pl M − j Z if lt is the ﬁrst index ≥ j + 1. Therefore Lemma 4.58 gives p j H/ p j+1H ∼ = (Z/ plt − j−1 Z)/(Z/ plt − j Z)⊕· · ·⊕(Z/ pl M − j−1 Z)/(Z/ pl M − j Z). Each term of p j H/ p j+1 H has order p, and thus | p j H/ p j+1 H | = p |{i | li > j}| . Hence H determines the integers l1 , . . . , l M , and uniqueness is proved.

10. Sylow Theorems This section continues the use of group actions to obtain results concerning structure theory for abstract groups. We shall prove the three Sylow Theorems, which are a starting point for investigations of the structure of ﬁnite groups that are deeper than those in Sections 6 and 7. We state the three theorems as the parts of Theorem 4.59. Theorem 4.59 (Sylow Theorems). Let G be a ﬁnite group of order p m r , where p is prime and p does not divide r . Then (a) G contains a subgroup of order p m , and any subgroup of G of order pl with 0 ≤ l < m is contained in a subgroup of order p m , (b) any two subgroups of order p m in G are conjugate in G, i.e., any two such subgroups P1 and P2 have P2 = a P1 a −1 for some a ∈ G, (c) the number of subgroups of order p m is of the form pk + 1 and divides r . REMARK. A subgroup of order p m as in the theorem is called a Sylow p-subgroup of G.

184

IV. Groups and Group Actions

Before coming to the proof, let us carefully give two simple applications to structure theory. The applications combine Theorem 4.59, some results of Sections 6 and 7, and Problems 35–38 and 45–48 at the end of the chapter. Proposition 4.60. If p and q are primes with p < q, then there exists a nonabelian group of order pq if and only if p divides q − 1, and in this case the nonabelian group is unique up to isomorphism. It may be taken to be a semidirect product of the cyclic groups C p and Cq with Cq normal. REMARK. It follows from Theorem 4.56 that the only abelian group of order pq, up to isomorphism, is C p × Cq ∼ = C pq . If p = 2 in the proposition, then q is odd and p divides q − 1; the proposition yields the dihedral group Dq . For p > 2, the divisibility condition may or may not hold: For pq = 15, the condition does not hold, and hence every group of order 15 is cyclic. For pq = 21, the condition does hold, and there exists a nonabelian group of order 21; this group was constructed explicitly in Example 2 in Section 7. PROOF. Existence of a nonabelian group of order pq, together with the semidirect-product structure, is established by Proposition 4.46 if p divides q −1. Let us see uniqueness and the necessity of the condition that p divide q − 1. If G has order pq, Theorem 4.59a shows that G has a Sylow p-subgroup Hp and a Sylow q-subgroup Hq . Corollary 4.9 shows that these two groups are cyclic. The conjugates of Hq are Sylow q-subgroups, and Theorem 4.59c shows that the number of such conjugates is of the form kq + 1 and divides p. Since p < q, k = 0. Therefore Hq is normal. (Alternatively, one can apply Proposition 4.36 to see that Hq is normal.) Each element of G is uniquely a product ab with a in Hp and b in Hq . For the uniqueness, if a1 b1 = a2 b2 , then a2−1 a1 = b2 b1−1 is an element of Hp ∩ Hq . Its order must divide both p and q and hence must be 1. Thus the pq products ab with a in Hp and b in Hq are all different. Since the number of them equals the order of G, every member of G is such a product. By Proposition 4.44, G is a semidirect product of Hp and Hq . If the action of Hp on Hq is nontrivial, then Problem 37 at the end of the chapter shows that p divides q − 1, and Problem 38 shows that the group is unique up to isomorphism. On the other hand, if the action is trivial, then G is certainly abelian. Proposition 4.61. If G is a group of order 12, then G contains a subgroup H of order 3 and a subgroup K of order 4, and at least one of them is normal. Consequently there are exactly ﬁve groups of order 12, up to isomorphism—two abelian and three nonabelian.

10. Sylow Theorems

185

REMARK. The second statement follows from the ﬁrst, as a consequence of Problems 45–48 at the end of the chapter. Those problems show how to construct the groups. PROOF. Theorem 4.59a shows that H may be taken to be a Sylow 3-subgroup and K may be taken to be a Sylow 2-subgroup. We have to prove that either H or K is normal. Suppose that H is not normal. Theorem 4.59c shows that the number of Sylow 3-subgroups is of the form 3k + 1 and divides 4. The subgroup H , not being normal, fails to equal one of its conjugates, which will be another Sylow 3-subgroup; hence k > 0. Therefore k = 1, and there are four Sylow 3-subgroups. The intersection of any two such subgroups is a subgroup of both and must be trivial since 3 is prime. Thus the set-theoretic union of the Sylow 3-subgroups accounts for 4 · 2 + 1 elements. None of these elements apart from the identity lies in K , and thus K contributes 3 further elements, for a total of 12. Thus every element of G lies in K or a conjugate of H . Consequently K equals every conjugate of K , and K is normal. Let us see where we are with classifying ﬁnite groups of certain orders, up to isomorphism. A group of order p is cyclic by Corollary 4.9, and a group of order p 2 is abelian by Corollary 4.39. Groups of order pq are settled by Proposition 4.60. Thus for p and q prime, we know the structure of all groups of order p, p 2 , and pq. Problems 39–44 at the end of the chapter tell us the structure of the groups of order 8, and Proposition 4.61 and Problems 45–48 tell us the structure of the groups of order 12. In particular, the table at the end of Section 1, which gives examples of nonisomorphic groups of order at most 15, is complete except for the one group of order 12 that is discussed in Problem 48. Problems 30–34 and 49–54 at the end of the chapter go in the direction of classifying ﬁnite groups of certain other orders. Now we return to Theorem 4.59. The proof of the theorem makes use of the theory of group actions as in Section 6. In fact, the proof of existence of Sylow p-subgroups is just an elaboration of the argument used to prove Corollary 4.38, saying that a group of prime-power order has a nontrivial center. The relevant action for the existence part of the proof is the one (g, x) → gxg −1 given by conjugation of the elements of the group, the orbit of x being the conjugacy class C (x). Proposition 4.37 shows that |G| = | C (x)||Z G (x)|, where Z G (x) is the centralizer of x. Since the disjoint union of the conjugacy classes is all of |G|, we have |G| = |Z G | + |G|/|Z G (x j )|, representatives x j of each conjugacy class with | C (x)| =1

186

IV. Groups and Group Actions

a formula sometimes called the class equation of G. PROOF OF EXISTENCE OF SYLOW p-SUBGROUPS IN THEOREM 4.59a. We induct on |G|, the base case being |G| = 1. Suppose that existence holds for groups of order < |G|. Without loss of generality suppose that m > 0, so that p divides |G|. First suppose that p does not divide |Z G |. Referring to the class equation of G, we see that p must fail to divide some integer |G|/|Z G (x j )| for which |Z G (x j )| < |G|. Since p m is the exact power of p dividing |G|, we conclude that p m divides this |Z G (x j )| and p m+1 does not. Since |Z G (x j )| < |G|, the inductive hypothesis shows that Z G (x j ) has a subgroup of order p m , and this is a Sylow p-subgroup of G. Now suppose that p divides |Z G |. The group Z G is ﬁnitely generated abelian, hence is a direct sum of cyclic groups by Theorem 4.56. Thus Z G contains an element c of order p. The cyclic group C generated by c then has order p. Being a subgroup of Z G , C is normal in G. The group G/C has order p m−1r , and the inductive hypothesis implies that G/C has a subgroup H of order p m−1 . If ϕ : G → G/C denotes the quotient map, then ϕ −1 (H ) is a subgroup of G of order |H || ker ϕ| = p m−1 p = p m . For the remaining parts of Theorem 4.59, we make use of a different group action. If denotes the set of all subgroups of G, then G acts on by conjugation: (g, H ) → g H g −1 . The orbit of a subgroup of H consists of all subgroups conjugate to H in G, and the isotropy subgroup at the point H in is {g ∈ G | g H g −1 = H }. This is a subgroup N (H ) of G known as the normalizer of H in G. It has the properties that N (H ) ⊇ H and that H is a normal subgroup of N (H ). The counting formula of Corollary 4.35 gives {g H g −1 | g ∈ G} = |G/N (H )|. Meanwhile, application of Lagrange’s Theorem (Theorem 4.7) to the three quotients G/H , G/N (H ), and N (H )/H shows that |G/H | = |G/N (H )||N (H )/H |, with all three factors being integers. Now assume as in the statement of Theorem 4.59 that |G| = p m r with p prime and p not dividing r . In this setting we have the following lemma.

10. Sylow Theorems

187

Lemma 4.62. If P is a Sylow p-subgroup of G and if H is a subgroup of the normalizer N (P) whose order is a power of p, then H ⊆ P. PROOF. Since H ⊆ N (P) and P is normal in N (P), the set H P of products is a group, by the same argument as used for Hp Hq in the proof of Proposition 4.60. Then H P/P ∼ = H/(H ∩ P) by the Second Isomorphism Theorem (Theorem 4.14), and hence |H P/P| is some power p k of p. By Lagrange’s Theorem (Theorem 4.7), |H P| = p m+k with k ≥ 0. Since no subgroup of G can have order pl with l > m, we must have k = 0. Thus H P = P and H ⊆ P. PROOF OF THE REMAINDER OF THEOREM 4.59. Within the set of all subgroups of G, let be the set of all subgroups of G of order p m . We have seen that is not empty. Since the conjugate of a subgroup has the same order as the subgroup, is the union of orbits of under conjugation by G. Thus we can restrict the group action by conjugation from G × → to G × → . Let P and P be members of , and let and be the G orbits of P and P under conjugation. Suppose that and are distinct orbits of G. Let us restrict the group action by conjugation from G × → to P × → . The G orbits and then break into P orbits, and the counting formula Corollary 4.35 says for each orbit that p m = |P| = #{subgroups in a P orbit} × isotropy subgroup within P . Hence the number of subgroups in a P orbit is of the form pl for some l ≥ 0. Suppose that l = 0. Then the P orbit is some singleton set {P }, and the corresponding isotropy subgroup within P is all of P: P = { p ∈ P | p P p −1 = P } ⊆ N (P ). Lemma 4.62 shows that P ⊆ P , and therefore P = P . Thus l = 0 only for the P orbit {P}. In other words, the number of elements in any P orbit other than {P} is divisible by p. Consequently || ≡ 1 mod p while | | ≡ 0 mod p, the latter because and are assumed distinct. But this conclusion is asymmetric in the G orbits and , and we conclude that and must coincide. Hence there is only one G orbit in , and it has kp + 1 members for some k. This proves parts (b) and (c) except for the fact that kp + 1 divides r . For this divisibility let us apply the counting formula Corollary 4.35 to the orbit of G. The formula gives |G| = || |isotropy subgroup|, and hence || divides |G| = p m r . Since || = kp + 1, we have GCD(||, p) = 1 and also GCD(||, p m ) = 1. By Corollary 1.3, kp + 1 divides r . Finally we prove that any subgroup H of G of order pl lies in some Sylow p-subgroup. Let = again be the G orbit in of subgroups of order p m ,

188

IV. Groups and Group Actions

and restrict the action by conjugation from G × → to H × → . Each H orbit in must have pa elements for some a, by one more application of the counting formula Corollary 4.35. Since || ≡ 1 mod p, some H orbit has one element, say the H orbit of P. Then the isotropy subgroup of H at the point P is all of H , and H ⊆ N (P). By Lemma 4.62, H ⊆ P. This completes the proof of Theorem 4.59.

11. Categories and Functors The mathematics thus far in the book has taken place in several different contexts, and we have seen that the same notions sometimes recur in more than one context, possibly with variations. For example we have worked with vector spaces, innerproduct spaces, groups, rings, and ﬁelds, and we have seen that each of these areas has its own deﬁnition of isomorphism. In addition, the notion of direct product or direct sum has arisen in more than one of these contexts, and there are other similarities. In this section we introduce some terminology to make the notion of “context” precise and to provide a setting for discussing similarities between different contexts. A category C consists of three things: • a class of objects, denoted by Obj(C ), • for any two objects A and B in the category, a set Morph(A, B) of morphisms, • for any three objects A, B, and C in the category, a law of composition for morphisms, i.e., a function carrying Morph(A, B)×Morph(B, C) into Morph(A, C), with the image of ( f, g) under composition written as g f , and these are to satisfy certain properties that we list in a moment. When more than one category is under discussion, we may use notation like MorphC (A, B) to distinguish between the categories. We are to think initially of the objects as the sets we are studying with a particular kind of structure on them; the morphisms are then the functions from one object to another that respect this additional structure, and the law of composition is just composition of functions. Indeed, the deﬁning conditions that are imposed on general categories are arranged to be obvious for this special kind of category, and this setting accounts for the order in which we write the composition of two morphisms. But the deﬁnition of a general category is not so restrictive, and it is important not to restrict the deﬁnition in this way. The properties that are to be satisﬁed to have a category are as follows: (i) the sets Morph(A1 , B1 ) and Morph(A2 , B2 ) are disjoint unless A1 = A2 and B1 = B2 (because two functions are declared to be different

11. Categories and Functors

189

unless their domains match and their ranges match, as is underscored in Section A1 of the appendix), (ii) the law of composition satisﬁes the associativity property h(g f ) = (hg) f for f ∈ Morph(A, B), g ∈ Morph(B, C), and h ∈ Morph(C, D), (iii) for each object A, there is an identity morphism 1 A in Morph(A, A) such that f 1 A = f and 1 A g = g for f ∈ Morph(A, B) and g ∈ Morph(C, A). A subcategory S of a category C by deﬁnition is a category with Obj(S ) ⊆ Obj(C ) and MorphS (A, B) ⊆ MorphC (A, B) whenever A and B are in Obj(S ), and it is assumed that the laws of composition in S and C are consistent when both are deﬁned. Here are several examples in which the morphisms are functions and the law of composition is ordinary composition of functions. They are usually identiﬁed in practice just by naming their objects, since the morphisms are understood to be all functions from one object to another respecting the additional structure on the objects. EXAMPLES OF CATEGORIES. (1) The category of all sets. An object A is a set, and a morphism in the set Morph(A, B) is a function from A into B. (2) The category of all vector spaces over a ﬁeld F. The morphisms are linear maps. (3) The category of all groups. The morphisms are group homomorphisms. (4) The category of all abelian groups. The morphisms again are group homomorphisms. This is a subcategory of the previous example. (5) The category of all rings. The morphisms are all ring homomorphisms. The kernel and the image of a morphism are necessarily objects of the category. (6) The category of all rings with identity. The morphisms are all ring homomorphisms carrying identity to identity. This is a subcategory of the previous example. The image of a morphism is necessarily an object of the category, but the kernel of a morphism is usually not in the category. (7) The category of all ﬁelds. The morphisms are as in Example 6, and the result is a subcategory of Example 6. In this case any morphism is necessarily one-one and carries inverses to inverses. (8) The category of all group actions by a particular group G. If G acts on X and on Y , then a morphism from the one space to the other is a G equivariant mapping from X to Y , i.e., a function ϕ : X → Y such that ϕ(gx) = gϕ(x) for all x in X . (9) The category of all representations by a particular group G on a vector space over a particular ﬁeld F. The morphisms are the linear G equivariant functions. This is a subcategory of the previous example.

190

IV. Groups and Group Actions

Readers who are familiar with point-set topology will recognize that one can impose topologies on everything in the above examples, insisting that the functions be continuous, and again we obtain examples of categories. For example the category of all topological spaces consists of objects that are topological spaces and morphisms that are continuous functions. The category of all continuous group actions by a particular topological group has objects that are group actions G × X → X that are continuous functions, and the morphisms are the equivariant functions that are continuous. Readers who are familiar with manifolds will recognize that another example is the category of all smooth manifolds, which consists of objects that are smooth manifolds and morphisms that are smooth functions. The morphisms in a category need not be functions in the usual sense. An important example is the “opposite category” C opp to a category C, which is a handy technical device and is discussed in Problems 78–80 at the end of the chapter. In all of the above examples of categories, the class of objects fails to be a set. This behavior is typical. However, it does not cause problems in practice because in any particular argument involving categories, we can restrict to a subcategory for which the objects do form a set.15 If C is a category, a morphism ϕ ∈ Morph(A, B) is said to be an isomorphism if there exists a morphism ψ ∈ Morph(B, A) such that ψϕ = 1 A and ϕψ = 1 B . In this case we say that A is isomorphic to B in the category C. Let us check that the morphism ψ is unique if it exists. In fact, if ψ is a member of Morph(B, A) with ψ ϕ = 1 A and ϕψ = 1 B , then ψ = 1 A ψ = (ψ ϕ)ψ = ψ (ϕψ) = ψ 1 B = ψ . We can therefore call ψ the inverse to ϕ. The relation “is isomorphic to” is an equivalence relation.16 In fact, the relation is symmetric by deﬁnition, and it is reﬂexive because 1 A ∈ Morph(A, A) has 1 A as inverse. For transitivity let ϕ1 ∈ Morph(A, B) and ϕ2 ∈ Morph(B, C) be isomorphisms, with respective inverses ψ1 ∈ Morph(B, A) and ψ2 ∈ Morph(C, B). Then ϕ2 ϕ1 is in Morph(A, C), and ψ1 ψ2 is in Morph(C, A). Calculation gives (ψ1 ψ2 )(ϕ2 ϕ1 ) = ψ1 (ψ2 (ϕ2 ϕ1 )) = ψ1 ((ψ2 ϕ2 )ϕ1 ) = ψ1 (1 B ϕ1 ) = ψ1 ϕ1 = 1 A , and similarly (ϕ2 ϕ1 )(ψ1 ψ2 ) = 1C . Therefore ϕ2 ϕ1 ∈ Morph(A, C) is an isomorphism, and “is isomorphic to” is an equivalence relation. When A is isomorphic to B, it is permissible to say that A and B are isomorphic. The next step is to abstract a frequent kind of construction that we have 15 For the interested reader, a book that pays closer attention to the inherent set-theoretic difﬁculties in the theory is Mac Lane’s Categories for the Working Mathematician. 16 Technically one considers relations only when they are deﬁned on sets, and the class of objects in a category is typically not a set. However, just as with vector spaces, groups, and so on, we can restrict attention in any particular situation to a subcategory for which the objects do form a set, and then there is no difﬁculty.

11. Categories and Functors

191

used with our categories. If C and D are two categories, a covariant functor F : C → D associates to each object A in Obj(C ) an object F(A) in Obj(D) and to each pair of objects A and B and morphism f in MorphC (A, B) a morphism F( f ) in MorphD (F(A), F(B)) such that (i) F(g f ) = F(g)F( f ) for f ∈ MorphC (A, B) and g ∈ MorphC (B, C), (ii) F(1 A ) = 1 F(A) for A in Obj(C ). EXAMPLES OF COVARIANT FUNCTORS. (1) Inclusion of a subcategory into a category is a covariant functor. (2) Let C be the category of all sets. If F carries each set X to the set 2 X of all subsets of X , then F is a covariant functor as soon as its effect on functions between sets, i.e., its effect on morphisms, is deﬁned in an appropriate way. Namely, if f : X → Y is a function, then F( f ) is to be a function from F(X ) = 2 X to F(Y ) = 2Y . That is, we need a deﬁnition of F( f )(A) as a subset of Y whenever A is a subset of X . A natural way of making such a deﬁnition is to put F( f )(A) = f (A), and then F is indeed a covariant functor. (3) Let C be any of Examples 2 through 6 of categories above, and let D be the category of all sets, as in Example 1 of categories. If F carries an object A in C (i.e., a vector space, group, ring, etc.) into its underlying set and carries each morphism into its underlying function between two sets, then F is a covariant functor and furnishes an example of what is called a forgetful functor. (4) Let C be the category of all vector spaces over a ﬁeld F, let U be a vector space over F, and let F : C → C be deﬁned on a vector space to be the vector space of linear maps F(V ) = HomF (U, V ). The set of morphisms MorphC (V1 , V 2 ) is HomF (V1 , V2 ). If f is in MorphC (V1 , V2 ), then F( f ) is to be in MorphC HomF (U, V1 ), HomF (U, V2 ) , and the deﬁnition is that F( f )(L) = f ◦ L for L ∈ HomF (U, V1 ). Then F is a covariant functor: to check that F(g f ) = F(g)F( f ) when g is in MorphC (V2 , V3 ), we write F(g f )(L) = g f ◦ L = g ◦ f L = g ◦ F( f ) = F(g)F( f ). (5) Let C be the category of all groups, let D be the category of all sets, let G be a group, and let F : C → D be the functor deﬁned as follows. For a group H , F(H ) is the set of all group homomorphisms from G into H . The set of morphisms MorphC (H1 , H2 ) is the set of group homomorphisms from H1 into H2 . If f is in MorphC (H1 , H2 ), then F( f ) is to be a function with domain the set of homomorphisms from G into H1 and with range the set of homomorphisms from G into H2 . Let F( f )(ϕ) = ϕ ◦ f . Then F is a covariant functor. (6) Let C be the category of all sets, and let D be the category of all abelian groups. To a set S, associate the free abelian group F(S) with S as Z basis. If f : S → S is a function, then the universal mapping property of external

192

IV. Groups and Group Actions

direct sums of abelian groups (Proposition 4.17) yields a corresponding group homomorphism from F(S) to F(S ), and we deﬁne this group homomorphism to be F( f ). Then F is a covariant functor. (7) Let C be the category of all ﬁnite sets, ﬁx a commutative ring R with identity, and let D be the category of all commutative rings with identity. To a ﬁnite set S, associate the commutative ring F(S) = R[{X s | s ∈ S}]. If f : S → S is a function, then the properties of substitution homomorphisms give us a corresponding homomorphism of rings with identity carrying F(S) to F(S ), and the result is a covariant functor. There is a second kind of functor of interest to us. If C and D are two categories, a contravariant functor F : C → D associates to each object A in Obj(C ) an object F(A) in Obj(D) and to each pair of objects A and B and morphism f in MorphC (A, B) a morphism F( f ) in MorphD (F(B), F(A)) such that (i) F(g f ) = F( f )F(g) for f ∈ MorphC (A, B) and g ∈ MorphD (B, C), (ii) F(1 A ) = 1 F(A) for A in Obj(C ). EXAMPLES OF CONTRAVARIANT FUNCTORS. (1) Let C be the category of all vector spaces over a ﬁeld F, let W be a vector space over F, and let F : C → C be deﬁned on a vector space to be the vector space of linear maps F(V ) = HomF (V, W ). The set of morphisms MorphC (V 1 , V2 ) is HomF (V1 , V2 ). If f is in MorphC (V1 , V2 ), then F( f ) is to be in MorphC HomF (V2 , W ), HomF (V1 , W ) , and the deﬁnition is that F( f )(L) = L ◦ f for L ∈ HomF (V1 , W ). Then F is a contravariant functor: to check that F(g f ) = F( f )F(g) when g is in MorphC (V2 , V3 ), we write F(g f )(L) = L ◦ g f = Lg ◦ f = F( f )(Lg) = F( f )F(g). (2) Let C be the category of all vector spaces over a ﬁeld F, deﬁne F of a vector space V to be the dual vector space V , and deﬁne F of a linear mapping f between two vector spaces V and W to be the contragredient f t carrying W into V , deﬁned by f t (w )(v) = w ( f (v)). This is the special case of Example 1 of contravariant functors in which W = F. Hence F is a contravariant functor. (3) Let C be the category of all groups, let D be the category of all sets, let G be a group, and let F : C → D be the functor deﬁned as follows. For a group H , F(H ) is the set of all group homomorphisms from H into G. The set of morphisms MorphC (H1 , H2 ) is the set of group homomorphisms from H1 into H2 . If f is in MorphC (H1 , H2 ), then F( f ) is to be a function with domain the set of homomorphisms from H2 into G and with range the set of homomorphisms from H1 into G. The deﬁnition is F( f )(ϕ) = f ◦ ϕ. Then F is a contravariant functor.

11. Categories and Functors

193

It is an important observation about functors that the composition of two functors is a functor. This is immediate from the deﬁnition. If the two functors are both covariant or both contravariant, then the composition is covariant. If one of them is covariant and the other is contravariant, then the composition is contravariant. α

A −−−→ ⏐ ⏐ β

B ⏐ ⏐γ

C −−−→ D δ

FIGURE 4.9. A square diagram. The square commutes if γ α = δβ. In the subject of category theory, a great deal of information is conveyed by “commutative diagrams” of objects and morphisms. By a diagram is meant a directed graph, usually but not necessarily planar, in which the vertices represent some relevant objects in a category and the arrows from one vertex to another represent morphisms of interest between pairs of these objects. Often the vertices and arrows are labeled, but in fact labels on the vertices can be deduced from the labels on the arrows since any morphism determines its “domain” and “range” as a consequence of deﬁning property (i) of categories. A diagram is said to be commutative if for each pair of vertices A and B and each directed path from A to B, the compositions of the morphisms along each path are the same. For example a square as in Figure 4.9 is commutative if γ α = δβ. The triangular diagrams in Figures 4.1 through 4.8 are all commutative. F(α)

F(A) −−−→ F(B) ⏐ ⏐ ⏐ ⏐ F(γ ) F(β) F(C) −−−→ F(D) F(δ)

G(α)

and

G(A) ←−−− G(B) ⏐ ⏐G(γ ) G(β)⏐ ⏐ G(C) ←−−− G(D) G(δ)

FIGURE 4.10. Diagrams obtained by applying a covariant functor F and a contravariant functor G to the diagram in Figure 4.9. Functors can be applied to diagrams, yielding new diagrams. For example, suppose that Figure 4.9 is a diagram in the category C, that F : C → D is a covariant functor, and that G : C → D is a contravariant functor. Then we can apply F and G to the diagram in Figure 4.9, obtaining the two diagrams in the category D that are pictured in Figure 4.10. If the diagram in Figure 4.9 is commutative, then so are the diagrams in Figure 4.10, as a consequence of the effect of functors on compositions of morphisms.

IV. Groups and Group Actions

194

The subject of category theory seeks to analyze functors that make sense for all categories, or at least all categories satisfying some additional properties. The most important investigation of this kind is concerned with homology and cohomology, as well as their ramiﬁcations, for “abelian categories,” which include several important examples affecting algebra, topology, and several complex variables. The topic in question is called “homological algebra” and is discussed further in Advanced Algebra. There are a number of other functors that are investigated in category theory, and we mention four: • • • •

products, including direct products, coproducts, including direct sums, direct limits, also called inductive limits, inverse limits, also called projective limits.

We discuss general products and coproducts in the present section, omitting a general discussion of direct limits and inverse limits. Inverse limits will arise in Advanced Algebra for one category in connection with Galois groups, but we shall handle that one situation on its own without attempting a generalization. An attempt in the 1960s to recast as much mathematics as possible in terms of category theory is now regarded by many mathematicians as having been overdone, and it seems wiser to cast bodies of mathematics in the framework of category theory only when doing so can be justiﬁed by the amount of time saved by eliminating redundant arguments. When a category C and a nonempty set S are given, we can deﬁne a category C S . The objects of C S are functions on S with the property that the value of the function at each s in S is in Obj(C ), two such functions being regarded as the same if they consist of the same ordered pairs.17 Let us refer to such a function as an S-tuple of members of Obj(C ), denoting it by an expression like {X s }s∈S . A morphism in MorphCS {X s }s∈S , {Ys }s∈S is an S-tuple { f s }s∈S of morphisms of C such that f s lies in MorphC (X s , Ys ) for all s, and the law of composition of such morphisms takes place coordinate by coordinate. Let {X s }s∈S be an object in C S . A product of {X s }s∈S is a pair (X, { ps }s∈S ) such that X is in Obj(C ) and each ps is in MorphC (X, X s ) with the following universal mapping property: whenever A in Obj(C ) is given and a morphism ϕs ∈ MorphC (A, X s ) is given for each s, then there exists a unique morphism ϕ ∈ MorphC (A, X ) such that ps ϕ = ϕs for all s. The relevant diagram is pictured in Figure 4.11. 17 In other words, the range of such a function is considered as irrelevant. We might think of the range as Obj(C) except for the fact that a function is supposed to have a set as range and Obj(C) need not be a set.

11. Categories and Functors

195

ϕs

X s ←−−− A ⏐ ϕ ps ⏐ X FIGURE 4.11. Universal mapping property of a product in a category. EXAMPLES OF PRODUCTS. (1) Products exist in the category of vector spaces over a ﬁeld F. If vector spaces Vs indexed by a nonempty set S are given, then their product exists in the category, and an example is their external direct product s∈S Vs , according to Figure 2.4 and the discussion around it. (2) Products exist in the category of all groups. If groups G s indexed by a nonempty set S are given, then their product exists in the category, and an example to Figure 4.2 and Proposition is their external direct product s∈S G s , according 4.15. If the groups G s are abelian, then s∈S G s is abelian, and it follows that products exist in the category of all abelian groups. (3) Products exist in the category of all sets. If sets X s indexed by a nonempty set S are given, then their product exists in the category, and an example is their Cartesian product ×s∈S X s , as one easily checks. (4) Products exist in the category of all rings and in the category of all rings with identity. If objects Rs in the category indexed by a nonempty set S are given, then their product may be taken as an abelian group to be the external direct product s∈S Rs , with multiplication deﬁned coordinate by coordinate, and the group homomorphisms ps are easily checked to be morphisms in the category. A product of objects in a category need not exist in the category. An artiﬁcial example may be formed as follows: Let C be a category with one object G, namely a group of order 2, and let Morph(G, G) = {0, 1G }, the law of composition being the usual composition. Let S be a 2-element set, and let the corresponding objects be X 1 = G and X 2 = G. The claim is that the product X 1 × X 2 does not exist in C. In fact, take A = G. There are four S-tuples of morphisms (ϕ1 , ϕ2 ) meeting the conditions of the deﬁnition. Yet the only possibility for the product is X = G, and then there are only two possible ϕ’s in Morph(A, X ). Hence we cannot account for all possible S-tuples of morphisms, and the product cannot exist. The thing that category theory addresses is the uniqueness. A product is always unique up to canonical isomorphism, according to Proposition 4.63. We proved uniqueness for products in the special cases of Examples 1 and 2 above in Propositions 2.32 and 4.16.

196

IV. Groups and Group Actions

Proposition 4.63. Let C be a category, and let S be a nonempty set. If {X s }s∈S is an object in C S and if (X, { ps }) and (X , { ps }) are two products, then there exists a unique morphism : X → X such that ps = ps ◦ for all s ∈ S, and is an isomorphism. REMARK. There is no assertion that ps is onto X s . In fact, “onto” has no meaning for a general category. PROOF. In Figure 4.11 let A = X and ϕs = ps . If ∈ Morph(X , X ) is the morphism produced by the fact that X is a direct product, then we have ps = ps for all s. Reversing the roles of X and X , we obtain a morphism ∈ Morph(X, X ) with ps = ps for all s. Therefore ps ( ) = ( ps ) = ps = ps . In Figure 4.11 we next let A = X and ϕs = ps for all s. Then the identity 1 X in Morph(X, X ) has the same property ps 1 X = ps relative to all ps that has, and the uniqueness in the statement of the universal mapping property implies that = 1 X . Reversing the roles of X and X , we obtain = 1 X . Therefore is an isomorphism. For uniqueness suppose that ∈ Morph(X , X ) is another morphism with ps = ps for all s ∈ S. Then the argument of the previous paragraph shows that = 1 X . Consequently = 1 X = ( ) = ( ) = 1 X = , and = . If products always exist in a particular category, they are not unique, only unique up to canonical isomorphism. Such a product is commonly denoted by s∈S X s , even though it is not uniquely deﬁned. It is customary to treat the S product over S as a covariant functor F : C → C, the effect of the functor on objects being given by F({X s }s∈S ) = s∈S X s . For a well-deﬁned functor we have to ﬁx a choice of product for each object under consideration18 in Obj(C S ). For the effect of F on morphisms, we argue with the universal mapping property. S Thus let {X s }s∈S and {Ys }s∈S be objects in C , let f s be in Morph C (X s , Ys ) for all X , { p } and s, and let the products in question be s s s∈S s∈S s∈S Ys , {qs }s∈S . Then f s0 ps0 is in MorphC X s , Ys0 for each s0 , and the universal mapping s∈S property gives us f in MorphC s∈S X s , s∈S Ys such that qs f = f s ps for all s. We deﬁne this f to be F({ f s }s∈S ), and we readily check that F is a functor. We turn to coproducts, which include direct sums. Let {X s }s∈S be an object in C S . A coproduct of {X s }s∈S is a pair (X, {i s }s∈S ) such that X is in Obj(C ) and each i s is in MorphC (X s , X ) with the following universal mapping property: whenever A in Obj(C ) is given and a morphism ϕs ∈ MorphC (X s , A) is given 18 Since Obj(C S ) need not be a set, it is best to be wary of applying the Axiom of Choice when the indexing of sets is given by Obj(C S ). Instead, one makes the choice only for all objects in some set of objects large enough for a particular application.

11. Categories and Functors

197

for each s, then there exists a unique morphism ϕ ∈ MorphC (X, A) such that ϕi s = ϕs for all s. The relevant diagram is pictured in Figure 4.12. ϕs

X s −−−→ A ⏐ ⏐ ϕ is X FIGURE 4.12. Universal mapping property of a coproduct in a category. EXAMPLES OF COPRODUCTS. (1) Coproducts exist in the category of vector spaces over a ﬁeld F. If vector spaces Vs indexed by a nonempty set S are given, then their coproduct exists in the category, and an example is their external direct sum s∈S Vs , according to Figure 2.5 and the discussion around it. (2) Coproducts exist in the category of all abelian groups. If abelian groups G s indexed by a nonempty set S are given, then their coproduct exists in the category, and an example is their external direct sum s∈S G s , according to Figure 4.4 and Proposition 4.17. (3) Coproducts exist in the category of all sets. If sets X s indexed by a nonempty set S are given, then their coproduct exists in the category, and an example is their disjoint union s∈S {(xs , s) | xs ∈ X s }. The veriﬁcation appears as Problem 74 at the end of the chapter. (4) Coproducts exist in the category of all groups. Suppose that groups G s indexed by a nonempty set S are given. It will be shown in Chapter VII that the coproduct is the “free product” s∈S G s that is deﬁned in that chapter. In the special case that each G s is the group Z of integers, the free product coincides with the free group on S. Therefore, even if all the groups G s are abelian, their coproduct need not be a subgroup of the direct product and need not even be abelian. In particular it need not coincide with the direct sum.

*

A coproduct of objects in a category need not exist in the category. Problem 76 at the end of the chapter offers an example that the reader is invited to check. Proposition 4.64. Let C be a category, and let S be a nonempty set. If {X s }s∈S is an object in C S and if (X, {i s }) and (X , {i s }) are two coproducts, then there exists a unique morphism : X → X such that i s = ◦ i s for all s ∈ S, and is an isomorphism. REMARKS. There is no assertion that i s is one-one. In fact, “one-one” has no meaning for a general category. This proposition may be derived quickly from Proposition 4.63 by a certain duality argument that is discussed in Problems

198

IV. Groups and Group Actions

78–80 at the end of the chapter. Here we give a direct argument without taking advantage of duality. PROOF. In Figure 4.12 let A = X and ϕs = i s . If ∈ Morph(X, X ) is the morphism produced by the fact that X is a coproduct, then we have i s = i s for all s. Reversing the roles of X and X , we obtain a morphism ∈ Morph(X , X ) with i s = i s for all s. Therefore ( )i s = i s = i s . In Figure 4.12 we next let A = X and ϕs = i s for all s. Then the identity 1 X in Morph(X, X ) has the same property 1 X i s = i s relative to all i s that has, and the uniqueness says that = 1 X . Reversing the roles of X and X , we obtain = 1 X . Therefore is an isomorphism. For uniqueness suppose that ∈ Morph(X, X ) is another morphism with i s = i s for all s ∈ S. Then the argument of the previous paragraph shows that = 1 X . Consequently = 1 X = ( ) = ( ) = 1 X = , and = . If coproducts always exist in a particular category, they are not unique, only unique up to canonical isomorphism. Such a coproduct is commonly denoted by ) X s∈S s , even though it is not uniquely deﬁned. As with product, it is customary to treat the coproduct over S as a covariant functor) F : C S → C, the effect of the functor on objects being given by F({X s }s∈S ) = s∈S X s . For a well-deﬁned functor we have to ﬁx a choice of coproduct for each object under consideration in Obj(C S ). For the effect of F on morphisms, we argue with the universal mapping property. Thus let {X s }s∈S and {Ys }s∈S be objects in)C S , let f s be in be s∈S X s , {i s }s∈S Morph )C (X s , Ys ) for alls, and let the coproducts in question ) and Y , { j } f is in Morph , Y for each s0 , and . Then j X s s s∈S s s s C 0 0 0 s∈S )s∈S s ) the universal mapping property gives us f in MorphC s∈S X s , s∈S Ys such that f i s = js f s for all s. We deﬁne this f to be F({ f s }s∈S ), and we readily check that F is a functor. Universal mapping properties occur in other contexts than for products and coproducts. We have already seen them in connection with homomorphisms on free abelian groups and with substitution homomorphisms on polynomial rings, and more such properties will occur in the development of tensor products in Chapter VI. A general framework for discussing universal mapping properties appears in the problems at the end of Chapter VI. 12. Problems 1.

Let G be a group in which all elements other than the identity have order 2. Prove that G is abelian.

2.

The dihedral group D4 of order 8 can be viewed as a subgroup of the symmetric group S4 of order 8. Find 8 explicit permutations in S4 forming a subgroup isomorphic to D4 .

12. Problems

199

3.

Suppose G is a ﬁnite group, H is a subgroup, and a ∈ G is an element with a l in H for some integer l with GCD(l, |G|) = 1. Prove that a is in H .

4.

Let G be a group, and deﬁne a new group G to have the same underlying set as G but to have multiplication given by a ◦ b = ba. Prove that G is a group and that it is isomorphic to G.

5.

Prove that if G is an abelian group and n is an integer, then a → a n is a homomorphism of G. Give an example of a nonabelian group for which a → a 2 is not a homomorphism.

6.

Suppose that G is a group and that H and K are normal subgroups of G with H ∩ K = {1}. Verify that the set H K of products is a subgroup and that this subgroup is isomorphic as a group to the external direct product H × K .

7.

Take as known that 8191 is prime, so that F8191 is a ﬁeld. Without carrying through the computations and without advocating trial and error, describe what steps you would carry out to solve for x mod 8191 such that 1234x ≡ 1 mod 8191.

8.

(Wilson’s Theorem) Let p be an odd prime. Starting from the fact that 1, . . . , p − 1 are roots of the polynomial X p−1 − 1 ≡ 0 mod p in F p , prove that ( p − 1)! ≡ −1 mod p.

9.

Classify, up to isomorphism, all groups of order p 2 if p is a prime.

10. This problem concerns conjugacy classes in a group G. (a) Prove that all elements of a conjugacy class have the same order. (b) Prove that if ab is in a conjugacy class, so is ba. 11. (a) Find explicitly all the conjugacy classes in the alternating group A4 . (b) For each conjugacy class in A4 , ﬁnd the centralizer of one element in the class. (c) Prove that A4 has no subgroup isomorphic to C6 or S3 . 12. Prove that the alternating group A5 has no subgroup of order 30. 13. Let G be a nonabelian group of order p n , where p is prime. Prove that any subgroup of order p n−1 is normal. 14. Let G be a ﬁnite group, and let H be a normal subgroup. If |H | = p and p is the smallest prime dividing |G|, prove that H is contained in the center of G. 15. Let G be a group. An automorphism of G of the form x → gxg −1 is called an inner automorphism. Prove that the set of inner automorphisms is a normal subgroup of the group Aut G of all automorphisms and is isomorphic to G/Z G . 16. (a) Prove that Aut Cm is isomorphic to (Z/mZ)× . (b) Find a value of m for which Aut Cm is not cyclic.

IV. Groups and Group Actions

200

17. Fix n ≥ 2. In the symmetric group Sn , for each integer k with 1 ≤ k ≤ n/2, let Ck be the set of elements in Sn that are products of k disjoint transpositions. (a) Prove that if τ is an of Sn , then τ (C1 ) = Ck for some k.

automorphism n (2k)! (b) Prove that |Ck | = . 2k 2k k! (c) Prove that |Ck | = |C1 | unless k = 1 or n = 6. (Educational note: From this, it follows that τ (C1 ) = C1 except possibly when n = 6. One can deduce as a consequence that every automorphism of Sn is inner except possibly when n = 6.) 18. Give an example: G is a group with a normal subgroup N , N has a subgroup M that is normal in M, yet M is not normal in G. 19. Show that the cyclic group Cr s is isomorphic to Cr ×Cs if and only if GCD(r, s)=1. 20. How many abelian groups, up to isomorphism, are there of order 27? 21. Let G be the free abelian group with Z basis {x1 , x2 , x3 }. Let H be the subgroup of G generated by {u 1 , u 2 , u 3 }, where u 1 = 3x1 + 2x2 + 5x3 , u 2 = x2 + 3x3 , u 3 = x2 + 5x3 . Express G/H as a direct sum of cyclic groups. 22. Let {e1 , e2 , e3 , e4 } be the standard basis of R4 . Let G be the additive subgroup of R4 generated by the four elements e1 ,

e1 + e 2 ,

1 2 (e1

+ e2 + e3 + e4 ),

1 2 (e1

+ e2 + e3 − e4 ),

and let H be the subgroup of G generated by the four elements e1 − e2 ,

e2 − e 3 ,

e 3 − e4 ,

e 3 + e4 .

Identify the abelian group G/H as a direct sum of cyclic groups. 23. Let G be the free abelian group with Z basis {x1 , . . . , xn }, and let H be the x1 u1 .. . = C .. for an m-by-n subgroup generated by {u 1 , . . . , u m }, where . um

xn

matrix C of integers. Prove that the number of summands Z in the decomposition of G/H into cyclic groups is equal to the rank of the matrix C when C is considered as in Mmn (Q). 24. Prove that every abelian group is the homomorphic image of a free abelian group. 25. Let G be a group, and let H and K be subgroups. (a) For x and y in G, prove that x H ∩ y K is empty or is a coset of H ∩ K . (b) Deduce from (a) that if H and K have ﬁnite index in G, then so does H ∩ K .

12. Problems

201

26. Let G be a free abelian group of ﬁnite rank n, and let H be a free abelian subgroup of rank n. Prove that H has ﬁnite index in G. 27. Let G = S4 be the symmetric group on four letters. (a) Find a Sylow 2-subgroup of G. How many Sylow 2-subgroups are there, and why? (b) Find a Sylow 3-subgroup of G. How many Sylow 3-subgroups are there, and why? 28. Let H be a subgroup of a group G. Prove or disprove that the normalizer N (H ) of H in G is a normal subgroup of G. 29. How many elements of order 7 are there in a simple group of order 168? 30. Let G be a group of order pq 2 , where p and q are primes with p < q. Let Sp and Sq be Sylow subgroups for the primes p and q. Prove that G is a semidirect product of Sp and Sq with Sq normal. 31. Suppose that G is a ﬁnite group and that H is a subgroup whose index in G is a prime p. By considering the action of G on the set of subgroups conjugate to H and considering the possibilities for the normalizer N (H ), determine the possibilities for the number of subgroups conjugate to H . 32. Let G be a group of order 24, let H be a subgroup of order 8, and assume that H is not normal. (a) Using the Sylow Theorems, explain why H has exactly 3 conjugates in G, counting H itself as one. (b) Show how to use the conjugates in (a) to deﬁne a homomorphism of G into the symmetric group S3 on three letters. (c) Use the homomorphism of (b) to conclude that G is not simple. 33. Let G be a group of order 36. Arguing in the style of the previous problem, show that there is a nontrivial homomorphism of G into the symmetric group S4 . 34. Let G be a group of order 2 pq, where p and q are primes with 2 < p < q. (a) Prove that if q + 1 = 2 p, then a Sylow q-subgroup is normal. (b) Suppose that q + 1 = 2 p, let H be a Sylow p-subgroup, and let K be a Sylow q-subgroup. Prove that at least one of H and K is normal, that the set H K of products is a subgroup, and that the subgroup H K is cyclic of index 2 in G. Problems 35–38 concern the detection of isomorphisms among semidirect products. For the ﬁrst two of the problems, let H and K be groups, and let ϕ1 : H → Aut K and ϕ2 : H → Aut K be homomorphisms. 35. Suppose that ϕ2 = ϕ1 ◦ϕ for some automorphism ϕ of H . Deﬁne ψ : H ×ϕ2 K → H ×ϕ1 K by ψ(h, k) = (ϕ(h), k). Prove that ψ is an isomorphism.

202

IV. Groups and Group Actions

36. Suppose that ϕ2 = ϕ ◦ ϕ1 for some inner automorphism ϕ of Aut K in the sense of Problem 15, i.e., ϕ : Aut K → Aut K is to be given by ϕ(x) = axa −1 with a in Aut K . Deﬁne ψ : H ×ϕ1 K → H ×ϕ2 K by ψ(h, k) = (h, a(k)). Prove that ψ is an isomorphism. 37. Suppose that p and q are primes and that the cyclic group C p acts on Cq by automorphisms with a nontrivial action. Prove that p divides q − 1. 38. Suppose that p and q are primes such that p divides q − 1. Let τ1 and τ2 be nontrivial homomorphisms from C p to Aut Cq . Prove that C p ×τ1 Cq ∼ = C p ×τ2 Cq , and conclude that there is only one nonabelian semidirect product C p ×τ Cq up to isomorphism. Problems 39–44 discuss properties of groups of order 8, obtaining a classiﬁcation of these groups up to isomorphism. 39. Prove that the ﬁve groups C8 , C4 × C2 , C2 × C2 × C2 , D4 , and H8 are mutually nonisomorphic and that the ﬁrst three exhaust the abelian groups of order 8, apart from isomorphisms. 40. (a) Find a composition series for the 8-element dihedral group D4 . (b) Find a composition series for the 8-element quaternion group H8 . 41. (a) Prove that every subgroup of the quaternion group H8 is normal. (b) Identify the conjugacy classes in H8 . (c) Compute the order of Aut H8 . 42. Suppose that G is a nonabelian group of order 8. Prove that G has an element of order 4 but no element of order 8. 43. Let G be a nonabelian group of order 8, and let K be the copy of C4 generated by some element of order 4. If G has some element of order 2 that is not in K , prove that G ∼ = D4 . 44. Let G be a nonabelian group of order 8, and let K be the copy of C4 generated by some element of order 4. If G has no element of order 2 that is not in K , prove that G ∼ = H8 . Problems 45–48 classify groups of order 12, making use of Proposition 4.61, Problem 15, and Problems 35–38. Let G be a group of order 12, let H be a Sylow 3-subgroup, and let K be a Sylow 2-subgroup. Proposition 4.61 says that at least one of H and K is normal. Consequently there are three cases, and these are addressed by the ﬁrst three of the problems. 45. Verify that there are only two possibilities for G up to isomorphism if G is abelian. 46. Suppose that K is normal, so that G ∼ = H ×τ K . Prove that either

(i) τ is trivial or (ii) τ is nontrivial and K ∼ = C2 × C2 , and deduce that G is abelian if (i) holds and that G ∼ = A4 if (ii) holds.

12. Problems

203

47. Suppose that H is normal, so that G = K ×τ H . Prove that one of the conditions

(i) τ is trivial, (ii) K ∼ = C2 × C2 and τ is nontrivial, (iii) K ∼ = C4 and τ is nontrivial holds, and deduce that G is abelian if (i) holds, that G ∼ = D6 if (ii) holds, and that G is nonabelian and is not isomorphic to A4 or D6 if (iii) holds. 48. In the setting of the previous problem, prove that there is one and only one group, up to isomorphism, satisfying condition (iii), and ﬁnd the order of each of its elements. Problems 49–52 assume that p and q are primes with p < q. The problems go in the direction of classifying ﬁnite groups of order p 2 q. 49. If G is a group of order p 2 q, prove that either p 2 q = 12 or a Sylow q-subgroup is normal. 50. If p 2 divides q −1, exhibit three nonabelian groups of order p 2 q that are mutually nonisomorphic. 51. If p divides q − 1 but p 2 does not divide q − 1, exhibit two nonabelian groups of order p 2 q that are not isomorphic. 52. If p does not divide q − 1, prove that any group of order p 2 q is abelian. Problems 53–54 concern nonabelian groups of order 27. 53. (a) Show that multiplication by the elements 1, 4, 7 mod 9 deﬁnes a nontrivial action of Z/3Z on Z/9Z by automorphisms. (b) Show from (a) that there exists a nonabelian group of order 27. (c) Show that the group in (b) is generated by elements a and b that satisfy a 9 = b3 = b−1 aba −4 = 1. 54. Show that any nonabelian group of order 27 having a subgroup H isomorphic to C9 and an element of order 3 not lying in H is isomorphic to the group constructed in the previous problem. Problems 55–62 give a construction of inﬁnitely many simple groups, some of them ﬁnite and some inﬁnite. Let F be a ﬁeld. For n ≥ 2, let SL(n, F) be the special linear group for the space Fn of n-dimensional column vectors. The center Z of SL(n, F) consists of the scalar multiples of the identity, the scalar being an n th root of 1. Let PSL(n, F) = SL(n, F)/Z . It is known that PSL(n, F) is simple except for PSL(2, F2 ) and PSL(2, F3 ). These problems will show that PSL(2, F) is simple if |F| > 5 and F is not of characteristic 2. Most of the argument will consider SL(2, F), and the passage to PSL will occur only at the very end. In Problems 56–61, G denotes a normal subgroup of SL(2, F) that is not contained in the center Z , and it is to be proved that G = SL(2, F).

IV. Groups and Group Actions

204

55. Suppose that F is a ﬁnite ﬁeld with q elements. (a) By considering the possibilities for the ﬁrst column of a matrix and then considering the possibilities for the second column when the ﬁrst column is ﬁxed, compute |GL(2, F)| as a function of q. (b) By using the determinant homomorphism, compute |SL(2, F)| in terms of |GL(2, F)|. (c) Taking into account that F does not have characteristic 2, prove that |PSL(2, F)| = 12 |SL(2, F)|. (d) Show for a suitable ﬁnite ﬁeld F with more than 5 elements that PSL(2, F) has order 168. 56. Let M be a member of G that is not in Z . Since M is not scalar, there exists a column vector u with Mu not a multiple of u. Deﬁne v = Mu, so that (u, v) is an ordered basis of F2 . By rewriting all matrices with the ordered basis (u, v), showthat there is no loss in generality in assuming that G contains a matrix 0 −1 A = 1 c if it is ultimately shown that G = SL(2, F). 57. Let a be a member of the multiplicative group F× to be chosen shortly, and let B be the member −1

−1

ca a −1 −a 0

of SL(2, F). Prove that

(a) B A B A is upper triangular and is in G, (b) B −1 A−1 B A has unequal diagonal entries if a 4 = 1, (c) the condition in (b) can be satisﬁed for a suitable choice of a under the assumption that |F| > 5. x y 58. Suppose that C = 0 x −1 is a member of G for some x = ±1 and some y. Taking D = 10 11 and forming C DC −1 D −1 , show that G contains a matrix E = 10 λ1 with λ = 0. 59. By conjugating E by α0 α0−1 , show that the set of λ in F such that 10 λ1 is in G is closed under multiplication by squares and under addition and subtraction. 60. Using the identity x = 14 (x + 1)2 − 14 (x − 1)2 , deduce from Problems 56–59 that G contains all matrices 10 λ1 with λ ∈ F. 1 0 61. Show that 10 λ1 is conjugate to −λ 1 , and show that the set of all matrices 1λ 1 0 and generates SL(2, F). Conclude that G = SL(2, F). 01 λ 1 62. Using the First Isomorphism Theorem, conclude that the only normal subgroup of PSL(2, F) other than {1} is PSL(2, F) itself. Problems 63–73 brieﬂy introduce the theory of error-correcting codes. Let F be the ﬁnite ﬁeld Z/2Z. The vector space Fn over F will be called Hamming space, and its members are regarded as “words” (potential messages consisting of 0’s and 1’s). The weight wt(c) of a word c is the number of nonzero entries in c. The Hamming

12. Problems

205

distance d(a, b) between words a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) is the weight of a − b, i.e., the number of indices i with 1 ≤ i ≤ n and ai = bi . A code is a nonempty subset C of Fn , and the minimal distance δ(C) of a code is the smallest value of d(a, b) for a and b in C with a = b. By convention if |C| = 1, take δ(C) = n + 1. One imagines that members of C, which are called code words, are allowable messages, i.e., words that can be stored and retrieved, or transmitted and received. A code with minimal distance δ can then detect up to δ − 1 errors in a word ostensibly from C that has been retrieved from storage or has been received in a transmission. The code can correct up to (δ − 1)/2 errors because no word of Fn can be at distance ≤ (δ − 1)/2 from more than one word in C, by Problem 63 below. The interest is in linear codes, those for which C is a vector subspace. It is desirable that each message have a high percentage of content and a relatively low percentage of further information used for error correction; thus a fundamental theoretical problem for linear codes is to ﬁnd the maximum dimension of a linear code if n and a lower bound on the minimal distance for the code are given. As a practical matter, information is likely to be processed in packets of a standard length, such as some power of 2. In many situations packets can be reprocessed if they have been found to have errors. The initial interest is therefore in codes that can recognize and possibly correct a small number of errors. The problems in this set are continued at the ends of Chapters VII and IX. 63. Prove that the Hamming distance satisﬁes d(a, b) ≤ d(a, c) + d(c, b), and conclude that if a word w in Fn is at distance ≤ (D − 1)/2 from two distinct members of the linear code C, then δ(C) < D. 64. Explain why the minimal distance δ(C) of a linear code C = {0} is given by the minimal weight of the nonzero words in C. 65. Fix n ≥ 2. List δ(C) and dim C for the following elementary linear codes: (a) C = 0. (b) C = Fn . (c) (Repetition code) C = {0, (1, 1, . . . , 1)}. (d) (Parity-check code) C = {c ∈ Fn | wt(c) is even}. (Educational note: To use this code, one sends the message in the ﬁrst n − 1 bits and adjusts the last bit so that the word is in C. If there is at most one error in the word, this parity bit will tell when there is an error, but it will not tell where the error occurs.) 66. One way to get a sense of what members of a linear code C in Fn have small weight starts by making a basis for the code into the row vectors of a matrix and row reducing the matrix. (a) Taking into account the distinction between corner variables and independent variables in the process of row reduction, show that every basis vector of C has weight at most the sum of 1 and the number of independent variables. Conclude that dim C + δ(C) ≤ n + 1. (b) Give an example of a linear code with δ(C) = 2 for which equality holds.

IV. Groups and Group Actions

206

(c) Examining the argument for (a) more closely, show that 2 ≤ dim C ≤ n − 2 implies dim C + δ(C) ≤ n.

67. Let C be a linear code with a basis consisting of the rows of

100110 010101 001011

. Show

that δ(C) = 3. Educational note: Thus for n = 6 and δ(C) = 3, we always have dim C ≤ 3, and equality is possible. 68. (Hamming codes) The Hamming code C7 of order 7 is a certain linear code having dim C7 = 4 that will be seen to have δ(C7 ) = 3. The code words of a basis, with their commas removed, may be taken as 1110000, 1001100, 0101010, 1101001. The basis may be described as follows. Bits 1, 2, 4 are used as checks. The remaining bits are used to form the standard basis of F4 . What is put in bits 1, 2, 4 is the binary representation of the position of the nonzero entry in positions 3, 5, 6, 7. When all 16 members of C7 are listed in the order dictated by the bits in positions 3, 5, 6, 7, the resulting list is Decimal value in 3, 5, 6, 7

Code word

Decimal value in 3, 5, 6, 7

Code word

0 1 2 3 4 5 6 7

0000000 1101001 0101010 1000011 1001100 0100101 1100110 0001111

8 9 10 11 12 13 14 15

1110000 0011001 1011010 0110011 0111100 1010101 0010110 1111111

For the general members of C7 , not just the basis vectors, the check bits in positions 1, 2, 4 may be described as follows: the bit in position 1 is a parity bit for the positions among 3, 5, 6, 7 having a 1 in their binary expansions, the bit in position 2 is a parity bit for the positions among 3, 5, 6, 7 having a 2 in their binary expansions, and the bit in position 4 is a parity bit for the positions among 3, 5, 6, 7 having a 4 in their binary expansions. The Hamming code C8 of order 8 is obtained from C7 by adjoining a parity bit in position 8. (a) Prove that δ(C7 ) = 3. (Educational note: Thus for n = 7 and δ(C) = 3, we always have dim C ≤ 4, and equality is possible.) (b) Prove that δ(C8 ) = 4. (c) Describe how to form a generalization that replaces n = 8 by n = 2r with r ≥ 3. The Hamming codes that are obtained will be called C2r −1 and C2r . (d) Prove that dim C2r −1 = dim C2r = 2r −r −1, δ(C2r −1 ) = 3, and δ(C2r ) = 4.

12. Problems

69. The matrix H =

207

1 0 1 0 1 0 1 0110011 0001111

, when multiplied by any column vector c in

the Hamming code C7 , performs the three parity checks done by bits 1, 2, 4 and described in the previous problem. Therefore such a c must have H c = 0. (a) Prove that the condition works in the reverse direction as well—that H c = 0 only if c is in C7 . (b) Deduce that if a received word r is not in C7 and if r is assumed to match some word of C7 except in the i th position, then Hr matches the i th column of H and this fact determines the integer i. (Educational note: Thus there is a simple procedure for testing whether a received word is a code word and for deciding, in the case that it is not a code word, what unique bit to change to convert it into a code word.) 70. Let r ≥ 4. Prove for 2r −1 ≤ n ≤ 2r − 1 that any linear code C in Fn with δ(C) ≥ 3 has dim C ≤ n − r . Observe that equality holds for C = C2r −1 . 71. The weight enumerator polynomial of a linear code C is the polynomial WC (X, Y ) in Z[X, Y ] given by WC (X, Y ) = nk=0 Nk (C)X n−k Y k , where Nk (C) is the number of words of weight k in C. (a) Compute WC (X, Y ) for the following linear codes C: the 0 code, the code Fn , the repetition code, the parity code, the code in Problem 67, the Hamming code C7 , and the Hamming code C8 . (b) Why is the coefﬁcient of X n in WC (X, Y ) necessarily equal to 1? (c) Show that WC (X, Y ) = c∈C X n−wt(c) Y wt(c) . 72. (Cyclic redundancy codes) Cyclic redundancy codes treat blocks of data as coefﬁcients of polynomials in F[X ]. With the size n of data blocks ﬁxed, one ﬁxes a monic generating polynomial G(X ) = 1 + a1 X + · · · + ag−1 X g−1 + X g with a nonzero constant term and with degree g suitably less than n. Data to be transmitted are provided as members (b0 , b1 , . . . , bn−g−1 ) of Fn−g and are converted into polynomials B(X ) = b0 + b1 X + · · · + bn−g−1 X n−g−1 . Then the n-tuple of coefﬁcients of G(X )B(X ) is transmitted. To decode a polynomial P(X ) that is received, one writes P(X ) = G(X )Q(X ) + R(X ) via the division algorithm. If R(X ) = 0, it is assumed that P(X ) is a code word. Otherwise R(X ) is deﬁnitely not a code word. Thus the code C amounts to the system of coefﬁcients of all polynomials G(X )B(X ) with B(X ) = 0 or deg B(X ) ≤ n − g − 1. A basis of C is obtained by letting B(X ) run through the monomials 1, X, . . . , X n−g−1 , and therefore dim C = n − g. Take G(X ) = 1+ X + X 2 + X 4 and n ≥ 8. Prove that δ(C) = 2. 73. (CRC-8) The cyclic redundancy code C bearing the name CRC-8 has G(X ) = 1 + X + X 2 + X 8 . Prove that if 8 ≤ n ≤ 19, then δ(C) = 4. (Educational note: It will follow from the theory of ﬁnite ﬁelds in Chapter IX, together with the problems on coding theory at the end of that chapter, that n = 255 plays a special role for this code, and δ(C) = 4 in that case.)

208

IV. Groups and Group Actions

Problems 74–77 concern categories and functors. Problem 75 assumes knowledge of point-set topology. 74. Let C be the category of all sets, the morphisms being the functions between sets. Verify that the disjoint union of sets is a coproduct. 75. Let C be the category of all topological spaces, the morphisms being the continuous functions. Let S be a nonempty set, and let X s be a topological space for each s in S. (a) Show that the Cartesian product of the spaces X s , with the product topology, is a product of the X s ’s. (b) Show that the disjoint union of the spaces X s , topologized so that a set E is open if and only if its intersection with each X s is open, is a coproduct of the X s ’s. 76. Taking a cue from the example of a category in which products need not exist, exhibit a category in which coproducts need not exist. 77. Let C be a category having just one object, say X , and suppose that every member of Morph(X, X ) is an isomorphism. Prove that Morph(X, X ) is a group under the law of composition for the category. Can every group be realized in this way, up to isomorphism? Problems 78–80 introduce a notion of duality in category theory and use it to derive Proposition 4.64 from Proposition 4.63. If C is a category, then the opposite category C opp is deﬁned to have Obj(C opp ) = Obj(C) and MorphC opp (A, B) = MorphC (B, A). If ◦ denotes the law of composition in C, then the law of composition ◦opp in C opp is deﬁned by g ◦opp f = f ◦ g for f ∈ MorphC opp (A, B) and g ∈ MorphC opp (B, C). 78. Verify that C opp is indeed a category, that (C opp )opp = C, and that to pass from a diagram involving objects and morphisms in C to a corresponding diagram involving the same objects and morphisms considered as in C opp , one leaves all the vertices and labels alone and reverses the directions of all the arrows. Verify also that the diagram of C commutes if and only if the diagram in C opp commutes. 79. Let C be the category of all sets, the morphisms in MorphC (A, B) being all functions from A to B. Show that the morphisms in MorphC opp (A, B) cannot necessarily all be regarded as functions from A to B. 80. Suppose that S is a nonempty set and that {X s }s∈S is an object in C. (a) Prove that if (X, { ps }s∈S ) is a product of {X s }s∈S in C, then (X, { ps }s∈S ) is a coproduct of {X s }s∈S in C opp , and that if (X, { ps }s∈S ) is a coproduct of {X s }s∈S in C, then (X, { ps }s∈S ) is a product of {X s }s∈S in C opp . (b) Show that Proposition 4.64 for C follows from the validity of Proposition 4.63 for C opp .

CHAPTER V Theory of a Single Linear Transformation

Abstract. This goal of this chapter is to ﬁnd ﬁnitely many canonical representatives of each similarity class of square matrices with entries in a ﬁeld and correspondingly of each isomorphism class of linear maps from a ﬁnite-dimensional vector space to itself. Section 1 frames the problem in more detail. Section 2 develops the theory of determinants over a commutative ring with identity in order to be able to work easily with characteristic polynomials det(λI − A). The discussion is built around the principle of “permanence of identities,” which allows for passage from certain identities with integer coefﬁcients to identities with coefﬁcients in the ring in question. Section 3 introduces the minimal polynomial of a square matrix or linear map. The Cayley– Hamilton Theorem establishes that such a matrix satisﬁes its characteristic equation, and it follows that the minimal polynomial divides the characteristic polynomial. It is proved that a matrix is similar to a diagonal matrix if and only if its minimal polynomial is the product of distinct factors of degree 1. In combination with the fact that two diagonal matrices are similar if and only if their diagonal entries are permutations of one another, this result solves the canonical-form problem for matrices whose minimal polynomial is the product of distinct factors of degree 1. Section 4 introduces general projection operators from a vector space to itself and relates them to vector-space direct-sum decompositions with ﬁnitely many summands. The summands of a directsum decomposition are invariant under a linear map if and only if the linear map commutes with each of the projections associated to the direct-sum decomposition. Section 5 concerns the Primary Decomposition Theorem, whose subject is the operation of a linear map L : V → V with V ﬁnite-dimensional. The statement is that if L has minimal polynomial P1 (λ)l1 · · · Pk (λ)lk with the Pj (λ) distinct monic prime, then V has a unique direct-sum decomposition in which the respective summands are the kernels of the linear maps Pj (L)l j , and moreover the minimal polynomial of the restriction of L to the j th summand is Pj (λ)l j . Sections 6–7 concern Jordan canonical form. For the case that the prime factors of the minimal polynomial of a square matrix all have degree 1, the main theorem gives a canonical form under similarity, saying that a given matrix is similar to one in “Jordan form” and that the Jordan form is completely determined up to permutation of the constituent blocks. The theorem applies to all square matrices if the ﬁeld is algebraically closed, as is the case for C. The theorem is stated and proved in Section 6, and Section 7 shows how to make computations in two different ways.

1. Introduction This chapter will work with vector spaces over a common ﬁeld of “scalars,” which will be called K. As was observed near the end of Section IV.5, all the results 209

210

V. Theory of a Single Linear Transformation

concerning vector spaces in Chapter II remain valid when the scalars are taken from K rather than just Q or R or C. The ring of polynomials in one indeterminate X over K will be denoted by K[X ]. For the ﬁeld C of complex numbers, every nonconstant polynomial in C[X ] has a root, according to the Fundamental Theorem of Algebra (Theorem 1.18). Because of this fact some results in this chapter will take an especially simple form when K = C, and this simple form will persist for any ﬁeld with this same property. Accordingly, we make a deﬁnition. Let us say that a ﬁeld K is algebraically closed if every nonconstant polynomial in K[X ] has a root. We shall work hard in Chapter IX to obtain examples of algebraically closed ﬁelds beyond K = C, but let us mention now what a few of them are. EXAMPLES. (1) The subset of C of all roots of polynomials with rational coefﬁcients is an algebraically closed ﬁeld. (2) For each prime p, we have seen that any ﬁnite ﬁeld of characteristic p has p n elements for some n. It turns out that there is one and only one ﬁeld of p n elements, up to isomorphism, for each n. If we align them suitably for ﬁxed p and take their union on n, then the result is an algebraically closed ﬁeld. (3) If K is any ﬁeld, then there exists an algebraically closed ﬁeld having K as a subﬁeld. We shall prove this existence in Chapter IX by means of Zorn’s Lemma (which appears in Section A5 of the appendix). The general problem to be addressed in this chapter is to ﬁnd “canonical forms” for linear maps from ﬁnite-dimensional vector spaces to themselves, special ways of realizing the linear maps that bring out some of their properties. Let us phrase a speciﬁc problem of this kind completely in terms of linear algebra at ﬁrst. Then we can rephrase it in terms of a combination of linear algebra and group theory, and we shall see how it ﬁts into a more general context. In terms of matrices, the speciﬁc problem is to ﬁnd a way of deciding whether two square matrices represent the same linear map in different bases. We know from Proposition 2.17 that if L : V → V is linear on the ﬁnite-dimensional vector space V and if A is the matrix of L relative to a particular ordered basis in domain and range, then the matrix B of L in another ordered basis is of the form B = C −1 AC for some invertible matrix C, i.e., A and B are similar. Thus one kind of solution to the problem would be to specify one representative of each similarity class of square matrices. But this is not a convenient kind of answer 10 to look for; in fact, the matrices A = 0 2 and B = 20 01 are similar via C = 01 10 , but there is no particular reason to prefer one of A or B to the other. Thus a “canonical form” for detecting similarity will allow more than one repre-

1. Introduction

211

sentative of each similarity class (but typically only ﬁnitely many such representatives), and a supplementary statement will tell us when two such are similar. So far, the best information that we have about solving this problem concerning square matrices comes from Section II.8. In that section the discussion of eigenvalues gave us some necessary conditions for similarity, but we did not obtain a useful necessary and sufﬁcient condition. In terms of linear maps, what we seek for a linear L : V → V is to use the geometry of L to construct an ordered basis of V such that L acts in a particularly simple way on that ordered basis. Ideally the description of how L acts on the ordered basis is to be detailed enough so that the matrix of L in that ordered basis is completely determined by the description, even though the ordered basis may not be determined by it. For example, if L were to have a basis of eigenvectors, then the description could be that “L has an ordered basis of eigenvectors with eigenvalues x1 , . . . , xn .” In any ordered basis with this property, the matrix of L would then be diagonal with diagonal entries x1 , . . . , xn . Suppose then that we have this kind of detailed description of how a linear map L acts on some ordered basis. To what extent is L completely determined? The answer is that L is determined up to an isomorphism of the underlying vector space.In fact, suppose

that L and M are linear maps from V to itself such that L M =A= for some ordered bases and . Then

L M I M I =A= =

−1

−1

M S S MS S = , =

S I where S : V → V is the invertible linear map deﬁned by = . Hence L = S −1 M S and S L = M S. In other words, if we think of having two copies of V , one called V1 and the other called V2 , that are isomorphic via S : V1 → V2 , then the effect of M in V2 corresponds under S to the effect of L in V1 . In this sense, L is determined up to an isomorphism of V . Thus we are looking for a geometric description that determines linear maps up to isomorphism. Two linear maps L and M that are related in this way have L = S −1 M S for some invertible linear map S. Passing to matrices with respect to some basis, we see that the matrices of L and M are to be similar. Consequently our two problems, one to characterize similarity for matrices and the other to characterize isomorphism for linear maps, come to the same thing. These two problems have an interpretation in terms of group theory. In the case of n-by-n matrices, the group GL(n, K) of invertible matrices acts on the set of all square matrices of size n by conjugation via (g, x) → gxg −1 ; the similarity

212

V. Theory of a Single Linear Transformation

classes are exactly the orbits of this group action, and the canonical form is to single out ﬁnitely many representatives from each orbit. In the case of linear maps, the group GL(V ) of invertible linear maps on the ﬁnite-dimensional vector space V acts by conjugation on the set of all linear maps from V into itself; the isomorphism classes of linear maps on V are the orbits, and the canonical form is to single out ﬁnitely many representatives from each orbit. The above problem, whether for matrices or for linear maps, does not have a unique acceptable solution. Nevertheless, the text of this chapter will ultimately concentrate on one such solution, known as the “Jordan canonical form.” Now that we have brought group theory into the statement of the problem, we can put matters in a more general context: The situation is that some “important” group G acts in an important way on an “interesting” vector space of matrices. The canonical-form problem for this situation is to single out ﬁnitely many representatives of each orbit and give a way of deciding, in terms of these representatives, whether two of the given matrices lie in the same orbit. We shall not pursue the more general problem in the text at this time. However, Problem 1 at the end of the chapter addresses one version beyond the one concerning similarity: to ﬁnd a canonical form for the action of GL(m, K) × GL(n, K) on m-by-n matrices by ((g, h), x) = gxh −1 . Some other groups that are important in this sense, besides products of general linear groups, are introduced in Chapter VI, and a problem at the end of Chapter VI reinterprets two theorems of that chapter as further canonical-form theorems under the action of a general linear group. Let us return to the canonical-form problems for similarity of matrices and isomorphism of linear maps. The basic tool in studying these problems is the characteristic polynomial of a matrix or a linear map, as in Chapter II. However, we subtly used a special feature of Q and R and C in working with characteristic polynomials in Chapter II: we passed back and forth between the characteristic polynomial det(λI − A) as a polynomial in one indeterminate (deﬁned by its expression after expanding it out) and as a polynomial function of λ, deﬁned for each value of λ in Q or R or C, one value at a time. This passage was legitimate because the homomorphism of the ring of polynomials in one indeterminate over a ﬁeld to the ring of polynomial functions is one-one when the ﬁeld is inﬁnite, by Proposition 4.28c or Corollary 1.14. Some care is required, however, in working with general ﬁelds, and we begin by supplying the necessary details for justifying manipulations with determinants in a more general setting than earlier. 2. Determinants over Commutative Rings with Identity Throughout this section let R be a commutative ring with identity. The main case of interest for us at this time will be that R = K[λ] is the polynomial ring in one indeterminate λ over a ﬁeld K.

2. Determinants over Commutative Rings with Identity

213

The set of n-by-n matrices with entries in R is an abelian group under entryby-entry addition, and matrix multiplication makes it into a ring with identity. Following tradition, we shall usually write Mn (R) rather than Mnn (R) for this ring. In this section we shall deﬁne a determinant function det : Mn (R) → R and establish some of its properties. For the case that R is a ﬁeld, some of our earlier proofs concerning determinants used vector-space concepts—bases, dimensions, and so forth—and these are not available for general R. Yet most of the properties of determinants remain valid for general R because of a phenomenon known as permanence of identities. We shall not try to state a general theorem about this principle but instead will be content to observe a pattern in how the relevant identities are proved. If A is in Mn (R), we deﬁne its determinant to be det A = (sgn σ )A1σ (1) A2σ (2) · · · Anσ (n) , σ ∈Sn

in effect converting into a deﬁnition the formula obtained in Theorem 2.34d when R is a ﬁeld. A sample of the kind of identity we have in mind is the formula det(AB) = det A det B

for A and B in Mn (R).

The key is that this formula says that two polynomials in 2n 2 variables, with integer coefﬁcients, are equal whenever arbitrary members of R are substituted for the variables. Thus let us introduce 2n 2 indeterminates X 11 , X 12 , . . . , X nn and Y11 , Y12 , . . . , Ynn to correspond to these variables. Forming the commutative ring S = Z[X 11 , X 12 , . . . , X nn , Y11*, Y 12 , . . . , Ynn + ], we assemble the matrices X Y X = [X i j ], Y = [Yi j ], and X Y = k ik k j in Mn (S). Consider the two members of S given by det X det Y (sgn σ )X 1σ (1) X 2σ (2) · · · X nσ (n) (sgn σ )Y1σ (1) Y2σ (2) · · · Ynσ (n) = σ ∈Sn

and

det(X Y ) =

σ ∈Sn

σ ∈Sn

(sgn σ )(X Y )1σ (1) (X Y )2σ (2) · · · (X Y )nσ (n) ,

where (X Y )i j = k X ik Yk j . If we ﬁx arbitrary elements x11 , x12 , . . . , xnn and y11 , y12 , . . . , ynn of Z, then Proposition 4.30 gives us a unique substitution homomorphism : S → Z such that (1) = 1, (X i j ) = xi j , and (Yi j ) = yi j for all i and j. Writing x = [xi j ] and y = [yi j ] and using that matrices with integer entries have det(x y) = det x det y because Z is a subset of the ﬁeld Q, we

V. Theory of a Single Linear Transformation

214

see that (det(X Y )) = (det X det Y ) for each choice of x and y. Since Z is an inﬁnite integral domain and since x and y are arbitrary, Corollary 4.32 allows us to deduce that det(X Y ) = det X det Y as an equality in S. Now we pass from an identity in S to an identity in R. Let 1 R be the identity in R. Proposition 4.19 gives us a unique homomorphism of rings ϕ1 : Z → R such that ϕ1 (1) = 1 R . If we ﬁx arbitrary elements A11 , A12 , . . . , Ann and B11 , B12 , . . . , Bnn of R, then Proposition 4.30 gives us a unique substitution homomorphism : S → R such that (1) = ϕ1 (1) = 1 R , (X i j ) = Ai j for all i and j, and (Yi j ) = Bi j for all i and j. Applying to the equality det(X Y ) = det X det Y , we obtain the identity we sought, namely det(AB) = det A det B

for A and B in Mn (R).

Proposition 5.1. If R is a commutative ring with identity, then the determinant function det : Mn (R) → R has the following properties: det(AB) = det A det B, det I = 1, det At = det A, det C = det A + det B if A, B, and C match in all rows but the j th and if the j th row of C is the sum of the j th rows of A and B, (e) det B = r det A if A and B match in all rows but the j th and if the j th row of B is equal entry by entry to r times the j th row of A for some r in R, (f) det A = 0if A has two equal rows, (g) det A0 DB = det A det D if A is in Mk (R), D is in Ml (R), and k + l = n. (a) (b) (c) (d)

REMARK. Properties (d), (e), and (f) imply that usual steps in manipulating determinants by row reduction continue to be valid. PROOF. Part (a) was proved above, and parts (c) through (f) may be proved in the same way from the corresponding facts about integer matrices in Section II.7. Part (b) is immediate from the deﬁnition. For (g), we ﬁrst prove the result when the entries are in Q, and then we argue in the same way as with (a) above. When the entries are in Q, row reduction of D allows us to reduce to the case either that a row of 0’s or that D D has is the identity. If D has a row of 0’s, then det A0 DB and det A det D are both 0 and hence are D is the identity, then further row reduction shows equal. If A B A 0 that det 0 I = det 0 I , and the right side equals det A = det A det I , as required.

2. Determinants over Commutative Rings with Identity

215

Proposition 5.2 (expansion in cofactors). Let R be a commutative ring with !i j be the member of Mn−1 (R) obtained by identity, let A be in Mn (R), and let A th th deleting the i row and the j column from A. Then n !i j , i.e., det A may be calculated (−1)i+ j Ai j det A (a) for any j, det A = i=1 th by “expansion in cofactors” about the j column, !i j , i.e., det A may be calculated (b) for any i, det A = nj=1 (−1)i+ j Ai j det A by “expansion in cofactors” about the i th row. PROOF. This may be derived in the same way from Proposition 2.36 by using the principle of permanence of identities. Corollary 5.3 (Vandermonde matrix and determinant). If r1 , . . . , rn lie in a commutative ring R with identity, then ⎞ ⎛ 1 1 ··· 1 r2 ··· rn ⎟ ⎜ r1 ⎜ 2 2 2 ⎟ r r · · · r ⎜ n ⎟= 2 (r j − ri ). det ⎜ 1 ⎟ .. .. ⎠ j>i .. ⎝ .. . . . . r1n−1 r2n−1 · · · rnn−1 PROOF. The derivation of this from Proposition 5.2 is the same as the derivation of Corollary 2.37 from Proposition 2.35. Proposition 5.4 (Cramer’s rule). Let R be a commutative ring with identity, let A be in Mn (R), and deﬁne Aadj in Mn (R) to be the classical adjoint of A, adj !ji , where A !kl deﬁned as in namely the matrix with entries Ai j = (−1)i+ j det A the statement of Proposition 5.2. Then A Aadj = Aadj A = (det A)I . PROOF. This may be derived from Proposition 2.38 in the same way as for Propositions 5.1 and 5.2 using the principle of permanence of identities. Corollary 5.5. Let R be a commutative ring with identity, and let A be in Mn (R). If det A is a unit in R, then A has a two-sided inverse in Mn (R). Conversely if A has a one-sided inverse in Mn (R), then det A is a unit in R. REMARK. If R is a ﬁeld, then A and any associated linear map are often called nonsingular if invertible, singular otherwise. When R is not a ﬁeld, terminology varies for what to call a noninvertible matrix whose determinant is not 0. PROOF. If det A is a unit in R, let r be its multiplicative inverse. Then Proposition 5.4 shows that r −1 Aadj is a two-sided inverse of A. Conversely if A has, say, a left inverse B, then B A = I implies (det B)(det A) = det I = 1, and det B is an inverse for det A. A similar argument applies if A has a right inverse.

216

V. Theory of a Single Linear Transformation

3. Characteristic and Minimal Polynomials Again let K be a ﬁeld. If A is in Mn (K), the characteristic polynomial of A is deﬁned to be the member of the ring K[λ] of polynomials in one indeterminate λ given by F(λ) = det(λI − A). The material of Section 2 shows that F(λ) is well deﬁned, being the determinant of a member of Mn (K[λ]). It is apparent from the deﬁnition of determinant in Section 2 that F(λ) is a monic polynomial of degree n with coefﬁcient − Tr A = − nj=1 A j j for λn−1 . Evaluating F(λ) at 0, we see that the constant term is (−1)n det A. Since the determinant of a product in Mn (K[λ]) is the product of the determinants (Proposition 5.1a) and since C −1 (λI − A)C = λI − C −1 AC, we have det(λI − C −1 AC) = (det C)−1 det(λI − A)(det C) = det(λI − A). Thus similar matrices have equal characteristic polynomials. If V is an ndimensional vector space over K and L : V → V is linear, then the matrices of L in any two ordered bases of V (the domain basis being assumed equal to the range basis) are similar, and their characteristic polynomials are the same. Consequently we can deﬁne the characteristic polynomial of L to be the characteristic polynomial of any matrix of L. The development of characteristic polynomials has thus be redone in a way that is valid over any ﬁeld K without making use of the ring homomorphism from polynomials in one indeterminate over K to polynomial functions from K into itself. The discussion in Section II.8 of eigenvectors and eigenvalues for members A of Mn (K) and for linear maps L : V → V with V ﬁnite-dimensional over K is now meaningful, and there is no need to repeat it. In particular, the eigenvalues of A and L are exactly the roots of their characteristic polynomial, no matter what K is. If K is algebraically closed, then the characteristic polynomial has a root, and consequently A and L each have at least one eigenvalue. If L : V → V is linear and V is ﬁnite-dimensional, then a vector subspace U of V is said to be invariant under L if L(U ) ⊆ U . In this case L U is a well-deﬁned linear map from U to itself. Since L(U ) ⊆ U , Proposition 2.25 shows that L : V → V factors through V /U as a linear map L : V /U → V /U . We shall use this construction, the existence of eigenvalues in the algebraically closed case, and an induction to prove the following. Proposition 5.6. If K is an algebraically closed ﬁeld, if V is a ﬁnitedimensional vector space over K, and if L : V → V is linear, then V has an ordered basis in which the matrix of L is upper triangular. Consequently any member of Mn (K) is similar to an upper triangular matrix.

3. Characteristic and Minimal Polynomials

⎛ REMARKS. For an upper triangular matrix A = ⎝

c1

..

217

∗ .

⎞ ⎠ in Mn (K), the

cn 0 characteristic polynomial is nj=1 (λ − c j ) because the only nonzero term in the deﬁnition of det(λI − A) is the one corresponding to the identity permutation. Triangular form is not yet the canonical form we seek for a square matrix because a particular square matrix may be similar to inﬁnitely many matrices in triangular form.

PROOF. We proceed by induction on n = dim V , with the base case n = 1 being clear. Suppose that the result holds for all linear maps from spaces of dimension < n to themselves. Given L : V → V with dim V = n, let v1 be an eigenvector of L. This exists by the remarks before the proposition since K is algebraically closed. Let U be the vector subspace Kv1 . Then L(U ) ⊆ U , and Proposition 2.25 shows that L : V → V factors through V /U as a linear map L : V /U → V /U . Since dim V /U = n − 1, the inductive hypothesis produces an ordered basis (v¯2 , . . . , v¯n ) of V /U such that the matrix of L is upper j triangular in this basis. This condition means that L(v¯ j ) = i=2 ci j v¯i for j ≥ 2. Select coset representatives v2 , . . . , vn of v¯2 , . . . , v¯n so that v¯ j = v j + U for j j ≥ 2. Then L(v j + U ) = i=2 ci j (vi + U ) for j ≥ 2, and hence L(v j ) j lies in the coset i=2 ci j vi + U for j ≥ 2. For each j ≥ 1, we then have j L(v j ) = i=2 ci j vi + c1 j v1 for some scalar c1 j , and we see that (v1 , . . . , vn ) is the required ordered basis. Let us return to the situation in which K is any ﬁeld. For a matrix A in Mn (K) and a polynomial P in K[λ], it is meaningful to form P(A). We can do so by two equivalent methods, both useful. The concrete way of forming P(A) is as P(A) = cn An + · · · + c1 A + c0 I if P(λ) = cn λn + · · · + c1 λ + c0 . The abstract way is to form the subring T of Mn (A) generated by KI and A. This subring is commutative. We let ϕ : K → T be given by ϕ(c) = cI . Then the universal mapping property of K[λ] given in Proposition 4.24 produces a unique ring homomorphism : K[λ] → T such that (c) = cI for all c ∈ K and (λ) = A. The value of P(A) is the element (P) of T . For A in Mn (K), let us study all polynomials P such that P(A) = 0. For any polynomial P and any invertible matrix C, we have P(C −1 AC) = C −1 P(A)C because if P(λ) = cn λn + · · · + c1 λ + c0 , then P(C −1 AC) = cn (C −1 AC)n + · · · + c1 C −1 AC + c0 I = C −1 (cn An + · · · + c1 A + c0 I )C.

218

V. Theory of a Single Linear Transformation

Consequently if P(A) = 0, then P(C −1 AC) = 0, and the set of matrices with P(A) = 0 is closed under similarity. We shall make use of this observation a little later in this section. Proposition 5.7. If A is in Mn (K), then there exists a nonzero polynomial P in K[λ] such that P(A) = 0. PROOF. The K vector space Mn (K) has dimension n 2 . Therefore the n 2 + 1 2 matrices I, A, A2 , . . . , An are linearly dependent, and we have 2

c0 + c1 A + c2 A2 + · · · + cn 2 An = 0 for some set of scalars not all 0. Then P(A) = 0 for the polynomial P(λ) = 2 c0 + c1 λ + c2 λ2 + · · · + cn 2 λn ; this P is not the 0 polynomial since at least one of the coefﬁcients is not 0. ALTERNATIVE PROOF IF K IS ALGEBRAICALLY CLOSED. Since the set of polynomials P with P(A) = 0 depends only on the similarity class of A, Proposition 5.6 shows that there ⎞ of generality in assuming that A is upper triangular, ⎛ is no loss λ1 ∗ say of the form ⎝ . . . ⎠. Then A − λ j I is upper triangular with 0 in the j th 0 λn diagonal entry, and nj=1 (A − λ j I ) is upper triangular with 0 in all diagonal n n = 0. entries. Therefore j=1 (A − λ j I ) With A ﬁxed, we continue to consider the set of all polynomials P(λ) such that P(A) = 0. Let us think of P(A) as being computed by the abstract procedure described above, namely as the image of A under the ring homomorphism : K[λ] → T such that (c) = cI for all c ∈ K and (λ) = A, where T is the commutative subring of Mn (K) generated by KI and A. Then the set of all polynomials P(λ) with P(A) = 0 is the kernel of the ring homomorphism . This set is therefore an ideal, and Proposition 5.7 shows that the ideal is nonzero. We shall apply the following proposition to this ideal. Proposition 5.8. If I is a nonzero ideal in K[λ], then there exists a unique monic polynomial of lowest degree in I , and every member of I is the product of this particular polynomial by some other polynomial. PROOF. Let B(λ) be a nonzero member of I of lowest possible degree; adjusting B by a scalar factor, we may assume that B is monic. If A is in I , then Proposition 1.12 produces polynomials Q and R such that A = B Q + R and either R = 0 or deg R < deg B. Since I is an ideal, B Q is in I and hence R = A − B Q is in I . From minimality of the degree of B, we conclude that R = 0. Hence A = B Q,

3. Characteristic and Minimal Polynomials

219

and A is exhibited as the product of B and some other polynomial Q. If B1 is a second monic polynomial of lowest degree in I , then we can take A = B1 to see that B1 = Q B. Since deg B1 = deg B, we conclude that deg Q = 0. Thus Q is a constant polynomial. Comparing the leading coefﬁcients of B and B1 , we see that Q(λ) = 1. With A ﬁxed in Mn (K), let us apply Proposition 5.8 to the ideal of all polynomials P in K[λ] with P(A) = 0. The unique monic polynomial of lowest degree in this ideal is called the minimal polynomial of A. Let us try to identify this minimal polynomial. Theorem 5.9 (Cayley–Hamilton Theorem). If A is in Mn (K) and if F(λ) = det(λI − A) is its characteristic polynomial, then F(A) = 0. PROOF. Let T be the commutative subring of Mn (K) generated by KI and A, and deﬁne a member B(λ) of the ring T [λ] by B(λ) = λI − A. The (i, j)th entry of B(λ) is Bi j (λ) = δi j λ − Ai j , and F(λ) = det B(λ). Let C(λ) = B(λ)adj denote the classical adjoint of B(λ) as a member of T [λ]; the form of C(λ) is given in the statement of Cramer’s rule (Proposition 5.4), and that proposition says that B(λ)C(λ) = (det B(λ))I = F(λ)I. The equality in the (i, j)th entry is the equality δi j F(λ) = j Bik (λ)Ck j (λ) of members of K[λ]. Application of the substitution homomorphism λ → A gives δi j F(A) =

Bik (A)Ck j (A) =

k

(δik A − Aik I )Ck j (A).

k

Multiplying on the right by the i th standard basis vector ei and summing on i, we obtain the equality of vectors (δik Aei − Aik ei )Ck j (A) = Ck j (A) (δik Aei − Aik ei ) F(A)e j = i

k

k

i

since Ck j (A) is a scalar. But i (δik Aei − Aik ei ) = Aek − i Aik ei = 0 for all k, and therefore F(A)e j = 0. Since j is arbitrary, F(A) = 0. Corollary 5.10. If A is in Mn (K), then the minimal polynomial of A divides the characteristic polynomial of A. PROOF. Theorem 5.9 shows that the characteristic polynomial of A lies in the ideal of all polynomials vanishing on A. Then the corollary follows from Proposition 5.8.

220

V. Theory of a Single Linear Transformation

For our matrix A in Mn (K), let F(λ) be the characteristic polynomial, and let M(λ) be the minimal polynomial. By unique factorization (Theorem 1.17), the monic polynomial F(λ) has a factorization into powers of distinct prime monic polynomials of the form F(λ) = P1 (λ)k1 · · · Pr (λ)kr , and this factorization is unique up to the order of the factors. Since M(λ) is a monic polynomial dividing F(λ), we must have M(λ) = P1 (λ)l1 · · · Pr (λ)lr with l1 ≤ k1 , . . . , lr ≤ kr , by the same argument that deduced Corollary 1.7 from unique factorization in the ring of integers. We shall see shortly that k j > 0 implies l j > 0 if Pj (λ) is of degree 1, i.e., if Pj (λ) is of the form λ − λ0 ; in other words, if λ0 is an eigenvalue of A, then λ − λ0 divides its minimal polynomial. We return to this point in a moment. Problem 31 at the end of the chapter will address the same question when Pj (λ) has degree > 1. EXAMPLES.

(1) In the 2-by-2 case, 0c 0c has minimal polynomial M(λ) = λ − c, and c 1 has M(λ) = (λ − c)2 . Both matrices have characteristic polynomial 0 c F(λ) = (λ − c)2 . (2) The k-by-k matrix

⎛c ⎜ ⎜ ⎝

1 0 ··· 0 0 ⎞ 0 c 1 ··· 0 0

..

.

0 0 0 ··· c 1 0 0 0 ··· 0 c

⎟ ⎟ ⎠

with c in every diagonal entry, with 1 in every entry just above the diagonal, and with 0 elsewhere has minimal polynomial M(λ) = (λ − c)k and characteristic polynomial F(λ) = (λ − c)k . (3) If a matrix A is made up exclusively of several blocks of the type in Example 2 with the same c in each case, the i th block being of size ki , then the maxi ki , and the characteristic polynomial minimal polynomial is M(λ) = (λ − c) k is F(λ) = (λ − c) i i . (4) If A is made up exclusively of several blocks as in Example 3 but with c different for each block, then the minimal and characteristic polynomials for A are obtained by multiplying the minimal and characteristic polynomials obtained from Example 3 for the various c’s.

3. Characteristic and Minimal Polynomials

221

To proceed further, let us change our point of view, working with linear maps L : V → V , where V is a ﬁnite-dimensional vector space over K. We have already deﬁned the characteristic polynomial of L to be the characteristic polynomial of the matrix of L in any ordered basis; this is well deﬁned because similar matrices have the same characteristic polynomial. In analogous fashion we can deﬁne the minimal polynomial of L to be the minimal polynomial of the matrix of L in any ordered basis; this is well deﬁned since, as we have seen, the set of polynomials P in one indeterminate with P(A) = 0 is the same as the set with P(C −1 AC) = 0 if C is invertible. Another way of approaching the matter of the minimal polynomial of L is to deﬁne P(L) for any polynomial P in one indeterminate. As with matrices, we can deﬁne P(L) either concretely by substituting L for λ in the expression for P(λ), or we can deﬁne P(L) abstractly by appealing to the universal mapping property in Proposition 4.24. For the latter we work with the subring T of linear maps from V to itself generated by KI and L. This subring is commutative. We let ϕ : K → T be given by ϕ(c) = cI , and we use Proposition 4.24 to obtain the unique ring homomorphism : K[λ] → T such that (c) = cI for all c ∈ K and (λ) = L. Then P(L) is the element (P) of T . Once P(L) is deﬁned, we observe that the set of polynomials P(λ) such that P(L) = 0 is a nonzero ideal in K[λ]; Proposition 5.8 yields a unique monic polynomial of lowest degree in this ideal, and that is the minimal polynomial of L. Linear maps enable us to make convenient use of invariant subspaces. Recall from earlier in the section that a vector subspace U of V is said to be invariant under the linear map L : V → V if L(U ) ⊆ U ; in this case we obtain associated linear maps L U : U → U and L : V /U → V /U . Relationships among the characteristic polynomials and minimal polynomials of these linear maps are given in the next two propositions. Proposition 5.11. Let V be a ﬁnite-dimensional vector space over K, let L : V → V be linear, let U be a proper nonzero invariant subspace under L, and let L : V /U → V /U be the induced linear map on V /U . Then the characteristic polynomials of L, L U , and L are related by det(λI − L) = det λI − L U det(λI − L). PROOF. Let U = (v1 , . . . , vk ) be an ordered basis of U , and extend U to an ordered basis = (v1 , . . . , vn ) of V . Then = (vk+1 + U, . . . , vn + U ) is an ordered basis of V /U . Since U isinvariant under L, the matrix of L in the ordered basis is of the form A0 DB , where A is the matrix of L U in the ordered basis U and D is the matrix of L in the ordered basis . Passing to the characteristic polynomials and applying Proposition 5.1g, we obtain the desired conclusion.

222

V. Theory of a Single Linear Transformation

Proposition 5.12. Let V be a ﬁnite-dimensional vector space over K, let L : V → V be linear, let U be a proper nonzero invariant subspace under L, and let L : V /U → V /U be the induced linear map on V /U . Then the minimal polynomials of L U and L divide the minimal polynomial of L. PROOF. Let N (λ) be the minimal polynomial of L U . Then N (λ) is the unique monic polynomial of lowest degree in the ideal of all polynomials P(λ) such that P(L)u = 0 for all u in U . The minimal polynomial M(λ) of L has this property because M(λ)v = 0 for all v in V . Therefore M(λ) is in the ideal and is the product of N (λ) and some other polynomial. Among linear maps S from V into V carrying U into itself, the function S → S sending S to the linear map S induced on V /U is a homomorphism of rings. It follows that if P(λ) is a polynomial with P(L) = 0, then P(L) = 0. Taking P(λ) to be the minimal polynomial of L, we see that the minimal polynomial of L is in the ideal of polynomials vanishing on L. Therefore it is the product of the minimal polynomial of L and some other polynomial. Let us come back to the unproved assertion before the examples—that k j > 0 implies l j > 0 if Pr (λ) has degree 1. We prove the linear-function version of this statement as a corollary of Proposition 5.12. Corollary 5.13. If L : V → V is linear on a ﬁnite-dimensional vector space over K and if a ﬁrst-degree polynomial λ−λ0 divides the characteristic polynomial of L, then λ − λ0 divides the minimal polynomial of L. PROOF. If λ−λ0 divides the characteristic polynomial, then λ0 is an eigenvalue of L, say with v as an eigenvector. Then U = Kv is an invariant subspace under L, and the characteristic and minimal polynomials of L U are both λ − λ0 . By Proposition 5.12, λ − λ0 divides the minimal polynomial of L. Theorem 5.14. If L : V → V is linear on a ﬁnite-dimensional vector space over K, then L has a basis of eigenvectors if and only if the minimal polynomial M(λ) of L is the product of distinct factors of degree 1; in this case, M(λ) equals (λ − λ1 ) · · · (λ − λk ), where λ1 , . . . , λk are the distinct eigenvalues of L. Consequently a matrix A in Mn (K) is similar to a diagonal matrix if and only if its minimal polynomial is the product of distinct factors of degree 1. PROOF. The easy direction is that v1 , . . . , vn are the members of a basis of eigenvectors for L with respective eigenvalues µ1 , . . . , µn . In this case, let λ1 , . . . , λk be the distinct members of the set of eigenvalues, with µi = λ j (i) for some function j : {1, . . . , n} → {1, . . . , k}. Then (L − λ j I )(v) = 0 for v equal to any vi with j (i) = j. Since the linear maps L − λ j I commute as j varies, k j=1 (L−λ j I )(v) = 0 for v equal to each of v1 , . . . , vn , hence for all v. Therefore

3. Characteristic and Minimal Polynomials

223

the minimal polynomial M(λ) of L divides kj=1 (λ − λ j ). On the other hand, Corollary 5.13 shows that the deg M(λ) ≥ k. Hence M(λ) = kj=1 (λ − λ j ). Conversely suppose that M(λ) = kj=1 (λ − λ j ) with the λ j distinct. If S1 is the linear map S1 = kj=2 (L − λ j I ), then the formula for M(λ) shows that (L − λ1 I )S1 (v) = 0 for all v in V , and hence image S1 is a vector subspace of the eigenspace of L for the eigenvalue λ1 . If v is in ker S1 ∩ image S1 , we then have 0 = S1 (v) = kj=2 (L − λ j I )(v) = kj=2 (λ1 − λ j )v. Since λ1 is distinct from λ2 , . . . , λk , we conclude that v = 0, hence that ker S1 ∩ image S1 = 0. Since dim ker S1 + dim image S1 = dim V , Corollary 2.29 therefore gives dim V = dim ker S1 + dim image S1 = dim(ker S1 + image S1 ) + dim(ker S1 ∩ image S1 ) = dim(ker S1 + image S1 ). Hence V = ker S1 + image S1 . Since ker S1 ∩ image S1 = 0, we conclude that V = ker S1 ⊕ image S1 . Actually, the same calculation of S1 (v) as above shows that image S1 is the full eigenspace of L for the eigenvalue λ1 . In fact, if L(v) = λ1 v, then S1 (v) = k −1 k v. j=2 (λ1 − λ j )v, and hence v equals the image under S1 of j=2 (λ1 − λ j ) Next, since L commutes with S , ker S is an invariant subspace under L, 1 1 and λ1 is not an eigenvalue of L ker S1 . Thus λ − λ1 does not divide the minimal polynomial of L ker S1 . On the other hand, S1 vanishes on the eigenspaces of L for 5.13 shows for j ≥ 2 that λ − λ j divides eigenvalues λ2 , . . . , λk , and Corollary the minimal polynomial of L ker S1 . Taking Proposition 5.12 into account, we has minimal polynomial k (λ−λ j ). We have succeeded conclude that L ker S1

j=2

in splitting off the eigenspace of L under λ1 as a direct summand and reducing the proposition to the case of k − 1 eigenvalues. Thus induction shows that V is the direct sum of its eigenspaces for the eigenvalues λ2 , . . . , λk , and L thus has a basis of eigenvectors.

Theorem 5.14 comes close to solving the canonical-form problem for similarity in the case of one kind of square matrices: if the minimal polynomial of A is the product of distinct factors of degree 1, then A is similar to a diagonal matrix. To complete the solution for this case, all we have to do is to say when two diagonal matrices are similar to each other; this step is handled by the following easy proposition. Proposition 5.15. Two diagonal matrices A and A in Mn (K) with respective diagonal entries d1 , . . . , dn and d1 , . . . , dn are similar if and only if there is a permutation σ in Sn such that d j = dσ ( j) for all j.

224

V. Theory of a Single Linear Transformation

PROOF. The respective characteristic polynomials are nj=1 (λ − d j ) and n j=1 (λ − d j ). If A and A are similar, then the characteristic polynomials are equal, and unique factorization (Theorem 1.17) shows that the factors λ − d j match the factors λ − d j up to order. Conversely if there is a permutation σ in Sn such that d j = dσ ( j) for all j, then the matrix C whose j th column is eσ ( j) has the property that A = C −1 AC. To proceed further with obtaining canonical forms for matrices under similarity and for linear maps under isomorphism, we shall use linear maps in ways that we have not used them before. In particular, it will be convenient to be able to recognize direct-sum decompositions from properties of linear maps. We take up this matter in the next section.

4. Projection Operators In this section we shall see how to recognize direct-sum decompositions of a vector space V from the associated projection operators, and we shall relate these operators to invariant subspaces under a linear map L : V → V . If V = U1 ⊕ U2 , then the function E 1 deﬁned by E 1 (u 1 + u 2 ) = u 1 when u 1 is in U1 and u 2 is in U2 is linear, satisﬁes E 12 = E 1 , and has image E 1 = U1 and ker E 1 = U2 . We call E 1 the projection of V on U1 along U2 . A decomposition of V as the direct sum of two vector spaces, when the ﬁrst of the two spaces is singled out, therefore determines a projection operator uniquely. A converse is as follows. Proposition 5.16. If V is a vector space and E 1 : V → V is a linear map such that E 12 = E 1 , then there exists a direct-sum decomposition V = U1 ⊕ U2 such that E 1 is the projection of V on U1 along U2 . In this case, (I − E 1 )2 = I − E 1 , and I − E 1 is the projection of V on U2 along U1 . PROOF. Deﬁne U1 = image E 1 and U2 = ker E 1 . If v is in image E 1 ∩ ker E 1 , then E 1 (v) = 0 since v is in ker E 1 and v = E 1 (w) for some w in V since v is in image E 1 . Then 0 = E 1 (v) = E 12 (w) = E 1 (w) = v, and therefore image E 1 ∩ ker E 1 = 0. If v ∈ V is given , write v = E 1 (v) + (I − E 1 )(v). Then E 1 (v) is in image E 1 , and the computation E 1 (I − E 1 )(v) = (E 1 − E 12 )(v) = (E 1 − E 1 )(v) = 0 shows that (I − E 1 )(v) = 0. Consequently V = image E 1 + ker E 1 , and we conclude that V = image E 1 ⊕ ker E 1 . Hence V = U1 ⊕ U2 , where U1 = image E 1 and U2 = ker E 1 . In this notation, E 1 is 0 on U2 . If v is in U1 , then v = E 1 (w) for some w, and we have

4. Projection Operators

225

v = E 1 (w) = E 12 (w) = E 1 (E 1 (w)) = E 1 (v). Thus E 1 is the identity on U1 and is the projection as asserted. For (I − E 1 )2 , we have (I − E 1 )2 = I − 2E 1 + E 12 = I − 2E 1 + E 1 = I − E 1 , and I − E 1 is a projection. It is 1 on U2 and is 0 on U1 , hence is the projection of V on U2 along U1 . Let us generalize these considerations to the situation that V is the direct sum of r vector subspaces. The following facts about the situation in Proposition 5.16, with the deﬁnition E 2 = I − E 1 , are relevant to formulating the generalization: (i) E 1 and E 2 have E 12 = E 1 and E 22 = E 2 , (ii) E 1 E 2 = E 2 E 1 = 0, (iii) E 1 + E 2 = I . Suppose that V = U1 ⊕ · · · ⊕ Ur . Deﬁne E j (u 1 + · · · + u r ) = u j . Then E j is linear from V to itself with E j2 = E j , and Proposition 5.16 shows that E j is the projection of V on U j along the direct sum of the remaining Ui ’s. The linear maps E 1 , . . . , Er then satisfy (i ) E j2 = E j for 1 ≤ j ≤ r , (ii ) E j E i = 0 if i = j, (iii ) E 1 + · · · + Er = I . A converse is as follows. Proposition 5.17. If V is a vector space and E j : V → V for 1 ≤ j ≤ r are linear maps such that (a) E j E i = 0 if i = j, and (b) E 1 + · · · + Er = I , then E j2 = E j for 1 ≤ j ≤ r and the vector subspaces U j = image E j have the properties that V = U1 ⊕ · · · ⊕ Ur and that E j is the projection of V on U j along the direct sum of all Ui but U j . PROOF. Multiplying (b) through by E j on the left and applying (a) to each term on the left side except the j th , we obtain E j2 = E j . Therefore, for each j, E j is a projection on U j along some vector subspace depending on j. If v is in V , then (b) gives v = E 1 (v) + · · · + Er (v) and shows that V = U1 + · · · + Ur . Suppose that v is in the intersection of U j with the sum of the other Ui ’s. Write v = i = j u i with u i = E i (wi ) in Ui . Applying E j and using the fact that v is in U j , we obtain v = E j (v) = i = j E j E i (wi ). Every term of the right side is 0 by (a), and hence v = 0. Thus V = U1 ⊕ · · · ⊕ Ur . Since E j E i = 0 for i = j, E j is 0 on each Ui for i = j. Therefore the sum of all Ui except U j is contained in the kernel of E j . Since the image and kernel of E j intersect in 0, the sum of all Ui except U j is exactly equal to the kernel of E j . This completes the proof.

226

V. Theory of a Single Linear Transformation

Proposition 5.18. Suppose that a vector space V is a direct sum V = U1 ⊕ · · · ⊕ Ur of vector subspaces, that E 1 , . . . , Er are the corresponding projections, and that L : V → V is linear. Then all the subspaces U j are invariant under L if and only if L E j = E j L for all j. PROOF. If L(U j ) ⊆ U j for all j, then i = j implies E i L(U j ) ⊆ E i (U j ) = 0 and L E i (U j ) = L(0) = 0. Also, v ∈ U j implies E j L(v) = L(v) = L E j (v). Hence E i L = E i L for all i. Conversely if E j L = L E j and if v is in U j , then E j L(v) = L E j (v) = L(v) shows that L(v) is in U j . Therefore L(U j ) ⊆ U j for all j.

5. Primary Decomposition For the case that the minimal polynomial of a linear map L : V → V is the product of distinct factors of degree 1, Theorem 5.14 showed that V is a direct sum of its eigenspaces. The proof used elementary vector-space techniques from Chapter II but did not take full advantage of the machinery developed in the present chapter for passing back and forth between polynomials in one indeterminate and the values of polynomials on L. Let us therefore rework the proof of that proposition, taking into account the discussion of projections in Section 4. We seek an eigenspace decomposition V = Vλ1 ⊕ · · · ⊕ Vλk relative to L. Proposition 5.17 suggests looking for the corresponding decomposition of the identity operator as a sum of projections: I = E 1 + · · · + E k . According to that proposition, we obtain a direct-sum decomposition as soon as we obtain this kind of sum of linear maps such that E i E j = 0 for i = j. The E j ’s will automatically be projections. The proof of Theorem 5.14 showed that S1 = kj=2 (L − λ j I ) has image equal to the kernel of L − λ1 I , i.e., equal to the eigenspace for eigenvalue λ1 . If v is in this eigenspace, then S1 (v) = kj=2 (λ1 − λ j )v. Hence E 1 = c1 S1 , where c−1 = kj=2 (λ1 − λ j ). The linear map S1 equals Q 1 (L), where Q 1 (λ) = 1 k j=2 (λ − λ j ). Thus E 1 = c1 Q 1 (L). Similar remarks apply to the other eigenspaces, and therefore the required decomposition of the identity operator has to be of the form I = c1 Q 1 (L) + · · · + ck Q k (L) with c1 , . . . , ck equal to certain scalars. The polynomials Q 1 (λ), . . . , Q l (λ) are at hand from the start, each containing all but one factor of the minimal polynomial. Moreover, i = j implies that Q i (L)Q j (L) =

k l=1

(L − λl I ) (L − λl I ) . l =i, j

5. Primary Decomposition

227

The ﬁrst factor on the right side is the value of the minimal polynomial of L with L substituted for λ. Hence the right side is 0, and we see that our linear maps E 1 , . . . , E k have E i E j = 0 for i = j. As soon as we allow nonconstant coefﬁcients in place of the c j ’s in the above argument, we obtain a generalization of Theorem 5.14 to the situation that the minimal polynomial of L is arbitrary. The prime factors of the minimal polynomial need not even be of degree 1. Hence the theorem applies to all L’s even if K is not algebraically closed. Theorem 5.19 (Primary Decomposition Theorem). Let L : V → V be linear on a ﬁnite-dimensional vector space over K, and let M(λ) = P1 (λ)l1 · · · Pk (λ)lk be the unique factorization of the minimal polynomial M(λ) of L into the product of powers of distinct monic prime polynomials Pj (λ). Deﬁne U j = ker(Pj (L)l j ) for 1 ≤ j ≤ k. Then (a) V = U1 ⊕ · · · ⊕ Uk , (b) the projection E j of V on U j along the sum of the other Ui ’s is of the form Tj (L) for some polynomial Tj , (c) each vector subspace U j is invariant under L, (d) any linear map from V to itself that commutes with L carries each U j into itself, (e) any vector subspace W invariant under L has the property that W = (W ∩ U1 ) ⊕ · · · ⊕ (W ∩ Uk ), (f) the minimal polynomial of L j = L U j is Pj (λ)l j . REMARKS. The decomposition in (a) is called the primary decomposition of V under L, and the vector subspaces U j are called the primary subspaces of V under L. PROOF. For 1 ≤ j ≤ k, deﬁne Q j (λ) = M(λ)/Pj (λ)l j . The ideal in K[λ] generated by Q 1 (λ), . . . , Q k (λ) consists of all products of a single monic polynomial D(λ) by arbitrary polynomials, according to Proposition 5.8, and D(λ) has to divide each Q j (λ). Since Q j (λ) = i = j Pi (λ)li , D(λ) cannot be divisible by any Pj (λ), and consequently D(λ) = 1. Thus there exist polynomials R1 (λ), . . . , Rk (λ) such that 1 = Q 1 (λ)R1 (λ) + · · · + Q k (λ)Rk (λ). Deﬁne E j = Q j (L)R j (L), so that E 1 + · · · + E k = I . If i = j, then Q i (λ)Q j (λ) = M(λ) r =i, j Pr (λ)lr . Since M(L) = 0, we see that E i E j = 0. Proposition 5.17 says that each E j is a projection. Also, it says that if U j denotes image E j , then V = U1 ⊕ · · · ⊕ Uk , and E j is the projection on U j along

228

V. Theory of a Single Linear Transformation

the sum of the other Ui ’s. With this deﬁnition of the U j ’s (rather than the one in the statement of the theorem), we have therefore shown that (a) and (b) hold. Let us see that conclusions (c), (d), and (e) follow from (b). Conclusion (c) holds by Proposition 5.18 since L commutes with Tj (L) whenever Tj is a polynomial. For (d), if J : V → V is a linear map commuting with L, then J commutes with each E j since (b) shows that each E j is of the form Tj (L). From Proposition 5.18 we conclude that each U j is invariant under J . For (e), the subspace W certainly contains (W ∩ U1 ) ⊕ · · · ⊕ (W ∩ Uk ). For the reverse containment suppose w is in W . Since E j is of the form Tj (L) and since W is invariant under L, E j (w) is in W . But also E j (w) is in U j . Therefore the expansion w = j E j (w) exhibits w as the sum of members of the spaces W ∩ Uj . Next let us prove that U j , as we have deﬁned it, is given also by the deﬁnition in the statement of the theorem. In other words, let us prove that image E j = ker(Pj (L)l j ).

(∗)

We need a preliminary fact. The polynomial Pj (λ)l j has the property that M(λ) = Pj (λ)l j Q j (λ). Hence Pj (L)l j Q j (L) = M(L) = 0. Multiplying by R j (L), we obtain (∗∗) Pj (L)l j E j = 0. Now suppose that v is in image E j . Then Pj (L)l j (v) = Pj (L)l j E j (v) = 0 by (∗∗), and hence image E j ⊆ ker(Pj (L)lj ).For the reverse inclusion,l let v be in lr ker(Pj (L)l j ). For i = j, Q i (λ)Ri (λ) = P (λ) Ri (λ)Pj (λ) j and hence r =i, j r lr lj E i (v) = r =i, j Pr (L) Ri (L)Pj (L) (v) = 0. Writing v = E 1 (v) + · · · + E k (v), we see that v = E j (v). Thus ker(Pj (L)l j ) ⊆ image E j . Therefore (∗) holds, and U j is as in the statement of the theorem. Finally let us prove (f). Let M j (λ) be the minimal polynomial of L j = L U j .

From (∗∗) we see that Pj (L j )l j = 0. Hence M j (λ) divides Pj (λ)l j . For the reverse divisibility we have M j (L j ) = 0. Then certainly M j (L j )Q j (L j )R j (L j ), which equals M j (L)E j on U j , is 0 on U j . Consider M j (L)E j on Ui = image E i when i = j. Since E j E i = 0, M j (L)E j equals 0 on all Ui other than U j . We conclude that M j (L)E j equals 0 on V , i.e., M j (L)Q j (L)R j (L) = 0. Since M(λ) is the minimal polynomial of L, M(λ) divides Q i (λ)Ri (λ) , (†) M j (λ)Q j (λ)R j (λ) = M j (λ) 1 − i = j

and the factor Pj (λ)l j of M(λ) must divide the right side of (†). On that right side, Pj (λ)l j divides each Q i (λ) with i = j. Since Pj (λ) does not divide 1, Pj (λ)

6. Jordan Canonical Form

229

does not divide the factor 1 − i = j Q i (λ)Ri (λ). Since Pj (λ) is prime, Pj (λ)l j and 1 − i = j Q i (λ)Ri (λ) are relatively prime. We know that Pj (λ)l j divides the product of M j (λ) and 1 − i = j Q i (λ)Ri (λ), and consequently Pj (λ)l j divides M j (λ). This proves the reverse divisibility and completes the proof of (f).

6. Jordan Canonical Form Now we can return to the canonical-form problem for similarity of square matrices and isomorphism of linear maps from a ﬁnite-dimensional vector space to itself. The answer obtained in this section will solve the problem completely if K is algebraically closed but only partially if K fails to be algebraically closed. Problems 32–40 at the end of the chapter extend the content of this section to give a complete answer for general K. The present theorem is most easily stated in terms of matrices. A square matrix is called a Jordan block if it is of the form ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

c

1 0 c 1 c

0 0 1 .. .

··· ··· ··· .. . c

⎞ 0 0 0 0⎟ ⎟ 0 0⎟ .. .. ⎟ . .⎟ ⎟, 1 0⎟ ⎟ c 1⎠ c

of some size and for some c in K, as in Example 2 of Section 3, with 0 everywhere below the diagonal. A square matrix is in Jordan form, or Jordan normal form, if it is block diagonal and each block is a Jordan block. One can insist on grouping the blocks for which the constant c is the same and arranging the blocks for given c in some order, but these reﬁnements are inessential. Theorem 5.20 (Jordan canonical form). (a) If the ﬁeld K is algebraically closed, then every square matrix over K is similar to a matrix in Jordan form, and two matrices in Jordan form are similar to each other if and only if their Jordan blocks can be permuted so as to match exactly. (b) For a general ﬁeld K, a square matrix A is similar to a matrix in Jordan form if and only if each prime factor of its minimal polynomial has degree 1. Two matrices in Jordan form are similar to each other if and only if their Jordan blocks can be permuted so as to match exactly.

230

V. Theory of a Single Linear Transformation

The ﬁrst step in proving existence of a matrix in Jordan form similar to a given matrix is to use the Primary Decomposition Theorem (Theorem 5.19). We think of the matrix A as operating on the space Kn of column vectors in the usual way. The primary subspaces are uniquely deﬁned vector subspaces of Kn , and we introduce an ordered basis, yet to be speciﬁed in full detail, within each primary subspace. The union of these ordered bases gives an ordered basis of Kn , and we change from the standard basis to this one. The result is that the given matrix has been conjugated so that its appearance is block diagonal, each block having minimal polynomial equal to a power of a prime polynomial and the prime polynomials all being different. Let us call these blocks primary blocks. The effect of Theorem 5.19 has been to reduce matters to a consideration of each primary block separately. The hypothesis either that K is algebraically closed or, more generally, that the prime divisors of the minimal polynomial all have degree 1 means that the minimal polynomial of the primary block under study may be taken to be (λ − c)l for some c in K and some integer l ≥ 1. In terms of Jordan form, we have isolated, for each c in K, what will turn out to be the subspace of Kn corresponding to Jordan blocks with c in every diagonal entry. Let us write B for a primary block with minimal polynomial (λ − c)l . We certainly have (B − cI )l = 0, and it follows that the matrix N = B − cI has N l = 0. A matrix N with N l = 0 for some integer l ≥ 0 is said to be nilpotent. To prove the existence part of Theorem 5.20, it is enough to prove the following theorem. Theorem 5.21. For any ﬁeld K, each nilpotent matrix N in Mn (K) is similar to a matrix in Jordan form. The proof of Theorem 5.21 and of the uniqueness statements in Theorem 5.20 will occupy the remainder of this section. It is implicit in Theorem 5.21 that a nilpotent matrix in Mn (K) has 0 as a root of its characteristic polynomial with multiplicity n, in particular that the only prime polynomials dividing the characteristic polynomial are the ones dividing the minimal polynomial. We proved such a fact about divisibility earlier for general square matrices when the prime factor has degree 1, but we did not give a proof for general degree. We pause for a moment to give a direct proof in the nilpotent case. Lemma 5.22. If N is a nilpotent matrix in Mn (K ), then N has characteristic polynomial λn and satisﬁes N n = 0. PROOF. If N l = 0, then (λI − N )(λl−1 I + λl−2 N + · · · + λ2 N l−3 + λN l−2 + N l−1 ) = λl I − N l = λl I. Taking determinants and using Proposition 5.1 in the ring R = K[λ], we obtain det(λI − N ) det(other factor) = det(λl I ) = λln .

6. Jordan Canonical Form

231

Thus det(λI − N ) divides λln . By unique factorization in K[λ], det(λI − N ) is a constant times a power of λ. Then we must have det(λI − N ) = λn . Applying the Cayley–Hamilton Theorem (Theorem 5.9), we obtain N n = 0. Let us now prove the uniqueness statements in Theorem 5.20; this step will in fact help orient us for the proof of Theorem 5.21. In (b), one thing we are to prove is that if A is similar to a matrix in Jordan form, then every prime polynomial dividing the minimal polynomial has degree 1. Since characteristic and minimal polynomials are unchanged under similarity, we may assume that A is itself in Jordan form. The characteristic and minimal polynomials of A are computed in the four examples of Section 3. Since the minimal polynomial is the product of polynomials of degree 1, the only primes dividing it have degree 1. In both (a) and (b) of Theorem 5.20, we are to prove that the Jordan form is unique up to permutation of the Jordan blocks. The matrix A determines its characteristic polynomial, which determines the roots of the characteristic polynomial, which are the diagonal entries of the Jordan form. Thus the sizes of the primary blocks within the Jordan form are determined by A. Within each primary block, we need to see that the sizes of the various Jordan blocks are completely determined. Thus we may assume that N is nilpotent and that C −1 N C = J is in Jordan form with 0’s on the diagonal. Although we shall make statements that apply in all cases, the reader may be helped by referring to the particular matrix J in Figure 5.1 and its powers in Figure 5.2. ⎞ ⎛0100 0010

⎜0001 ⎜ ⎜0000 ⎜ ⎜ J =⎜ ⎜ ⎜ ⎜ ⎝

010 001 000

01 00

01 00

⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎠ 0

FIGURE 5.1. Example of a nilpotent matrix in Jordan form. Each block of the Jordan form J contributes 1 to the dimension of the kernel (or null space really) of J via the ﬁrst column of the block, and hence dim(ker J ) = #{Jordan blocks in J }. In Figure 5.1 this number is 5.

V. Theory of a Single Linear Transformation

232

⎛0010 ⎜ ⎜ ⎜ ⎜ ⎜ J2 = ⎜ ⎜ ⎜ ⎜ ⎝

0001 0000 0000

001 000 000

00 00

00 00

⎞

⎛0001

⎞

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

0000 0000 0000

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

and

⎜ ⎜ ⎜ ⎜ ⎜ J3 = ⎜ ⎜ ⎜ ⎜ ⎝

000 000 000

00 00

00 00

0

0

FIGURE 5.2. Powers of the nilpotent matrix in Figure 5.1. When J is squared, the 1’s in J move up and to the right one more step beyond the diagonal except that blocks of size 2 become 0. When J is cubed, the 1’s in J move up and to the right one further step except that blocks of size 3 become 0. Each time J is raised to a new power one higher than before, each block that is nonzero in the old power contributes an additional 1 to the dimension of the kernel. Thus we have dim(ker J 2 ) − dim(ker J ) = #{Jordan blocks of size ≥ 2} and

dim(ker J 3 ) − dim(ker J 2 ) = #{Jordan blocks of size ≥ 3};

in the general case, dim(ker J k ) − dim(ker J k−1 ) = #{Jordan blocks of size ≥ k}

for k ≥ 1.

Lemma 5.22 says that J k = 0 when k is ≥ the size of J , and the differences need not be computed beyond that point. For Figure 5.2 the values by inspection are dim(ker J 2 ) = 9 and dim(ker J 3 ) = 11; also J 4 = 0 and hence dim(ker J 4 ) = 12. The numbers of Jordan blocks of size ≥ k for k = 1, 2, 3, 4 are 5, 4, 2, 1, and these numbers indeed match the differences 5 − 0, 9 − 5, 11 − 9, 12 − 11, as predicted by the above formula. Since C −1 N C = J , we have C −1 N k C = J k and N k C = C J k . The matrix C is invertible, and therefore dim(ker J k ) = dim(ker C J k ) = dim(ker N k C) = dim(ker N k ). Hence dim(ker N k ) − dim(ker N k−1 ) = #{Jordan blocks of size ≥ k}

for k ≥ 1,

and the number of Jordan blocks of each size is uniquely determined by properties of N . This completes the proof of all the uniqueness statements in Theorem 5.20.

6. Jordan Canonical Form

233

Now let us turn to the proof of Theorem 5.21, ﬁrst giving the idea. The argument involves a great many choices, and it may be helpful to understand it in the context of Figures 5.1 and 5.2. Let = (e1 , . . . , e12 ) be the standard ordered basis of K12 . The matrix J , when operating by multiplication on the left, moves basis vectors to other basis vectors or to 0. Namely, J e1 = 0,

J e2 = e1 ,

J e5 = 0,

J e3 = e2 ,

J e6 = e5 ,

J e8 = 0, J e10 = 0,

J e4 = e3 ,

J e7 = e6 ,

J e9 = e8 , J e11 = e10 ,

J e12 = 0, with each line describing what happens for a single Jordan block. Let us think L of the given nilpotent matrix N as equal to for some linear map L. We want to ﬁnd a new ordered basis = (v1 , . . . , v12 ) in which

the matrix of L is I −1 J . In the expression C N C = J , the matrix C equals , and its columns are expressions for v1 , . . . , v12 in the basis , i.e., Cei = vi . For each index i, we have J ei = J ei−1 or J ei = 0. The formula N C = C J , when applied to ei , therefore says that Cei−1 = vi−1 if J ei = ei−1 , N vi = N Cei = C J ei = 0 if J ei = 0. Thus we are looking for an ordered basis such that N sends each member of the basis either into the previous member or into 0. The procedure in this example will be to pick out v4 as a vector not annihilated by N 3 , obtain v3 , v2 , v1 , from it by successively applying N , pick out v7 as a vector not annihilated by N 2 and independent of what has been found, obtain v6 , v5 from it by successively applying N , and so on. It is necessary to check that the appropriate linear independence can be maintained, and that step will be what the proof is really about. The proof of Theorem 5.21 will now be given in the general case. The core of the argument concerns linear maps and appears as three lemmas. Afterward the results of the lemmas will be interpreted in terms of matrices. For all the lemmas let V be an n-dimensional vector space over K, and let N : V → V be linear with N n = 0. Deﬁne K j = ker N j , so that 0 = K 0 ⊆ K 1 ⊆ K 2 ⊆ · · · ⊆ K n = V. Lemma 5.23. Suppose j ≥ 1 and suppose Sj is any vector subspace of V such that K j+1 = K j ⊕ Sj . Then N is one-one from Sj into K j and N (Sj ) ∩ K j−1 = 0.

234

V. Theory of a Single Linear Transformation

PROOF. Since N (ker N j+1 ) ⊆ ker N j , we obtain N (Sj ) ⊆ K j ; thus N indeed sends Sj into K j . To see that N is one-one from Sj into K j , suppose that s is a member of Sj with N (s) = 0. Then s is in K 1 . Since j ≥ 1, K 1 ⊆ K j . Thus s is in K j . Since K j ∩ Sj = 0, s is 0. Hence N is one-one from Sj into K j . To see that N (Sj ) ∩ K j−1 = 0, suppose s is a member of Sj with N (s) in K j−1 . Then 0 = N j−1 (N (s)) = N j (s) shows that s is in K j . Since K j ∩ Sj = 0, s equals 0. Lemma 5.24. Deﬁne Un = Wn = 0. For 0 ≤ j ≤ n − 1, there exist vector subspaces U j and W j of K j+1 such that K j+1 = K j ⊕ U j ⊕ W j , and

U j = N (U j+1 ⊕ W j+1 ), N : U j+1 ⊕ W j+1 → U j is one-one.

PROOF. Deﬁne Un−1 = N (Un ⊕ Wn ) = 0, and let Wn−1 be a vector subspace such that V = K n = K n−1 ⊕ Wn−1 . Put Sn−1 = Un−1 ⊕ Wn−1 . Proceeding inductively downward, suppose that Un , Un−1 , . . . , U j+1 , Wn , Wn−1 , . . . , W j+1 have been deﬁned so that Uk = N (Uk+1 ⊕ Wk+1 ), N : Uk+1 ⊕ Wk+1 → Uk is one-one, and K k+1 = K k ⊕ Uk ⊕ Wk whenever k satisﬁes j < k ≤ n − 1. We put Sk = Uk ⊕ Wk for these values of k, and then Sk satisﬁes the hypothesis of Lemma 5.23 whenever k satisﬁes j < k ≤ n − 1. We now construct U j and W j . We put U j = N (Sj+1 ). Since Sj+1 satisﬁes the hypothesis of Lemma 5.23, we see that U j ⊆ K j+1 , N is one-one from Sj+1 into U j , and U j ∩ K j = 0. Thus we can ﬁnd a vector subspace W j with K j+1 = K j ⊕ U j ⊕ W j , and the inductive construction is complete. Lemma 5.25. The vector subspaces of Lemma 5.24 satisfy V = U0 ⊕ W0 ⊕ U1 ⊕ W1 ⊕ · · · ⊕ Un−1 ⊕ Wn−1 . PROOF. Iterated use of Lemma 5.24 gives V = K n = K n−1 ⊕ (Un−1 ⊕ Wn−1 ) = K n−2 ⊕ (Un−2 ⊕ Wn−2 ) ⊕ (Un−1 ⊕ Wn−1 ) = · · · = K 0 ⊕ (U0 ⊕ W0 ) ⊕ · · · ⊕ (Un−1 ⊕ Wn−1 ) = (U0 ⊕ W0 ) ⊕ · · · ⊕ (Un−1 ⊕ Wn−1 ), the last step holding since K 0 = 0, K 0 being the kernel of the identity function.

7. Computations with Jordan Form

235

PROOF OF THEOREM 5.21. We regard N as acting on V = Kn by multiplication on the left, and we describe an ordered basis in which the matrix of N is in Jordan form. For 0 ≤ j ≤ n − 1, form a basis of the vector subspace W j of Lemma 5.24, and let v ( j) be a typical member of this basis. Each v ( j) will be used as the last basis vector corresponding to a Jordan block of size j + 1. The full ordered basis for that Jordan block will therefore be N j v ( j) , N j−1 v ( j) , . . . , N v ( j) , v ( j) . The theorem will be proved if we show that the union of these sets as j and v ( j) vary is a basis of Kn and that N j+1 v ( j) = 0 for all j and v ( j) . From the ﬁrst conclusion of Lemma 5.24 we see for j ≥ 0 that W j ⊆ K j+1 , and hence N j+1 (W j ) = 0. Therefore N j+1 v ( j) = 0 for all j and v ( j) . Let us prove by induction downward on j that a basis of U j ⊕ W j consists of all v ( j) and all N k v ( j+k) for k > 0. The base case of the induction is j = n − 1, and the statement holds in that case since Un−1 = 0 and since the vectors v (n−1) form a basis of Wn−1 . The inductive hypothesis is that all v ( j+1) and all N k v ( j+1+k) for k > 0 together form a basis of U j+1 ⊕ W j+1 . The second and third conclusions of Lemma 5.24 together show that all N v ( j+1) and all N k+1 v ( j+1+k) for k > 0 together form a basis of U j . In other words, all N k v ( j+k) with k > 0 together form a basis of U j . The vectors v ( j) by construction form a basis of W j , and U j ∩ W j = 0. Therefore the union of these separate bases is a basis for U j ⊕ W j , and the induction is complete. Taking the union of the bases of U j ⊕ W j for all j and applying Lemma 5.25, we see that we have a basis of V = Kn . This shows that the desired set is a basis of Kn and completes the proof of Theorem 5.21.

7. Computations with Jordan Form Let us illustrate the computation of Jordan form and the change-of-basis matrix with a few examples. We are given a matrix A and we seek J and C with J = C −1 AC. We regard A as the matrix of some linear L in the standard ordered basis

, andwe regard J as the matrix of L in some other ordered basis . Then I C= , and so the columns of C give the members of written as ordinary column vectors (in the standard ordered basis). EXAMPLE 1. This example will be a nilpotent matrix, and we shall compute J and C merely by interpreting the proof of Theorem 5.21 in concrete terms. Let A=

−1 1 0 −1 1 0 −1 1 0

.

V. Theory of a Single Linear Transformation

236

The ﬁrst step is to compute the characteristic polynomial, which is det(λI − A) = det

λ+1 1 1

−1 0 λ−1 0 −1 λ

= λ det

λ+1 −1 1 λ−1

= λ3 .

Then A3 = 0 by the Cayley–Hamilton Theorem (Theorem 5.9), and A is indeed nilpotent. The diagonal entries of J are thus all 0, and we have to compute the sizes of the various Jordan blocks. To do so, we compute the dimension of the kernel of each power of A. The dimension of the kernel of a matrix equals the number of independent variables when we solve AX = 0 by row reduction. With the ﬁrst power of A, the variable x1 is dependent, and x2 and x3 are independent. Also, A2 = 0. Thus dim(ker A0 ) = 0,

dim(ker A) = 2,

and

dim(ker A2 ) = 3.

Hence #{Jordan blocks of size ≥ 1} = dim(ker A) − dim(ker A0 ) = 2 − 0 = 2, #{Jordan blocks of size ≥ 2} = dim(ker A2 ) − dim(ker A) = 3 − 2 = 1. From these equalities we see that one Jordan block has size 2 and the other has size 1. Thus

J=

01 00

0

.

We want to set up vector subspaces as in Lemma 5.24 so that K j+1 = K j ⊕U j ⊕W j and U j = A(U j+1 ⊕ W j+1 ) for 0 ≤ j ≤ 2. Since K 3 = K 2 , the equations begin with K 2 = · · · and are K 2 = K 1 ⊕ 0 ⊕ W1 ,

U0 = A(0 ⊕ W1 ),

K 1 = K 0 ⊕ U 0 ⊕ W0 .

x1 3 Here K 2 = K and K 1 is the subspace of all X = x2 such that AX = 0. x3

The space W1 is to satisfy K 2 = K 1 ⊕ W1 , and we see that W1 is 1-dimensional. Let {v (1) } be a basis of the 1-dimensional vector subspace W1 . Then U0 is 1-dimensional with basis {Av (1) }. The subspace K 1 is 2-dimensional and contains U0 . The space W0 is to satisfy K 1 = U0 ⊕W0 , and we see that W0 is 1-dimensional. Let {v (0) } be a basis of W0 . Then the respective columns of C may be taken to be Av (1) , Let us compute these vectors.

v (1) ,

v (0) .

7. Computations with Jordan Form

237

If we extend a basis of K 1 to a basis of K 2 , then W1 may be taken to be the linear span of the added vector. To

obtain a basis of K 1 , we compute that the 1 −1 0 0 0 0 , and the 0 00 Thus x1 = x2 , and

reduced row-echelon form of A is the single equation x1 − x2 = 0.

x1 x2 x3

= x2

1 1 0

+ x3

resulting system consists of

0 0 1

.

The coefﬁcients of x2 and x3 on the right side form a basis of K 1 , and we are to choose v (1) =

1 take

a vector that is not a linear combination of these. Thus we can −1 1 0 as the basis vector of W1 . Then U0 = A(W1 ) has Av (1) = A 0 = −1 0

0

−1

taken as a basis, and the basis of W0 may be

as any vector in K 1 but not U0 . We can take this basis to consist of v (0) =

0 0 1

.

−1 1 0 Lining up our three basis vectors as the columns of C gives us C = −1 0 0 . −1 0 1

0 −1 0 Computation gives C −1 = 1 −1 0 , and we readily check that C −1 AC = J . 0 −1 1

EXAMPLE 2. We continue with A and J as in Example 1, but we compute the columns of C without directly following the proof of Theorem 5.21. The method starts from the fact that each Jordan block corresponds to a 1-dimensional space of eigenvectors, and then we backtrack to ﬁnd vectors corresponding to the other columns. For this particular A, we know that the three columns of C are to be of the form v1 = Av (1) , v2 = v (1) , and v3 = v (0) . The vectors v1 and v3 together span the 0 eigenspace of A. We ﬁnd all the 0 eigenvectors, writing them as a two-parameter family.

This% eigenspace is just K 1 = ker A, and we found in Example 1 that K 1 =

x2 x2 x3

. One of these vectors is to be v1 , and it has to

x2 equal Av2 . Thus we solve Av2 = x2 . Applying the solution procedure yields x3

1 −1 0 0 0 0 0 0 0

−x2 0 x3 −x2

.

This system has no solutions unless x3 − x2 = 0. If we take x2 = x3 = −1, then we obtain the same

ﬁrst two columns of C as in Example 1, and any vector in K 1 independent of

−1 −1 −1

may be taken as the third column.

V. Theory of a Single Linear Transformation

238

EXAMPLE 3. Let

A=

2 1 0 −1 4 0 −1 2 2

.

Direct calculation shows that the characteristic polynomial is det(λI − A) = λ3 − 8λ2 + 21λ − 18 = (λ − 2)(λ − 3)2 . The possibilities for J are therefore

3 1 0

3 0 0 030 030 ; and 002

002

the ﬁrst one will be correct if the dimension of the eigenspace for the eigenvalue 3 is 2, and the second one will be correct if that dimension is 1. The third column of C corresponds to an eigenvector for the eigenvalue

2, hence to a nonzero solution of (A − 2I )v = 0. The solutions are v = k

0 and we can therefore use 0 .

0 0 1

,

1

For the ﬁrst two columns of C, we have to ﬁnd ker(A − 3I ) no matter which of the methods we use, the one in Example 1 or the one in Example

% 2. Solving the system of equations, we obtain all vectors in the space z

1 1 1

. The dimension

of the space is 1, and the second possibility for the Jordan form is the correct one. Following the method of Example 1 to ﬁnd the columns of C means that we pick a basis of this kernel and extend )2 . A basis

it to a basis of ker(A − 3I

of ker(A − 3I ) consists of the vector

1 1 1

. The matrix (A − 3I )2 is

the solution procedure leads to the formula

a

1

0 b =a 0 +c 1 c

1 for its kernel. The vector

1 1

0

0 00 0 00 0 −1 1

, and

1

arises from a = 1 and c = 1. We are to make an

independent choice, say a = 1 and c = 0. Then the second basis vector to use is

1 0 . This becomes the second column of C, and the ﬁrst column then has to be 0

1 −1

−1 1 0 (A − 3I ) 0 = −1 . The result is that C = −1 0 0 . 0

−1

−1 0 1

Following the method of Example 2 for this example means that we retain the

entire kernel of A − 3I , namely all vectors v1 = z

1 1 1

, as candidates for the

ﬁrst column of C. The second column is to satisfy (A − 3I )v2 = v1 . Solving

8. Problems

leads to v2 = z

−1 0 0

+c

239

1 1 1

. In contrast to Example 2, there is no potential

contradictory equation. So we choose z and then c.

If we take z−1= 1 and 1 0 . Then c = 0, we ﬁnd that the ﬁrst two columns of C are to be 1 and 0 1

1 −1 0 C = 1 00 . 1

01

For any example in which we can factor the characteristic polynomial exactly, either of the two methods used above will work. The ﬁrst method appears complicated but uses numbers throughout; it tends to be more efﬁcient with large examples involving high-degree minimal polynomials. The second method appears direct but requires solving equations with symbolic variables; it tends to be more efﬁcient for relatively simple examples.

8. Problems In Problems 1–25 all vector spaces are assumed ﬁnite-dimensional, and all linear transformations are assumed deﬁned from such spaces into themselves. Unless information is given to the contrary, the underlying ﬁeld K is assumed arbitrary. 1.

Let Mmn (C) be the vector space of m-by-n complex matrices. The group GL(m, C) × GL(n, C) acts on Mmn (C) by ((g, h), x) → gxh −1 , where gxh −1 denotes a matrix product. (a) Verify that this is indeed a group action. (b) Prove that two members of Mmn (C) lie in the same orbit if and only if they have the same rank. (c) For each possible rank, give an example of a member of Mmn (C) with that rank.

2.

Prove that a member of Mn (K) is invertible if and only if the constant term of its minimal polynomial is different from 0.

3.

Suppose that L : V → V is a linear map with minimal polynomial M(λ) = P1 (λ)l1 · · · Pk (λ)lk and that V = U ⊕ W with U and W both invariant under L. Let P1 (λ)r1 ·· · Pk (λ)rk and P1 (λ)s1 · · · Pk (λ)sk be the respective minimal polynomials of L U and L W . Prove that l j = max(r j , s j ) for 1 ≤ j ≤ k.

4.

(a) If A and B are in Mn (K), if P(λ) is a polynomial such that P(AB) = 0, and if Q(λ) = λP(λ), prove that Q(B A) = 0. (b) What can be inferred from (a) about the relationship between the minimal polynomials of AB and of B A?

V. Theory of a Single Linear Transformation

240

5.

(a) Suppose that D and D are in Mn (K), are similar to diagonal matrices, and have D D = D D. Prove that there is a matrix C such that C −1 DC and C −1 D C are both diagonal. (b) Give an example of two nilpotent matrices N and N in Mn (K) with N N = N N such that there is no C with C −1 N C and C −1 N C both in Jordan form.

6.

(a) Prove that the matrix of a projection is similar to a diagonal matrix. What are the eigenvalues? (b) Give a necessary and sufﬁcient condition for two projections involving the same V to be given by similar matrices.

7.

Let E : V → V and F : V → V be projections. Prove that E and F have (a) the same image if and only if E F = F and F E = E, (b) the same kernel if and only if E F = E and F E = F.

8.

Let E : V → V and F : V → V be projections. Prove that E F is a projection if E F = F E. Prove or disprove a converse.

9.

An involution on V is a linear map U : V → V such that U 2 = I . Show that the equation U = 2E − 1 establishes a one-one correspondence between all projections E and all involutions U .

10. Let L : V → V be linear. Prove that there exist vector subspaces U and W of V such that

(i) (ii) (iii) (iv)

V = U ⊕ W, L(U ) ⊆ U and L(W ) ⊆ W , L is nilpotent on U , L is nonsingular on W .

11. Prove that the vector subspaces U and W in the previous problem are uniquely characterized by (i) through (iv). 12. Let L : V → V be a linear map, and suppose that its minimal polynomial is of the form M(λ) = kj=1 (λ − λ j )l j with the λ j distinct. Let V = U1 ⊕ · · · ⊕ Uk be the corresponding primary decomposition of V , and deﬁne D : V → V by D = λ1 E 1 + · · · + λk E k , where E 1 , . . . , E k are the projections associated with the primary decomposition. Finally put N = L − D. Prove that (a) L = D + N , (b) D has a basis of eigenvectors, (c) N is nilpotent, (d) D N = N D. (e) D and N are given by polynomials in L, (f) the minimal polynomial of D is kj=1 (λ − λ j ), (g) the minimal polynomial of N is λmax l j .

8. Problems

241

13. In the previous problem with L given, prove that the decomposition L = D + N is uniquely determined by properties (a) through (d). 14. Let A be a nilpotent square matrix. Prove that det(I + A) = 1. −5 9 15. For the complex matrix A = −4 7 , ﬁnd a Jordan-form matrix J and an invertible matrix C such that J = C −1 AC.

4 1 −1 16. For the complex matrix A = −8 −2 2 , ﬁnd a Jordan-form matrix J and an 8

2 −2

invertible matrix C such that J = C −1 AC. 17. For the upper triangular matrix ⎛2 0 0 1 1 0 0⎞ ⎜ ⎜ A=⎜ ⎜ ⎝

2000 201 20 2

1 0 1 1 2

1 ⎟ 0⎟ 2⎟ ⎟, 1⎠ 1 3

ﬁnd a Jordan-form matrix J and an invertible matrix C such that J = C −1 AC. 18. (a) For M3 (C), prove that any two matrices with the same minimal polynomial and the same characteristic polynomial must be similar. (b) Is the same thing true for M4 (C)? 19. Suppose that K has characteristic 0 and that J is a Jordan block with nonzero eigenvalue and with size > 1. Prove that there is no n ≥ 1 such that J n is diagonal. 20. Classify up to similarity all members A of Mn (C) with An = I . 21. How many similarity classes are there of 3-by-3 matrices A with entries in C such that A3 = A? Explain. 22. Let n ≥ 2, and let N be a member of Mn (K) with N n = 0 but N n−1 = 0. Prove that there is no n-by-n matrix A with A2 = N . 23. For a Jordan block J , prove that J t is similar to J . 24. Prove that if A is in Mn (C), then At is similar to A. 25. Let N be the 2-by-2 matrix 00 10 , and let A and B be the 4-by-4 matrices A = N0 N0 and B = N0 NN . Prove that A and B are similar. Problems 26–31 concern cyclic vectors. Fix a linear map L : V → V from a ﬁnitedimensional vector space V to itself. For v in V , let P(v) denote the set of all vectors Q(L)(v) in V for Q(λ) in K[λ]; P(v) is a vector subspace and is invariant under L. If U is an invariant subspace of V , we say that U is a cyclic subspace if there is some

V. Theory of a Single Linear Transformation

242

v in U such that P(v) = U ; in this case, v is said to be a cyclic vector for U , and U is called the cyclic subspace generated by v. For v in V , let Iv be the ideal of all polynomials Q(λ) in K[λ] with Q(L)v = 0. The monic generator of v is the unique monic polynomial Mv (λ) such that Mv (λ) divides every member of Iv . 26. For v ∈ V , explain why Iv is nonzero and why Mv (λ) therefore exists. 27. For v ∈ V , prove that (a) the degree of the monic generator Mv (λ) equals the dimension of the cyclic subspace P(v), (b) the vectors v, L(v), L 2 (v), . . . , L deg Mv −1 (v) form a vector-space basis of P(v), (c) the minimal and characteristic polynomials of L P(v) are both equal to Mv (λ). 28. Suppose that Mv (λ) = c0 + c1 λ + · · · + cd−1 λd−1 + λd . Prove that the matrix of L in a suitable ordered basis is P(v)

⎛ −c

d−1

⎜ . ⎜ . ⎜ . ⎜ −c2 ⎝

⎞

1 0 ···

⎜ −cd−2 0 1 ⎜ −cd−3 0 0 00

⎟ ⎟ . . . . .. ⎟ . . .⎟ ⎟. ⎟ ··· 0 1 0 ⎠

−c1 0 0 ··· −c0 0 0 ···

0 01 00

29. Suppose that v is in V , that Mv (λ) is a power of a prime polynomial P(λ), and that Q(λ) is a nonzero polynomial with deg Q(λ) < deg P(λ). Prove that P(Q(L)(v)) = P(v). 30. Let P(λ) be a prime polynomial. (a) Prove by induction on dim V that if the minimal polynomial of L is P(λ), then the characteristic polynomial of L is a power of P(λ). (b) Prove by induction on l that if the minimal polynomial of L is P(λ)l , then the characteristic polynomial of L is a power of P(λ). (c) Conclude that if the minimal polynomial of L is a power of P(λ), then deg P(λ) divides dim V . 31. (a) Prove that every prime factor of the characteristic polynomial of L divides the minimal polynomial of L. (b) In Problem 12 prove that D and L have the same characteristic polynomial. Problems 32–40 continue the study of cyclic vectors begun in Problems 26–31, using the same notation. The goal is to obtain a canonical-form theorem like Theorem 5.20 for L but with no assumption on K or P(λ), namely that each primary subspace for L is the direct sum of cyclic subspaces and the resulting decomposition is unique up to isomorphism. This result and the Fundamental Theorem of Finitely Generated

8. Problems

243

Abelian Groups (Theorem 4.56) will be seen in Chapter VIII to be special cases of a single more general theorem. Still another canonical form for matrices and linear maps is an analog of the result with elementary divisors mentioned in the remarks with Theorem 4.56 and is valid here; it is called rational canonical form, but we shall not pursue it until the problems at the end of Chapter VIII. The proof in Problems 32–40 uses ideas similar to those used for Theorem 5.21 except that the hypothesis will now be that the minimal polynomial of L is P(λ)l with P(λ) prime, rather than just λl . Deﬁne K j = ker(P(L) j ) for j ≥ 0, so that K 0 = 0, K j ⊆ K j+1 for all j, K l = V , and each K j is an invariant subspace under L. Deﬁne d = deg P(λ). 32. Suppose j ≥ 1, and suppose Sj is any vector subspace of V such that K j+1 = K j ⊕ Sj . Prove that P(L) is one-one from Sj into K j and P(L)(Sj ) ∩ K j−1 = 0. 33. Deﬁne Ul = Wl = 0. For 0 ≤ j ≤ l − 1, prove that there exist vector subspaces U j and W j of K j+1 such that K j+1 = K j ⊕ U j ⊕ W j , U j = P(L)(U j+1 ⊕ W j+1 ), P(L) : U j+1 ⊕ W j+1 → U j

is one-one.

34. Prove that the vector subspaces of the previous problem satisfy V = U0 ⊕ W0 ⊕ U1 ⊕ W1 ⊕ · · · ⊕ Ul−1 ⊕ Wl−1 . 35. For v = 0 in W j , prove that the set of all L r P(L)s (v) with 0 ≤ r ≤ d − 1 and 0 ≤ s ≤ j is a vector-space basis of P(v). 36. Going back over the construction in Problem 33, prove that each W j can be ( j) chosen to have a basis consisting of vectors L r (vi ) for 1 ≤ i ≤ (dim W j )/d and 0 ≤ r ≤ d − 1. 37. Let the index i used in the previous problem with j be denoted by i j for 1 ≤ i j ≤ (dim W j )/d. Prove that a vector-space basis of U j ⊕ W j consists of all ( j+k) L r P(L)k (vi j+k ) for 0 ≤ r ≤ d − 1, k ≥ 0, 1 ≤ i j+k ≤ (dim W j+k )/d. 38. Prove that V is the direct sum of cyclic subspaces under L. Prove speciﬁcally ( j) that each vi j generates a cyclic subspace and that the sum of all these vector subspaces, with 0 ≤ j ≤ l and 1 ≤ i j ≤ (dim W j )/d, is a direct sum and equals V. 39. In the decomposition of the previous problem, each cyclic subspace generated ( j) by some vi j has minimal polynomial P(λ) j+1 . Prove that

% direct summands with minimal polynomial # = (dim K j+1 − dim K j )/d. P(λ)k for some k ≥ j + 1

V. Theory of a Single Linear Transformation

244

40. Prove that the formula of the previous problem persists for any decomposition of V as the direct sum of cyclic subspaces, and conclude from Problem 28 that the decomposition into cyclic subspaces is unique up to isomorphism. Problems 41–46 concern systems of ordinary differential equations with constant coefﬁcients. The underlying ﬁeld is taken to be C, and differential calculus is used. t k Ak For A in Mn (C) and t in R, deﬁne et A = ∞ k=0 k ! . Take for granted that the series deﬁning et A converges entry by entry, that the series may be differentiated term by term to yield dtd (et A ) = Aet A = et A A, and that es A+t B = es A et B if A and B commute. 41. Calculate et A for A equal to 01 (a) −1 0 , (b) 01 10 , (c) the diagonal matrix with diagonal entries d1 , . . . , dn . 42. (a) Calculate et J when J is a nilpotent n-by-n Jordan block. (b) Use (a) to calculate et J when J is a general n-by-n Jordan block. 43. Let y1 , . . . , yn be unknown functions from R to C, and let y be the vector-valued function formed by arranging y1 , . . . , yn in a column. Suppose that A is in Mn (C). Prove for each vector v ∈ Cn that y(t) = et A v is a solution of the system of differential equations dy = Ay(t). dt 44. With notation as in the previous problem and with v ﬁxed in Cn , use e−t A y(t) to show, for each open interval of t’s containing 0, that the only solution of dy = Ay(t) on that interval such that y(0) = v is y(t) = et A v. dt −1

45. For C invertible, prove that etC AC = C −1 et A C, and deduce a relationship between solutions of dy = Ay(t) and solutions of dy = (C −1 AC)y(t). dt dt

2 1 0 46. Let A = −1 4 0 . Taking into account Example 3 in Section 7 and Problems −1 2 2

42 through 45 above, ﬁnd all solutions for t in (−1, 1) to the system

1 such that y(0) = 2 . 3

dy dt

= Ay(t)

CHAPTER VI Multilinear Algebra

Abstract. This chapter studies, in the setting of vector spaces over a ﬁeld, the basics concerning multilinear functions, tensor products, spaces of linear functions, and algebras related to tensor products. Sections 1–5 concern special properties of bilinear forms, all vector spaces being assumed to be ﬁnite-dimensional. Section 1 associates a matrix to each bilinear form in the presence of an ordered basis, and the section shows the effect on the matrix of changing the ordered basis. It then addresses the extent to which the notion of “orthogonal complement” in the theory of inner-product spaces applies to nondegenerate bilinear forms. Sections 2–3 treat symmetric and alternating bilinear forms, producing bases for which the matrix of such a form is particularly simple. Section 4 treats a related subject, Hermitian forms when the ﬁeld is the complex numbers. Section 5 discusses the groups that leave some particular bilinear and Hermitian forms invariant. Section 6 introduces the tensor product of two vector spaces, working with it in a way that does not depend on a choice of basis. The tensor product has a universal mapping property—that bilinear functions on the product of the two vector spaces extend uniquely to linear functions on the tensor product. The tensor product turns out to be a vector space whose dual is the vector space of all bilinear forms. One particular application is that tensor products provide a basis-independent way of extending scalars for a vector space from a ﬁeld to a larger ﬁeld. The section includes a number of results about the vector space of linear mappings from one vector space to another that go hand in hand with results about tensor products. These have convenient formulations in the language of category theory as “natural isomorphisms.” Section 7 begins with the tensor product of three and then n vector spaces, carefully considering the universal mapping property and the question of associativity. The section deﬁnes an algebra over a ﬁeld as a vector space with a bilinear multiplication, not necessarily associative. If E is a vector space, the tensor algebra T (E) of E is the direct sum over n ≥ 0 of the n-fold tensor product of E with itself. This is an associative algebra with a universal mapping property relative to any linear mapping of E into an associative algebra A with identity: the linear map extends to an algebra homomorphism of T (E) into A carrying 1 into 1. Sections 8–9 deﬁne the symmetric and exterior algebras of a vector space E. The symmetric algebra S(E) is a quotient of T (E) with the following universal mapping property: any linear mapping of E into a commutative associative algebra A with identity extends to an algebra homomorphism of S(E) into A carrying 1 into 1. The symmetric algebra is commutative. Similarly the exterior algebra (E) is a quotient of T (E) with this universal mapping property: any linear mapping l of E into an associative algebra A with identity such that l(v)2 = 0 for all v ∈ E extends to an algebra homomorphism of (E) into A carrying 1 into 1. The problems at the end of the chapter introduce some other algebras that are of importance in applications, and the problems relate some of these algebras to tensor, symmetric, and exterior algebras. Among the objects studied are Lie algebras, universal enveloping algebras, Clifford algebras, Weyl algebras, Jordan algebras, and the division algebra of octonions. 245

VI. Multilinear Algebra

246

1. Bilinear Forms and Matrices This chapter will work with vector spaces over a common ﬁeld of “scalars,” which will be called K. In Section 6 a ﬁeld containing K as a subﬁeld will brieﬂy play a role, and that will be called L. If V is a vector space over K, a bilinear form on V is a function from V × V into K that is linear in each variable when the other variable is held ﬁxed. EXAMPLES. (1) For general K, take V = Kn . Any matrix A in Mn (K) determines a bilinear form by the rule v, w = v t Aw. (2) For K = R, let V be an inner-product space, in the sense of Chapter III, with inner product ( · , · ). Then ( · , · ) is a bilinear form on V . Multilinear functionals on a vector space of row vectors, also called k-linear functionals or k-multilinear functionals, were deﬁned in the course of working with determinants in Section II.7, and that deﬁnition transparently extends to general vector spaces. A bilinear form on a general vector space is then just a 2-linear functional. From the point of view of deﬁnitions, the words “functional” and “form” are interchangeable here, but the word “form” is more common in the bilinear case because of a certain homogeneity that it suggests and that comes closer to the surface in Corollary 6.12 and in Section 7. For the remainder of this section, all vector spaces will be ﬁnite-dimensional. Bilinear forms, i.e., 2-linear functionals, are of special interest relative to klinear functionals for general k because of their relationships with matrices and linear mappings. To begin with, each bilinear form, in the presence of an ordered basis, is given by a matrix. In more detail let V be a ﬁnite-dimensional vector space, and let · , · be a bilinear form on V . If an ordered basis = (v1 , . . . , vn ) of V is speciﬁed, then the bilinear form determines the matrix B with entries Bi j = vi , v j . Conversely we can recover the bilinear form from B as follows: Write v = i ai vi and w = j b j v j . Then v, w =

, i

ai vi ,

j

- b j v j = i, j ai vi , v j b j .

In other words, v, w = a Bb, where a = t

v

and b =

of Section II.3. Therefore

v, w =

v

t B

w .

w

in the notation

1. Bilinear Forms and Matrices

247

Consequently we see that all bilinear forms on a ﬁnite-dimensional vector space reduce to Example 1 above—once we choose an ordered basis. Let us examine the effect of a change of ordered basis. Suppose that = (v1 , . . . , vm ) and = (w1 , . . . , wn ), and let B and C be the matrices of the bilinear form in these two ordered bases: Bi j = vi , v j and C i j = w i , w j . Let I the two bases be related by w j = i ai j vi , i.e., let [ai j ] = . Then we have , - Ci j = wi , w j = aki vk , al j vl = aki al j vk , vl = aki Bkl al j . k

l

k,l

k,l

Translating this formula into matrix form, we obtain the following proposition. Proposition 6.1. Let · , · be a bilinear form on a ﬁnite-dimensional vector space V , let and be ordered bases of V , and let B and C be the respective matrices of · , · relative to and . Then

t

I I B . C= The qualitative conclusion about the matrices may be a little unexpected. It is not that they are similar but that they are related by C = S t B S for some nonsingular square matrix S. In particular, B and C need not have the same determinant. Guided by the circle of ideas around the Riesz Representation Theorem for inner products (Theorem 3.12), let us examine what happens when we ﬁx one of the variables of a bilinear form and work with the resulting linear map. Thus again let · , · be a bilinear form on V . For ﬁxed u in V , v → u, v is a linear functional on V , thus a member of the dual space V of V . If we write L(u) for this linear functional, then L is a function from V to V satisfying L(u)(v) = u, v. The formula for L shows that L is in fact a linear function. We deﬁne the left radical, lrad, of · , · to be the kernel of L; thus lrad · , · = {u ∈ V | u, v = 0 for all v ∈ V }. Similarly we let R : V → V be the linear map R(v)(u) = u, v, and we deﬁne the right radical, rrad, of · , · to be the kernel of R; thus rrad · , · = {v ∈ V | u, v = 0 for all u ∈ V }. EXAMPLE 1, CONTINUED. The vector space V is the space Kn of n-dimensional column vectors, the dual V is the space of n-dimensional row vectors, A is

248

VI. Multilinear Algebra

an n-by-n matrix with entries in K, and · , · is given by u, v = u t Av = L(u)(v) = R(v)(u) for u and v in Kn . Explicit formulas for L and R are given by L(u) = u t A = (At u)t and

R(v) = (Av)t .

Thus lrad · , · = ker L = null space(At ), rrad · , · = ker R = null space(A). Since A is square and since the row rank and column rank of A are equal, the dimensions of the null spaces of A and At are equal. Hence dim lrad · , · = dim rrad · , · . This equality of dimensions for the case of Kn extends to general V , as is noted in the next proposition. Proposition 6.2. If · , · is any bilinear form on a ﬁnite-dimensional vector space V , then dim lrad · , · = dim rrad · , · . PROOF. We saw above that computations with bilinear forms of V reduce, once we choose an ordered basis for V , to computations with matrices, row vectors, and column vectors. Thus the argument just given in the continuation of Example 1 is completely general, and the proposition is proved. A bilinear form · , · is said to be nondegenerate if its left radical is 0. In view of the Proposition 6.2, it is equivalent to require that the right radical be 0. When the radicals are 0, the associated linear maps L and R from V to V are one-one. Since dim V = dim V , it follows that L and R are onto V . Thus a nondegenerate bilinear form on V sets up two canonical isomorphisms of V with its dual V . For deﬁniteness let us work with the linear mapping L : V → V given by L(u)(v) = u, v). If U ⊆ V is a vector subspace, deﬁne U ⊥ = {u ∈ V | u, v = 0 for all v ∈ U }. It is apparent from the deﬁnitions that U ∩ U ⊥ = lrad · , · U ×U .

1. Bilinear Forms and Matrices

249

In contrast to the special case that K = R and the bilinear form is an inner product, U ∩ U ⊥ may be nonzero even if · , · is nondegenerate. For example let V = R2 , deﬁne .x y / 1 1 = x1 y1 − x2 y2 , x2 , y2 # x $ 1 . The and suppose that U is the 1-dimensional vector subspace U = x1 1 0 matrix of the bilinear form in the standard ordered basis is 0 −1 ; since the matrix is nonsingular, # y $ the bilinear form is nondegenerate. Direct calculation shows that 1 ⊥ = U , so that U ∩U ⊥ = 0. Nevertheless, in the nondegenerate case U = y1 the dimensions of U and U ⊥ behave as if U ⊥ were an orthogonal complement. The precise result is as follows. Proposition 6.3. If · , · is a nondegenerate bilinear form on the ﬁnitedimensional vector space V and if U is a vector subspace of V , then dim V = dim U + dim U ⊥ . PROOF. Deﬁne : V → U by (v)(u) = v, u for v ∈ V and u ∈ U . The deﬁnition of U ⊥ shows that ker = U ⊥ . To see that image = U , choose a vector subspace U1 of V with V = U ⊕ U1 , let u be in U , and deﬁne v in V by u on U, v = 0 on U1 . Since · , · is nondegenerate, the linear mapping L : V → V is onto V . Thus we can choose v ∈ V with L(v) = v . Then

(v)(u) = v, u = L(v)(u) = v (u) = u (u) for all u in U , and hence (v) = u . Therefore image = U , and we conclude that dim V = dim(ker ) + dim(image ) = dim U ⊥ + dim U = dim U ⊥ + dim U. Corollary 6.4. If · , · is a nondegenerate bilinear form on the ﬁnitedimensional vector space V and if U is a vector subspace of V , then V = U ⊕U ⊥ if and only if · , · U ×U is nondegenerate. PROOF. Corollary 2.29 and Proposition 6.3 together give dim(U + U ⊥ ) + dim(U ∩ U ⊥ ) = dim U + dim U ⊥ = dim V.

Thus U + U ⊥ = V if and only if U ∩ U ⊥ = 0, if and only if · , · U ×U is nondegenerate. The result therefore follows from Proposition 2.30.

250

VI. Multilinear Algebra

2. Symmetric Bilinear Forms We continue with the setting in which K is a ﬁeld and all vector spaces of interest are deﬁned over K and are ﬁnite-dimensional. A bilinear form · , · on V is said to be symmetric if u, v = v, u for all u and v in V , skew-symmetric if u, v = −v, u for all u and v in V , and alternating if u, u = 0 for all u in V . “Alternating” always implies “skew-symmetric.” In fact, if · , · is alternating, then 0 = u + v, u + v = u, u + u, v + v, u + v, v = u, v + v, u; thus · , · is skew-symmetric. If K has characteristic different from 2, then the converse is valid: “skew-symmetric” implies “alternating.” In fact, if · , · is skew-symmetric, then u, u = −u, u and hence 2u, u = 0; thus u, u = 0, and · , · is alternating. Let us examine further the effect of the characteristic of K. If, on the one hand, K has characteristic different from 2, the most general bilinear form · , · is the sum of the symmetric form · , · s and the alternating form · , · a given by u, vs = 12 (u, v + v, u), u, va = 12 (u, v − v, u). In this sense the symmetric and alternating bilinear forms are the extreme cases among all bilinear forms, and we shall study the two cases separately. If, on the other hand, K has characteristic 2, then “alternating” implies “skewsymmetric” but not conversely. “Alternating” is a serious restriction, and we shall be able to deal with it. However, “symmetric” and “skew-symmetric” are equivalent since 1 = −1, and thus neither condition is much of a restriction; we shall not attempt to say anything insightful in these cases. In this section we study symmetric bilinear forms, obtaining results when K has characteristic different from 2. From the symmetry it is apparent that the left and right radicals of a symmetric bilinear form are the same, and we call this vector subspace the radical of the form. By way of an example, here is a continuation of Example 1 from the previous section. EXAMPLE. Let V = Kn , let A be a symmetric n-by-n matrix (i.e., one with At = A), and let u, v = u t Av. The computation v, u = v t Au = (v t Au)t = u t At v = u t Av = u, v shows that the bilinear form · , · is symmetric; the second equality v t Au = (v th Au)t holds since v t Au is a 1-by-1 matrix. Again the example is completely general. In fact, if = (v1 , . . . , vn ) is an ordered basis of a vector space V and if · , · is a given symmetric bilinear form on V , then the matrix of the form has entries Ai j = vi , v j , and these evidently satisfy Ai j = A ji . So A is a symmetric matrix, and computations with the bilinear form are reduced to those used in the example.

2. Symmetric Bilinear Forms

251

Theorem 6.5 (Principal Axis Theorem). Suppose that K has characteristic different from 2. (a) If · , · is a symmetric bilinear form on a ﬁnite-dimensional vector space V , then there exists an ordered basis of V in which the matrix of · , · is diagonal. (b) If A is an n-by-n symmetric matrix, then there exists a nonsingular n-by-n matrix M such that M t AM is diagonal. REMARKS. Because computations with general symmetric bilinear forms reduce to computations in the special case of a symmetric matrix and because Proposition 6.1 tells the effect of a change of ordered basis, (a) and (b) amount to the same result; nevertheless, we give two proofs of Theorem 6.5—a proof via matrices and a proof via linear maps. A hint of the validity of the theorem comes from the case that K = R. For the ﬁeld R when the bilinear form is an inner product, the Spectral Theorem (Theorem 3.21) says that there is an orthonormal basis of eigenvectors and hence that (a) holds. When K = R, the same theorem says that there exists an orthogonal matrix M with M −1 AM diagonal; since any orthogonal matrix M satisﬁes M −1 = M t , the Spectral Theorem is saying that (b) holds. PROOF VIA MATRICES. If A is an n-by-n symmetric matrix, we seek a nonsingular M with M t AM diagonal. We induct on the size of A, the base case of the induction being n = 1, where there is nothing to prove. Assume the result to be known for size n − 1, and write the given n-by-n matrix A in block form as A = bat db with d of size 1-by-1. If d = 0, let x be the column vector −d −1 b. Then a b I 0 I x = ∗0 d0 , bt d xt 1 0 1 and the induction goes through. If d = 0, we argue in a different way. We may assume that b = 0 since otherwise the result is immediate by induction. Say bi = 0 with 1 ≤ i ≤ n − 1. Let y be an (n − 1)-dimensional row vector with i th entry a member δ of K to be speciﬁed and with other entries 0. Then

I 0 y 1

a b

I yt

bt 0

0 1

=

∗

∗

∗ yay t +bt y t +yb

=

∗

∗

∗ δ 2 aii +2δbi

.

Since K has characteristic different from 2, 2bi is not 0; thus there is some value of δ for which δ 2 aii + 2δbi = 0. Then we are reduced to the case d = 0, which we have already handled, and the induction goes through. PROOF VIA LINEAR MAPS. We may assume that the given symmetric bilinear form is not identically 0, since otherwise any basis will do. Let the radical of the form be denoted by rad = rad · , · . Choose a vector subspace S of V such that V = rad ⊕S, and put [ · , · ] = · , · S×S . Then [ · , · ] is a symmetric

252

VI. Multilinear Algebra

bilinear form on S, and it is nondegenerate. In fact, [u, · ] = 0 means u, v = 0 for all v ∈ S; since u, v = 0 for v in rad anyway, u, v = 0 for all v ∈ V , u is in rad as well as S, and u = 0. Since · , · is not identically 0, the subspace S is not 0. Thus the nondegenerate symmetric bilinear form [ · , · ] on S is not 0. Since [u, v] = 12 [u + v, u + v] − [u, u] − [v, v] , it follows that [v, v] = 0 for some v in S. Put U1 = Kv. Then [ · , · ]U1 ×U1 is nondegenerate, and Corollary 6.4 implies that S = U1 ⊕ U1⊥ . Applying the ⊥ converse direction of the same corollary to U1 , we see that [ · , · ] U ⊥ ×U ⊥ is 1

1

nondegenerate. Repeating this construction with U ⊥ and iterating, we obtain V = rad ⊕U1 ⊕ · · · ⊕ Uk

with Ui , U j = 0 for i = j and with dim Ui = 1 for all i. This completes the proof. Theorem 6.5 fails in characteristic 2. Problem 2 at the end of the chapter illustrates the failure. Let us examine the matrix version of Theorem 6.5 more closely when K is C or R. The theorem says that if A is n-by-n symmetric, then we can ﬁnd a nonsingular M with B = M t AM diagonal. Taking D diagonal and forming C = D t B D, we see that we can adjust the diagonal entries of B by arbitrary nonzero squares. Over C, we can therefore arrange that C is of the form diag(1, . . . , 1, 0, . . . , 0). The number of 1’s equals the rank, and this has to be the same as the rank of the given matrix A. The form is nondegenerate if and only if there are no 0’s. Thus we understand everything about the diagonal form. Over R, matters are more subtle. We can arrange that C is of the form diag(±1, . . . , ±1, 0, . . . , 0), the various signs ostensibly not being correlated. Replacing C by P t C P with P a permutation matrix, we may assume that our diagonal matrix is of the form diag(+1, . . . , +1, −1, . . . , −1, 0, . . . , 0). The number of +1’s and −1’s together is again the rank of A, and the form is nondegenerate if and only if there are no 0’s. But what about the separate numbers of +1’s and −1’s? The triple given by ( p, m, z) = #(+1)’s, #(−1)’s, #(0)’s is called the signature of A when K = R. A similar notion can be deﬁned in the case of a symmetric bilinear form over R. Theorem 6.6 (Sylvester’s Law). The signature of an n-by-n symmetric matrix over R is well deﬁned.

3. Alternating Bilinear Forms

253

PROOF. The integer p + m is the rank, which does not change under a transformation A → M t AM if M is nonsingular. Thus we may take z as known. Let ( p , m , z) and ( p, m, z) be two signatures for a symmetric matrix A, with p ≤ p. Deﬁne the corresponding symmetric bilinear form on Rn by u, v = u t Av. Let (v1 , . . . , vn ) and (v1 , . . . , vn ) be ordered bases of Rn diagonalizing the bilinear form and exhibiting the resulting signature, i.e., having vi , v j = vi , v j = 0 for i = j and having ⎧ ⎨ +1 v j , v j = −1 ⎩ 0 ⎧ ⎨ +1 v j , v j = −1 ⎩ 0

for 1 ≤ j ≤ p , for p + 1 ≤ j ≤ n − z, for n − z + 1 ≤ j ≤ n, for 1 ≤ j ≤ p, for p + 1 ≤ j ≤ n − z, for n − z + 1 ≤ j ≤ n.

We shall prove that {v1 , . . . , v p , v p +1 , . . . , vn } is linearly independent, and then we must have p ≥ p. Reversing the roles of p and p , we see that p = p and m = m, and the theorem is proved. Thus suppose we have a linear dependence: a1 v1 + · · · + a p v p = b p +1 v p +1 + · · · + bn vn . Let v be the common value of the two sides of this equation. Then v, v = a1 v1 + · · · + a p v p , a1 v1 + · · · + a p v p =

p

a j2 ≥ 0

j=1

and v, v = b p +1 v p +1 + · · · + bn vn , b p +1 v p +1 + · · · + bn vn = −

n−z j= p +1

b2j ≤ 0.

p We conclude that v, v = 0, j=1 a j2 = 0, and a1 = · · · = a p = 0. Thus v = 0 and b p +1 v p +1 + · · · + bn vn = 0. Since {v p +1 , . . . , vn } is linearly independent, we obtain also b p +1 = · · · = bn = 0. Therefore {v1 , . . . , v p , v p +1 , . . . , vn } is a linearly independent set, and the proof is complete. 3. Alternating Bilinear Forms We continue with the setting in which K is a ﬁeld and all vector spaces of interest are deﬁned over K and are ﬁnite-dimensional.

VI. Multilinear Algebra

254

In this section we study alternating bilinear forms, imposing no restriction on the characteristic of K. From the skew symmetry of any alternating bilinear form it is apparent that the left and right radicals of such a form are the same, and we call this vector subspace the radical of the form. First let us consider examples given in terms of matrices. Temporarily let us separate matters according to the characteristic. EXAMPLE 1 OF SECTION 1 WITH K OF CHARACTERISTIC = 2. Let V = Kn , let A be a skew-symmetric n-by-n matrix (i.e., one with At = −A), and let u, v = u t Av. The computation v, u = v t Au = (v t Au)t = u t At v = −u t Av = −u, v shows that the bilinear form · , · is skew-symmetric, hence alternating. EXAMPLE 1 OF SECTION 1 WITH K OF CHARACTERISTIC = 2. Let V = Kn , let A be an n-by-n matrix, and deﬁne u, v = u t Av. We suppose that A is skewsymmetric; it is the same to assume that A is symmetric since the characteristic is 2. In order to have ei , ei = 0 for each standard basis vector, we shall 0 for all i. If u is a column vector with entries u 1 , . . . , u n , then assume that Aii = u, u = u t Au = i, j u i Ai j u j = i = j u i Ai j u j = i< j (Ai j u i u j + A ji u i u j ) = i< j 2Ai j u i u j = 0. Hence the bilinear form · , · is alternating. Again the examples are completely general. In fact, if = (v1 , . . . , vn ) is an ordered basis of a vector space V and if · , · is a given alternating bilinear form, then the matrix of the form has entries Ai j = vi , v j that evidently satisfy Ai j = −A ji and Aii = 0. So A is a skew-symmetric matrix with 0’s on the diagonal, and computations with the bilinear form are reduced to those used in the examples. To keep the terminology parallel, let us say that a square matrix is alternating if it is skew-symmetric and has 0’s on the diagonal. Theorem 6.7. (a) If · , · is an alternating bilinear form on a ﬁnite-dimensional vector space V , then there exists an ordered basis of V in which the matrix of · , · has the form ⎞ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

01 −1 0

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

01 −1 0

..

. 01 −1 0 0

..

. 0

4. Hermitian Forms

255

If · , · is nondegenerate, then dim V is even. (b) If A is an n-by-n alternating matrix, then there exists a nonsingular n-by-n matrix M such that M t AM is as in (a). PROOF. It is enough to prove (a). Let rad be the radical of the given form · , · , and choose a vector subspace S of V with V = rad ⊕S. The restriction of · , · to S is then alternating and nondegenerate. We may now proceed by induction on dim V under the assumption that · , · is nondegenerate. For dim V = 1, the form is degenerate. For dim V = 2, we can ﬁnd u and v with u, v = 0, and we can normalize one of the vectors to make u, v = 1. Then (u, v) is the required ordered basis. Assuming the result in the nondegenerate case for dimension < n, suppose that dim V = n. Again choose u and v with u, v = 1, and deﬁne U = Ku ⊕ Kv. 01 Then · , · U ×U has matrix −1 0 and is nondegenerate. By Corollary 6.4,

⊥ that V = U ⊕ U , and an application of the converse of the corollary shows · , · U ⊥ ×U ⊥ is nondegenerate. The induction hypothesis applies to U ⊥ , and we obtain the desired matrix for the given form.

4. Hermitian Forms In this section the ﬁeld will be C, and V will be a ﬁnite-dimensional vector space over C. A sesquilinear form · , · on V is a function from V × V into C that is linear in the ﬁrst variable and conjugate linear in the second.1 Sesquilinear forms do not make sense for general ﬁelds because of the absence of a universal analog of complex conjugation, and we shall consequently work only with the ﬁeld C in this section.2 A sesquilinear form · , · is Hermitian if u, v = v, u for all u and v in V . The form is skew-Hermitian if instead u, v = −v, u for all u and v in V . Hermitian and skew-Hermitian forms are the extreme types of sesquilinear forms since any sesquilinear form · , · is the sum of a Hermitian form · , · h and a skew-Hermitian form · , · sh given by u, vh = 12 (u, v + v, u), u, vsh = 12 (u, v − v, u). 1 Some authors, particularly in mathematical physics, reverse the roles of the two variables and assume the conjugate linearity in the ﬁrst variable instead of the second. √

2 Sesquilinear forms make sense in number ﬁelds like Q 2 that have an automorphism of order 2 (see Section IV.1), but sesquilinear forms in this kind of setting will not concern us here.

256

VI. Multilinear Algebra

In addition, any skew-Hermitian form becomes a Hermitian form simply by multiplying by i. Speciﬁcally if · , · sh is skew-Hermitian, then i · , · sh is sesquilinear and Hermitian, as is readily checked. Consequently the study of skew-Hermitian forms immediately reduces to the study of Hermitian forms. EXAMPLE. Let V = Cn , and let A be a Hermitian matrix, i.e., one with A = A, where A∗ is the conjugate transpose of A. Then it is a simple matter to check that u, v = v ∗ Au deﬁnes a Hermitian form on Cn . ∗

Again the example with a matrix is completely general. In fact, let · , · be a Hermitian form on V , let = (v1 , . . . , vn ) be an ordered basis of V , and deﬁne ¯ where v¯ is the Ai j = vi , v j . Then A is a Hermitian matrix, and u, v = u t Av, entry-by-entry complex conjugate of v. If = (w1 , . . . , wn ) is a second ordered basis, then the formula for changing basis may

be derived as follows: Write w j = i ci j vi , so that [ci j ] is the matrix I . If Bi j = wi , w j , then Bi j = wi , w j = kl cki vk , vl c¯l j , and hence

B=

I

t

A

I .

Thus two Hermitian matrices A and B represent the same Hermitian form in different bases if and only if B = M ∗ AM for some nonsingular matrix M. Proposition 6.8. (a) If · , · is a Hermitian form on a ﬁnite-dimensional vector space V over C, then there exists an ordered basis of V in which the matrix of · , · is diagonal with real entries. (b) If A is an n-by-n Hermitian matrix, then there exists a nonsingular n-by-n matrix M such that M ∗ AM is diagonal. PROOF. The above considerations show that (a) and (b) are reformulations of the same result. Hence it is enough to prove (b). By the Spectral Theorem (Theorem 3.21), there exists a unitary matrix U such that U −1 AU is diagonal with real entries. Since U is unitary, U −1 = U ∗ . Thus we can take M = U to prove (b). Just as with symmetric bilinear forms over R, we can do a little better than Proposition 6.8 indicates. If B is Hermitian and diagonal with diagonal entries bi , and if D is diagonal with positive entries di , then C = D ∗ B D is diagonal with diagonal entries di2 bi . Choosing D suitably and then replacing C by P t C P for a suitable permutation matrix P, we may assume that P t C P is of the

5. Groups Leaving a Bilinear Form Invariant

257

form diag(+1, . . . , +1, −1, . . . , −1, 0, . . . , 0). The number of +1’s and −1’s together is the rank of A, and the form is nondegenerate if and only if there are no 0’s. The triple given by ( p, m, z) = #(+1)’s, #(−1)’s, #(0)’s is again called the signature of A. A similar notion can be deﬁned in the case of a Hermitian form, as opposed to a Hermitian matrix. Theorem 6.9 (Sylvester’s Law). The signature of an n-by-n Hermitian matrix is well deﬁned. The proof is the same as for Theorem 6.6 except for adjustments in notation. 5. Groups Leaving a Bilinear Form Invariant Although it is not logically necessary to do so, we digress in this section to introduce some important groups that are deﬁned by means of bilinear or Hermitian forms. These groups arise in many areas of mathematics, both pure and applied, and their detailed structure constitutes a topic in the ﬁelds of Lie groups, algebraic groups, and ﬁnite groups that is beyond the scope of this book. Thus the best place to deﬁne them seems to be now. We limit our comments on applications to just these: When the underlying ﬁeld in the deﬁnition of these groups is R or C, the group is quite often a “simple Lie group,” one of the basic building blocks of the theory of the continuous groups that so often arise in topology, geometry, differential equations, and mathematical physics. When the underlying ﬁeld is a number ﬁeld in the sense of Example 9 of Section IV.1, the group quite often plays a role in algebraic number theory. When the underlying ﬁeld is a ﬁnite ﬁeld, the group is often closely related to a ﬁnite simple group; an example of this relationship occurred in Problems 55–62 at the end of Chapter IV, where it was shown that the group PSL(2, K), built in an easy way from the general linear group GL(2, K), is simple if the ﬁeld K has more than 5 elements. More general examples of ﬁnite simple groups produced by analogous constructions are said to be of “Lie type.” A celebrated theorem of the late twentieth century classiﬁed the ﬁnite simple groups—establishing that the only such groups are the cyclic groups of prime order, the alternating groups on 5 or more letters, the simple groups of Lie type, and 26 so-called sporadic simple groups. If · , · is a bilinear form on an n-dimensional vector space V over a ﬁeld K, a nonsingular linear map g : V → V is said to leave the bilinear form invariant if g(u), g(v) = u, v

258

VI. Multilinear Algebra

for all u and v in V . Fix an ordered

basis of V , let A be the matrix of the bilinear g form in this basis, let g = be the member of GL(n, K) corresponding

w to g, and abbreviate as w for any w in V . To translate the invariance condition into one concerning matrices, we use the formula u, v = u t Av , the corresponding formula for g(u), g(v), and the formula g(w) = g (w ) from Theorem 2.14. Then we obtain u t g t Ag v = u t Av . Taking u to be the i th member of the ordered basis and v to be the j th member, we obtain equality of the (i, j)th entry of the two matrices g t Ag and A. Thus the matrix form of the invariance condition is that a nonsingular matrix g satisfy g t Ag = A. We know that changing the ordered basis amounts to replacing A by M t AM for some nonsingular matrix M. If g satisﬁes the invariance condition g t Ag = A relative to A, then M −1 g M satisﬁes (M −1 g M)t (M t AM)(M −1 g M) = M t AM. Thus we are led to a conjugate subgroup within GL(n, K). A conjugate subgroup is not something substantially new, and thus we might as well make a convenient choice of basis so that A looks particularly special. The interesting cases are that the given bilinear form is symmetric or alternating, hence that the matrix A is symmetric or alternating. Let us restrict our attention to them. The left and right radicals coincide in these cases, and the ﬁrst thing to do is to take the two-sided radical into account. Returning to the original bilinear form, we write V = rad ⊕S, where rad is the radical and S is some vector subspace of S, and we choose an ordered basis (v1 , . . . , v p , v p+1 , . . . , vn ) such that v1 , . . . , v p are in S and v p+1 , . . . , vn are in rad. Then vi , v j = 0 if i > p or j > p, and consequently A has its only nonzero entries in the upper left p-by- p block. The same argument as in the proofs of Theorems 6.5 and 6.7 shows that the restriction of the bilinear form to S is nondegenerate, and consequently the upper left p-by- p block of A is nonsingular. Changingnotation g11 g12 slightly, suppose that g is an n-by-n matrix written in block form as g = g21 g22 with g11 of size p-by- p, suppose that A0 00 is another matrix written in the same block form, suppose that the p-by- p matrix A is nonsingular, and suppose that g t A0 00 g = A0 00 . Making a brief computation, we ﬁnd that necessary and t Ag11 = A, sufﬁcient conditions on g are that g11 be nonsingular and have g11 that g12 = 0, that g22 be arbitrary nonsingular, and that g21 be arbitrary. In other

5. Groups Leaving a Bilinear Form Invariant

259

t words, the only interesting condition g11 Ag11 = A is a reﬂection of what happens in the nonsingular case. Consequently the interesting cases are that the given bilinear form is nondegenerate, as well as either symmetric or alternating. If A is symmetric and nonsingular, then the group of all nonsingular matrices g such that g t Ag = A is called the orthogonal group relative to A. If A is alternating and nonsingular, then the group of all nonsingular matrices g such that g t Ag = A is called the symplectic group relative to A. For the symplectic case it is customary to invoke Theorem 6.7 and take A to be ⎞ ⎛

⎜ ⎜ ⎜ ⎜ J =⎜ ⎜ ⎜ ⎝

01 −1 0

01 −1 0

..

. 01 −1 0

⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠

except possibly for a permutation of the rows and columns and possibly for a multiplication by −1. Two conﬂicting notations are in common use for the symplectic group, namely Sp(n, K) and Sp( 12 n, K), and one always has to check a particular author’s deﬁnitions. For the orthogonal case the notation is less standardized. Theorem 6.5 says that we may take A to be diagonal except when K has characteristic 2. But the theorem does not tell us exactly which A’s are representative of the same bilinear form. When K = C, we know that we can take A to be the identity matrix I . The group is known as the complex orthogonal group and is denoted by O(n, C). When K = R, we can take A to be diagonal with diagonal entries ±1. Sylvester’s Law (Theorem 6.6) says that the form determines the number of +1’s and the number of −1’s. The groups are called indeﬁnite orthogonal groups and are denoted by O( p, q), where p is the number of +1’s and q is the number of −1’s. When q = 0, we obtain the ordinary orthogonal group of matrices relative to an inner product. A similar analysis applies to Hermitian forms. The ﬁeld is now C, the invariance condition with the form is still g(u), g(v) = u, v, and the corresponding condition with matrices is g t A g¯ = A. The interesting case is that the Hermitian form is nondegenerate. Proposition 6.8 and Sylvester’s Law (Theorem 6.9) together show that we may take A to be diagonal with diagonal entries ±1 and that the Hermitian form determines the number of +1’s and the number of −1’s. The groups are the indeﬁnite unitary groups and are denoted by U( p, q), where p is the number of +1’s and q is the number of −1’s. When q = 0, we obtain the ordinary unitary group of matrices relative to an inner product.

260

VI. Multilinear Algebra

6. Tensor Product of Two Vector Spaces If E is a vector space over K, then the set of all bilinear forms on E is a vector space under addition and scalar multiplication of the values, i.e., it is a vector subspace of the set of all functions from E × E into K. In this section we introduce a vector space called the “tensor product” of E with itself, whose dual, even if E is inﬁnite-dimensional, is canonically isomorphic to this vector space of bilinear forms. Matters will be clearer if we work initially with something slightly more general than bilinear forms on a single vector space E. Thus ﬁx a ﬁeld K, and let E and F be vector spaces over K. A function from E × F into a vector space U over K is said to be bilinear if it is linear in each of the two variables when the other one is held ﬁxed. Such a space of bilinear functions is a vector space over K under addition and scalar multiplication of the values. The bilinear functions are called bilinear forms when the range space U is K itself. More generally, if E 1 , . . . , E k are vector spaces over K, a function from E 1 × · · · × E k into a vector space over K is said to be k-linear or k-multilinear if it is linear in each of its k variables when the other k − 1 variables are held ﬁxed. Again the word “form” is used in the scalar-valued case, and all of these spaces of multilinear functions are vector spaces over K. In this section we shall introduce the tensor product of two vector spaces E and F over K, ultimately denoting it by E ⊗K F. The dual of this tensor product will be canonically isomorphic to the vector space of bilinear forms on E × F. More generally the space of linear functions from the tensor product into a vector space U will be canonically isomorphic to the vector space of bilinear functions on E × F with values in U . Following the habit encouraged by Chapter IV, we want to arrange that tensor product is a functor. If V denotes the category of vector spaces over K and if V × V denotes the category described in Section IV.11 as V S for a two-element set S, then tensor product is to be a functor from V × V into V. Hence we will want to examine the effect of tensor products on morphisms, i.e., on linear maps. As in similar constructions in Chapter IV, the effect of tensor product on linear maps is captured by deﬁning the tensor product by means of a universal mapping property. The appropriate universal mapping property rephrases the statement above that the space of linear functions from the tensor product into any vector space U is canonically isomorphic to the vector space of bilinear functions on E × F with values in U . If E and F are vector spaces over K, a tensor product of E and F is a pair (V, ι) consisting of a vector space V over K together with a bilinear function ι : E × F → V , with the following universal mapping property: whenever b is a bilinear mapping of E ×F into a vector space U over K, then there exists a unique

6. Tensor Product of Two Vector Spaces

261

linear mapping B of V into U such that the diagram in Figure 6.1 commutes, i.e., such that Bι = b holds in the diagram. When ι is understood, one frequently refers to V itself as the tensor product. The linear mapping B : V → U is called the linear extension of b to the tensor product. b

E × F −−−→ U ⏐ ⏐ ι B V FIGURE 6.1. Universal mapping property of a tensor product. Theorem 6.10. If E and F are vector spaces over K, then a tensor product of E and F exists and is unique up to canonical isomorphism in this sense: if (V1 , ι1 ) and (V2 , ι2 ) are tensor products, then there exists a unique linear mapping B : V2 → V1 with Bι2 = ι1 , and B is an isomorphism. Any tensor product is spanned linearly by the image of E × F in it. REMARKS. As usual, uniqueness will follow readily from the universal mapping property. What is really needed is a proof of existence. This will be carried out by an explicit construction. Later, in Chapter X, we shall reintroduce tensor products, taking the basic construction to be that of the tensor product of two abelian groups, and then the tensor product of two vector spaces will in effect be obtained in a slightly different way. However, the exact construction does not matter, only the existence; the uniqueness allows us to match the results of any two constructions. ι2

E × F −−−→ V2 ⏐ ⏐ B2 ι1 V1

ι1

and

E × F −−−→ V1 ⏐ ⏐ B1 ι2 V2

FIGURE 6.2. Diagrams for uniqueness of a tensor product. PROOF OF UNIQUENESS. Let (V1 , ι1 ) and (V2 , ι2 ) be tensor products. Set up the diagrams in Figure 6.2, and use the universal mapping property to obtain linear maps B2 : V1 → V2 and B1 : V2 → V1 extending ι2 and ι1 . Then B1 B2 : V1 → V1 has B1 B2 ι1 = B1 ι2 = ι1 , and 1V1 : V1 → V1 has (1V1 )ι1 = ι1 . By the assumed uniqueness within the universal mapping property, B1 B2 = 1V1 on V1 . Similarly B2 B1 = 1V2 on V2 . Then B1 : V2 → V1 gives the canonical isomorphism. Because of the isomorphism the image of E × F will span an arbitrary tensor product if it spans some particular tensor product.

VI. Multilinear Algebra

262

PROOF OF EXISTENCE. Let V1 = (e, f ) K(e, f ), the direct sum being taken over all ordered pairs (e, f ) with E ∈ E and f ∈ F. Then V1 is a vector space over K with a basis consisting of all ordered pairs (e, f ). We think of all identities that the elements of V1 must satisfy to be a tensor product, writing each as some expression set equal to 0, and then we assemble those expressions into a vector subspace to factor out from V1 . Namely, let V0 be the vector subspace of V1 generated by all elements of any of the kinds (e1 + e2 , f ) − (e1 , f ) − (e2 , f ), (ce, f ) − c(e, f ), (e, f 1 + f 2 ) − (e, f 1 ) − (e, f 2 ), (e, c f ) − c(e, f ), the understanding being that c is in K, the elements e, e1 , e2 are in E, and the elements f, f 1 , f 2 are in F. Deﬁne V = V1 /V0 , and deﬁne ι : E × F → V1 /V0 by ι(e, f ) = (e, f ) + V0 . We shall prove that (V, ι) is a tensor product of E and F. The deﬁnitions show that the image of ι spans V linearly. Let b : E × F → U be given as in Figure 6.1. To see that a linear extension B exists and is unique, deﬁne B1 on V1 by B1

ci (ei , f i ) = ci b(ei , f i ).

(ﬁnite)

(ﬁnite)

The bilinearity of b shows that B1 maps V0 to 0. By Proposition 2.25, B1 descends to a linear map B : V1 /V0 → U , and we have Bι = b. Hence B exists as required. To check uniqueness of B, we observe again that the cosets (e, f ) + V0 within V1 /V0 span V ; since commutativity of the diagram in Figure 6.1 forces B((e, f ) + V0 ) = B(ι(e, f )) = b(e, f ), B is unique. This completes the proof.

A tensor product of E and F is denoted by (E ⊗K F, ι), with the bilinear map ι given by ι(e, f ) = e ⊗ f ; the map ι is frequently dropped from the notation when there is no chance of ambiguity. The tensor product that was constructed in the proof of existence in Theorem 6.10 is not given any special notation to distinguish it from any other tensor product. The elements e ⊗ f span E ⊗K F, as was noted in the statement of the theorem. Elements of the form e ⊗ f are sometimes called pure tensors. Not every element need be a pure tensor, but every element in E ⊗K F is a ﬁnite sum of pure tensors. We shall see in Proposition 6.14 that if {u i } is a basis

6. Tensor Product of Two Vector Spaces

263

of E and {v j } is a basis of F, then the pure tensors u i ⊗ v j form a basis of E ⊗K F. In particular the dimension of the tensor product is the product of the dimensions of the factors. We could have deﬁned the tensor product in this way—by taking bases and declaring that u i ⊗ v j is to be a basis of the desired space. The difﬁculty is that we would be forever wedded to our choice of those particular bases, or we would constantly have to prove that our deﬁnitions are independent of bases. The deﬁnition by means of Theorem 6.10 avoids this difﬁculty. To make tensor product (E, F) → E ⊗K F into a functor, we have to describe the effect on linear mappings. To aid in that discussion, let us reintroduce some notation ﬁrst used in Chapter II: if U and V are vector spaces over K, then HomK (U, V ) is deﬁned to be the vector space of K linear maps from U to V . Corollary 6.11. If E, F, and V are vector spaces over K, then the vector space HomK (E ⊗K F, V ) is canonically isomorphic (via restriction to pure tensors) to the vector space of all V -valued bilinear functions on E × F. PROOF. Restriction is a linear mapping from HomK (E ⊗K F, V ) to the vector space of all V -valued bilinear functions on E × F, and it is one-one since the image of E × F in E ⊗K F spans E ⊗K F. It is onto since any bilinear function from E × F to V has a linear extension to E ⊗K F, by Theorem 6.10. Corollary 6.12. If E and F are vector spaces over K, then the vector space of all bilinear forms on E × F is canonically isomorphic to (E ⊗K F) , the dual of the vector space E ⊗K F. PROOF. This is the special case of Corollary 6.11 in which V = K.

Corollary 6.13. If E, F, and V are vector spaces over K, then there is a canonical K linear isomorphism of left side to right side in HomK (E ⊗K F, V ) ∼ = HomK (E, HomK (F, V )) such that (ϕ)(e)( f ) = ϕ(e ⊗ f ) for all ϕ ∈ HomK (E ⊗K F, V ), e ∈ E, and f ∈ F. REMARK. This result is just a restatement of Corollary 6.11, but let us prove it anyway, writing the proof in the language of the statement. PROOF. The map is well deﬁned and K linear, and it carries the left side to the right side. For ψ in the right side, deﬁne (ψ)(e, f ) = ψ(e)( f ). Then (ψ) 3 is a bilinear map from E × F into V , and we let (ψ) be the linear extension 3 is a two-sided inverse to from E ⊗K F into V given in Theorem 6.10. Then , and the corollary follows.

264

VI. Multilinear Algebra

Let us now make (E, F) → E ⊗K F into a covariant functor. If (E 1 , F1 ) and (E 2 , F2 ) are objects in V × V, i.e., if they are two ordered pairs of vector spaces, then a morphism from the ﬁrst to the second is a pair (L , M) of linear maps of the form L : E 1 → E 2 and M : F1 → F2 . To (L , M), we are to associate a linear map from E 1 ⊗K F1 into E 2 ⊗K F2 ; this linear map will be denoted by L ⊗ M. We use Corollary 6.11 to deﬁne L ⊗ M as the member of HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ) that corresponds under restriction to the bilinear map (e1 , f 1 ) → L(e1 ) ⊗ M( f 1 ) of E 1 × F1 into E 2 ⊗K F2 . In terms of pure tensors, the map L ⊗ M satisﬁes (L ⊗ M)(e1 ⊗ f 1 ) = L(e1 ) ⊗ M( f 1 ), and this formula completely determines L ⊗ M because of the uniqueness of linear extensions of bilinear maps. To check that this deﬁnition of the effect of tensor product on pairs of linear maps makes (E, F) → E ⊗K F into a covariant functor, we have to check the effect on the identity map and the effect on composition. For the effect on the identity map (1 E1 , 1 F1 ) when E 1 = E 2 and F1 = F2 , we see from the above displayed formula that (1 E1 ⊗ 1 F1 )(e1 ⊗ f 1 ) = 1 E1 (e1 ) ⊗ 1 F1 ( f 1 ) = e1 ⊗ f 1 = 1 E1 ⊗K F1 (e1 ⊗ f 1 ). Since elements of the form e1 ⊗ f 1 span E 1 ⊗K F1 , we conclude that 1 E1 ⊗ 1 F1 = 1 E1 ⊗K F1 . For the effect on composition, let (L 1 , M1 ) : (E 1 , F1 ) → (E 2 , F2 ) and (L 2 , M2 ) : (E 2 , F2 ) → (E 3 , F3 ) be given. Then we have (L 2 ⊗ M2 )(L 1 ⊗ M1 )(e1 ⊗ f 1 ) = (L 2 ⊗ M2 )(L 1 (e1 ) ⊗ M1 ( f 1 )) = (L 2 L 1 )(e1 ) ⊗ (M2 M1 )( f 1 ) = (L 2 L 1 ⊗ M2 M1 )(e1 ⊗ f 1 ). Since elements of the form e1 ⊗ f 1 span E 1 ⊗K F1 , we conclude that (L 2 ⊗ M2 )(L 1 ⊗ M1 ) = L 2 L 1 ⊗ M2 M1 . Therefore (E, F) → E ⊗K F is a covariant functor. In particular, E → E ⊗K F and F → E ⊗K F are covariant functors from V into itself. For these two functors from V into itself, the effect on linear mappings is especially nice, namely that is K linear from HomK (E 1 , E 2 ) L 1 → L 1 ⊗ M1 into HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ), is K linear from HomK (F1 , F2 ) M1 → L 1 ⊗ M1 into HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ). To prove the ﬁrst of these assertions, for example, we observe that the sum of the linear extensions of (e1 , f 1 ) → L 1 (e1 ) ⊗ M1 ( f 1 )

and

(e1 , f 1 ) → L 1 (e1 ) ⊗ M1 ( f 1 )

6. Tensor Product of Two Vector Spaces

265

is a linear extension of (e1 , f 1 ) → (L 1 + L 1 )(e1 )⊗ M1 ( f 1 ), and the uniqueness in the universal mapping property implies that (L 1 +L 1 )⊗M1 = L 1 ⊗M1 +L 1 ⊗M1 . Similar remarks apply to multiplication by scalars. Let us mention some identities satisﬁed by ⊗K . There is a canonical isomorphism ∼ F ⊗K E E ⊗K F = given by taking the linear extension of (e, f ) → f ⊗ e as the map from left to right. The linear extension of ( f, e) → e ⊗ f gives a two-sided inverse. Category theory has a way of capturing the idea that this isomorphism is systematic, rather than randomly dependent on E and F. The two sides of the above isomorphism may be regarded as the values of the covariant functors (E, F) → E ⊗K F and (E, F) → F ⊗K E. The notion in category theory capturing “systematic” is called “naturality.” It makes precise the fact that the system of isomorphisms respects linear maps, as well as the vector spaces. Here is the general deﬁnition. Its usefulness will be examined later in this section. Let C and D be two categories, and let : C → D and : C → D be covariant functors. Suppose that for each X in Obj(C ), a morphism TX in MorphD ((X ), (X )) is given. Then the system {TX } is called a natural transformation of into if for each pair of objects X 1 and X 2 in C and each h in MorphC (X 1 , X 2 ), the diagram in Figure 6.3 commutes. If furthermore each TX is an isomorphism, then it is immediate that the system {TX−1 } is a natural transformation of into , and we say that {TX } is a natural isomorphism. (h)

(X 1 ) −−−→ (X 2 ) ⏐ ⏐ ⏐ ⏐T TX 1 X2 (h)

(X 1 ) −−−→ (X 2 ) FIGURE 6.3. Commutative diagram of a natural transformation {TX }. If and are contravariant functors, then the system {TX } is called a natural transformation of into if the diagram obtained from Figure 6.3 by reversing the horizontal arrows commutes. The system is a natural isomorphism if furthermore each Tx is an isomorphism. In the case we are studying, we have C = V × V and D = V. Objects X in C are pairs (E, F) of vector spaces, and and are the covariant functors with (E, F) = E ⊗K F and (E, F) = F ⊗K E. The mapping T(E,F) : E ⊗K F → F ⊗K E is uniquely determined by the condition that T(E,F) (e ⊗ f ) = f ⊗ e for all e ∈ E and f ∈ F. A morphism of pairs from (E 1 , F1 ) to (E 2 , F2 ) is of

VI. Multilinear Algebra

266

the form h = (L , M) with L ∈ HomK (E 1 , E 2 ) and M ∈ HomK (F1 , F2 ). Our constructions above show that (L , M) = L ⊗ M ∈ HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ) and

(L , M) = M ⊗ L ∈ HomK (F1 ⊗K E 1 , F2 ⊗K E 2 ).

In Figure 6.3 the two routes from top left to bottom right in the diagram have T(E2 ,F2 ) (L , M)(e1 ⊗ f 1 ) = T(E2 ,F2 ) (L ⊗ M)(e1 ⊗ f 1 ) = T(E2 ,F2 ) (L(e1 ) ⊗ M( f 1 )) = M( f 1 ) ⊗ L(e1 ) and (L , M)T(E1 ,F1 ) (e1 ⊗ f 1 ) = (L , M)( f 1 ⊗ e1 ) = (M ⊗ L)( f 1 ⊗ e1 ) = M( f 1 ) ⊗ L(e1 ). The results are equal, and therefore the diagram commutes. Consequently the isomorphism E ⊗K F ∼ = F ⊗K E is natural in the pair (E, F). Another canonical isomorphism of interest is E ⊗K K ∼ = E. Here the map from left to right is the linear extension of (e, c) → ce, while the map from right to left is e → e ⊗ 1. In view of the previous canonical isomorphism, we have K ⊗K E ∼ = E also. Each of these isomorphisms is natural in E. Next let us consider how ⊗K interacts with direct sums. The result is that tensor product distributes over direct sums, even inﬁnite direct sums: Fs ∼ (E ⊗K Fs ). E ⊗K = s∈S

s∈S

The map from left to right is the linear extension of the bilinear map (e, { f s }s∈S ) → {e ⊗ f s }s∈S . For the deﬁnition of the inverse, the constructions of Section II.6 , where it is the linear show that we have only to deﬁne the map on each E ⊗K Fs extension of (e, f s ) → e ⊗ {i s ( f s ))}s∈S ; here i s0 : Fs0 → s Fs is the one-one linear map carrying the s0th vector space into the direct sum. Once again it is possible to prove that the isomorphism is natural; we omit the details. It follows from the displayed isomorphism and the isomorphism E ⊗K K ∼ =E that if {xi } is a basis of E and {yj } is a basis of F, then {xi ⊗ yj } is a basis of E ⊗K F. This proves the following result.

6. Tensor Product of Two Vector Spaces

267

Proposition 6.14. If E and F are vector spaces over K, then dim(E ⊗K F) = (dim E)(dim F). If {yj } is a basis of F, then the most general member of E ⊗K F is of the form j e j ⊗ y j with all e j in E. We turn to a consideration of HomK from the point of view of functors. In the examples in Section IV.11, we saw that V → HomK (U, V ) is a covariant functor from V to itself and that U → HomK (U, V ) is a contravariant functor from V to itself. If we are not squeamish about mixing the two types—covariant and contravariant—then we can consider (U, V ) → HomK (U, V ) as a functor3 from V × V to V. At any rate if L is in HomK (U1 , U2 ) and M is in HomK (V1 , V2 ), then Hom(L , M) carries HomK (U2 , V1 ) into HomK (U1 , V2 ) and is given by Hom(L , M)(h) = Mh L

for h ∈ HomK (U2 , V1 ).

It is evident that the result is K linear as a function of h, and hence Hom(L , M) is in HomK HomK (U2 , V1 ), HomK (U1 , V2 ) . When we look for analogs for the functor HomK of the identity E ⊗K K ∼ =E for the functor ⊗K , we are led to two identities. One is just the deﬁnition of the dual of a vector space: HomK (U, K) = U . The other is the natural isomorphism HomK (K, V ) ∼ = V. In the proof of the latter identity, the mapping from left to right is given by sending a linear h : K → V to h(1), and the mapping from right to left is given by sending v in V to h with h(c) = cv. Next let us consider how HomK interacts with direct sums and direct products. The construction HomK (U, V ) distributes over ﬁnite direct sums in each variable, but the situation with inﬁnite direct sums or direct products is more subtle. Valid identities are Us , V ∼ HomK (Us , V ) HomK = s∈S

and

s∈S

HomK U, Vs ∼ HomK (U, Vs ), = s∈S

s∈S

3 Readers who prefer to be careful about this point can regard U as in the category V opp deﬁned in Problems 78–80 at the end of Chapter IV. Then (U, V ) → HomK (U, V ) is a covariant functor from V opp × V into V.

VI. Multilinear Algebra

268

and these are natural isomorphisms. Proofs of these identities for all S and counterexamples related to them when S is inﬁnite appear in Problems 7–8 at the end of the chapter. We have already checked that the isomorphism E ⊗K F ∼ = F ⊗K E is natural in (E, F), and we have asserted naturality in some other situations in which it is easy to check. The next proposition asserts naturality for the identity of Corollary 6.13, which combines ⊗K and HomK in a nontrivial way. After the proof of the result, we shall digress for a moment to indicate the usefulness of natural isomorphisms. Proposition 6.15. Let E, F, V , E 1 , F1 , and V1 be vector spaces over K, and let L E1 : E 1 → E, L F1 : F1 → F, and L V : V → V1 be K linear maps. Then the isomorphism of Corollary 6.13 is natural in the sense that the diagram

HomK (E ⊗K F, V ) −−−→ HomK (E, ⏐ ⏐ Hom(L E1 ⊗L F1 , L V )

HomK (F, V )) ⏐ ⏐Hom(L ,Hom(L ,L )) E1 F1 V

HomK (E 1 ⊗K F1 , V1 ) −−−→ HomK (E 1 , HomK (F1 , V1 )) commutes. REMARKS. Observe that the ﬁrst two linear maps L E1 and L F1 go in the opposite direction to the two vertical maps, while L V goes in the same direction as the vertical maps. This is a reﬂection of the fact that both sides of the identity in Corollary 6.13 are contravariant in the ﬁrst two variables and covariant in the third variable. PROOF. For ϕ in HomK (E ⊗K F, V ), e1 in E 1 , and f 1 in F1 , we have (Hom(L E1 , Hom(L F1 ,L V )) ◦ )(ϕ)(e1 )( f 1 ) = (Hom(L F1 , L V ) ◦ (ϕ) ◦ L E1 )(e1 )( f 1 ) = (Hom(L F1 , L V ) ◦ ((ϕ) ◦ L E1 ))(e1 )( f 1 ) = L V ((ϕ)(L E1 (e1 ))(L F1 ( f 1 ))) = L V (ϕ(L E1 (e1 ) ⊗ L F1 ( f 1 ))) = (L V ◦ ϕ ◦ (L E1 ⊗ L F1 ))(e1 ⊗ f 1 ) = (Hom(L E1 ⊗ L F1 , L V )(ϕ))(e1 ⊗ f 1 ) = (Hom(L E1 ⊗ L F1 , L V ) ◦ ϕ)(e1 )( f 1 ). This proves the proposition.

6. Tensor Product of Two Vector Spaces

269

Let us now discuss naturality in a wider context. In a general category D, if we have two objects U and U such that Morph(U, V ) and Morph(U , V ) have the same cardinality for each object V , then we cannot really say anything about the relationship between U and U . But under a hypothesis that the isomorphism of sets has a certain naturality to it, then, according to Proposition 6.16 below, U and U are isomorphic objects. Thus naturality of a system of weak-looking set-theoretic isomorphisms can lead to a much stronger-looking isomorphism. Corollary 6.17 goes on to make a corresponding assertion about functors. The assertion about functors in the corollary is a helpful tool for establishing natural isomorphisms of functors, and an example appears below in Proposition 6.20 . Proposition 6.16. Let D be a category, and suppose that U and U are objects in D with the following property: to each object V in D corresponds a one-one onto function TV : Morph(U, V ) → Morph(U , V ) with the system {TV } natural in V in the sense that whenever σ is in Morph(V, V ), then the diagram TV

Morph(U, V ) −−−→ Morph(U , V ) ⏐ ⏐ ⏐ ⏐left-by-σ left-by-σ TV

Morph(U, V ) −−−→ Morph(U , V ) commutes. Then U is isomorphic to U as an object in D, an isomorphism from U to U being the member TU−1 (1U ) of Morph(U, U ). REMARKS. (1) Another way of formulating this result is as follows: Let D be any category, let S be the category of sets, and let U and U be objects in D. Deﬁne a covariant functor HU : D → S by HU (V ) = MorphD (U, V ) and HU (σ ) = left-by-σ for σ ∈ MorphD (V, V ), and deﬁne HU similarly. If HU and HU are naturally isomorphic functors, then U and U are isomorphic objects in D. (2) A similar result is valid when HU and HU are contravariant functors, HU being deﬁned by HU (V ) = HomD (V, U ) and HU (σ ) = right-by-σ for σ ∈ MorphD (V, V ). The result in this case follows immediately by applying Proposition 6.16 to the opposite category D opp of D as deﬁned in Problems 78–80 at the end of Chapter IV. PROOF. Let ϕ be the element TU−1 (1U ) of Morph(U, U ), and let ψ be the element TU (1U ) of Morph(U , U ). To prove the proposition, it is enough to show that ϕψ = 1U and ψϕ = 1U .

270

VI. Multilinear Algebra

For σ in Morph(V, V ), form the commutative diagram in the statement of the proposition. The commutativity says that σ TV (h) = TV (σ h)

for h ∈ Morph(U, V ).

(∗)

Taking V = U , V = U , σ = ϕ, and h = 1U in (∗) proves the second equality of the chain ϕψ = ϕTU (1U ) = TU (ϕ1U ) = TU (ϕ) = 1U . Taking V = U , V = U , σ = ψ, and h = ϕ in (∗) proves the ﬁrst equality of the chain TU (ψϕ) = ψ TU (ϕ) = ψ1U = ψ = TU (1U ); Applying TU−1 , we obtain ψϕ = 1U , as required.

Corollary 6.17. Let C and D be categories, and let F : C → D and G : C → D be covariant functors. Suppose that to each pair of objects (A, V ) in C × D corresponds a one-one onto function T A,V : Morph(F(A), V ) → Morph(G(A), V ) with the system {T A,V } natural in (A, V ). Then the functors F and G are naturally isomorphic. REMARKS. A similar result is valid if T A,V carries Morph(V, F(A)) to Morph(V, G(A)) and/or if F and G are contravariant. To handle these situations, we apply the corollary to the opposite categories D opp and/or C opp , as deﬁned in Problems 78–80 at the end of Chapter IV, instead of to the categories D and/or C. −1 (1G(A) ) PROOF. By Proposition 6.16 and the hypotheses, the member T A,G(A) of MorphD (F(A), G(A)) is an isomorphism. We are to prove that the system {T A,G(A) } is natural in A. If σ in MorphC (A, A ) is given, then the naturality of T A,V in the V variable implies that the diagram T A,G(A)

MorphD (F(A), G(A)) −−−→ MorphD (G(A), G(A)) ⏐ ⏐ ⏐ ⏐left-by-G(σ ) left-by-G(σ ) T A,G(A )

MorphD (F(A), G(A )) −−−−→ MorphD (G(A), G(A )) −1 commutes. Evaluating at T A,G(A) (1G(A) ) ∈ MorphD (F(A), G(A)) the two equal compositions in the diagram, we obtain −1 (1G(A) ) . (∗) G(σ ) = G(σ )1G(A) = T A,G(A ) G(σ )T A,G(A)

6. Tensor Product of Two Vector Spaces

271

With σ as above, the naturality of T A,V in the A variable implies that the diagram T A ,G(A )

MorphD (F(A ), G(A )) −−−−→ MorphD (G(A ), G(A )) ⏐ ⏐ ⏐ ⏐right-by-G(σ ) right-by-F(σ ) T A,G(A )

MorphD (F(A), G(A )) −−−−→ MorphD (G(A), G(A )) commutes. Evaluating at T A−1 ,G(A ) (1G(A ) ) ∈ MorphD (F(A ), G(A )) the two equal compositions in the diagram, we obtain

G(σ ) = 1G(A ) G(σ ) = T A,G(A ) T A−1 ,G(A ) (1G(A ) )F(σ ) .

(∗∗)

Equations (∗) and (∗∗), together with the fact that T A,G(A ) is invertible, say that −1 (1G(A) ) = T A−1 G(σ )T A,G(A) ,G(A ) (1G(A ) )F(σ ).

3A = 3A ∈ MorphD (F(A), G(A)) given by T In other words, the isomorphism T −1 T A,G(A) (1G(A) ) makes the diagram TA

F(A) −−−→ G(A) ⏐ ⏐ ⏐ ⏐G(σ ) F(σ ) TA

F(A ) −−−→ G(A ) commute. Thus F is naturally isomorphic to G.

Tensor product provides a device for converting a real vector space canonically into a complex vector space, so that a basis over R in the original space becomes a basis over C in the new space. If E is the given real vector space, then the complex vector space, called the complexiﬁcation of E, is the space E C = E ⊗R C with multiplication by a complex number c in E C deﬁned to be 1 ⊗ (z → cz). This construction works more generally when we have any inclusion of ﬁelds K ⊆ L. In this situation, L becomes a vector space over K if scalar multiplication K × L → L is deﬁned as the restriction of the multiplication L × L → L within L. For any vector space E over K, we deﬁne E L = E ⊗K L, initially as a vector space over K. For c ∈ L, we then deﬁne (multiplication by c in E ⊗K L) = 1 ⊗ (multiplication by c in L).

272

VI. Multilinear Algebra

The above identities concerning tensor products of linear maps allow one easily to prove the following identities: c1 (c2 v) = (c1 c2 )v, c(u + v) = cu + cv, (c1 + c2 )v = c1 v + c2 v, 1v = v. Together these identities say that E L = E ⊗K L, with its vector-space addition and the above deﬁnition of multiplication by scalars in L, is a vector space over L. The further identity c(e ⊗ 1) = ce ⊗ 1

if c is in K and e is in E

shows that its scalar multiplication is consistent with scalar multiplication in E when the scalars are in K and E is identiﬁed with the subset E ⊗ 1 of E L . Let us say that the pair (E L , ι), where ι : E → E L is the mapping e → e ⊗ 1, is obtained by extension of scalars. This construction is characterized by a universal mapping property as follows. Proposition 6.18. Let K ⊆ L be an inclusion of ﬁelds, and let E be a vector space over K. (a) If (E L , ι) is formed by extension of scalars, then (E L , ι) has the following universal mapping property: whenever U is a vector space over L and ϕ : E → U is a K linear map, there exists a unique L linear map : E L → U such that ι = ϕ. (b) Suppose that (V, j) is any pair in which V is a vector space over L and j : E → V is a K linear function such that the following universal mapping property holds: whenever U is a vector space over L and ϕ : E → U is a K linear map, there exists a unique L linear map : V → U such that j = ϕ. Then there exists a unique isomorphism : E L → V of L vector spaces such that ι = j. PROOF. In (a), for the uniqueness of , we must have (e ⊗c) = c(e ⊗1) = c( ι)(e) = cϕ(e). Hence is determined by ϕ on pure tensors in E ⊗K L and therefore everywhere. For existence let : E ⊗K L → U be the K linear extension of the K bilinear function of E × L into U given by (e, c) → cϕ(e)

for e ∈ E and c ∈ L.

6. Tensor Product of Two Vector Spaces

273

In the L vector space E ⊗K L, multiplication by a member c0 of L is deﬁned to be 1 ⊗ (multiplication by c0 ). On a pure tensor e ⊗ c, we therefore have (c0 (e ⊗ c)) = (e ⊗ c0 c) = (c0 c)ϕ(e) = c0 (cϕ(e)) = c0 ((e ⊗ c)). Since E ⊗K L is generated by pure tensors, is L linear. By the construction of , ϕ(e) = (e ⊗ 1) = ( ι)(e). Thus has the required properties. In (b), let (V, j) have the same universal mapping property as (E L , ι). We apply the universal mapping property of (E L , ι) to the K linear map j : E → V to obtain an L linear : E L → V with ι = j, and we apply the universal mapping property of (V, j) to the K linear map ι : E → E L to obtain an L linear : V → E L with j = ι. From ( ) ι = j = ι and 1 E L ι = ι, the uniqueness in the universal mapping property for (E L , ι) implies = 1 E L . Arguing similarly, we obtain = 1V . Thus is an isomorphism with the required properties. If : E L → V is another isomorphism with ι = j, then the argument just given shows that = 1 E L and = 1V . Hence = ( )−1 = , and is unique. To make E → E L into a covariant functor from vector spaces over K to vector spaces over L, we must examine the effect on linear maps. The tool is Proposition 6.18a. Thus let E and F be two vector spaces over K, and let M : E → F be a K linear map between them. We extend scalars for E and F. The proposition applies to the composition E → F → F L and shows that the composition extends uniquely to an L linear map from E L to F L . A quick look at the proof shows that this L linear map is M ⊗ 1. Actually, we can see directly that M ⊗ 1 is indeed linear over L and not just over K: we just use our identity for compositions of tensor products to write (M ⊗ 1)(I ⊗ (multiplication by c)) = M ⊗ (multiplication by c) = (I ⊗ (multiplication by c))(M ⊗ 1). In any event, the explicit form of the extended linear map as M ⊗ 1 shows immediately that the identity linear map goes to the identity and that compositions go to compositions. Thus E → E L is a covariant functor. In the special case that the vector spaces are Kn and Km , extension of scalars has a particularly simple interpretation. The new spaces may be viewed as Ln and Lm . Thus column vectors with entries in K get replaced by column vectors with entries in L. What happens with linear mappings is even more transparent. A linear map M : E → F is given by an m-by-n matrix A with entries in K, and the linear map M ⊗ 1 : E L → F L is the one given by the same matrix A. Now the entries of A are to be regarded as members of the larger ﬁeld L. Viewed this

274

VI. Multilinear Algebra

way, extension of scalars might look as if it is dependent on choices of bases, but the tensor-product formalism shows that it is not. A related notion to extension of scalars is that of restriction of scalars. Again with an inclusion K ⊆ L of ﬁelds, a vector space E over the larger ﬁeld L becomes a vector space E K over the smaller ﬁeld K by ignoring unnecessary scalar multiplications. Although this notion is related to extension of scalars, it is not inverse to it. For example, if the two ﬁelds are R and C and if we start with an n-dimensional vector space E over R, then E C is a complex vector space of dimension n and (E C )R is a real vector space of dimension 2n. We thus do not get back to the original space E.

7. Tensor Algebra Just as polynomial rings are often used in the construction of more general commutative rings, so “tensor algebras” are often used in the construction of more general rings that may not be commutative. In this section we construct the “tensor algebra” of a vector space as a direct sum of iterated tensor products of the vector space with itself, and we establish its properties. We shall proceed with care, in order to provide a complete proof of the associativity of the multiplication. Let A, B, and C be vector spaces over a ﬁeld K. A triple tensor product V = A ⊗K B ⊗K C is a vector space over K with a 3-linear map ι : A × B × C → V having the following universal mapping property: whenever t is a 3-linear mapping of A × B ×C into a vector space U over K, then there exists a linear mapping T of V into U such that the diagram in Figure 6.4 commutes. A× B ×C ⏐ ⏐ ι

t

−−−→ U T

V = A ⊗K B ⊗K C FIGURE 6.4. Commutative diagram of a triple tensor product. The usual argument with universal mapping properties shows that there is at most one triple tensor product up to a well-determined isomorphism, and one can give an explicit construction of it that is similar to the one for ordinary tensor products E ⊗K F. We shall not need that particular proof of existence since Proposition 6.19a below will give us an alternative argument. Once we have that statement, we shall use the uniqueness of triple tensor products to establish in Proposition 6.19b an associativity formula for ordinary iterated tensor products.

7. Tensor Algebra

275

A shorter proof of Proposition 6.19b, which avoids Proposition 6.19a and uses naturality, will be given after the proof of Proposition 6.20. Proposition 6.19. If K is a ﬁeld and A, B, C are vector spaces over K, then (a) (A ⊗K B) ⊗K C and A ⊗K (B ⊗K C) are triple tensor products. (b) there exists a unique K isomorphism from left to right in (A ⊗K B) ⊗K C ∼ = A ⊗K (B ⊗K C) such that ((a ⊗ b) ⊗ c) = a ⊗ (b ⊗ c) for all a ∈ A, b ∈ B, and c ∈ C. PROOF. In (a), consider (A ⊗K B) ⊗K C. Let t : A × B × C → U be 3-linear. For c ∈ C, deﬁne tc : A × B → U by tc (a, b) = t (a, b, c). Then tc is bilinear and hence extends to a linear Tc : A ⊗K B → U . Since t is 3-linear, tc1 +c2 = tc1 +tc2 and txc = xtc for scalar x; thus uniqueness of the linear extension forces Tc1 +c2 = Tc1 + Tc2 and Txc = x Tc . Consequently t : (A ⊗K B) × C → U given by t (d, c) = Tc (d) is bilinear and therefore extends to a linear T : (A ⊗K B) ⊗K C → U . This T proves existence of the linear extension of the given t. Uniqueness is trivial, since the elements (a ⊗b)⊗c span (A ⊗K B)⊗K C. So (A ⊗K B) ⊗K C is a triple tensor product. In a similar fashion, A ⊗K (B ⊗K C) is a triple tensor product. For (b), set up the diagram of the universal mapping property for a triple tensor product, using V = (A ⊗K B) ⊗K C, U = A ⊗K (B ⊗K C), and t (a, b, c) = a ⊗ (b ⊗ c). We have just seen in (a) that V is a triple tensor product with ι(a, b, c) = (a ⊗b)⊗c. Thus there exists a linear T : V → U with T ι(a, b, c) = t (a, b, c). This equation means that T ((a ⊗ b) ⊗ c) = a ⊗ (b ⊗ c). Interchanging the roles of (A ⊗K B) ⊗K C and A ⊗K (B ⊗K C), we obtain a two-sided inverse for T . Thus T will serve as in (b), and existence is proved. Uniqueness is trivial, since the elements (a ⊗ b) ⊗ c span (A ⊗K B) ⊗K C. When there is no danger of confusion, Proposition 6.19 allows us to write a triple tensor product without parentheses as A ⊗K B ⊗K C. The same argument as in Corollaries 6.11 and 6.12 shows that the vector space of 3-linear forms on A × B ×C is canonically isomorphic to the dual of the vector space A ⊗K B ⊗K C. Just as with Corollary 6.13 and Proposition 6.15, the result of Proposition 6.19 can be improved by saying that the isomorphism is natural in the variables A, B, and C, as follows.

VI. Multilinear Algebra

276

Proposition 6.20. Let A, B, C, A1 , B1 , and C1 be vector spaces over a ﬁeld K, and let L A : A → A1 , L B : B → B1 , and L C : C → C1 be linear maps. Then the isomorphism of Proposition 6.19b is natural in the triple (A, B, C) in the sense that the diagram (A ⊗K B) ⊗K C ⏐ ⏐ (L A ⊗L B )⊗L C

−−−→

A ⊗K (B ⊗K C) ⏐ ⏐ L ⊗(L ⊗L ) A B C

(A1 ⊗K B1 ) ⊗K C1 −−−→ A1 ⊗K (B1 ⊗K C1 ) commutes. PROOF. We have ((L A ⊗ (L B ⊗ L C )) ◦ )((a ⊗ b) ⊗ c) = (L A ⊗ (L B ⊗ L C ))(a ⊗ (b ⊗ c)) = L A a ⊗ (L B ⊗ L C )(b ⊗ c) = L A a ⊗ (L B b ⊗ L C c) = ((L A a ⊗ L B b) ⊗ L C c) = ((L A ⊗ L B )(a ⊗ b) ⊗ L C c) = ( ◦ ((L A ⊗ L B ) ⊗ L C ))((a ⊗ b) ⊗ c), and the proposition follows.

The treatment of Propositions 6.19 and 6.20 can be shortened if we are willing to bypass the notion of a triple tensor product and use what was proved about naturality in the previous section. The result and the proof are as follows. Proposition 6.20 . Let A, B, and C be vector spaces over a ﬁeld K. Then there is an isomorphism : (A ⊗K B) ⊗K C → A ⊗K (B ⊗K C) that is natural in the triple (A, B, C) and satisﬁes (a ⊗ (b ⊗ c)) = a ⊗ (b ⊗ c). PROOF. Writing ∼ = for “naturally isomorphic in all variables” and applying Proposition 6.15 and other natural isomorphisms of the previous section repeatedly, we have HomK (A ⊗K B) ⊗K C, V ∼ = HomK A ⊗K B, HomK (C, V ) ∼ = HomK B, HomK (A, HomK (C, V )) ∼ = HomK B, HomK (A ⊗K C, V ) ∼ = HomK B, HomK (C ⊗K A, V ) ∼ by symmetry = HomK (C ⊗K B) ⊗K A, V ∼ = HomK A ⊗K (C ⊗K B), V ∼ = HomK A ⊗K (B ⊗K C), V .

7. Tensor Algebra

277

Then the existence of the natural isomorphism follows from Corollary 6.17. Using the explicit formula for the isomorphism in Proposition 6.16 and tracking matters down, we see that (a ⊗ (b ⊗ c)) = a ⊗ (b ⊗ c). There is no difﬁculty in generalizing matters to n-fold tensor products by induction. An n-fold tensor product is to be universal for n-multilinear maps. Again it is unique up to canonical isomorphism, as one proves by an argument that runs along familiar lines. A direct construction of an n-fold tensor product is possible in the style of the proof for ordinary tensor products, but such a construction will not be needed. Instead, we can form an n-fold tensor product as the (n − 1)-fold tensor product of the ﬁrst n − 1 spaces, tensored with the n th space. Proposition 6.19b allows us to regroup parentheses (inductively) in any fashion we choose, and the same argument as in Corollaries 6.11 and 6.12 yields the following proposition. Proposition 6.21. If E 1 , . . . , E n , and V are vector spaces over K, then the vector space HomK (E 1 ⊗K · · · ⊗K E n , V ) is canonically isomorphic (via restriction to pure tensors) to the vector space of all V -valued n-multilinear functions on E 1 × · · · × E n . In particular the vector space of all n-multilinear forms on E 1 × · · · × E n is canonically isomorphic to (E 1 ⊗K · · · ⊗K E n ) . Iterated application of Proposition 6.20 shows that we get also a well-deﬁned notion of a linear map L 1 ⊗ · · · ⊗ L n , the tensor product of n linear maps. Thus (E 1 , . . . , E n ) → E 1 ⊗K · · · ⊗K E n is a functor. There is no need to write out the details. We turn to the question of deﬁning a multiplication operation on tensors. If K is a ﬁeld, an algebra4 over K is a vector space V over K with a multiplication or product operation V × V → V that is K bilinear. The additive part of the K bilinearity means that the product operation satisﬁes the distributive laws a(b + c) = ab + ac

and

(b + c)a = ba + ca

for all a, b, c in V,

and the scalar-multiplication part of the K bilinearity means that (ka)b = k(ab) = a(kb)

for all k in K and a, b in V.

Within the text of the book, we shall work mostly just with associative algebras, i.e., those algebras satisfying the usual associative law a(bc) = (ab)c 4 Some

for all a, b, c in V.

authors use the term “algebra” to mean what we shall call an “associative algebra.”

VI. Multilinear Algebra

278

An associative algebra is therefore a ring and a vector space, the scalar multiplication and the ring multiplication being linked by the requirement that (ka)b = k(ab) = a(kb) for all scalars k. Some commutative examples of associative algebras over K are any ﬁeld L containing K, the polynomial algebra K[X 1 , . . . , X n ], and the algebra of all K-valued functions on a nonempty set S. Two noncommutative examples of associative algebras over K are the matrix algebra Mn (K), with matrix multiplication as its product, and HomK (V, V ) for any vector space V , with composition as its product. The division ring H of quaternions (Example 10 in Section IV.1) is another example of a noncommutative associative algebra over R. Despite our emphasis on algebras that are associative, certain kinds of nonassociative algebras are of great importance in applications, and consequently several problems at the end of the chapter make use of nonassociative algebras. A nonassociative algebra is determined by its vector-space structure and the multiplication table for the members of a K basis. There is no restriction on the multiplication table; all multiplication tables deﬁne algebras. Perhaps the bestknown nonassociative algebra is the 3-dimensional algebra over R determined by vector product in R3 . A basis is {i, j, k}, the multiplication operation is denoted by ×, and the multiplication table is i × i = 0,

i × j = k,

i × k = −j,

j × i = −k,

j × j = 0,

j × k = i,

k × i = j,

k × j = −i,

k × k = 0.

Since i × (i × k) = i × (−j) = −k and (i × i) × k = 0, vector product is not associative. The vector-product algebra is a special case of a Lie algebra; Lie algebras are deﬁned in Problems 31–35 at the end of the chapter. Tensor algebras, which we shall now construct, will be associative algebras. Fix a vector space E over K, and for integers n ≥ 1, let T n (E) be the n-fold tensor product of E with itself. In the case n = 0, we let T 0 (E) be the ﬁeld K. Deﬁne, initially as a vector space, T (E) to be the direct sum T (E) =

∞

T n (E).

n=0

The elements that lie in one or another T n (E) are called homogeneous. We deﬁne a bilinear multiplication on homogeneous elements T m (E) × T n (E) → T m+n (E) to be the restriction of the canonical isomorphism T m (E) ⊗K T n (E) → T m+n (E)

7. Tensor Algebra

279

resulting from iterating Proposition 6.19b. This multiplication, denoted by ⊗, is associative, as far as it goes, because the restriction of the K isomorphism T l (E) ⊗K (T m (E) ⊗K T n (E)) → (T l (E) ⊗K T m (E)) ⊗K T n (E) to T l (E) × (T m (E) × T n (E)) factors through the map T l (E) × (T m (E) × T n (E)) → (T l (E) × T m (E)) × T n (E) given by (r, (s, t)) → ((r, s), t). This much tells how to multiply homogeneous elements in T (E). nSince each element t in T (E) has a unique expansion as a ﬁnite sum t = k=0 tk with k tk ∈ T (E), we can deﬁne the product of this t and the element t = nk=0 tk to n+n be the element t ⊗ t = l=0 k+k =l (tk ⊗ tk ); the expression k+k =l (tk ⊗ tk ) l is the component of the product in T (E). Multiplication is thereby well deﬁned in T (E), and it satisﬁes the distributive laws and is associative. Thus T (E) becomes an associative algebra with a (two-sided) identity, namely the element 1 in T 0 (E). In the presence of the identiﬁcation ι : E → T 1 (E), T (E) is known as the tensor algebra of E. The pair (T (E), ι) has the universal mapping property given in Proposition 6.22 and pictured in Figure 6.5. E ⏐ ⏐ ι

l

−−−→ A L

T (E) FIGURE 6.5. University mapping property of a tensor algebra. Proposition 6.22. The pair (T (E), ι) has the following universal mapping property: whenever l : E → A is a linear map from E into an associative algebra with identity, then there exists a unique associative algebra homomorphism L : T (E) → A with L(1) = 1 such that the diagram in Figure 6.5 commutes. PROOF. Uniqueness is clear, since E and 1 generate T (E) as an algebra. For existence we deﬁne L (n) on T n (E) to be the linear extension of the n-multilinear map (v1 , v2 , . . . , vn ) → l(v1 )l(v2 ) · · · l(vn ), (n) and we let L = L in obvious notation. Let u 1 ⊗ · · · ⊗ u m be in T m (E) and v1 ⊗ · · · ⊗ vn be in T n (E). Then we have L (m) (u 1 ⊗ · · · ⊗ u m ) = l(u 1 ) · · · l(u m ), L (n) (v1 ⊗ · · · ⊗ vn ) = l(v1 ) · · · l(vn ), L (m+n) (u 1 ⊗ · · · ⊗ u m ⊗ v1 ⊗ · · · ⊗ vn ) = l(u 1 ) · · · l(u m )l(v1 ) · · · l(vn ).

280

VI. Multilinear Algebra

Hence L (m) (u 1 ⊗ · · · ⊗ u m )L (n) (v1 ⊗ · · · ⊗ vn ) = L (m+n) (u 1 ⊗ · · · ⊗ u m ⊗ v1 ⊗ · · · ⊗ vn ). Taking linear combinations, we see that L is a homomorphism.

Proposition 6.22 allows us to make E → T (E) into a functor from the category of vector spaces over K to the category of associative algebras with identity over K. To carry out the construction, we suppose that ϕ : E → F is a linear map between two vector spaces over K. If i : E → T (E) and j : F → T (F) are the inclusion maps, then jϕ is a linear map from E into T (F), and Proposition 6.22 produces a unique algebra homomorphism : T (E) → T (F) carrying 1 to 1 and satisfying i = ϕ. Then the tensor-product functor is deﬁned to carry the linear map ϕ to the homomorphism of associative algebras with identity. For the situation in which R is a commutative ring with identity, Section IV.5 introduced the ring R[X 1 , . . . , X n ] of polynomials in n commuting indeterminates with coefﬁcients in R. This ring was characterized by a universal mapping property saying that if a ring homomorphism of R into a commutative ring with identity were given and if n elements t1 , . . . , tn were given, then the ring homomorphism of R could be extended uniquely to a ring homomorphism of R[X 1 , . . . , X n ] carrying X j into t j for each j. Proposition 6.22 yields a noncommutative version of this result, except that the ring of coefﬁcients is assumed this time to be a ﬁeld K. To arrange for X 1 , . . . , X n to be noncommuting indeterminates, we form a vector space with {X 1 , . . . , X n } as a basis. Thus we let E = nj=1 KX j . If t1 , . . . , tn are arbitrary elements of an associative algebra A with identity, then the formulas l(X j ) = t j for 1 ≤ j ≤ n deﬁne a linear map l : E → A. The associative-algebra homomorphism L : T (E) → A produced by the proposition extends the inclusion of K into the subﬁeld K1 of A and carries each X j to t j .

8. Symmetric Algebra We continue to allow K to be an arbitrary ﬁeld. Let E be a vector space over K, and let T (E) be the tensor algebra. We begin by deﬁning the symmetric algebra S(E). This is to be a version of T (E) in which the elements, which are called symmetric tensors, commute with one another. It will not be canonically an algebra of polynomials, as we shall see presently, and thus we make no use of polynomial rings in the construction. Just as the vector space of n-multilinear forms E ×· · ·× E → K is canonically the dual of T n (E), so the vector space of symmetric n-multilinear forms will be

8. Symmetric Algebra

281

canonically the dual of S n (E). Here “symmetric” means that f (x1 , . . . , xn ) = f (xτ (1) , . . . , xτ (n) ) for every permutation τ in the symmetric group Sn . Since tensor algebras are supposed to be universal devices for constructing associative algebras over K, whether commutative or not, we seek to form S(E) as a quotient of T (E). If q is the quotient homomorphism, we want to have q(u ⊗ v) = q(v ⊗ u) in S(E) whenever u and v are in ι(E) = T 1 (E). Hence every element u ⊗ v − v ⊗ u is to be in the kernel of the homomorphism. On the other hand, we do not want to impose any unnecessary conditions on our quotient, and so we factor out only what the elements u ⊗ v − v ⊗ u force us to factor out. Thus we deﬁne the symmetric algebra by S(E) = T (E)/I, where

I =

two-sided ideal generated by all u ⊗ v − v ⊗ u with u and v in T 1 (E)

.

Then S(E) is an associative algebra with identity. Let us see that the fact that the generators of the ideal I are homogeneous elements (all being in T 2 (E)) implies that I =

∞

(I ∩ T n (E)).

n=0

In fact, each I ∩ T n (E) is contained in I , and hence I contains the right side. On the other hand, if x is any element of I , then x is a sum of terms of the form a ⊗ (u ⊗ v − v ⊗ u) ⊗ b, and we may assume that each a and b is homogeneous. Any individual term a ⊗ (u ⊗ v − v ⊗ u) ⊗ b is in some I ∩ T n (E), and x is exhibited as a sum of members of the various intersections I ∩ T n (E). n An ideal with the property I = ∞ n=0 (I ∩ T (E)) is said to be homogeneous. Since I is homogeneous, S(E) =

∞

T n (E)/(I ∩ T n (E)).

n=0

We write S n (E) for the n th summand on the right side, so that S(E) =

∞

S n (E).

n=0

Since I ∩ T 1 (E) = 0, the map of E → T 1 (E) → S 1 (E) into ﬁrst-order elements is one-one onto. The product operation in S(E) is written without a product sign,

282

VI. Multilinear Algebra

the image in S n (E) of v1 ⊗ · · · ⊗ vn in T n (E) being written as v1 · · · vn . If a is in S m (E) and b is in S n (E), then ab is in S m+n (E). Moreover, S n (E) is generated by elements v1 · · · vn with all v j in S 1 (E) ∼ = E, since T n (E) is generated by corresponding elements v1 ⊗ · · · ⊗ vn . The deﬁning relations for S(E) make vi v j = v j vi for vi and v j in S 1 (E), and it follows that the associative algebra S(E) is commutative. Proposition 6.23. Let E be a vector space over the ﬁeld K. (a) Let ι be the n-multilinear function ι(v1 , . . . , vn ) = v1 · · · vn of E × · · · × E into S n (E). Then (S n (E), ι) has the following universal mapping property: whenever l is any symmetric n-multilinear map of E × · · · × E into a vector space U , then there exists a unique linear map L : S n (E) → U such that the diagram l

E × · · · × E −−−→ U ⏐ ⏐ ι L S n (E) commutes. (b) Let ι be the one-one linear function that embeds E as S 1 (E) ⊆ S(E). Then (S(E), ι) has the following universal mapping property: whenever l is any linear map of E into a commutative associative algebra A with identity, then there exists a unique algebra homomorphism L : S(E) → A with L(1) = 1 such that the diagram E ⏐ ⏐ ι

l

−−−→ A L

S(E) commutes. PROOF. In both cases uniqueness is trivial. For existence we use the universal mapping properties of T n (E) and T (E) to produce 3 L on T n (E) or T (E). If we 3 can show that L annihilates the appropriate subspace so as to descend to S n (E) or S(E), then the resulting map can be taken as L, and we are done. For (a), we L(T n (E) ∩ I ) = 0, where I is have 3 L : T n (E) → U , and we are to show that 3 generated by all u ⊗ v − v ⊗ u with u and v in T 1 (E). A member of T n (E) ∩ I is thus of the form ai ⊗ (u i ⊗ vi − vi ⊗ u i ) ⊗ bi with each term in T n (E). Each term here is a sum of pure tensors x1 ⊗· · ·⊗ xr ⊗u i ⊗vi ⊗ y1 ⊗· · ·⊗ ys − x1 ⊗· · ·⊗ xr ⊗vi ⊗u i ⊗ y1 ⊗· · ·⊗ ys (∗)

8. Symmetric Algebra

283

with r + 2 + s = n. Since l by assumption takes equal values on x1 × · · · × xr × u i × vi × y1 × · · · × ys and

x1 × · · · × xr × vi × u i × y1 × · · · × ys ,

3 L vanishes on (∗), and it follows that 3 L(T n (E) ∩ I ) = 0. For (b) we are to show that 3 L : T (E) → A vanishes on I . Since ker 3 L is an ideal, it is enough to check that 3 L vanishes on the generators of I . But 3 L(u ⊗ v − v ⊗ u) = l(u)l(v) − l(v)l(u) = 0 by the commutativity of A, and thus L(I ) = 0. Corollary 6.24. If E and F are vector spaces over the ﬁeld K, then the vector space HomK (S n (E), F) is canonically isomorphic (via restriction to pure tensors) to the vector space of all F-valued symmetric n-multilinear functions on E × · · · × E. PROOF. Restriction is linear and one-one. It is onto by Proposition 6.23a. Corollary 6.25. If E is a vector space over the ﬁeld K, then the dual (S n (E)) of S n (E) is canonically isomorphic (via restriction to pure tensors) to the vector space of symmetric n-multilinear forms on E × · · · × E. PROOF. This is a special case of Corollary 6.24.

If ϕ : E → F is a linear map between vector spaces, then we can use Proposition 6.23b to deﬁne a corresponding homomorphism : S(E) → S(F) of associative algebras with identity. In this way, we can make E → S(E) into a functor from the category of vector spaces over K to the category of commutative associative algebras with identity over K. The details appear in Problem 14 at the end of the chapter. Next we shall identify a basis for S n (E) as a vector space. The union of such bases as n varies will then be a basis of S(E). Let {u i }i∈A be a basis of E, possibly inﬁnite. As noted in Section A5 of the appendix, a simple ordering on the index set A is a partial ordering in which every pair of elements is comparable and in which a ≤ b and b ≤ a together imply a = b. Proposition 6.26. Let E be a vector space over the ﬁeld K, let {u i }i∈A be a basis of E, and suppose that a simple ordering has been imposed on the index set j j A. Then the set of all monomials u i11 · · · u ikk with i 1 < · · · < i k and m jm = n is a basis of S n (E).

284

VI. Multilinear Algebra

REMARK. In particular if E is ﬁnite-dimensional with (u 1 , . . . , u N ) as an j j ordered basis, then the monomials u 11 · · · u NN of total degree n form a basis of n S (E). PROOF. Since S(E) is commutative and since n-fold products of elements ι(u i ) in T 1 (E) span T n (E), the indicated set of monomials spans S n (E). Let us see that the independent. Take any ﬁnite subset F ⊆ A of indices. The map set is linearly c u → i∈A i i i∈F ci X i of E into the polynomial algebra K[{X i }i∈F ] is linear into a commutative algebra with identity. Its extension via Proposition 6.23b maps all monomials in the u i for i ∈ F into distinct monomials in K[{X i }i∈F ], which are necessarily linearly independent. Hence any ﬁnite subset of the monomials in the statement of the proposition is linearly independent, and the whole set must be linearly independent. Therefore our spanning set is a basis. The proof of Proposition 6.26 shows that S(E) may be identiﬁed with polynomials in indeterminates identiﬁed with members of E once a basis has been chosen, but this identiﬁcation depends on the choice of basis. Indeed, if we think of E as speciﬁed in advance, then the isomorphism was set up by mapping the set {X i }i∈A to the speciﬁed basis of E, and the result certainly depended on what basis was used. Nevertheless, if E is ﬁnite-dimensional, there is still an isomorphism that is independent of basis; it is between S(E ), where E is the dual of E, and a natural basis-free notion of “polynomials” on E. We return to this point after one application of Proposition 6.26. Corollary 6.27. Let E be a ﬁnite-dimensional vector space over K of dimension N . Then

n+ N −1 n for 0 ≤ n < ∞, (a) dim S (E) = N −1 (b) S n (E ) is canonically isomorphic to S n (E) in such a way that ( f 1 · · · f n )(w1 · · · wn ) =

n

f j (wτ ( j)) ),

τ ∈Sn j=1

for any f 1 , . . . , f n in E and any w1 , . . . , wn in E, provided K has characteristic 0; here Sn is the symmetric group on n letters. PROOF. For (a), a basis has been described in Proposition 6.26. To see its cardinality, we recognize that picking out N − 1 objects from n + N − 1 to label as dividers is a way of assigning exponents

to the u j’s in an ordered basis; thus n+ N −1 the cardinality of the indicated basis is . N −1

8. Symmetric Algebra

285

For (b), let f 1 , . . . , f n be in E and w1 , . . . , wn be in E, and deﬁne l f1 ,..., fn (w1 , . . . , wn ) =

n

f j (wτ ( j)) ).

τ ∈Sn j=1

Then l f1 ,..., fn is symmetric n-multilinear from E × · · · × E into K and extends by Proposition 6.23a to a linear L f1 ,..., fn : S n (E) → K. Thus l( f 1 , . . . , f n ) = L f1 ,..., fn deﬁnes a symmetric n-multilinear map of E × · · · × E into S n (E) . Its linear extension L maps S n (E ) into S n (E) . To complete the proof, we shall show that L carries basis to basis. Let . , u N be the dual basis. Part u 1 , . . . , u N be an ordered basis of E, and let u 1 , . . j1 jN (a) shows that the elements (u 1 ) · · · (u N ) with m jm = n form a basis of S n (E ) and that the elements (u 1 )k1 · · · (u N )k N with m km = n form a basis of S n (E). We show that L of the basis of S n (E ) is the dual basis of the basis of S n (E), except for positive-integer factors. Thus let all of f 1 , . . . , f j1 be u 1 , let all of f j1 +1 , . . . , f j1 + j2 be u 2 , and so on. Similarly let all of w1 , . . . , wk1 be u 1 , let all of wk1 +1 , . . . , wk1 +k2 be u 2 , and so on. Then L((u 1 ) j1 · · · (u N ) jN )((u 1 )k1 · · · (u N )k N ) = L( f 1 · · · f n )(w1 · · · wn ) = l( f 1 , . . . , f n )(w1 · · · wn ) n = f i (wτ (i)) ). τ ∈Sn i=1

For given τ , the product on the right side is 0 unless, for each index i, an inequality jm−1 + 1 ≤ i ≤ jm implies that km−1 + 1 ≤ τ (i) ≤ km . In this case the product is 1; so the right side counts the number of such τ ’s. For given τ , obtaining a nonzero product forces km = jm for all m. And when km = jm for all m, the choice τ = 1 does lead to product 1. Hence the members of L of the basis are positive-integer multiples of the members of the dual basis, as asserted. Let us return to the question of introducing a basis-free notion of polynomials on the vector space E under the assumption that E is ﬁnite-dimensional. We take a cue from Corollary 4.32, which tells us that the evaluation homomorphism carrying K[X 1 , . . . , X n ] to the algebra of K-valued polynomial functions of (t1 , . . . , tn ) is one-one if K is an inﬁnite ﬁeld. We regard the latter as the algebra of polynomial functions on Kn , and we check what happens when we carry the vector space E over to Kn by ﬁxing a basis. Let = {x1 , . . . , xn } be a basis of E, and let = {x1 , . . . , xn } be the dual basis of E . If e = t1 x1 + · · · + tn xn is the expansion of a member of E in terms of , then we have x j (e) = t j . Thus the polynomial functions t j are given by the members of the dual basis. The vector

286

VI. Multilinear Algebra

space of all homogeneous ﬁrst-degree polynomial functions is the set of linear combinations of the t j ’s, and these are given by arbitrary linear functionals on E. Thus the vector space of homogeneous ﬁrst-degree polynomial functions on E is just the dual space E , and this conclusion does not depend on the choice of basis. The algebra of all polynomial functions on E is then the algebra of all K-valued functions on E generated by E and the constant functions. This discussion tells us unambiguously what polynomial functions on E are to be, and we want to backtrack to handle abstract polynomials on E. Although the evaluation homomorphism from K[X 1 , . . . , X n ] to the algebra of polynomial functions on Kn may fail to be one-one if K is a ﬁnite ﬁeld, its restriction to homogeneous ﬁrst-degree polynomials is one-one. Thus, whatever we might mean by the vector space of homogeneous ﬁrst-degree polynomials on E, the evaluation mapping should exhibit this space as isomorphic to E . Armed with these clues, we deﬁne the polynomial algebra P(E) on E to be the symmetric algebra S(E ) if E is ﬁnite-dimensional. We need an evaluation mapping for each point e of E, and we obtain this from the universal mapping property of symmetric algebras (Proposition 6.23b): With e ﬁxed, we have a linear map l from the vector space E to the commutative associative algebra K given with l(e ) = e (e). The universal mapping property gives us a unique algebra homomorphism L : S(E ) → K that extends l and carries 1 to 1. The algebra homomorphism L is then a multiplicative linear functional on P(E) = S(E ) that carries 1 to 1 and agrees with evaluation at e on homogeneous ﬁrstdegree polynomials. We write this homomorphism as p → p(e), and we deﬁne P n (E) = S n (E ); this is the vector space of homogeneous n th -degree polynomials on E. A conﬁrmation that P(E) is indeed to be regarded as the algebra of abstract polynomials on E comes from the following. Proposition 6.28. If E is a ﬁnite-dimensional vector space over the ﬁeld K, then the system of evaluation homomorphisms P(E) → K on polynomials given by p → { p(e)}e∈E is an algebra homomorphism of P(E) onto the algebra of K-valued polynomial functions on E that carries the identity to the constant function 1, and it is one-one if K is an inﬁnite ﬁeld. PROOF. Certainly p → { p(e)}e∈E is an algebra homomorphism of P(E) into the algebra of K-valued polynomial functions on E, and it carries the identity to the constant function 1. We have seen that the image of P 1 (E) is exactly E , and hence the image of P(E) is the algebra of K-valued functions on E generated by E and the constants. This is exactly the algebra of all K-valued polynomial functions, and hence the mapping is onto. Suppose that K is inﬁnite. The restriction of p → { p(e)}e∈E to the ﬁnitedimensional subspace P n (E) of P(E) maps into the ﬁnite-dimensional subspace of all polynomial functions on E homogeneous of degree n, and this restriction

8. Symmetric Algebra

287

must therefore be onto. We can read off the dimension of the space of all polynomial functions on E homogeneous of degree n from Corollary 4.32 and Corollary 6.27a. This dimension matches the dimension of P n (E), according to Corollary 6.27a. Since the mapping is onto and the ﬁnite dimensions match, the restricted mapping is one-one. Hence p → { p(e)}e∈E is one-one. We have deﬁned the symmetric algebra S(E) as a quotient of the tensor algebra T (E). Now let us suppose that K has characteristic 0. With this hypothesis we shall be able to identify an explicit vector subspace of T (E) that maps one-one onto S(E) during the passage to the quotient. This subspace of T (E) can therefore be viewed as a version of S(E) for some purposes. We deﬁne an n-multilinear function from E × · · · × E into T n (E) by (v1 , . . . , vn ) →

1 vτ (1) ⊗ · · · ⊗ vτ (n) , n! τ ∈S n

and let σ : T n (E) → T n (E) be its linear extension. We call σ the symmetrizer operator. The image of σ in T (E) is denoted by 3 S n (E), and the members of this subspace are called symmetrized tensors. Proposition 6.29. Let the ﬁeld K have characteristic 0, and let E be a vector space over K. Then the symmetrizer operator σ satisﬁes σ 2 = σ . The kernel of σ on T n (E) is exactly T n (E) ∩ I , and therefore S n (E) ⊕ (T n (E) ∩ I ). T n (E) = 3 REMARK. In view of this corollary, the quotient map T n (E) → S n (E) carries 3 S n (E) one-one onto S n (E). Thus 3 S n (E) can be viewed as a copy of S n (E) n embedded as a direct summand of T (E). PROOF. We have 1 vρτ (1) ⊗ · · · ⊗ vρτ (n) (n!)2 ρ,τ ∈S n 1 = vω(1) ⊗ · · · ⊗ vω(n) (n!)2 ρ∈S ω∈S ,

σ 2 (v1 ⊗ · · · ⊗ vn ) =

n

=

n

(ω=ρτ )

1 σ (v1 ⊗ · · · ⊗ vn ) n! ρ∈S n

= σ (v1 ⊗ · · · ⊗ vn ).

VI. Multilinear Algebra

288

Hence σ 2 = σ . Thus σ ﬁxes any member of image σ , and it follows that image σ ∩ ker σ = 0. Consequently T n (E) is the direct sum of image σ and ker σ . We are left with identifying ker σ as T n (E) ∩ I . The subspace T n (E) ∩ I is spanned by elements x1 ⊗ · · · ⊗ xr ⊗ u ⊗ v ⊗ y1 ⊗ · · · ⊗ ys − x1 ⊗ · · · ⊗ xr ⊗ v ⊗ u ⊗ y1 ⊗ · · · ⊗ ys with r + 2 + s = n, and the symmetrizer σ certainly vanishes on such elements. Hence T n (E) ∩ I ⊆ ker σ . Suppose that the inclusion is strict, say with t in ker σ but t not in T n (E) ∩ I . Let q be the quotient map T n (E) → S n (E). The kernel of q is T n (E) ∩ I , and thus q(t) = 0. From Proposition 6.26 the T (E) monomials in basis elements from E with increasing indices map onto a basis of S(E). Since K has characteristic 0, the symmetrized versions of these monomials map to nonzero multiples of the images of the initial monomials. S n (E) Consequently q carries 3 S n (E) = image σ onto S n (E). Thus choose t ∈ 3 n with q(t ) = q(t). Then t − t is in ker q = T (E) ∩ I ⊆ ker σ . Since σ (t) = 0, we see that σ (t ) = 0. Consequently t is in ker σ ∩ image σ = 0, and we obtain t = 0 and q(t) = q(t ) = 0, contradiction. 9. Exterior Algebra We turn to a discussion of the exterior algebra. Let K be an arbitrary ﬁeld, and let E be a vector 4space over K. The construction, results, and proofs for the exterior algebra (E) are similar to those for the symmetric algebra S(E). The 4 elements of (E) are to be all the alternating tensors (= skew-symmetric if K has characteristic = 2), and so we want to force v ⊗ v = 0. Thus we deﬁne the exterior algebra by 4 (E) = T (E)/I , where

I =

two-sided ideal generated by all v ⊗ v with v in T 1 (E)

.

4 Then (E) is an associative algebra with identity. ∞ n It is clear that I is homogeneous: I = n=0 (I ∩ T (E)). Thus we can write 4 n n (E) = ∞ n=0 T (E)/(I ∩ T (E)). 4 We write n (E) for the n th summand on the right side, so that 4

(E) =

∞ 4n n=0

(E).

9. Exterior Algebra

289

4 Since I ∩ T 1 (E) = 0, the map4 of E into ﬁrst-order elements 1 (E) is one-one onto. The product operation in (E) is denoted by ∧ rather than ⊗, the image 4m in 4 n n (E) of v4 denoted by v1 ∧ · · · ∧ v4 (E) 1 ⊗ · · · vn in T (E) being4 n . If a is in and b is in n (E), then a ∧ b is in m+n (E). Moreover, n (E) is generated 4 is generated by elements v1 ∧ · · · ∧ vn with all v j in 1 (E) ∼ = E, since T n (E) 4 by corresponding elements v1 ⊗ · · · ⊗ vn . The deﬁning relations for (E) make 4 vi ∧ v j = −v j ∧ vi for vi and v j in 1 (E), and it follows that 4 4 a ∧ b = (−1)mn b ∧ a for a ∈ m (E) and b ∈ n (E). Proposition 6.30. Let E be a vector space over the ﬁeld K. (a)4Let ι be the n-multilinear function ι(v1 , . . . , vn ) = v1 ∧· · ·∧vn of E×· · ·×E 4 into n (E). Then ( n (E), ι) has the following universal mapping property: whenever l is any alternating n-multilinear map 4nof E ×· · ·× E into a vector space (E) → U such that the diagram U , then there exists a unique linear map L : l

E × · · · × E −−−→ U ⏐ ⏐ ι L 4n (E) commutes. 4 4 4 (b) Let ι be the function that embeds E as 1 (E) ⊆ (E). Then ( (E), ι) has the following universal mapping property: whenever l is any linear map of 2 E into an associative algebra A with identity such that 4 l(v) = 0 for all v ∈ E, then there exists a unique algebra homomorphism L : (E) → A with L(1) = 1 such that the diagram l

E −−−→ A ⏐ ⏐ ι L 4 (E) commutes. PROOF. The proof is completely analogous to the proof of Proposition 6.23. Corollary 6.31. 4 If E and F are vector spaces over the ﬁeld K, then the vector space HomK ( n (E), F) is canonically isomorphic (via restriction to pure tensors) to the vector space of all F-valued alternating n-multilinear functions on E × · · · × E. PROOF. Restriction is linear and one-one. It is onto by Proposition 6.30a.

290

VI. Multilinear Algebra

4n Corollary 6.32. If E is a vector space over the ﬁeld K, then the dual ( (E)) 4n of (E) is canonically isomorphic (via restriction to pure tensors) to the vector space of alternating n-multilinear forms on E × · · · × E. PROOF. This is a special case of Corollary 6.31.

If ϕ : E → F is a linear map between vector spaces, then 4 we can 4 use Proposition 6.30b to deﬁne a corresponding homomorphism : (E) → (F) 4 of associative algebras with identity. In this way, we can make E → (E) into a functor from the category of vector spaces over K to the category of commutative associative algebras with identity over K. We omit the details, which are similar to those for symmetric tensors. 4 Next we shall identify a basis for n (E) 4 as a vector space. The union of such bases as n varies will then be a basis of (E). Proposition 6.33. Let E be a vector space over the ﬁeld K, let {u i }i∈A be a basis of E, and suppose that a simple ordering has been imposed on the index set A. Then the set of all monomials u i1 ∧ · · · ∧ u in with i 1 < · · · < i n is a basis of 4 n (E). 4 PROOF in (E) satisﬁes a ∧ b = (−1)mn b ∧ a for 4n 4m . Since multiplication (E) and since monomials span T n (E), the indicated set a ∈ 4(E) and b ∈ n (E). Let us see that the set is linearly independent. For i ∈ A, let u i be spans the member of E with u i (u j ) equal to 1 for j = i and equal to 0 for j = i. Fix r1 < · · · < rn , and deﬁne l(w1 , . . . , wn ) = det{u r i (w j )}

for w1 , . . . , wn in E.

Then l is alternating n-multilinear from E × · · · × E into K and extends by 4 Proposition 6.30a to L : n (E) → K. If k1 < · · · < kn , then L(u k1 ∧ · · · ∧ u kn ) = l(u k1 , . . . , u kn ) = det{u r i (u k j )}, and the right side is 0 unless r1 = k1 , . . . , rn = kn , in which 4 case it is 1. This proves that the u r1 ∧ · · · ∧ u rn are linearly independent in n (E). Corollary 6.34. Let E be a ﬁnite-dimensional vector space over K of dimension N . Then

4 N for 0 ≤ n ≤ N and = 0 for n > N , (a) dim n (E) = n 4 4n (E ) is canonically isomorphic to n (E) by (b) ( f 1 ∧ · · · ∧ f n )(w1 , . . . , wn ) = det{ f i (w j )}.

9. Exterior Algebra

291

PROOF. Part (a) is an immediate consequence of Proposition 6.33, and (b) is proved in the same way as Corollary 6.27b, using Proposition 6.30a as a tool. The “positive-integer multiples” that arise in the proof of Corollary 6.27b are all 1 in the current proof, and hence no restriction on the characteristic of K is needed. Now let us suppose that K has characteristic 0. We deﬁne an n-multilinear function from E × · · · × E into T n (E) by (v1 , . . . , vn ) →

1 (sgn τ )vτ (1) ⊗ · · · ⊗ vτ (n) , n! τ ∈S n

and let σ : T n (E) → T n (E) be its linear extension. We call σ the antisym4n metrizer operator. The image of σ in T (E) is denoted by 3 (E), and the members of this subspace are called antisymmetrized tensors. Proposition 6.35. Let the ﬁeld K have characteristic 0, and let E be a vector space over K. Then the antisymmetrizer operator σ satisﬁes σ 2 = σ . The kernel of σ on T n (E) is exactly T n (E) ∩ I , and therefore 4n T n (E) = 3 (E) ⊕ (T n (E) ∩ I ). 4 REMARK. In view of this corollary, the quotient map T n (E) → n (E) carries 4 3 n (E) can be viewed as a copy of 4n (E) 3 n (E) one-one onto 4n (E). Thus 4 embedded as a direct summand of T n (E). PROOF. We have 1 (sgn ρτ )vρτ (1) ⊗ · · · ⊗ vρτ (n) 2 (n!) ρ,τ ∈S n 1 = (sgn ω)vω(1) ⊗ · · · ⊗ vω(n) (n!)2 ρ∈S ω∈S ,

σ 2 (v1 ⊗ · · · ⊗ vn ) =

n

=

n

(ω=ρτ )

1 σ (v1 ⊗ · · · ⊗ vn ) n! ρ∈S n

= σ (v1 ⊗ · · · ⊗ vn ). Hence σ 2 = σ . Consequently T n (E) is the direct sum of image σ and ker σ , and we are left with identifying ker σ as T n (E) ∩ I . The subspace T n (E) ∩ I is spanned by elements x1 ⊗ · · · ⊗ xr ⊗ v ⊗ v ⊗ y1 ⊗ · · · ⊗ ys

292

VI. Multilinear Algebra

with r +2+s = n, and the antisymmetrizer σ certainly vanishes on such elements. Hence T n (E) ∩ I ⊆ ker σ . Suppose that the inclusion is strict, say 4 with t in ker σ but t not in T n (E) ∩ I . Let q be the quotient map T n (E) → n (E). The kernel of q is T n (E) ∩ I , and thus q(t) = 0. From Proposition46.33 the T (E) monomials with strictly increasing indices map onto a basis of (E). Since K has characteristic 0, the antisymmetrized versions of these monomials map to nonzero multiples of the images of the initial monomials. Consequently q carries 4 3 n (E) with q(t ) = q(t). 3 n (E) = image σ onto 4n (E). Thus choose t ∈ 4 Then t − t is in ker q = T n (E) ∩ I ⊆ ker σ . Since σ (t) = 0, we see that σ (t ) = 0. Consequently t is in ker σ ∩ image σ = 0, and we obtain t = 0 and q(t) = q(t ) = 0, contradiction.

10. Problems 1.

2.

Let V be a vector space over a ﬁeld K, and let · , · be a nondegenerate bilinear form on V . (a) Prove that every member v of V is of the form v (w) = v, w for one and only one member v of V . (b) Suppose that ( · , · ) is another bilinear form on V . Prove that there is some linear function L : V → V such that (v, w) = L(v), w for all v and w in V . The matrix A = 01 10 with entries in F2 is symmetric. Prove that there is no nonsingular M with M t AM diagonal.

3.

This problem shows that one possible generalization of Sylvester’s Law to other ﬁelds is not valid. Over theﬁeldF3 , show that there is a nonsingular matrix −1 0 M such that = M t 10 01 M. Conclude that the number of squares in 0 −1

K× among the diagonal entries of the diagonal form in Theorem 6.5 is not an invariant of the symmetric matrix. 4.

Let V be a complex n-dimensional vector space, let ( · , · ) be a Hermitian form on V , let VR be the 2n-dimensional real vector space obtained from V by restricting scalar multiplication to real scalars, and deﬁne · , · = Im( · , · ). Prove that (a) · , · is an alternating bilinear form on VR , (b) J (v1 ), J (v2 ) = v1 , v2 for all v1 and v2 if J : VR → VR is what multiplication by i becomes when viewed as a linear map from VR to itself, (c) · , · is nondegenerate on VR if and only if ( · , · ) is nondegenerate on V .

5.

Let W be a 2n-dimensional real vector space, and let · , · be a nondegenerate alternating bilinear form on W . Suppose that J : W → W is a linear map such

10. Problems

293

that J 2 = −I and J (w1 ), J (w2 ) = w1 , w2 for all w1 and w2 in W . Prove that W equals VR for some n-dimensional complex vector space V possessing a Hermitian form whose imaginary part is · , · . 6.

This problem sharpens the result of Theorem 6.7 in the nondegenerate case. Let · , · be a nondegenerate alternating bilinear form on a 2n-dimensional vector space V over K. A vector subspace S of V is called an isotropic subspace if u, v = 0 for all u and v in S. Prove that (a) any isotropic subspace of V that is maximal under inclusion has dimension n, (b) for any maximal isotropic subspace S1 , there exists a second maximal isotropic subspace S2 such that S1 ∩ S2 = 0. (c) if S1 and S2 are maximal isotropic subspaces of V such that S1 ∩ S2 = 0, then the linear map S2 → S1 given by s2 → · , s2 S1 is an isomorphism of S2 onto the dual space S1 . (d) if S1 and S2 are maximal isotropic subspaces of V such that S1 ∩ S2 = 0, then there exist bases { p1 , . . . , pn } of S1 and {q1 , . . . , qn } of S2 such that pi , p j = qi , q j = 0 and pi , q j = δi j for all i and j. (The resulting basis { p1 , . . . , pn , q1 , . . . , qn } of V is called a Weyl basis of V .)

7.

Let S be a nonempty set, and let K be a ﬁeld. For s in S, let Us and Vs be vector spaces over K, and let U and V be two vector spaces over K. further ∼ (a) Prove that HomK s∈S Us , V = s∈S HomK (Us , V ). (b) Prove that HomK U, s∈S Vs ∼ = s∈S HomK (U, Vs ). (c) Give examples to show that neither isomorphism in (a) and (b) need remain valid if all three direct products are changed to direct sums.

8.

This problem continues Problem 1 at the end of Chapter V, which established a canonical-form theorem for an action of G L(m, K) × G L(n, K) on m-byn matrices. For the present problem, the group G L(n, K) acts on Mn (K) by (g, x) → gxg t . (a) Verify that this is indeed a group action and that the vector subspaces Ann (K) of alternating matrices and Snn (K) of symmetric matrices are mapped into themselves under the group action. (b) Prove that two members of Ann (K) lie in the same orbit if and only if they have the same rank, and that the rank must be even. For each even rank ≤ n, ﬁnd an example of a member of Ann (K) with that rank. (c) Prove that two members of Snn (C) lie in the same orbit if and only if they have the same rank, and for each rank ≤ n, ﬁnd an example of a member of Snn (C) with that rank.

9.

Let U and V be vector spaces over K, and let U be the dual of U . The bilinear map (u , v) → u ( · )v of U × V into HomK (U, V ) extends to a linear map TU V : U ⊗K V → HomK (U, V ).

VI. Multilinear Algebra

294

(a) (b) (c) (d)

Prove that TU V is one-one. Prove that TU V is onto HomK (U, V ) if U is ﬁnite-dimensional. Give an example for which TU V is not onto HomK (U, V ). Let C be the category of all vector spaces over K, and let and be the functors from C × C into C whose effects on objects are (U, V ) = U ⊗K V and (U, V ) = HomK (U, V ). Prove that the system {TU V } is a natural transformation of into . (e) In view of (c), can the system {TU V } be a natural isomorphism?

10. Let K ⊆ L be an inclusion of ﬁelds, and let VK and VL be the categories of vector spaces over K and L. Section 6 of the text deﬁned extension of scalars as a covariant functor (E) = E ⊗K L. Another deﬁnition of extension of scalars is (E) = HomK (L, E) with (lϕ)(l ) = ϕ(ll ). Verify that (E) is a vector space over L and that is a functor. 11. A linear map L : E → F between ﬁnite-dimensional complex vector spaces becomes a linear map L R : E R → FR when we restrict attention to real scalars. Explain how to express a matrix for L R in terms of a matrix for L. 12. (Kronecker product of matrices) Let L : E 1 → E 2 and M : F1 → F2 be linear maps between ﬁnite-dimensional vector spaces over K, let 1 and 2 be ordered bases of E 1 and E 2 , and let 1and 2 beordered bases of F1 and F2 . L M Deﬁne matrices A and B by A = and B = . Use 1 , 2 , 1 , and 2 1 2 1 2 to deﬁne ordered bases 1 and 2 of E 1 ⊗K F1 and E 2 ⊗K F2 , and describe L⊗M how the matrix C = is related to A and B. 2

1

13. Let K be a ﬁeld, and let E be the vector space KX ⊕KY . Prove that the subalgebra of T (E) generated by 1, Y , and X 2 + X Y + Y 2 is isomorphic as an algebra with identity to T (F) for some vector space F. 4 Problems 14–17 concern the functors E → T (E), E → S(E), and E → E deﬁned for vector spaces over a ﬁeld K. 14. If ϕ : E → F is a linear map between vector spaces over K, Section 8 of the text indicated how to deﬁne a corresponding homomorphism : S(E) → S(F) of associative algebras with identity over K, using Proposition 6.23b. (a) Fill in the details of this application of Proposition 6.23b. (b) Establish the appropriate conditions on mappings that complete the proof that E → S(E) is a functor. (c) Verify that carries S n (E) linearly into S n (F) for all integers n ≥ 0. 15. Suppose that a linear map ϕ : E → E is given. Let : S(E) → S(E) and 3 : T (E) → T (E) be the associated algebra homomorphisms of S(E) into itself and of T (E) into itself, and let q : T (E) → S(E) be the quotient homomorphism appearing in the deﬁnition of S(E). These mappings are related by the equation 3 q(x) = q (x) for x in T (E). Proposition 6.29 shows for each n ≥ 0 that

10. Problems

295

T n (E) = 3 S n (E) ⊕ (T n (E) ∩ I ), where 3 S n (E) is the image of T n (E) under the symmetrizer mapping. The remark with the proposition observes that q carries 3 3 carries 3 3 n S n (E) into itself and that S n (E) one-one onto S n (E). Prove that 3 S (E) 3 matches S n (E) in the sense that q (x) = q(x) for all x in 3 S n (E). 16. With 4 E ﬁnite-dimensional let ϕ : E → E be a linear mapping, and deﬁne 4 4 : E → E to be the corresponding algebra homomorphism of E 4n sending 1 into 1. This carries each E into itself. Prove that acts as 4 multiplication by the scalar det ϕ on the 1-dimensional space dim E (E). 17. Suppose that G is a group, that the vector space E over K is ﬁnite-dimensional, and that ϕ : G → GL(E)4 is a representation of G on E. The functors E → T (E), E → S(E), and E → E yield, for4each ϕ(g), algebra homomorphisms of T (E) into itself, S(E) into itself, and E into itself. (a) Show that as g varies, the result in each case is a representation of G. (b) Suppose that E = Kn . Give a formula for the representation of G on a member of P(Kn ) = S((Kn ) ). Problems 18–22 concern universal mapping properties. Let A and V be two categories, and let F : A → V be a covariant functor. (In practice, F tends to be a relatively simple functor, such as one that simply ignores some of the structure of A.) Let E be in Obj(V ). A pair (S, ι) with S in Obj(A) and ι in MorphV (E, F(S)) is said to have the universal mapping property relative to E and F if the following condition is satisﬁed: whenever A is in Obj(A) and a member l of MorphV (E, F(A)) is given, there exists a unique member L of MorphA (S, A) such that F(L) ι = l. 18. (a) By suitably specializing A, V, F, etc., show that the universal mapping property of the symmetric algebra of a vector space over K is an instance of what has been described. (b) How should the answer to (a) be adjusted so as to account for the universal mapping property of the exterior algebra of a vector space over K? (c) How should the answer to (a) be adjusted so as to account for the universal mapping property of the coproduct of {X j } j∈J in a category C, the universal mapping property being as in Figure 4.12? (Educational note: For the product of {X j } j∈J in C, the above description does not apply directly because the morphisms go the wrong way. Instead, one applies the above description to the opposite categories Aopp and V opp , deﬁned as in Problems 78–80 at the end of Chapter IV.) 19. If (S, ι) and (S , ι ) are two pairs that each have the universal mapping property relative to E and F, prove that S and S are canonically isomorphic as objects in A. More speciﬁcally prove that there exists a unique L in MorphA (S, S ) such that F(L)ι = ι and that L is an isomorphism whose inverse L in MorphA (S , S) has F(L )ι = ι.

VI. Multilinear Algebra

296

20. Suppose that the pair (S, ι) has the universal mapping property relative to E and F. Let S be the category of sets, and deﬁne functors F : A → S and G : A → S by F(A) = MorphA (S, A), F(ϕ) equals composition on the left by ϕ for ϕ ∈ MorphA (A, A ), G(A) = MorphV (E, F(A)), and G(ϕ) equals composition on the left by F(ϕ). Let T A : MorphA (S, A) → MorphV (E, F(A)) be the one-one onto map given by the universal mapping property. Show that the system {T A } is a natural isomorphism of F into G. 21. Suppose that (S , ι) is a second pair having the universal mapping property relative to E and F. Deﬁne F : A → S by F (A) = MorphA (S , A). Combining the previous problem and Proposition 6.16, obtain a second proof (besides the one in Problem 19) that S and S are canonically isomorphic. 22. Suppose that for each E in Obj(V ), there is some pair (S, ι) with the universal mapping property relative to E and F. Fix such a pair (S, ι) for each E, calling it (S(E), ι E ). Making an appropriate construction for morphisms and carrying out the appropriate veriﬁcations, prove that E → S(E) is a functor. Problems 23–28 introduce the Pfafﬁan of a (2n)-by-(2n) alternating matrix X = [xi j ] with entries in a ﬁeld K. This is the polynomial in the entries of X with integer coefﬁcients given by Pfaff(X ) =

some τ ’s in S2n

(sgn τ )

n

xτ (2k−1),τ (2k) ,

k=1

where the sum is taken over those permutations τ such that τ (2k − 1) < τ (2k) for 1 ≤ k ≤ n and such that τ (1) < τ (3) < · · · < τ (2n − 1). It will be seen that det X is the square of this polynomial. Examples of Pfafﬁans are ⎞ ⎛ 0 a b c −a 0 d e 0 x Pfaff −x 0 = x and Pfaff ⎝ −b −d 0 f ⎠ = a f − be + cd. −c −e − f

0

The problems in this set will be continued at the end of Chapter VIII. 23. For the matrix J in Section 5, show that Pfaff(J ) = 1. 2n 24. In the expansion det X = σ ∈S2n (sgn σ ) l= 1 xl,σ (l) , prove that the value of the right side with X as above is not changed if the sum is extended only over those σ ’s whose expansion in terms of disjoint cycles involves only cycles of even length (and in particular no cycles of length 1). 25. Deﬁne σ ∈ S2n to be “good” if its expansion in terms of disjoint cycles involves only cycles of even length. If σ is good, show that there uniquely exist two disjoint subsets A and B of n elements each in {1, . . . , 2n} such that A contains the smallest-numbered index in each cycle and such that σ maps each set onto the other.

10. Problems

297

26. In the notation of the previous problem with σ good, let y(σ ) be the product of the monomials xab such that a is in A and b = σ (a). For each factor xi j of y(σ ) with i > j, replace the factor by −x ji . In the resulting product, arrange the factors in order so that their ﬁrst subscripts are increasing, and denote this expression by sxi1 i2 xi3 i4 · · · xi2n−1 i2n , where s is a sign. Let τ be the permutation that carries each r to ir , and deﬁne s(τ ) to be the sign s. Similarly let z(σ ) be the product of the monomials xba such that b is in B and a = σ (b). For each factor xi j of z(σ ) with i > j, replace the factor by −x ji . In the resulting product, arrange the factors in order so that their ﬁrst subscripts are increasing, and denote this expression by s x j1 j2 x j3 j4 · · · x j2n−1 j2n , where s is a sign. Let τ be the permutation that carries each r to jr , and deﬁne s (τ ) to be the sign s . Prove, apart from signs, that the σ th term in the expansion of det X matches the product of the τ th term of Pfaff(X ) and the τ th term of Pfaff(X ). 27. In the previous problem, take the signs s(τ ) and s (τ ) into account and show that the signs of σ , τ , and τ work out so that the σ th term in the expansion of det X is the product of the τ th and τ th terms of Pfaff(X ). 28. Show that every term of the product of Pfaff(X ) with itself is accounted for once and only once by the construction in the previous three problems, and conclude that the alternating matrix X has det X = (Pfaff(X ))2 . Problems 29–30 concern ﬁltrations and gradings. A vector space V over K is said to be ﬁltered when an increasing sequence of subspaces V0 ⊆ V1 ⊆ V2 ⊆ · · · is speciﬁed with union V . In this case we put V−1 = 0 by convention. The space V is graded if a sequence of subspaces V 0 , V 1 , V 2 , . . . is speciﬁed such that V =

∞

V n.

n=0

When V is graded, there is a natural ﬁltration of V given by Vn = nk=0 V k . Examples of graded vector4 spaces are any tensor algebra V = T (E), symmetric algebra S(E), exterior algebra (E), and polynomial algebra P(E), the n th subspace of the grading consisting of those elements that are homogeneous of degree n. Any polynomial algebra K[X 1 , . . . , X n ] is another example of a graded vector space, the grading being by total degree. 29. When V is a ﬁltered vector space as in (A.34), the associated graded vector # space is gr V = ∞ n=0 Vn /Vn−1 . Let V and V be two ﬁltered vector spaces, and let ϕ be a linear map between them such that ϕ(Vn ) ⊆ Vn# for all n. Since # , this restriction induces a linear the restriction of ϕ to Vn carries Vn−1 into Vn− 1 # n # map gr ϕ : (Vn /Vn−1 ) → (Vn /Vn−1 ). The direct sum of these linear maps is then a linear map gr ϕ : gr V → gr V # called the associated graded map for ϕ. Prove that if gr ϕ is a vector-space isomorphism, then ϕ is a vector-space isomorphism.

298

VI. Multilinear Algebra

30. Let A be an associative algebra over K with identity. If A has a ﬁltration A0 , A1 , . . . of vector subspaces with 1 ∈ A0 such that Am An ⊆ Am+n for all m and n, then one says algebra; similarly ∞that n A is a ﬁltered associative m n if A is graded as A = A in such a way that A A ⊆ Am+n for all m n=0 and n, then one says that A is a graded associative algebra. If A is a ﬁltered associative algebra with identity, prove that the graded vector space gr A acquires a multiplication in a natural way, making it into a graded associative algebra with identity. Problems 31–35 concern Lie algebras and their universal enveloping algebras. If K is a ﬁeld, a Lie algebra g over K is a nonassociative algebra whose product, called the Lie bracket and written [x, y], is alternating as a function of the pair (x, y) and satisﬁes the Jacobi identity [x, [y, z]] + [y, [z, x]] + [z, [x, y]] = 0 for all x, y, z in g. The universal enveloping algebra U (g) of g is the quotient T (g)/I , where I is the two-sided ideal generated by all elements x ⊗ y − y ⊗ x − [x, y] with x and y in T 1 (g). The grading for T (g) makes U (g) into a ﬁltered associate algebra with identity. The product of x and y in U (g) is written x y. 31. If A is an associative algebra over K, prove that A becomes a Lie algebra if the Lie bracket is deﬁned by [x, y] = x y − yx. In particular, observe that Mn (K) becomes a Lie algebra in this way. 32. Fix a matrix A ∈ Mn (K), and let g be the vector subspace of all members x of Mn (K) with x t A + Ax = 0. (a) Prove that g is closed under the bracket operation of the previous problem and is therefore a Lie subalgebra of Mn (K). (b) Deduce as a special case of (a) that the vector space of all skew-symmetric matrices in Mn (K) is a Lie subalgebra of Mn (K). 33. Let g be a Lie algebra over K, and let ι be the linear map obtained as the composition of g → T 1 (g) and the passage to the quotient U (g). Prove that (U (g), ι) has the following universal mapping property: whenever l is any linear map of g into an associative algebra A with identity satisfying the condition of being a Lie algebra homomorphism, namely l[x, y] = l(x)l(y) − l(y)l(x) for all x and y in g, then there exists a unique associative algebra homomorphism L : U (g) → A with L(1) = 1 such that L ◦ ι = l. 34. Let g be a Lie algebra over K, let {u i }i∈A be a vector-space basis of g, and suppose that a simple ordering has been imposed on the index set A. Prove that the set of j j all monomials u i11 · · · u ikk with i 1 < · · · < i k and m jm arbitrary is a spanning set for U (g). 35. For a Lie algebra g over K, the Poincar´e–Birkhoff–Witt Theorem says that the spanning set for U (g) in the previous problem is actually a basis. Assuming this theorem, prove that gr U (g) is isomorphic as a graded algebra to S(g). Problems 36–40 introduce Clifford algebras. Let K be a ﬁeld of characteristic = 2,

10. Problems

299

let E be a ﬁnite-dimensional vector space over K, and let · , · be a symmetric bilinear form on E. The Clifford algebra Cliff(E, · , · ) is the quotient T (E)/I , where I is the two-sided ideal generated by all elements5 v ⊗ v + v, v with v in E. The grading for T (E) makes Cliff(E, · , · ) into a ﬁltered associative algebra with identity. Products in Cliff(E, · , · ) are written as ab with no special symbol. 36. Let ι be the composition of the inclusion E ⊆ T 1 (E) and the passage to the quotient modulo I . Prove that (Cliff(E, · , · ), ι) has the following universal mapping property: whenever l is any linear map of E into an associative algebra A with identity such that l(v)2 = −v, v1 for all v ∈ E, then there exists a unique algebra homomorphism L : Cliff(E, · , · ) → A with L(1) = 1 and such that L ◦ ι = l. 37. Let {u 1 , . . . , u n } be a basis of E. Prove that the 2n elements of Cliff(E, · , · ) given by u i1 u i2 · · · u ik with i 1 < · · · < i k form a spanning set of Cliff(E, · , · ). 38. Using the Principal Axis Theorem, ﬁx a basis {e1 , . . . , en } of E such that ei , e j = di δi j for all j. Introduce an algebra C over K of dimension 2n with generators e1 , . . . , en and with a basis parametrized by subsets of {1, . . . , n} and given by all elements ei1 ei2 · · · eik with i1 < i2 < · · · < ik , with the multiplication that is implicit in the rules ei2 = −di and ei e j = −e j ei if i = j, namely, to multiply two monomials ei1 ei2 · · · eik and e j1 e j2 · · · e jl , put them end to end, replace any occurrence of two ek ’s by the scalar −dk , and then permute the remaining ek ’s until their indices are in increasing order, introducing a minus sign each time two distinct ek ’s are interchanged. Prove that the algebra C is associative. 39. Prove that the associative algebra C of the previous problem is isomorphic as an algebra to Cliff(E, · , · ). 4 40. Prove that gr Cliff(E, · , · ) is isomorphic as a graded algebra to (E). Problems 41–48 introduce ﬁnite-dimensional Heisenberg Lie algebras and the corresponding Weyl algebras. They make use of Problems 31–35 concerning Lie algebras and universal enveloping algebras. Let V be a ﬁnite-dimensional vector space over the ﬁeld K, and let · , · be a nondegenerate alternating bilinear form on V × V . Write 2n for the dimension of V . Introduce an indeterminate X 0 . The Heisenberg Lie algebra H (V ) on V is a Lie algebra whose underlying vector space is KX 0 ⊕ V and whose Lie bracket is given by [(cX 0 , u), (d X 0 , v)] = u, vX 0 . Let U (H (V )) be its universal enveloping algebra. The Weyl algebra W (V ) on V is the quotient of the tensor algebra T (V ) by the two-sided ideal generated by all u ⊗ v − v ⊗ u − u, v1 with u and v in V ; as such, it is a ﬁltered associative algebra. authors factor out the elements v ⊗ v − v, v instead. There is no generally accepted convention.

5 Some

VI. Multilinear Algebra

300

41. Verify when the ﬁeld is K = R that an example of a 2n-dimensional V with its nondegenerate alternating bilinear form · , · is V = Cn with u, v = Im(u, v), where ( · , · ) is the usual inner product on Cn . For this V , exhibit a Lie-algebra isomorphism of H (V ) with the Lie algebra of all complex (n + 1)-by-(n + 1) matrices of the form

0 z¯ t ir 0 0 z 0 0 0

with z ∈ Cn and r ∈ R.

42. In the general situation show that the linear map ι(cX 0 , v) = c1+v is a Lie algebra homomorphism of H (V ) into W (V ) and that its extension to an associative algebra homomorphism3 ι : U (H (V )) → W (V ) is onto and has kernel equal to the two-sided ideal in U (H (V )) generated by X 0 − 1. 43. Prove that W (V ) has the following universal mapping property: whenever ϕ : H (V ) → A is a Lie algebra homomorphism of H (V ) into an associative algebra A with identity such that ϕ(X 0 ) = 1, then there exists a unique associative algebra homomorphism 3 ϕ of W (V ) into A such that ϕ = 3 ϕ ◦ ι. 44. Let v1 , . . . , v2n be any vector space basis of V . Prove that the elements v1k1 · · · v2k2nn with integer exponents ≥ 0 span W (V ). 45. For K = R, let S be the vector space of all real-valued functions P(x)e−π|x| , where P(x) is a polynomial in n real variables. Show that S is mapped into itself by the linear operators ∂/∂ xi and m j = (multiplication by x j ). 2

46. With K = R, let { p1 , . . . , pn , q1 , . . . , qn } be a Weyl basis of V in the terminology of Problem 6. In the notation of Problem 45, let ϕ : V → HomR (S, S ) be the linear map given by ϕ( pi ) = ∂/∂ xi and ϕ(q j ) = m j . Use Problem 43 to extend ϕ to an algebra homomorphism 3 ϕ : W (V ) → HomR (S, S ) with 3 ϕ (1) = 1, and use Problem 42 to obtain a representation of H (V ) on S. Prove that this representation of H (V ) is irreducible in the sense that there is no proper nonzero vector subspace carried to itself by all members of 3 ϕ (H (V )). 47. In Problem 46 with K = R, prove that the associative algebra homomorphism 3 ϕ : W (V ) → HomR (S, S ) is one-one. Conclude for K = R that the elements v1k1 · · · v2k2nn of Problem 44 form a vector-space basis of W (V ). 48. For K = R, prove that gr W (V ) is isomorphic as a graded algebra to S(V ). Problems 49–51 deal with Jordan algebras. Let K be a ﬁeld of characteristic = 2. An algebra J over K with multiplication a · b is called a Jordan algebra if the identities a · b = b · a and a 2 · (b · a) = (a 2 · b) · a are always satisﬁed; here a 2 is an abbreviation for a · a. 49. Let A be an associative algebra, and deﬁne a · b = 12 (ab + ba). Prove that A becomes a Jordan algebra under this new multiplication.

10. Problems

301

50. In the situation of the previous problem, suppose that a → a t is a one-one linear mapping of A onto itself such that (ab)t = bt a t for all a and b. (For example, a → a t could be the transpose mapping if A = Mn (K).) Prove that the vector subspace of all a with a t = a is carried to itself by the Jordan product a · b and hence is a Jordan algebra. 51. Let V be a ﬁnite-dimensional vector space over K, and let · , · be a symmetric bilinear form on V . Deﬁne A = K1 ⊕ V as a vector space, and deﬁne a multiplication in A by (c1, x) · (d1, y) = (cd + x, y)1, cy + d x . Prove that A is a Jordan algebra under this deﬁnition of multiplication. Problems 52–56 deal with the algebra O of real octonions, sometimes known as the Cayley numbers. This is a certain 8-dimensional nonassociative algebra with identity over R with an inner product such that ab = ab for all a and b and such that the left and right multiplications by any element a = 0 are always invertible. 52. Let A be an algebra over R. Let [a, b] = ab − ba and [a, b, c] = (ab)c − a(bc). (a) The 3-multilinear function (a, b, c) → [a, b, c] from A× A× A to A is called the associator in A. Observe that it is 0 if and only if A is associative. Show that it is alternating if and only if A always satisﬁes the limited associativity laws (aa)b = a(ab),

(ab)a = a(ba),

(ba)a = b(aa).

In this case, A is said to be alternative. (b) Show that A is alternative if the ﬁrst and third of the limited associativity laws in (a) are always satisﬁed. 53. (Cayley–Dickson construction) Suppose that A is an algebra over R with a two-sided identity 1, and suppose that there is an R linear function ∗ from A to itself (called “conjugation”) such that 1∗ = 1, a ∗∗ = a, and (ab)∗ = b∗ a ∗ for all a and b in A. Deﬁne an algebra B over R to have the underlying real vector-space structure of A ⊕ A and to have multiplication and conjugation given by (a, b)(c, d) = (ac − db∗ , a ∗ d + cb)

and

(a, b)∗ = (a ∗ , −b).

(a) Prove that (1, 0) is a two-sided identity in B and that the operation ∗ in B satisﬁes the required properties of a conjugation. (b) Prove that if a ∗ = a for all a ∈ A, then A is commutative. (c) Prove that if a ∗ = a for all a ∈ A, then B is commutative. (d) Prove that if A is commutative and associative, then B is associative. (e) Verify the following outcomes of the above construction A → B:

(i) A = R yields B = C, (ii) A = C yields B = H, the algebra of quaternions.

302

VI. Multilinear Algebra

54. Suppose that A is an algebra over R with an identity and a conjugation as in the previous problem. Say that A is nicely normed if

(i) a + a ∗ is always of the form r 1 with r real and (ii) aa ∗ always equals a ∗ a and for a = 0, is of the form r 1 with r real and positive. (a) Prove that if A is nicely normed, then so is the algebra B of the previous problem. (b) Prove that if A is nicely normed, then (a, b) = 12 (ab∗ + ba ∗ ) is an inner product on A with norm a = (aa ∗ )1/2 = (a ∗ a)1/2 . (c) Prove that if A is associative and nicely normed, then the algebra B of the previous problem is alternative. 55. Starting from the real algebra A = H, apply the construction of Problem 53, and let the resulting 8-dimensional real algebra be denoted by O, the algebra of octonions. (a) Prove that O is an alternative algebra and is nicely normed. (b) Prove that (x x ∗ )y = x(x ∗ y) and x(yy ∗ ) = (x y)y ∗ within O. (c) Prove that ab2 a = a2 b2 a within O. (d) Conclude from (c) that the operations of left and right multiplication by any a = 0 within O are invertible. (e) Show that the inverse operators are left and right multiplication by a−2 a ∗ . (f) Denote the usual basis vectors of H by 1, i, j, k. Write down a multiplication table for the eight basis vectors of O given by (x, 0) and (0, y) as x and y run through the basis vectors of H. 56. What prevents the construction of Problem 53, when applied with A = O, from yielding a 16-dimensional algebra B in which ab2 = a2 b2 and therefore in which the operations of left and right multiplication by any a = 0 within B are invertible?

CHAPTER VII Advanced Group Theory

Abstract. This chapter continues the development of group theory begun in Chapter IV, the main topics being the use of generators and relations, representation theory for ﬁnite groups, and group extensions. Representation theory uses linear algebra and inner-product spaces in an essential way, and a structure-theory theorem for ﬁnite groups is obtained as a consequence. Group extensions introduce the subject of cohomology of groups. Sections 1–3 concern generators and relations. The context for generators and relations is that of a free group on the set of generators, and the relations indicate passage to a quotient of this free group by a normal subgroup. Section 1 constructs free groups in terms of words built from an alphabet and shows that free groups are characterized by a certain universal mapping property. This universal mapping property implies that any group may be deﬁned by generators and relations. Computations with free groups are aided by the fact that two reduced words yield the same element of a free group if and only if the reduced words are identical. Section 2 obtains the Nielsen–Schreier Theorem that subgroups of free groups are free. Section 3 enlarges the construction of free groups to the notion of the free product of an arbitrary set of groups. Free product is what coproduct is for the category of groups; free groups themselves may be regarded as free products of copies of the integers. Sections 4–5 introduce representation theory for ﬁnite groups and give an example of an important application whose statement lies outside representation theory. Section 4 contains various results giving an analysis of the space C(G, C) of all complex-valued functions on a ﬁnite group G. In this analysis those functions that are constant on conjugacy classes are shown to be linear combinations of the characters of the irreducible representations. Section 5 proves Burnside’s Theorem as an application of this theory—that any ﬁnite group of order pa q b with p and q prime and with a +b > 1 has a nontrivial normal subgroup. Section 6 introduces cohomology of groups in connection with group extensions. If N is to be a normal subgroup of G and Q is to be isomorphic to G/N , the ﬁrst question is to parametrize the possibilities for G up to isomorphism. A second question is to parametrize the possibilities for G if G is to be a semidirect product of N and Q.

1. Free Groups This section and the next two introduce some group-theoretic notions that in principle apply to all groups but in practice are used with countable groups, often countably inﬁnite groups that are nonabelian. The material is especially useful in applications in topology, particularly in connection with fundamental groups and covering spaces. But the formal development here will be completely algebraic, not making use of any deﬁnitions or theorems from topology. 303

304

VII. Advanced Group Theory

In the case of abelian groups, every abelian group G is a quotient of a suitable free abelian group, i.e., a suitable direct sum of copies of the additive group Z of integers.1 Recall the discussion of Section IV.9: We introduce a copy Zg of 3 3 = Z for each g in G, deﬁne G g∈G Zg , let i g : Zg → G be the standard embedding, and let ϕg : Zg → G be the group homomorphism written additively as ϕg (n) = ng. The universal mapping property of direct sums that was stated 3 → G such as Proposition 4.17 produces a unique group homomorphism ϕ : G that ϕ ◦ i g = ϕg for all g, and ϕ is the required homomorphism of a free abelian group onto G. The goal in this section is to carry out an analogous construction for groups that are not necessarily abelian. The constructed groups, to be called “free groups,” are to be rather concrete, and the family of all of them is to have the property that every group is the quotient of some member of the family. If S is any set, we construct a “free group F(S) on the set S.” Let us speak of S as a set of “symbols” or as the members of an “alphabet,” possibly inﬁnite, with which we are working. If S is empty, the group F(S) is taken to be the one-element trivial group, and we shall therefore now assume that S is not empty. If a is a symbol in S, we introduce a new symbol a −1 corresponding to it, and we let S −1 denote the set of all such symbols a −1 for a ∈ S. Deﬁne S = S ∪ S −1 . A word is a ﬁnite string of symbols from S , i.e., an ordered n-tuple for some n of members of S with repetitions allowed. Words that are n-tuples are said to have length n. The empty word, with length 0, will be denoted by 1. Other words are usually written with the symbols juxtaposed and all commas omitted, as in abca −1 cb−1 . The set of words will be denoted by W (S ). We introduce a multiplication W (S )× W (S ) → W (S ) by writing end-to-end the words that are to be multiplied: (abca −1 , cb−1 ) → abca −1 cb−1 . The length of a product is the sum of the lengths of the factors. It is plain that this multiplication is associative and that 1 is a two-sided identity. It is not a group operation, however, since most elements of W (S ) do not have inverses: multiplication never decreases length, and thus the only way that 1 can be a product of two elements is as the product 11. To obtain a group from W (S ), we shall introduce an equivalence relation in W (S ). Two words are said to be equivalent if one of the words can be obtained from the other by a ﬁnite succession of insertions and deletions of expressions aa −1 or a −1 a within the word; here a is assumed to be an element of S. It will be convenient to refer to the pairs aa −1 and a −1 a together; therefore when b = a −1 is in S −1 , let us deﬁne b−1 = (a −1 )−1 to be a. Then two words are equivalent if one of the words can be obtained from the other by a ﬁnite succession of insertions and deletions of expressions of the form bb−1 with b in S . This deﬁnition is 1 Direct sum here is what coproduct, in the sense of Section IV.11, amounts to in the category of all abelian groups.

1. Free Groups

305

arranged so that “equivalent” is an equivalence relation. We write x ∼ y if x and y are words that are equivalent. The underlying set for the free group F(S) will be taken to be the set of equivalence classes of members of W (S ). Theorem 7.1. If S is a set and W (S ) is the corresponding set of words built from S = S ∪ S −1 , then the product operation deﬁned on W (S ) descends in a well-deﬁned fashion to the set F(S) of equivalence classes of members of W (S ), and F(S) thereby becomes a group. Deﬁne ι : S → F(S) to be the composition of the inclusion into words of length one followed by passage to equivalence classes. Then the pair (F(S), ι) has the following universal mapping property: whenever G is a group and ϕ : S → G is a function, then there exists a unique group homomorphism 3 ϕ : F(S) → G such that ϕ = 3 ϕ ◦ ι. REMARK. The group F(S) is called the free group on S. Figure 7.1 illustrates its universal mapping property. The brief form in words of the property is that any function from S into a group G extends uniquely to a group homomorphism of F(S) into G. This universal mapping property actually characterizes F(S), as will be seen in Proposition 7.2. S ⏐ ⏐ ι

ϕ

−−−→ G ϕ

F(S) FIGURE 7.1. Universal mapping property of a free group. PROOF. Let us denote equivalence classes by brackets. We want to deﬁne multiplication in F(S) by [w1 ][w2 ] = [w1 w2 ]. To see that this formula makes sense in F(S), let x1 , x2 , and y be words, and let b be in S . Deﬁne x = x1 x2 and x = x1 bb−1 x2 , so that x ∼ x. Then it is evident that x y ∼ x y and yx ∼ yx. Iteration of this kind of relationship shows that w1 ∼ w1 and w2 ∼ w2 implies w1 w2 ∼ w1 w2 , and hence multiplication of equivalence classes is well deﬁned. Since multiplication in W (S ) is associative, we have [w1 ]([w2 ][w3 ]) = [w1 ][w2 w3 ] = [w1 (w2 w3 )] = [(w1 w2 )w3 ] = [w1 w2 ][w3 ] = ([w1 ][w2 ])[w3 ]. Thus multiplication is associative in F(S). The class [1] of the empty word 1 is a two-sided identity. If b1 , . . . , bn are in S , then bn−1 · · · b2−1 b1−1 b1 b2 · · · bn is equivalent to 1, and so is b1 b2 · · · bn bn−1 · · · b2−1 b1−1 . Consequently [bn−1 · · · b2−1 b1−1 ] is a two-sided inverse of [b1 b2 · · · bn ], and F(S) is a group. Now we address the universal mapping property, ﬁrst proving the stated uniqueness of the homomorphism. Every member of F(S) is the product of classes [b] with b in S . In turn, if b is of the form a −1 with a in S, then [b] = [a]−1 . Hence F(S) is generated by all classes [a] with a in S, i.e., by ι(S). Any homomorphism

VII. Advanced Group Theory

306

of a group is determined by its values on the members of a generating set, and uniqueness therefore follows from the formula 3 ϕ ([a]) = 3 ϕ (ι(a)) = ϕ(a). For existence we begin by deﬁning a function : W (S ) → G such that (a) = ϕ(a) (a

−1

) = ϕ(a)

for a in S, −1

for a −1 in S −1 ,

(w1 w2 ) = (w1 )(w2 ) for w1 and w2 in W (S ). We use the formulas (a) = ϕ(a) for a in S and (a −1 ) = ϕ(a)−1 for a −1 in S −1 as a deﬁnition of (b) for b in S . Any member of W (S ) can be written uniquely as b1 · · · bn with each b j in S , and we set (b1 · · · bn ) = (b1 ) · · · (bn ). (If n = 0, the understanding is that (1) = 1.) Then has the required properties. Let us show that w ∼ w implies (w ) = (w). If b1 , . . . , bn are in S and b is in S , then the question is whether (b1 · · · bk bb−1 bk+1 · · · bn ) = (b1 · · · bk bk+1 · · · bn ). ?

If g and g denote the elements (b1 ) · · · (bk ) and (bk+1 ) · · · (bn ) of G, then the two sides of the queried formula are g(b)(b−1 )g

and

gg .

Thus the question is whether (b)(b−1 ) always equals 1 in G. If b = a is in S, this equals ϕ(a)ϕ(a)−1 = 1, while if b = a −1 is in S −1 , it equals ϕ(a)−1 ϕ(a) = 1. We conclude that w ∼ w implies (w ) = (w). We may therefore deﬁne 3 ϕ ([w]) = (w) for [w] in F(S). Since 3 ϕ ([w][w ]) = ϕ ([w])3 ϕ ([w ]), 3 ϕ is a homomorphism 3 ϕ ([ww ]) = (ww ) = (w)(w ) = 3 of F(S) into G. For a in S, we have 3 ϕ ([a]) = (a) = ϕ(a). In other words, 3 ϕ (ι(a)) = ϕ(a). This completes the proof of existence. Proposition 7.2. Let S be a set, F be a group, and ι : S → F be a function. Suppose that the pair (F, ι ) has the following universal mapping property: whenever G is a group and ϕ : S → G is a function, then there exists a unique group homomorphism 3 ϕ : F → G such that ϕ = 3 ϕ ◦ ι . Then there exists a unique group homomorphism : F(S) → F such that ι = ◦ ι, and it is a group isomorphism. REMARKS. Chapter VI is not a prerequisite for the present chapter. However, readers who have been through Chapter VI will recognize that Proposition 7.2 is a special case of Problem 19 at the end of that chapter.

1. Free Groups

307

PROOF. We apply the universal mapping property of (F(S), ι), as stated in Theorem 7.1, to the group G = F and the function ϕ = ι , obtaining a group homomorphism : F(S) → F such that ι = ◦ ι. Then we apply the given universal mapping property of (F, ι ) to the group G = F(S) and the function ϕ = ι, obtaining a group homomorphism : F → F(S) such that ι = ◦ ι . The group homomorphism ◦ : F(S) → F(S) has the property that ( ◦)◦ι = ◦(◦ι) = ◦ι = ι, and the identity 1 F(S) has this same property. By the uniqueness of the group homomorphism in Theorem 7.1, ◦ = 1 F(S) . Similarly the group homomorphism ◦ : F → F has the property that ( ◦ ) ◦ ι = ι , and the identity 1 F has this same property. By the uniqueness of the group homomorphism in the assumed universal mapping property of F, ◦ = 1F . Therefore is a group isomorphism. We know that ι(S) generates F(S). If : F(S) → F is another group isomorphism with ι = ◦ ι, then and agree on ι(S) and therefore have to agree everywhere. Hence is unique. Proposition 7.2 raises the question of recognizing candidates for the set T = ι (S) in a given group F so as to be in a position to exhibit F as isomorphic to the free group F(S). Certainly T has to generate F. But there is also an independence condition. The idea is that if we form words from the members of T , then two words are to lead to equal members of F only if they can be transformed into one another by the same rules that are allowed with free groups. What this problem amounts to in the case that F = F(S) is that we want a decision procedure for telling whether two given words are equivalent. This is the so-called word problem for the free group. If we think about the matter for a moment, not much is instantly obvious. If a1 and a2 are two members of S and if they are considered as words of length 1, are they equivalent? Equivalence allows for inserting pairs bb−1 with b in S , as well as deleting them. Might it be possible to do some complicated iterated insertion and deletion of pairs to transform a1 into a2 ? Although the negative answer can be readily justiﬁed in this situation by a parity argument, it can be justiﬁed even more easily by the universal mapping property: there exist groups G with more than one element; we can map a1 to one element of G and a2 to another element of G, extend to a homomorphism ϕ (ι(a2 )), and conclude that ι(a1 ) = ι(a2 ). 3 ϕ : F(S) → G, see that 3 ϕ (ι(a1 )) = 3 But what about the corresponding problem for two more-complicated words in a free group? Fortunately there is a decision procedure for the word problem in a free group. It involves the notion of “reduced” words. A word in W (S ) is said to be reduced if it contains no consecutive pair bb−1 with b in S .

Proposition 7.3 (solution of the word problem for free groups). Let S be a set, let S = S ∪ S −1 , and let W (S ) be the corresponding set of words. Then each word in W (S ) is equivalent to one and only one reduced word.

VII. Advanced Group Theory

308

REMARK. To test whether two words are equivalent, the proposition says to delete pairs bb−1 with b ∈ S as much as possible from each given word, and to check whether the resulting reduced words are identical. PROOF. Removal of a pair bb−1 with b ∈ S decreases the length of a word by 2, and the length has to remain ≥ 0. Thus the process of successively removing such pairs has to stop after ﬁnitely many steps, and the result is a reduced word. This proves that each equivalence class contains a reduced word. For uniqueness we shall associate to each word a ﬁnite sequence of reduced words such that the last member of the sequence is unchanged when we insert or delete within the given word any expression bb−1 with b ∈ S . Speciﬁcally if w = b1 · · · bn , with each bi in S , is a given word, we associate to w the sequence of words x0 , x1 , . . . , xn deﬁned inductively by x0 = 1, x 1 = b1 , xi−1 bi xi = yi−2

if i ≥ 2 and xi−1 does not end in bi−1 , if i ≥ 2 and xi−1 = yi−2 bi−1 ,

(∗)

and we deﬁne r (w) = xn . Let us see, by induction on i ≥ 0, that xi is reduced. The base cases i = 0 and i = 1 are clear from the deﬁnition. Suppose that i ≥ 2 and that x0 , . . . , xi−1 are reduced. If xi−1 = yi−2 bi−1 for some yi−2 , then xi−1 reduced forces yi−2 to be reduced, and hence xi = yi−2 is reduced. If xi−1 does not end in bi−1 , then the last two symbols of xi = xi−1 bi do not cancel, and no earlier pair can cancel since xi−1 is assumed reduced; hence xi is reduced. This completes the induction and shows that xi is reduced for 0 ≤ i ≤ n. If the word w = b1 · · · bn is reduced, then each xi for i ≥ 2 is determined by the ﬁrst of the two choices in (∗), and hence xi = b1 · · · bi for all i. Consequently r (w) = w if w is reduced. If we can prove for a general word b1 · · · bn that r (b1 · · · bn ) = r (b1 · · · bk bb−1 bk+1 · · · bn ),

(∗∗)

then it follows that every word w equivalent to a word w has r (w ) = r (w). Since r (w) = w for w reduced, there can be only one reduced word in an equivalence class. To prove (∗∗), let x0 , . . . , xn be the ﬁnite sequence associated with b1 · · · bn , be the sequence associated with b1 · · · bk bb−1 bk+1 · · · bn . and let x0 , . . . , xn+2 and xk+2 . From (∗) we see that Certainly xi = xi for i ≤ k. Let us compute xk+1 xk+1

=

xk b

if xk does not end in b−1 ,

y

if xk = yb−1 .

1. Free Groups

309

In the ﬁrst of these cases, xk+1 ends in b, and (∗) says therefore that xk+2 = xk . In the second of the cases, the fact that xk is reduced implies that y does not end = yb−1 = xk . In other words, xk+2 = xk in both in b; hence (∗) says that xk+2 cases. Since the inductive deﬁnition of any xi depends only on xi−1 , and similarly = xk+i for 0 ≤ i ≤ n − k. Therefore xn+2 = xn , and for xi , we see that xk+2+i (∗∗) follows. This proves the proposition.

Let us return to the problem of recognizing candidates for the set T = ι (S) in a given group F so that the subgroup generated by T is a free group. Using the universal mapping property for the free group F(T ), we form the group homomorphism of F(T ) into F that extends the identity mapping on T . We want this homomorphism to be one-one, i.e., to have the property that the only way a word in F built from the members of T can equal the identity is if it comes from the identity. Because of Proposition 7.3 the only reduced word in F(T ) that yields the identity is the empty word. Thus the condition that the homomorphism be one-one is that the only image in F of a reduced word in F(T ) that can equal the identity is the image of the empty word. Making this condition into a deﬁnition, we say that a subset S = {gt | t ∈ T } of F not containing 1 is free if no nonempty product h 1 h 2 · · · h m in which each h i or h i−1 is in S and each h i+1 is different from h i−1 can be the identity. A free set in F that generates F is called a free basis for F. EXAMPLE. Within the free group F({x, y}) on two generators x and y, consider the subgroup generated by u = x 2 , v = y 2 , and w = x y. The claim is that the subset {u, v, w} is free, so that the subgroup generated by u, v, and w is isomorphic to a free group F({u, v, w}) on three generators. We are to check that no nonempty reduced word in u, v, w, u −1 , v −1 , w −1 can reduce to the empty word after substitution in terms of x and y. We induct on the length of the u, v, w word, the base case being length 0. Suppose that v = y 2 occurs somewhere in our reduced u, v, w word that collapses to the empty word after substitution. Consider what is needed for the left-hand factor of y in the y 2 to cancel. The cancellation must result from the presence of some y −1 . Suppose that this y −1 occurs to the left of y 2 . Since passing to a reduced word need involve only deletions and not insertions of pairs, everything between y −1 and y 2 must cancel. If the y −1 has resulted from w−1 = y −1 x −1 , then the number of x, y symbols between y −1 and y 2 is odd, and an odd number of factors can never cancel. So the y −1 must arise from the right-hand y −1 in a factor v −1 = y −2 . The symbols between y −2 and y 2 come from some reduced u, v, w word, and induction shows that this word must be trivial. Then y −2 and y 2 are adjacent, contradiction. Thus the left factor of y 2 must cancel because of some y −1 on the right of y 2 . If the y −1 is part of w −1 = y −1 x −1 or is the left y −1 in v −1 = y −2 , then the number of x, y

310

VII. Advanced Group Theory

symbols between the left y and the y −1 is odd, and we cannot get cancellation. So the y −1 must be the right-hand y −1 in a factor y −2 . Then we have an expression y(y · · · y −1 )y −1 in which the symbols in parentheses cancel. The symbols · · · must cancel also; since these represent some reduced u, v, w word, induction shows that · · · is empty. We conclude that y 2 and y −2 are adjacent, contradiction. Thus our reduced u, v, w word contains no factor v. Similarly examination of the right-hand factor x in an occurrence of x 2 shows that our reduced u, v, w word contains no factor u. It must therefore be a product of factors w or a product of factors w −1 . Substitution of w = x y leads directly without any cancellation to an x, y reduced word, and we conclude that the u, v, w word is empty. Thus the subset {u, v, w} is free. If G is any group, the commutator subgroup G of G is the subgroup generated by all elements x yx −1 y −1 with x ∈ G and y ∈ G. Proposition 7.4. If G is a group, then the commutator subgroup is normal, and G/G is abelian. If ϕ : G → H is any homomorphism of G into an abelian group H , then ker ϕ ⊇ G . PROOF. The computation ax yx −1 y −1 a −1 = (axa −1 )(aya −1 )(axa −1 )−1 (aya −1 )−1 shows that G is normal. If ψ : G → G/G is the quotient homomorphism, then ψ(x)ψ(y) = x yG = x y(y −1 x −1 yx)G = yx G = ψ(y)ψ(x), and therefore G/G is abelian. Finally if ϕ : G → H is a homomorphism of G into an abelian group H , then the computation ϕ(x yx −1 y −1 ) = ϕ(x)ϕ(y)ϕ(x)−1 ϕ(y)−1 = ϕ(x)ϕ(x)−1 ϕ(y)ϕ(y)−1 = 1 shows that G ⊆ ker ϕ. Corollary 7.5. If F is the free group on a set S and if F is thecommutator subgroup of F, then F/F is isomorphic to the free abelian group s∈S Zs . PROOF. Let H = s∈S Zs , and let ϕ : S → H be the function with ϕ(s) = 1s , i.e., ϕ(s) is to be the member of H that is 1 in the s th coordinate and is 0 elsewhere. Application of the universal mapping property of F as given in Theorem 7.1 yields a group homomorphism 3 ϕ : F → H such that 3 ϕ ◦ ι = ϕ. Since the elements ϕ(s), with s in S, generate H , 3 ϕ carries F onto H . Since H is abelian, ϕ descends Proposition 7.4 shows that ker 3 ϕ ⊇ F . Proposition 4.11 shows that 3 ϕ0 has to be onto H . to a homomorphism 3 ϕ0 : F/F → H , and 3 To complete the proof, we show that 3 ϕ0 is one-one. Let x be a member of F. Since the products of the elements ι(s) and their inverses generate F and since j j F/F is abelian, we can write x F = si11 · · · sinn F , where si1 occurs a total of j1 times in x, . . . , and sin occurs a total of jn times in x; it is understood that

1. Free Groups

311

an occurrence of si−1 is to contribute −1 toward j1 . Then we have 3 ϕ0 (x F ) = 1 ϕ0 (x F ) = 0, we obtain j1 ϕ(si1 )+· · ·+ jn ϕ(sin ) = 0, j1 ϕ(si1 )+· · ·+ jn ϕ(sin ). If 3 and then j1 = · · · = jn = 0 since the elements ϕ(si1 ), . . . , ϕ(sin ) are members ϕ0 is one-one. of a Z basis of H . Hence x F = F , x is in F , and 3 Corollary 7.6. If F1 and F2 are isomorphic free groups on sets S1 and S2 , respectively, then S1 and S2 have the same cardinality. PROOF. Corollary 7.5 shows that an isomorphism of F1 with F2 induces an isomorphism of the free abelian groups s∈S1 Zs1 and s∈S2 Zs2 . The rank of a free abelian group is a well-deﬁned cardinal, and the result follows—almost. We did not completely prove this fact about the rank of a free abelian group in Section IV.9. Theorem 4.53 did prove, however, that rank is well deﬁned for ﬁnitely generated free abelian groups. Thus the corollary follows if S1 and S2 are ﬁnite. If S1 or S2 is uncountable, then the cardinality of the corresponding free abelian group matches the cardinality of its Z basis; hence the corollary follows if S1 or S2 is uncountable. The only remaining case to eliminate is that one of S1 and S2 , say the ﬁrst of them, has a countably inﬁnite Z basis and the other has ﬁnite rank n. The ﬁrst of the groups then has a linearly independent set of n + 1 elements, and Lemma 4.54 shows that the span of these elements cannot be isomorphic to a subgroup of a free abelian group of rank n. This completes the proof in all cases. Because of Corollary 7.6, it is meaningful to speak of the rank of a free group; it is the cardinality of any free basis. We shall see in the next section that any subgroup of a free group is free. In contrast to the abelian case, however, the rank may actually increase in passing from a free group to one of its subgroups: the example earlier in this section exhibited a free group of rank 3 as a subgroup of a free group of rank 2. We turn to a way of describing general groups, particularly groups that are at most countable. The method uses “generators,” which we already understand, and “relations,” which are deﬁned in terms of free groups. Let S be a set, let R be a subset of F(S), and let N (R) be the smallest normal subgroup of F(S) containing R. The group G = F(S)/N (R) is sometimes written as G = S; R or as G = elements of S; elements of R, with the elements of S and R listed rather than grouped as a set. Either of these expressions is called a presentation of G. The set S is a set of generators, and the set R is the corresponding set of relations. The following result implicit in the universal mapping property of Theorem 7.1 shows the scope of this deﬁnition.

312

VII. Advanced Group Theory

Proposition 7.7. Each group G is the homomorphic image of a free group. PROOF. Let S be a set of generators for G; for example, S can be taken to be G itself. Let ϕ : S → G be the inclusion of the set of generators into G, and let 3 ϕ : F(S) → G be the group homomorphism of Theorem 7.1 such that 3 ϕ (ι(s)) = ϕ(s) for all s in S. The image of 3 ϕ is a subgroup of G that contains the generating set S and is therefore equal to all of G. Thus 3 ϕ is the required homomorphism. If G is any group and 3 ϕ : F(S) → G is the homomorphism given in Proposition 7.7, then the subgroup R = ker 3 ϕ has the property that G ∼ = S; R. Consequently every group can be given by generators and relations. For example the proof of the proposition shows that one possibility is to take S = G and R equal to the set of all members of the multiplication table, but with the multiplication table entry ss = s rewritten as the left side ss (s )−1 of an equation ss (s )−1 = 1 specifying a combination of generators that maps to 1. This is of course not a very practical example. Generators and relations are most useful when S and R are fairly small. One says that G is ﬁnitely generated if S can be chosen to be ﬁnite, ﬁnitely presented if both S and R can be chosen to be ﬁnite. A frequently used device in working with generators and relations is the following simple proposition. Proposition 7.8. Let G = S; R be a group given by generators and relations, let G be a second group, let ϕ be a one-one function ϕ from S onto a set of generators for G , and let : F(S) → G be the extension of ϕ to a group homomorphism. If (r ) = 1 for every member r of R, then descends to a homomorphism of G onto G . In particular, if G = S; R and G = S; R are groups given by generators and relations with R ⊆ R , then the natural homomorphism of F(S) onto G descends to a homomorphism of G onto G . PROOF. The proposition follows immediately from the universal mapping property in Theorem 7.1 in combination with Proposition 4.11. Now let us consider some examples of groups given by generators and relations. The case of one generator is something we already understand: the group has to be cyclic. A presentation of Z is as a; , and a presentation of Cn is as a; a n . But other presentations are possible with one generator, such as a; a 6 , a 9 for C3 . Here is an example with two generators.

1. Free Groups

313

, EXAMPLE. Let us prove that Dn ∼ = x, y; x n , y 2 , (x y)2 , where Dn is the dihedral group of order 2n. Concretely let us work with Dn as the group of 2-by-2 cos 2π/n − sin 2π/n 1 0 real matrices generated by sin 2π/n cos 2π/n and 0 −1 . The generated group indeed has order 2n. If we identify cos 2π/n − sin 2π/n 1 0 x with sin 2π/n cos 2π/n and y with 0 −1 , then y 2 = 1, and the formula

cos 2π/n − sin 2π/n sin 2π/n cos 2π/n

k

=

cos 2π k/n − sin 2π k/n sin 2π k/n cos 2π k/n

, and the square of 3n = this is the identity. - By Proposition 7.8, Dn is a homomorphic image of D , x, y; x n , y 2 , (x y)2 . To complete the identiﬁcation, it is enough to show that the 3n onto Dn must then be 3n is ≤ 2n because the homomorphism of D order of D , n 2 2 −1 one-one. In x, y; x , y , (x y) , we compute that y = y and that x(yx)y = 1 implies yx = x −1 y −1 = x −1 y. Induction then yields yx k = x −k y for k > 0. Multiplying left and right by y gives yx −k = x k y for k > 0. So yx l = x −l y for every integer l. This means that every element is of the form x m or x m y, and we may take 0 ≤ m ≤ n − 1. Hence there are at most 2n elements. shows that x n = 1. In addition, x y =

cos 2π/n sin 2π/n sin 2π/n − cos 2π/n

Without trying to be too precise, let us mention that the word problem for ﬁnitely presented groups is to give an algorithm for deciding whether two words represent the same element of the group. It is known that there is no such algorithm applicable to all ﬁnitely presented groups. Of course, there can be such an algorithm for certain special classes of presentations. For example, if there are no relations in the presentation, then the group is a free group, and Proposition 7.3 gives a solution in this case. There tends to be a solution for a class of groups if the groups all correspond rather concretely to some geometric situation, such as a tiling of Euclidean space or some other space. The example above with Dn is of this kind. By way of a concrete class of examples, - one can identify any doubly generated , group of the form x, y; x a , y b , (x y)c if a, b, c are integers > 1, and one can describe what words represent what elements in these groups. These groups all correspond to tilings in 2 dimensions. In fact, let γ = a −1 + b−1 + c−1 . If γ > 1, the tiling is of the Riemann sphere, and the group is ﬁnite. If γ = 1, the tiling is of the Euclidean plane R2 , and the group is inﬁnite. If γ < 1, the tiling is of the hyperbolic plane, and the group is inﬁnite. In all cases one starts from a triangle in the appropriate geometry with angles π/a, π/b, and π/c, and a basic tile consists of the double of this triangle obtained by reﬂecting the triangle about any of its

314

VII. Advanced Group Theory

sides. The group elements x, y, and x y are rotations, suitably oriented, about the vertices of the triangle through respective angles 2π/a, 2π/b, and 2π/c. Further information about the cases γ > 1 and γ = 1 is obtained in Problems 37–46 at the end of the chapter. We conclude with one further example of a presentation whose group we can readily identify concretely. Proposition 7.9. Let S be a set, and let R = {sts −1 t −1 | s ∈ S, t ∈ S}. Then the smallest normal subgroup of the free group F(S) containing R is the com mutator subgroup F(S) , and therefore S; R is isomorphic to the free abelian group s∈S Zs . PROOF. The members of R are in F(S) , the product of two members of F(S) is in F(S) , and any conjugate of a member of F(S) is in F(S) . Therefore the smallest normal subgroup N (R) containing R has N (R) ⊆ F(S) . Let ϕ : F(S) → F(S)/N (R) be the quotient homomorphism. Elements of the quotient F(S)/N (R) may be expressed as words in the elements ϕ(s) and ϕ(s)−1 for s in S, and the factors commute because of the deﬁnition of R. Therefore F(S)/N (R) is abelian. By Proposition 7.4, N (R) ⊇ F(S) . Therefore N (R) = F(S) . This proves the ﬁrst conclusion, and the second conclusion follows from Corollary 7.5. 2. Subgroups of Free Groups The main result of this section is that any subgroup of a free group is a free group. An example in the previous section shows that the rank can actually increase in the process of passing to the subgroup. The proof of the main result is ostensibly subtle but is relatively easy to understand in topological terms. Although we shall give the topological interpretation, we shall not pursue it further, and the proof that we give may be regarded as a translation of the topological proof into the language of algebra, combined with some steps of beautiﬁcation. For purposes of the topological argument, let us think of the given free group for the moment as ﬁnitely generated, and let us suppose that the subgroup has ﬁnite index. A free group on n symbols is the fundamental group of a bouquet of n circles, all joined at a single point, which we take as the base point. By the theory of covering spaces, any subgroup of index k is the fundamental group of some k-sheeted covering space of the bouquet of circles. This covering space is a 1-dimensional simplicial complex, and one can prove with standard tools that the fundamental group of any 1-dimensional simplicial complex is a free group. The theorem follows.

2. Subgroups of Free Groups

315

If the special hypotheses are dropped that the given free group is ﬁnitely generated and the subgroup has ﬁnite index, then the same proof is applicable as long as one allows a suitable generalization of the notion of simplicial complex. Thus the topological argument is completely general. The theorem then is as follows. Theorem 7.10 (Nielsen–Schreier Theorem). Every subgroup of a free group is a free group. REMARKS. The algebraic proof will occupy the remainder of the section but will occasionally be interrupted by comments about the example in the previous section. Let the given free group be F, let the subgroup be H , and form the right cosets H g in F. Let C be a set of representatives for these cosets, with 1 chosen as the representative of the identity coset; we shall impose further conditions on C shortly. EXAMPLE, continued. For the example in the previous section, we were given a free group F with two generators x, y, and the subgroup H is taken to have generators x 2 , x y, y 2 . In fact, one readily checks that H is the subgroup formed from all words of even length, and we shall think of it that way. The set C of coset representatives may be taken to be {1, x} in this case. The argument we gave that H is free has points of contact with the proof we give of Theorem 7.10 but is not an exact special case of it. One point of contact is that within each generator of H that we identify, there is some particular factor that does not cancel when that generator appears in a word representing a member of the subgroup. We deﬁne a function ρ : F → C by taking ρ(x) to be the coset representative of the member x of F. This function has the property that ρ(hx) = ρ(x) for all h in H and x in F. Also, x → xρ(x)−1 is a function from F to H , and it is the identity function on H . The ﬁrst lemma shows that a relatively small subset of the elements xρ(x)−1 is a set of generators of H . Lemma 7.11. Let S be the set of generators of F, and let S = S ∪ S −1 . Every element of H is a product of elements of the form gbρ(gb)−1 with g in C and b in S . Furthermore the element g = ρ(gb) of C has the properties −1 that g = ρ(g b−1 ) and that gb−1 ρ(gb−1 )−1 is of the form g bρ(g b)−1 . Consequently the elements gaρ(ga)−1 with g in C and a in S form a set of generators of H .

316

VII. Advanced Group Theory

EXAMPLE, continued. In the example, we are taking C = {1, x} and S = {x, y}. The elements gbρ(gb)−1 obtained with g=1 and b equal to x, y, x −1 , y −1 are 1, yx −1 , x −1 x −1 , and y −1 x −1 . The elements gbρ(gb)−1 obtained with g = x and b equal to x, y, x −1 , y −1 are x x, x y, 1, and x y −1 . The lemma says that 1, yx −1 , x x, and x y form a set of generators of H and that the elements x −1 x −1 , y −1 x −1 , 1, and x y −1 are inverses of these generators in some order. REMARK. The lemma needs no hypothesis that F is free. A nontrivial application of the lemma with F not free appears in Problem 43 at the end of the chapter. PROOF. Any h in F can be written as a product h = b1 · · · bn with each b j in S . Deﬁne r0 = 1 and rk = ρ(b1 · · · bk ) for 1 ≤ k ≤ n. Then

hrn−1 = (r0 b1r1−1 )(r1 b2r2 )−1 · · · (rn−1 bn rn−1 ).

(∗)

Since rk = ρ(b1 · · · bk ) = ρ(b1 · · · bk−1 bk ) = ρ(ρ(b1 · · · bk−1 )bk ) = ρ(rk−1 bk ), we have rk−1 bk rk−1 = gbρ(gb)−1 with g = rk−1 and b = bk . Thus (∗) exhibits hrn−1 as a product of elements as in the ﬁrst conclusion of the lemma. Since rn = ρ(b1 · · · bn ) = ρ(h), rn = 1 if h is in H . Therefore in this case, h itself is a product of elements as in the statement of that conclusion, and that conclusion is now proved. For the other conclusion, let gb−1 ρ(gb−1 )−1 be given, and put g = ρ(gb−1 ), so that gb−1 g −1 = h is in H . This equation implies that g b = h −1 g. Hence ρ(g b) = ρ(h −1 g) = ρ(g) = g, and it follows that gb−1 ρ(gb−1 )−1 = gb−1 g −1 −1 = (g bg −1 )−1 = g bρ(g b)−1 . This proves the lemma. Lemma 7.12. With F free it is possible to choose the set C of coset representatives in such a way that all of its members have expansions in terms of S as g = b1 · · · bn in which (a) g = b1 b2 · · · bn is a reduced word as written, (b) b1 b2 · · · bn−1 is also a member of C. REMARKS. It is understood from the case of n = 1 in (b) that 1 is the representative of the identity coset. When C is chosen as in this lemma, C is said to be a Schreier set. In the example, C = {1, x} is a Schreier set. So is C = {1, y}, and hence the selection of a Schreier set may involve a choice. PROOF. If S is ﬁnite or countably inﬁnite, we enumerate it. In the uncountable case (which is of less practical interest), we introduce a well ordering in S by means of Zermelo’s Well-Ordering Theorem as in Section A5 of the appendix.

2. Subgroups of Free Groups

317

The ordering of S will be used to deﬁne a lexicographic ordering of the set of all reduced words in the members of S . If x = b1 · · · bm

and

y = b1 · · · bn

(∗)

are reduced words with m ≤ n, we say that x < y if any of the following hold: (i) m < n, (ii) m = n and b1 < b1 , . (iii) m = n, and for some k < m, b1 = b1 , . . . , bk = bk , and bk+1 < bk+1 With this deﬁnition the set of reduced words is well ordered, and hence any nonempty subset of reduced words has a least element. Let us observe that if x, y, z are reduced words with x < y and if yz is reduced as written, then x z < yz after x z has been reduced. In fact, let us assume that x and y are as in (∗) and that the length of z is r . The assumption is that yz has length n + r , and the length of x z is at most m + r . If m < n, then certainly x z < yz. If m = n and x z fails to be reduced, then the length of x z is less than the length of yz, and x z < yz. If m = n and x z is reduced, then the ﬁrst inequality bk < bk with x and y shows that x z < yz. To deﬁne the set C of coset representatives, let the representative of H g be the least member of the set H g, each element being written as a reduced word. Since the length of the empty word is 0, the representative of the identity coset H is 1 under this deﬁnition. Thus all we have to check is that an initial segment of a member of C is again in C. Suppose that b1 · · · bn is in C, so that b1 · · · bn is the least element of H b1 · · · bn . Denote the least element of H b1 · · · bn−1 by g. If g = b1 · · · bn−1 , we are done. Otherwise g < b1 · · · bn−1 , and then the fact that b1 · · · bn is reduced implies that gbn < b1 · · · bn . But gbn is in H b1 · · · bn , and this inequality contradicts the minimality of b1 · · · bn in that coset. Thus we conclude that g = b1 · · · bn−1 . This proves the lemma. For the remainder of the proof of Theorem 7.10, we assume, as we may by Lemma 7.12, that the set C of coset representatives is a Schreier set. Typical elements of S will be denoted by a, and typical elements of S = S ∪ S −1 will be denoted by b. Let us write u for a typical element gaρ(ga)−1 with g in C, and let us write v for a typical element gbρ(gb)−1 with g in C. The elements u generate H by Lemma 7.11, and each element v is either an element u or the inverse of an element u, according to the lemma. We shall prove that the elements u not equal to 1 are distinct and form a free basis of H . First we prove that each of the elements v = gbρ(gb)−1 either is reduced as written or is equal to 1. Put g = ρ(gb), so that v = gbg −1 . Since g and g are in the Schreier set C, they are reduced as written, and hence so are g and g −1 . Thus

318

VII. Advanced Group Theory

the only possible cancellation in v occurs because the last factor of g is b−1 or the last factor of g is b. If the last factor of g is b−1 , then gb is an initial segment of g and hence is in the Schreier set C; thus ρ(gb) = gb and v = gbρ(gb)−1 = 1. Similarly if the last factor of g is b, then g b−1 is an initial segment of g and hence is in the Schreier set C; thus ρ(g b−1 ) = g b−1 , and Lemma 7.11 gives −1 = g b−1 ρ(g b−1 )−1 = 1. Thus v = gbρ(gb)−1 either is v −1 = gbρ(gb)−1 reduced as written or is equal to 1. Next let us see that the elements v other than 1 are distinct. Suppose that v = gbρ(gb)−1 = g b ρ(g b )−1 is different from 1. Remembering that each of these expressions is reduced as written, we see that if g is shorter than g , then gb is an initial segment of g . Since C is a Schreier set, gb is in C and ρ(gb) = gb; thus v = gbρ(gb)−1 equals 1, contradiction. Similarly g cannot be shorter than g. So g and g must have the same length l. In this case the ﬁrst l + 1 factors must match in the two equal reduced words, and we conclude that g = g and b = b . This proves the uniqueness. We know that each v is either some u or some u −1 , and this uniqueness shows that it cannot be both unless v = 1. Therefore the nontrivial u’s are distinct, and the nontrivial v’s consist of the u’s and their inverses, each appearing once. Since an element v not equal to 1 therefore determines its g and b, let us refer to the factor b of v = gbρ(gb)−1 as the signiﬁcant factor of v. This is the part that will not cancel out when we pass from a product of v’s to its reduced form. ¯ g¯ b) ¯ −1 , that Speciﬁcally suppose that we have v = gbρ(gb)−1 and v¯ = g¯ bρ( −1 ¯ The neither of these is 1, and that v¯ = v . Put g = ρ(gb) and g¯ = ρ(g¯ b). −1 ¯ −1 does not extend claim is that the cancellation in forming v v¯ = gbg g¯ b g¯ ¯ If it does, then one of three things to either of the signiﬁcant factors b and b. happens: (i) the b in bg −1 gets canceled because the last factor of g is b, in which case g b−1 is an initial segment of g , g b−1 = ρ(g b−1 ) = g, and v = gbg −1 = 1, or (ii) the b¯ in g¯ b¯ gets canceled because the last factor of g¯ is b¯ −1 , in which case ¯ = g¯ , and v¯ = g¯ b¯ g¯ −1 = 1, or g¯ b¯ is an initial segment of g, ¯ g¯ b¯ = ρ(g¯ b) −1 ¯ (iii) g g¯ = 1 and bb = 1, in which case g¯ = g , b¯ = b−1 , and the middle conclusion of Lemma 7.11 allows us to conclude that v¯ = v −1 . All three of these possibilities have been ruled out by our assumptions, and therefore neither of the signiﬁcant factors in v v¯ cancels. As a consequence of this noncancellation, we can see that in any product v1 · · · vm of v’s in which no vk is 1 and no vk+1 equals vk−1 , none of the signiﬁcant factors cancel. In fact, the previous paragraph shows that the signiﬁcant factors of v1 and v2 survive in forming v1 v2 , the signiﬁcant factors of v2 and v3 survive in right multiplying by v3 , and so on. Since the nontrivial u’s are distinct and

3. Free Products

319

the nontrivial v’s consist of the u’s and their inverses, each appearing once, we conclude that the set of nontrivial u’s is a free subset of F. Lemma 7.11 says that the u’s generate H , and therefore the set of nontrivial u’s is a free basis of H .

3. Free Products The free abelian group on an index set S, as constructed in Section IV.9, has a universal mapping property that allows arbitrary functions from S into any target abelian group to be extended to homomorphisms of the free abelian group into the target group. The construction of free groups in Section 1 was arranged to adapt the construction so that the target group in the universal mapping property could be any group, abelian or nonabelian. In this section we make a similar adaptation of the construction of a direct sum of abelian groups so that the result is applicable in a context of arbitrary groups. Proposition 4.17 gave the universal mapping property of the external direct sum G of a set s∈S s of abelian groups with associated embedding homomorphisms i s0 : G s0 → s∈S G s . The statement is that if H is any abelian group and ϕs : G s → H , then there {ϕs | s ∈ S} is a system of group homomorphisms exists a unique group homomorphism ϕ : s∈S G s → H such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. Example 2 of coproducts in Section IV.11 shows that direct sum is therefore the coproduct functor in the category of all abelian groups. This universal mapping property of s∈S G s fails when H is a nonabelian group such as the symmetric group S3 . In fact, S3 has an element of order 2 and an element of order 3 and hence admits nontrivial homomorphisms ϕ2 : C2 → S3 and ϕ3 : C3 → S3 . But there is no homomorphism ϕ : C2 ⊕ C3 → S3 such that ϕ ◦ i 2 = ϕ2 and ϕ ◦ i 3 = ϕ3 because the image of ϕ has to be abelian but the images of ϕ2 and ϕ3 do not commute. Consequently direct sum cannot extend to a coproduct functor in the category of all groups. Instead, the appropriate group constructed from C2 and C3 for this kind of universal mapping property is the “free product” of C2 and C3 , denoted by C2 ∗ C3 . In this section we construct the free product of any set of groups, ﬁnite or inﬁnite. Also, we establish its universal mapping property and identify it in terms of generators and relations. The prototype of a free product is the free group F(S), which equals a free product of copies of Z indexed by S. A free product is always an inﬁnite group if at least two of the factors are not 1-element groups. An important application of free products occurs in the theory of the fundamental group in topology: if X is a topological space for which the theory of covering spaces is applicable, and if A and B are open subsets of X with X = A ∪ B such that A ∩ B is nonempty, connected, and simply connected, then the fundamental

VII. Advanced Group Theory

320

group of X is the free product of the fundamental group of A and the fundamental group of B. This result, together with a generalization that no longer requires A ∩ B to be simply connected, is known as the Van Kampen Theorem. Let S be a nonempty set of groups G s for s in S. The set S is allowed to be inﬁnite, but in practice it often has just two elements. We shall describe the group deﬁned to be the free product G = s∈S G s . We start from the set W ({G s }) of all words built from the groups G s . This consists of all ﬁnite sequences g1 · · · gn with each gi in some G s depending on i. The length of a word is the number of factors in it. The empty word is denoted by 1. We multiply two words by writing them end to end, and the resulting operation of multiplication is associative. A word is said to be equivalent to a second word if the ﬁrst can be obtained from the second by a ﬁnite sequence of steps of the following kinds and their inverses: (i) drop a factor for which gi is the identity element of the group in which it lies, (ii) collapse two factors gi gi+1 to a single one gi∗ if gi and gi+1 lie in the same G s and their product in that group is gi∗ . The result is an equivalence relation, and the set of equivalence classes is the underlying set of s∈S G s .

*

*

Theorem 7.13. If S is a nonempty set of groups G s and W ({G s }) is the set of all words from the groups G s , then the product operation deﬁned on W ({G s }) descends in a well-deﬁned fashion to the set s∈S G s of equivalence classes of members of W ({G s }), and s∈S G s thereby becomes a group. For each s0 in S, deﬁne i s0 : G s0 → s∈S G s to be the group homomorphism obtained as the 1 followed by passage composition of the inclusion of G s0 into words of length to equivalence classes. Then the pair s∈S G s , {i s } has the following universal mapping property: whenever H is a group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : G s → H , then there exists a unique group homomorphism ϕ : s∈S G s → H such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S.

*

*

*

*

*

G s0 ⏐ ⏐ i s0

ϕs

−−−→ H ϕ

s∈S G s

*

FIGURE 7.2. Universal mapping property of a free product. REMARKS. The group s∈S G s is called the free product of the groups G s . Figure 7.2 illustrates its universal mapping property. This universal mapping property actually characterizes s∈S G s , as will be seen in Proposition 7.14. One

*

*

3. Free Products

321

often writes G 1 ∗ · · · ∗ G n when the set S is ﬁnite; the order of listing the groups is immaterial. The proof of Theorem 7.13 is rather similar to the proof of Theorem 7.1, and we shall skip some details. PROOF. Let us write ∼ for the equivalence relation on words, and let us denote equivalence classes by brackets. We want to deﬁne multiplication in s∈S G s by [w1 ][w2 ] = [w1 w2 ]. To see that this formula makes sense in

* let x, x ,

s∈S G s ,

* and y be words in W ({G }), and suppose that x and x differ by only one operation

s

of type (i) or type (ii) as above. Then x ∼ x , and it is evident that x y ∼ x y and yx ∼ yx. Iteration of this kind of relationship shows that w1 ∼ w1 and w2 ∼ w2 implies w1 w2 ∼ w1 w2 , and hence multiplication is well deﬁned. The associativity of multiplication in W ({G s }) implies that multiplication in s∈S G s is associative, and [1] is a two-sided identity. We readily check that if

*

g = g1 · · · gn is a word, then the word g −1 = gn−1 · · · g1−1 has the property that [g −1 ] is a two-sided inverse to [g]. Therefore s∈S G s is a group. The uniqueness of the homomorphism ϕ in the universal mapping property is no problem since all words are products of words of length 1 and since the subgroups i s0 (G s0 ) together generate s∈S G s . For existence of ϕ, we begin by deﬁning a function : W ({G s }) → H such that

*

*

(gs ) = ϕs (gs )

for gs in G s when viewed as a word of length 1,

(w1 w2 ) = (w1 )(w2 ) for w1 and w2 in W ({G s }). We take the formulas (gs ) = ϕ(gs ) for gs in G s as a deﬁnition of on words of length 1. Any member of W ({G s }) can be written uniquely as g1 · · · gn with each gi in G si , and we set (g1 · · · gn ) = (g1 ) · · · (gn ). (If n = 0, the understanding is that (1) = 1.) Then has the required properties. Let us show that w ∼ w implies (w ) = (w). The questions are whether (i) if g1 , . . . , gn are in various G s ’s with gi equal to the identity 1si of G si , then ?

(g1 · · · gi−1 1si gi+1 · · · gn ) = (g1 · · · gi−1 gi+1 · · · gn ), (ii) if g1 , . . . , gn are in various G s ’s with G si = G si+1 and if gi gi+1 = gi∗ in G si , then (g1 · · · gi−1 gi gi+1 gi+2 · · · gn ) = (g1 · · · gi−1 gi∗ gi+2 · · · gn ). ?

VII. Advanced Group Theory

322

In the case of (i), the question comes down to whether a certain h(1si )h in H equals hh , and this is true because (1si ) = ϕsi (1si ) is the identity of H . In the case of (ii), the question comes down to whether h(gi )(gi+1 )h equals h(gi∗ )h if G si = G si+1 and gi gi+1 = gi∗ in G si , and this is true because (gi )(gi+1 ) = ϕsi (gi )ϕsi (gi+1 ) = ϕsi (gi gi+1 ) = ϕsi (gi∗ ) = (gi∗ ). We conclude that w ∼ w implies (w ) = (w). We may therefore deﬁne ϕ([w]) = (w) for [w] in F({G s }), and ϕ is a homomorphism of F({G s }) into H as a consequence of the property (w1 w2 ) = (w1 )(w2 ) of on W ({G s }). For gs in G s , we have ϕ([gs ]) = (gs ) = ϕs (gs ), i.e., ϕ(i(gs )) = ϕs (gs ). This completes the proof of existence. Proposition 7.14. Let S be a nonempty set of groups G s . Suppose that G is a group and that i s : G s → G for s ∈ S is a system of group homomorphisms with the following universal mapping property: whenever H is a group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : G s → H , then there exists a unique group homomorphism ϕ : G → H such that ϕ ◦ i s = ϕs for all s ∈ S. Then there exists a unique group homomorphism : s∈S G s → G

*

such that i s = ◦ i s for all s ∈ S. Moreover, is a group isomorphism, and the homomorphisms i s : G s → G are one-one. REMARKS. As was true with Proposition 7.2, readers who have been through Chapter VI will recognize that Proposition 7.14 is a special case of Problem 19 at the end of that chapter. PROOF. Put G =

s∈S G s . s = is ,

* and ϕ

In the universal mapping property of Theorem

and let : G → G be the homomorphism ϕ 7.13, let H = G produced by that theorem. Then satisﬁes ◦ i s = i s for all s. Reversing the roles of G and G , we obtain a homomorphism : G → G with ◦ i s = i s for all s. Therefore ( ◦ ) ◦ i s = ◦ i s = i s . Comparing ◦ with the identity 1G and applying the uniqueness in the universal mapping property for G, we see that ◦ = 1G . Similarly the uniqueness in the universal mapping property of G gives ◦ = 1G . Thus is a group isomorphism. It is uniquely determined by the given properties since the various subgroups i s (G s ) generate G. Since i s = ◦ i s and since and i s are one-one, i s is one-one. As was the case for free groups, we want a decision procedure for telling whether two given words in W ({G s }) are equivalent. This is the so-called word problem for the free product. Solving it allows us to use free products concretely, just as Proposition 7.3 allowed us to use free groups concretely. A word in W ({G s })) is said to be reduced if it (i) contains no factor for which gi is the identity element of the group G s in which it lies,

3. Free Products

323

(ii) contains no two consecutive factors gi and gi+1 taken from the same group G s . Proposition 7.15. (solution of the word problem for free products). If S is a nonempty set of groups G s and W ({G s }) is the set of all words from the groups G s , then each word in W ({G s }) is equivalent to one and only one reduced word. EXAMPLE. Consider the free product C2 ∗C2 of two cyclic groups, one with x as generator and the other with y as generator. Words consist of a ﬁnite sequence of factors of x, y, the identity of the ﬁrst factor, and the identity of the second factor. A word is reduced if no factor is an identity and if no two x’s are adjacent and no two y’s are adjacent. Thus the reduced words consist of ﬁnite sequences whose terms are alternately x and y. Those of length ≤ 3 are 1, x, y, x y, yx, x yx, yx y, and in general there are two of each length > 0. The proposition tells us that all these reduced words give distinct members of C2 ∗ C2 . In particular, the group is inﬁnite. REMARK. More generally, to test whether two words are equivalent, the proposition says to eliminate factors of the identity and multiply consecutive factors in each word when they come from the same group, and repeat these steps until it is no longer possible to do either of these operations on either word. Then each of the given words has been replaced by a reduced word, and the two given words are equivalent if and only if the two reduced words are identical. Problems 37–46 at the end of the chapter concern C2 ∗C3 , and some of these problems make use of the result of this proposition—that distinct reduced words are inequivalent. PROOF OF PROPOSITION 7.15. Both operations—eliminating factors of the identity and multiplying consecutive factors in each word when they come from the same group—reduce the length of a word. Since the length has to remain ≥ 0, the process of successively carrying out these two operations as much as possible has to stop after ﬁnitely many steps, and the result is a reduced word. This proves that each equivalence class of words contains a reduced word. For uniqueness of the reduced word in an equivalence class, we proceed somewhat as with Proposition 7.3, associating to each word a ﬁnite sequence of reduced words such that the last member of the sequence is unchanged when we apply an operation to the word that preserves equivalence. However, there are considerably more details to check this time. If w = g1 · · · gn is a given word with each gi in G si , then we associate to w the sequence of reduced words x0 , x1 , . . . , xn deﬁned inductively by x0 = 1, g1 x1 = 1

if g1 is not the identity of G s1 , if g1 is the identity of G s1 ,

324

VII. Advanced Group Theory

and the following formula for i ≥ 2 if xi−1 is of the reduced form h 1 · · · h k with h j in G tj : ⎧ h 1 · · · h k gi if G si = G tk and gi is not the identity 1G si of G si , ⎪ ⎪ ⎪ ⎨ h ···h if gi is the identity 1G si of G si , 1 k xi = ⎪ if G tk = G si with h k gi = 1G si , h 1 · · · h k−1 ⎪ ⎪ ⎩ ∗ h 1 · · · h k−1 gi if G tk = G si with h k gi = gi∗ = 1G si . Put r (w) = xn . We check inductively for i ≥ 0 that each xi is reduced. In fact, xi for i ≥ 2 begins in every case with h 1 · · · h k−1 , which is assumed reduced. The only possible reduction for xi thus comes from factors that are adjoined or from interference with h k−1 , and all possibilities are addressed in the above choices. Thus r (w) = xn is necessarily reduced for each word w. If g1 · · · gn is reduced as given, then xi is determined by the ﬁrst possible choice h 1 · · · h k gi every time, and hence xi = g1 · · · gi for all i. Therefore we obtain r (w) = w if w is reduced. Now consider the equivalent words w = g1 · · · g j g j+1 · · · gn

and

w = g1 · · · g j 1G s g j+1 · · · gn .

for w . Then we have x j = x j ; let Form x0 , . . . , xn for w and x0 , . . . , xn+1 h 1 · · · h k be a reduced form of x j . The formula for x j+1 is governed by the second choice in the display, and x j+1 = h 1 · · · h k = x j . Then x j+i+1 = x j+i for 1 ≤ i ≤ n − j as well. Hence xn+1 = xn , and r (w ) = r (w). Next suppose that g j∗ = g j g j+1 in G sj , and consider the equivalent words

w = g1 · · · g j−1 g j∗ g j+2 · · · gn

and

w = g1 · · · g j−1 g j g j+1 g j+2 · · · gn .

As above, form x0 , . . . , xn for w and x0 , . . . , xn+1 for w . Then we have x j−1 = , and we let h 1 · · · h k be a reduced form of x j−1 . There are cases, subcases, x j−1 and subsubcases. First assume G tk = G sj . Then x j equals h 1 · · · h k g j∗ or h 1 · · · h k in the two subcases g j∗ = 1G sj and g j∗ = 1G sj . In the ﬁrst subcase, we have g j∗ = 1G sj and x j = h 1 · · · h k g j∗ . Then x j equals h 1 · · · h k g j or h 1 · · · h k in the two subsubcases g j = 1G sj and g j = 1G sj . In the ﬁrst subsubcase, x j+1 = h 1 · · · h k g j∗ = x j ∗ whether or not g j+1 = 1G sj . In the second subsubcase, g j = g j g j+1 cannot be 1G sj , and therefore x j+1 = h 1 · · · h k g j∗ = x j . In the second subcase of the case G tk = G sj , we have g j∗ = 1G sj and x j = x j−1 = h 1 · · · h k . Then x j equals h 1 · · · h k g j or h 1 · · · h k in the two subsubcases g j = 1G sj and g j = 1G sj . In both subsubcases, x j+1 = h 1 · · · h k , so that x j+1 = xj .

3. Free Products

325

Now assume G tk = G sj . Then x j equals h 1 · · · h k−1 h ∗k or h 1 · · · h k−1 in the two subcases h k g j∗ = h ∗k = 1G sj and h k g j∗ = 1G sj . In the ﬁrst subcase, we have h k g j∗ = h ∗k = 1G sj and x j = h 1 · · · h k−1 h ∗k . Then x j equals h 1 · · · h k−1 h k or h 1 · · · h k−1 in the two subsubcases h k g j = h k = 1G sj and h k g j = 1G sj . In the ﬁrst subsubcase, h k g j+1 = h k g j g j+1 = h k g j∗ = h ∗k implies x j+1 = h 1 · · · h k−1 h ∗k = x j . In the second subsubcase, we know that h ∗k cannot be 1G si and hence that g j+1 = h k g j g j+1 = h k g j∗ = h ∗k cannot be 1G sj ; thus x j+1 = h 1 · · · h k−1 h ∗k = x j . In the second subcase of the case G tk = G sj , we have h k g j∗ = 1G sj and x j = h 1 · · · h k−1 . Then x j equals h 1 · · · h k−1 h ∗k or h 1 · · · h k−1 in the two subsubcases h k g j = h ∗k = 1G sj and h k g j = 1G sj . In the ﬁrst subsubcase, g j+1 cannot be 1G sj but h ∗k g j+1 = h k g j g j+1 = h k g j∗ = 1G sj ; hence x j+1 = h 1 · · · h k−1 = x j . In the second subsubcase, x j = h 1 · · · h k−1 and g j+1 = 1G sj , so that x j+1 = h 1 · · · h k−1 = x j . = x j in all cases. Hence x j+i+1 = x j+i for 0 ≤ i ≤ We conclude that x j+1 n − j, xn+1 = xn , and r (w ) = r (w). Consequently the only reduced word that is equivalent to w is r (w). Proposition 7.16. Let S be a nonempty set of groups G s , and suppose that Ss ; Rs is a ,presentation of G s ,- the sets Ss being understood to be disjoint for S ; s ∈ S. Then s∈S G s . s∈S s s∈S Rs is a presentation of the free product

*

REMARK. One effect of this proposition is to make Proposition 7.8 available as a tool for use with free products. Using Proposition 7.8 may be easier than appealing to the universal mapping property in Theorem 7.13. PROOF. Put S = s∈S Ss and R = s∈S Rs , and deﬁne G to be a group given by generators and relations as G = S; R. Consider the function from Ss into the quotient group G = F(S)/N (R) given by carrying x in Ss into the word x in S and then passing to F(S) and its quotient G. Because of the universal mapping property of free groups, this function extends to a group homomorphism 3 i s : F(Ss ) → G. If r is a reduced word relative to Ss representing a member of Rs , then r is carried by 3 i s into a member of the larger set R and then into i s contains the smallest the identity of G. Since ker3 i s is normal in F(Ss ), ker3 normal subgroup N (Rs ) in F(Ss ) that contains Rs . Proposition 4.11 shows that 3 i s descends to a group homomorphism i s : G s → G. We shall prove that G and the system {i s } have the universal mapping property of Proposition 7.14 that characterizes a free product. Then it will follow from that proposition that G ∼ = s∈S G s , and the proof will be complete. Thus let H be a group, and let {ϕs | s ∈ S} be a system of group homomorphisms ϕs : G s → H . We are to produce a homomorphism : G → H such that ◦ i s = ϕs for all s, and we are to prove that such a homomorphism

*

326

VII. Advanced Group Theory

is unique. Let qs : F(Ss ) → G s be the quotient homomorphism, and deﬁne 3 : S → H as follows: if ϕs = ϕs ◦ qs . Now deﬁne 3 ϕs : F(Ss ) → H by 3 x is in S, then x is in a set Ss for a unique s and thereby deﬁnes a member 3 is taken to be 3 ϕs (x). The universal mapping of F(Ss ) for that unique s; (x) 3 to a group homomorphism, property of the free group F(S) allows us to extend 3 which we continue to call , of F(S) into H . Let r be a nontrivial relation in R ⊆ F(S). Then r , by hypothesis of disjointness for the sets Ss , lies in a unique 3 )=3 ϕs (r ) = ϕs (qs (r )) = ϕs (1s ) = 1 H . Consequently the kernel Rs . Hence (r 3 contains the smallest normal subgroup N (R) of F(S) containing R, and 3 of descends to a homomorphism : G → H . This satisﬁes 3 is = =3 ϕs = ϕs ◦ qs . ◦ i s ◦ qs = ◦ 3 F(Ss )

Since the quotient homomorphism qs is onto G s , we obtain ◦ i s = ϕs , and existence of the homomorphism is established. For uniqueness, we observe that the identities ◦ i s = ϕs imply that is uniquely determined on the subgroup of G generated by the images of all i s . Since qs is onto G s , this subgroup is the same as the subgroup generated by the images of all 3 i s . This subgroup contains the image in G of every generator of F(S) and hence is all of G. Thus is uniquely determined. 4. Group Representations Group representations were deﬁned in Section IV.6 as group actions on vector spaces by invertible linear functions. The underlying ﬁeld of the vector space will be taken to be C in this section and the next, and the theory will then be especially tidy. The subject of group representations is one that uses a mix of linear algebra and group theory to reveal hidden structure within group actions. It has broad applications to algebra and analysis, but we shall be most interested in an application to ﬁnite groups known as Burnside’s Theorem that will be proved in the next section. Let us begin with the abelian case, taking G for the moment to be a ﬁnite abelian group. A multiplicative character of G is a homomorphism χ : G → S 1 ⊆ C× of G into the multiplicative group of complex numbers of absolute value 1. The under pointwise multiplication multiplicative characters form an abelian group G is the of their complex values: (χχ )(g) = χ (g)χ (g). The identity of G multiplicative character that is identically 1 on G, and the inverse of χ is the complex conjugate of χ. The notion of multiplicative character adapts to the case of a ﬁnite group the familiar exponential functions x → einx on the line, which can be regarded as multiplicative characters of the additive group R/2πZ of real numbers modulo 2π. These functions have long been used to resolve a periodic function of

4. Group Representations

327

time into its component frequencies: The device is the Fourier series of the function f . &If f is periodic of period 2π, then the Fourier coefﬁcients of f π 1 −inx d x, and the Fourier series of f is the inﬁnite series are c = 2π −π f (x)e ∞ n inx c e . A portion of the subject of Fourier series looks for senses in n=−∞ n which f (x) is actually equal to the sum of its Fourier series. This is the problem of Fourier inversion. A similar problem can be formulated when R/2πZ is replaced by the ﬁnite abelian group G. The exponential functions are replaced by the multiplicative characters. One can form an analog of Fourier coefﬁcients for the vector space C(G, C) of complex-valued functions2 deﬁned on G, and then one can form the analog of the Fourier series of the function. The problem of Fourier inversion becomes one of linear algebra, once we take into account the known structure of all ﬁnite abelian groups (Theorem 4.56). The result is as follows. Theorem 7.17 (Fourier inversion formula for ﬁnite abelian groups). Let G be a ﬁnite abelian group, and introduce an inner product on the complex vector space C(G, C) of all functions from G to C by the formula F(g)F (g), F, F = g∈G

form an the corresponding norm being F = F, F1/2 . Then the members of G satisfying χ2 = |G|. Consequently orthogonal basis of C(G, C), each χ in G = |G|, and any function F : G → C is given by the “sum of its Fourier |G| series”: 1 F(h)χ (h) χ (g). F(g) = |G| h∈G χ ∈G

REMARKS. This theorem is one of the ingredients in the proof in Advanced Algebra of Dirichlet’s theorem that if a and b are positive relatively prime integers, then there are inﬁnitely many primes of the form an + b. In applications to engineering, the ordinary Fourier transform on the line is often approximated, for computational purposes, by a Fourier series on a large cyclic group, and then Theorem 7.17 is applicable. Such a Fourier series can be computed with unexpected efﬁciency using a special grouping of terms; this device is called the fast Fourier transform and is described in Problems 29–31 at the end of the chapter. notation C(G, C) is to be suggestive of what happens for G = S 1 and for G = R1 , where one works in part with the space of continuous complex-valued functions vanishing off a bounded set. In any event, pointwise multiplication makes C(G, C) into a commutative ring. Later in the section we introduce a second multiplication, called “convolution,” that makes C(G, C) into a ring in a different way. In Chapter VIII we shall introduce the “complex group algebra” C G of G. The vector space C(G, C) is the dual vector space of C G. However, C(G, C) and C G are canonically isomorphic because they have distinguished bases, and the isomorphism respects the multiplication structures—convolution in C(G, C) and the group-algebra multiplication in C G. 2 The

VII. Advanced Group Theory

328

and put PROOF. For orthogonality let χ and χ be distinct members of G, −1 χ = χ χ = χ χ . Choose g0 in G with χ (g0 ) = 1. Then χ (g0 ) so that and therefore

χ (g) = g∈G χ (g0 g) = g∈G χ (g), [1 − χ (g0 )] g∈G χ (g) = 0 g∈G χ (g) = 0.

g∈G

Consequently χ , χ =

g∈G

χ (g)χ (g) =

g∈G

χ (g) = 0.

are linearly independent, The orthogonality implies that the members of G 2 ≤ dim C(G, C) = |G|. Certainly χ2 = and we obtain |G| g∈G |χ (g)| = g∈G 1 = |G|. are a basis of C(G, C), we write G as a direct To see that the members of G sum of cyclic groups, by Theorem 4.56. A summand Z/mZ has at least m distinct multiplicative characters, given by j mod m → e2πi jr/m for 0 ≤ r ≤ m − 1, and these characters extend to G as 1 on the other direct summands of G. Taking products of such multiplicative characters from the different summands of G, ≥ |G|. Therefore |G| = |G|, and G is an orthogonal basis by we see that |G| Corollary 2.4. The formula for F(g) in the statement of the theorem follows by applying Theorem 3.11c. Now suppose that the ﬁnite group G is not necessarily abelian. Since S 1 is abelian, Proposition 7.4 shows that χ takes the value 1 on every member of the commutator subgroup G of G. Consequently there is no way that the multiplicative characters can form a basis for the vector space C(G, C) of complex-valued functions on G. The above analysis thus breaks down, and some adjustment is needed in order to extend the theory. The remedy is to use representations, as deﬁned in Section IV.6, on complex vector spaces of dimension > 1. We shall assume in the text that the vector space is ﬁnite-dimensional. The sense in which representations extend the theory of multiplicative characters is that any multiplicative character χ gives a representation R on the 1-dimensional vector space C by R(g)(z) = χ (g)z for g in G and z in C. Conversely any 1-dimensional representation gives a multiplicative character: if R is the representation on the 1-dimensional vector space V and if v0 = 0 is in V , then χ (g) is the scalar such that R(g)v0 = χ (g)v0 . It is enough to observe that the only elements of ﬁnite order in the multiplicative group C× are certain members of the circle S 1 , and then it follows that χ takes values in S 1 .

4. Group Representations

329

In the higher-dimensional case, the analog of the multiplicative character χ in passing to a 1-dimensional representation R is a “matrix representation.” A matrix representation of G is a function g → [ρ(g)i j ] from G into invertible square matrices of some given size such that ρ(g1 g2 )i j = nk=1 ρ(g1 )ik ρ(g2 )k j . If a representation R acts on the ﬁnite-dimensional complex vector space V , then the choice of an ordered basis for V leads to a matrix representation by the formula

R(g) [ρ(g)i j ] = . Conversely if a matrix representation g → [ρ(g)i j ] and an ordered basis of V are given, then the same formula may be used to obtain a representation R of G on V . In contrast to the 1-dimensional case, the matrices that occur with a matrix representation of dimension > 1 need not be unitary. The correspondence between unitary linear maps and unitary matrices was discussed in Chapter III. When the ﬁnite-dimensional vector space V has an inner product, a linear map was deﬁned to be unitary if it satisﬁes the equivalent conditions of Proposition 3.18. A complex square matrix A was deﬁned to be unitary if A∗ A = I . The matrix of a unitary linear map relative to an ordered orthonormal basis is unitary, and conversely when a unitary matrix and an ordered orthonormal basis are given, the associated linear map is unitary. We can thus speak of unitary representations and unitary matrix representations. Some examples of representations appear in Section IV.6. One further pair of examples will be of interest to us. With the ﬁnite group G ﬁxed but not necessarily abelian, we continue to let C(G, C) be the complex vector space of all functions f : G → C. We deﬁne two representations of G on C(G, C): the left regular representation given by ( (g) f )(x) = f (g −1 x) and the right regular representation r given by (r (g) f )(x) = f (xg). The reason for the presence of an inverse in one case and not the other was discussed in Section IV.6. Relative to the inner product f 1 (x) f 2 (x), ( f1, f2) = x∈G

both and r are unitary. The argument for is that ( (g) f 1 , (g) f 2 ) = ( (g) f 1 )(x)( (g) f 2 )(x) = f 1 (g −1 x) f 2 (g −1 x) x∈G under

y=g −1 x

=

x∈G

f 1 (y) f 2 (y) = ( f 1 , f 2 ),

y∈G

and the argument for r is completely analogous.

330

VII. Advanced Group Theory

It will be convenient to abbreviate “representation R on V ” as “representation (R, V ).” Let (R, V ) be a representation of the ﬁnite group G on a ﬁnitedimensional complex vector space. An invariant subspace U of V is a vector subspace such that R(g)U ⊆ U for all g in G. The representation is irreducible if V = 0 and if V has no invariant subspaces other than 0 and V . Two representations (R1 , V1 ) and (R2 , V2 ) on ﬁnite-dimensional complex vector spaces are equivalent if there exists a linear invertible function A : V1 → V2 such that A R1 (g) = R2 (g)A for all g in G. In the terminology of Section IV.11, “equivalent” is the notion of “is isomorphic to” in the category of all ﬁnite-dimensional representations of G. In more detail a morphism from (R1 , V1 ) to (R2 , V2 ) in this category is an intertwining operator, namely a linear map A : V1 → V2 such that A R1 (g) = R2 (g)A for all g in G. The condition for this equality to hold is that the diagram in Figure 7.3 commute. A

V1 −−−→ ⏐ ⏐ R1 (g)

V2 ⏐ ⏐ R (g) 2

A

V1 −−−→ V2 FIGURE 7.3. An intertwining operator for two representations, i.e., a morphism in the category of ﬁnite-dimensional representations of G. An example of a pair of representations that are equivalent is the left and right regular representations of G on C(G, C): in fact, if we deﬁne (A f )(x) = f (x −1 ), then ( (g)A f )(x) = (A f )(g −1 x) = f (x −1 g) = (r (g) f )(x −1 ) = (Ar (g) f )(x). Proposition 7.18 (Schur’s Lemma). If (R1 , V1 ) and (R2 , V2 ) are irreducible representations of the ﬁnite group G on ﬁnite-dimensional complex vector spaces and if A : V1 → V2 is an intertwining operator, then A is invertible (and hence exhibits R1 and R2 as equivalent) or else A = 0. If (R1 , V1 ) = (R2 , V2 ) and A : V1 → V2 is an intertwining operator, then A is scalar. REMARK. The conclusion that A is scalar makes essential use of the fact that the underlying ﬁeld is C. PROOF. The equality R2 (g)Av1 = A R1 (g)v shows that ker A and image A are invariant subspaces. By the assumed irreducibility, ker A equals 0 or V1 , and image A equals 0 or V2 . The ﬁrst statement follows. When (R1 , V1 ) = (R2 , V2 ), the identity I : V1 → V2 is an intertwining operator. If λ is an eigenvalue of A, then A − λI is another intertwining operator. Since A − λI is not invertible when λ is an eigenvalue of A, A must be 0.

4. Group Representations

331

Corollary 7.19. Every irreducible ﬁnite-dimensional representation of a ﬁnite abelian group G is 1-dimensional. PROOF. If (R, V ) is given, then the linear map A = R(g) satisﬁes A R(g) = R(xg) = R(gx) = R(g)A for all x in G. By Schur’s Lemma (Proposition 7.18), A = R(g) is scalar. Since g is arbitrary, every vector subspace of V is invariant. Irreducibility therefore implies that V is 1-dimensional. Let R be a representation of the ﬁnite group G on the ﬁnite-dimensional complex vector space V , let ( · , · )0 be any inner product on V , and deﬁne (v1 , v2 ) =

(R(x)v1 , R(x)v2 )0 .

x∈G

Then we have (R(g)v1 , R(g)v2 ) =

(R(x)R(g)v1 , R(x)R(g)v2 )0

x∈G

=

(R(xg)v1 , R(xg)v2 )0

x∈G

=

(R(y)v1 , R(y)v2 )0

by the change y = xg

y∈G

= (v1 , v2 ). With respect to the inner product ( · , · ), the representation (R, V ) is therefore unitary. In other words, we are always free to introduce an inner product to make a given ﬁnite-dimensional representation unitary. The signiﬁcance of this construction is noted in the following proposition. Proposition 7.20. If (R, V ) is a ﬁnite-dimensional representation of the ﬁnite group G and if an inner product is introduced in V that makes the representation unitary, then the orthogonal complement of an invariant subspace is invariant. PROOF. Let U be an invariant subspace. If u is in U and u ⊥ is in U ⊥ , then (R(g)u ⊥ , u) = (R(g)−1 R(g)u ⊥ , R(g)−1 u) = (u ⊥ , R(g)−1 u) = 0. Thus u ⊥ in U ⊥ implies R(g)u ⊥ is in U ⊥ . Corollary 7.21. Any ﬁnite-dimensional representation of the ﬁnite group G is a direct sum of irreducible representations. REMARK. That is, we can ﬁnd a system of invariant subspaces such that the action of G is irreducible on each of these subspaces and such that the whole vector space is the direct sum of these subspaces.

VII. Advanced Group Theory

332

PROOF. This is immediate by induction on the dimension. For dimension 0, the representation is the empty direct sum of irreducible representations. If the decomposition is known for dimension < n and if U is an invariant subspace under R of smallest possible dimension ≥ 1, then U is irreducible under R, and Proposition 7.20 says that the subspace U ⊥ , which satisﬁes V = U ⊕ U ⊥ , is invariant. It is therefore enough to decompose U ⊥ , and induction achieves such a decomposition. Proposition 7.22 (Schur orthogonality). For ﬁnite-dimensional representations of a ﬁnite group G in which inner products have been introduced to make the representations unitary, (a) if (R1 , V1 ) and (R2 , V2 ) are inequivalent and irreducible, then (R1 (x)v1 , v1 )(R2 (x)v2 , v2 ) = 0 for all v1 , v2 ∈ V1 and v2 , v2 ∈ V2 . x∈G

(b) if (R, V ) is irreducible, then

(R(x)v1 , v1 )(R(x)v2 , v2 ) =

x∈G

|G|(v1 , v2 )(v1 , v2 ) dim V

for v1 , v2 , v1 , v2 ∈ V.

REMARKS. If G is abelian, then V1 and V2 in (a) are 1-dimensional, and the conclusion of (a) reduces to the statement that the multiplicative characters are orthogonal. Conclusion (b) in this case reduces to a trivial statement. PROOF. For (a), let l : V2 → V1 be any linear map, and form the linear map R1 (x)l R2 (x −1 ). L= x∈G

Multiplying on the left by R1 (g) and on the right by R2 (g −1 ) and changing variables in the sum, we obtain R1 (g)L R2 (g −1 ) = L, so that R1 (g)L = L R2 (g) for all g ∈ G. By Schur’s Lemma (Proposition 7.18) and the assumed irreducibility and inequivalence, L = 0. Thus (Lv2 , v1 ) = 0. For the particular choice of l as l(w2 ) = (w2 , v2 )v1 , we have (R1 (x)l R2 (x −1 )v2 , v1 ) 0 = (Lv2 , v1 ) = x∈G = R1 (x)(R2 (x −1 )v2 , v2 )v1 , v1 = (R1 (x)v1 , v1 )(R2 (x −1 )v2 , v2 ), x∈G

x∈G

and (a) results since (R2 (x −1 )v2 , v2 ) = (R2 (x)v2 , v2 ). For (b), we proceed in the same way, starting from l : V → V , and we obtain L = λI from Schur’s Lemma. Taking the trace of both sides, we ﬁnd that

4. Group Representations

333

λ dim V = Tr L = |G| Tr l. ( Therefore λ = |G|(Tr l) dim V . Since L = λI , (Lv2 , v1 ) =

|G| Tr l (v1 , v2 ). dim V

Again we make the particular choice of l as l(w2 ) = (w2 , v2 )v1 . Since Tr l = (v1 , v2 ), we obtain (v1 , v2 )(v1 , v2 ) Tr l = (v , v ) = |G|−1 (Lv2 , v1 ) dim V dim V 1 2 = |G|−1 (R(x)l R(x −1 )v2 , v1 ) x∈G

= |G|

−1

= |G|

−1

x∈G

x∈G

R(x)(R(x −1 )v2 , v2 )v1 , v1

(R(x)v1 , v1 )(R(x −1 )v2 , v2 ),

and (b) results since (R(x −1 )v2 , v2 ) = (R(x)v2 , v2 ).

Let us interpret Proposition 7.22 as a statement about the left and right regular representations and r of G on the inner-product space C(G, C), the inner product being f, f = g∈G f (g) f (g). Let R be an irreducible representation of G on the ﬁnite-dimensional vector space V , and introduce an inner product to make it unitary. A member of C(G, C) of the form g → (R(g)v, v ) is called a matrix coefﬁcient of R. Let v1 , . . . , vn be an orthonormal basis of V . The matrix representation of G that corresponds to R and this choice of orthonormal basis has ρ(g)i j = (R(g)v j , vi ), and hence the entries of [ρ(g)i j ], as functions on G, provide examples of matrix coefﬁcients. These particular matrix coefﬁcients are orthogonal, according to Proposition 7.22b, with x∈G

|ρ(x)i j |2 =

x∈G

(R(g)v j , vi )(R(g)v j , vi ) =

|G|(v j , v j )(vi , vi ) |G| = . dim V dim V

6 Thus the functions |G|−1 dim V ρ(x)i j form an orthonormal basis of an n 2 -dimensional subspace VR of C(G, C), where n = dim V . The vector subspace VR has the following properties: (i) All matrix coefﬁcients of R are in VR , as is seen by expanding v = j c j v j and v = i di vi and obtaining (R(g)v, v ) = i, j c j d¯i (R(g)v j , vi ) = ¯ i, j c j di ρ(g)i j .

VII. Advanced Group Theory

334

(ii) VR is invariant under and r because

(g)(R( · )v, v )(x) = (R(g −1 x)v, v ) = (R(x)v, R(g)v ), r (g)(R( · )v, v )(x) = (R(xg)v, v ) = (R(x)R(g)v, v ). (iii) Any representation R equivalent to R has VR = VR . Let us see how VR decomposes into irreducible subspaces under r . The computation with r in (ii) above shows, for each i, that the vector space of all functions x → (R(x)v, vi ) for v ∈ V is invariant under r . This is the linear span of the matrix coefﬁcients obtained from the i th row of [ρ(x)i j ]. Deﬁne a linear map A from V into this vector space by Av = (R( · )v, vi ). It is evident that A is one-one onto, and moreover A R(g)v = (R( · )R(g)v, vi ) = r (g)(R( · )v, vi ) = r (g)Av. Thus A exhibits this space, with r as representation, as equivalent to (R, V ). The space VR is the direct sum of these spaces on i, and the summands are orthogonal, according to Proposition 7.22b. Thus VR decomposes under r as the direct sum of dim V irreducible subspaces, each one equivalent to (R, V ). One can make a similar analysis with , using columns in place of rows. However, this analysis is a little more subtle since VR , acted upon by , is the direct sum of dim V copies of the “contragredient” of (R, V ), rather than (R, V ) itself. The details are left to Problems 32–36 at the end of the chapter. As R varies over inequivalent representations, these vector spaces VR are orthogonal, according to Proposition 7.22a. The claim is that their direct sum is the space C(G, C) of all functions on G. In fact, the sum is invariant under r , and if it is nonzero, then we can ﬁnd a nonzero vector subspace U = { f ( · )} of C(G, C) orthogonal to all the spaces VR such that U is invariant and irreducible under r . Let u 1 , . . . , u m be an orthonormal basis of U . Then each function x → (r (x)u j , u i ) is orthogonal to U by construction, i.e., 0= (r (x)u j , u i ) f (x) for all f in U . x∈G

Applying the Riesz Representation Theorem (Theorem 3.12), choose a member e of U such that f (1) = ( f, e) for all f in U . By deﬁnition of r (x) and e, we ﬁnd that u(x) = (r (x)u)(1) = (r (x)u, e) for all u in U . Substitution and use once more of Proposition 7.22b gives 0=

x∈G

(r (x)u j , u i )(r (x)u, e) =

|G|(u j , u)(u i , e) dim U

for all i and j. Since we can take u = u j = u 1 and since i is arbitrary, this equation forces e = 0 and gives a contradiction. We conclude that the sum of all the spaces VR is all of C(G, C). Let us state the result as a theorem.

4. Group Representations

335

Theorem 7.23. For the ﬁnite group G, let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnite-dimensional representations of G, and let VRα be the linear span of the matrix coefﬁcients of Rα . Then (a) the spaces VRα are mutually orthogonal and are invariant under the left and right regular representations and r , (b) the representation (r, VRα ) is equivalent to the direct sum of dim Uα copies of (Rα , Uα ), (c) the direct sum of the spaces VRα is the space C(G, C) of all complexvalued functions on G. Moreover, (d) the number of Rα ’s is ﬁnite, (e) dim VRα = (dim Uα )2 , (f) any irreducible subspace of (r, C(G, C)) that is equivalent to (Rα , Uα ) is contained in VRα . Corollary 7.24. Let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnite-dimensional representations of the ﬁnite group G, and let dα = dim Uα . In Uα , introduce an inner product making (Rα , Uα ) unitary. For each α, let each (α) u 1 , . . . , u (α) be an orthonormal basis of Uα . Then the functions in C(G, C) 6 dα −1 given by |G| dα Rα (x)v j(α) , vi(α) form an orthonormal basis of C(G, C). Consequently every f in C(G, C) satisﬁes 1 Rα (x)v j(α) , vi(α) dα f (y) Rα (y)v j(α) , vi(α) f (x) = |G| α i, j y∈G and

x∈G

| f (x)|2 =

2 1 dα f (y) Rα (y)v j(α) , vi(α) . |G| α i, j y∈G

REMARKS. The ﬁrst displayed formula is the Fourier inversion formula for an arbitrary ﬁnite group G and generalizes Theorem 7.17, which gives the result in the abelian(α)case;(α)in the abelian case all the dimensions dα equal 1, and the functions Rα (x)v j , vi are just the multiplicative characters of G. The second displayed formula is known as the Plancherel formula, a result incorporating the conclusion about norms in Parseval’s equality (Theorem 3.11d). PROOF. This follows form (a), (c), and (e) in Theorem 7.23, together with Theorem 3.11 and the remarks made before the statement of Theorem 7.23. Corollary 7.25. Let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnite-dimensional representations of the ﬁnite group G, and let dα = dim Uα . Then α dα2 = |G|.

VII. Advanced Group Theory

336

PROOF. This follows by counting the number of members listed in the orthonormal basis of C(G, C) given in Corollary 7.24. We shall make use of a second multiplication on the vector space C(G, C) besides the pointwise multiplication that itself makes C(G, C) into a ring. The new multiplication is called convolution and is deﬁned by f 1 (y) f 2 (y −1 x) = f 1 (x y −1 ) f 2 (y), ( f 1 ∗ f 2 )(x) = y∈G

y∈G

the two expressions on the right being equal by a change of variables. The ﬁrst of the expressions on the right equals the value of the function y∈G f 1 (y) (y) f 2 at x and shows that the convolution is an average of the left translates of f 2 weighted by f 1 . Convolution is associative because f 1 (y)( f 2 ∗ f 3 )(y −1 x) = f 1 (y) f 2 (y −1 x z −1 ) f 3 (z) ( f 1 ∗ ( f 2 ∗ f 3 ))(x) = y

=

y,z

( f 1 ∗ f 2 )(x z

−1

) f 3 (z) = (( f 1 ∗ f 2 ) ∗ f 3 )(x),

z

and one readily checks that C(G, C) becomes a ring when convolution is used as the multiplication. For any ﬁnite-dimensional representation (R, V ) and any v in V , let us deﬁne R( f )v = x∈G f (x)R(x)v. Convolution has the property that R( f 1 ∗ f 2 ) = R( f 1 )R( f 2 ) because R( f 1 ∗ f 2 )v = = =

x,y x

f 1 (x y −1 ) f 2 (y)R(x)v f 1 (x) f 2 (y)R(x y)v = x f 1 (x)R(x) y f 2 (y)R(y)v

x ( f1

∗ f 2 )(x)R(x)v =

x,y

f 1 (x)R(x)R( f 2 )v = R( f 1 )R( f 2 )v.

We shall combine the notion of convolution with the notion of a “character.” If (R, V ) is a ﬁnite-dimensional representation of G, then the character of (R, V ) is the function χ R given by χ R (x) = Tr R(x), with Tr denoting the trace. Equivalent representations have the same character since Tr(A R(x)A−1 ) = Tr R(x) if A is invertible. Characters have the additional properties that (i) χ R (gxg −1 ) = χ R (x) because Tr R(gxg −1 ) = Tr(R(g)R(x)R(g)−1 ) = Tr R(x), (ii) χ R1 ⊕···⊕Rn = χ R1 + · · · + χ Rn since the trace of a block-diagonal matrix is the sum of the traces of the blocks.

4. Group Representations

337

The character of a 1-dimensional representation is the associated multiplicative character. Here is an example of a character for a representation on a space of dimension more than 1; its values are not all in S 1 . EXAMPLE. The dihedral group Dn with 2n elements, deﬁned in Section IV.1, is isomorphic to the matrix group generated by cos 2π/n − sin 2π/n 1 0 and y = 0 −1 . x = sin 2π/n cos 2π/n The map carrying each matrix of the group to itself is a representation of Dn on C2 . The value of the character of this representation is 2 cos 2π k/n on x k for 0 ≤ k ≤ n − 1, and the value of the character is 0 on y and on the remaining n − 1 elements of the group. Computations with characters are sometimes aided by the use of inner products. If an inner product is imposed on a ﬁnite-dimensional complex vector space V an orthonormal basis, then the trace of a linear A : V → V is given and if {vi } is by Tr A = i (Avi , vi ). If R is a representation on V , we consequently have χ R (x) = i (R(x)vi , vi ). Proposition 7.26. Let R, R1 , and R2 be irreducible ﬁnite-dimensional representations of a ﬁnite group G. Then their characters satisfy |χ (x)|2 = |G|, (a) x∈G R (b) x∈G χ R1 (x)χ R2 (x) = 0 if R1 and R2 are inequivalent. PROOF. These follow from Schur orthogonality (Proposition 7.22): For (a), let R act on the vector space V , let d = dim V , introduce an inner product with respect to which R is unitary, and let {vi } be an orthonormal basis of V . Then Proposition 7.22b gives x

|χ R (x)|2 = = =

x

i, j

i, j

i

(R(x)vi , vi )

j

(R(x)v j , v j )

x (R(x)vi , vi )(R(x)v j , v j )

|G|d −1 δi j δi j =

i

|G|d −1 = |G|.

Part (b) is proved in the same fashion, using Proposition 7.22a.

Let us now bring together the notions of convolution and character. A class function on G is a function f in C(G, C) with f (gxg −1 ) = f (x) for all g and x in G. That is, class functions are the ones that are constant on each conjugacy class of the group. Every character is an example of a class function. The class

VII. Advanced Group Theory

338

functions form a vector subspace of C(G, C), and the dimension of this vector subspace equals the number of conjugacy classes in G. Class functions are closed under convolution because if f 1 and f 2 are class functions, then ( f 1 ∗ f 2 )(gxg −1 ) = y f 1 (gxg −1 y −1 ) f 2 (y) = y f 1 (xg −1 y −1 g) f 2 (g −1 yg) = z f 1 (x z −1 ) f 2 (z) = ( f 1 ∗ f 2 )(x). On an abelian group every member of C(G, C) is a class function. Theorem 7.27 (Fourier inversion formula for class functions). For the ﬁnite group G, let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnitedimensional representations of G. If f is a class function on G, then 1 f (y)χ Rα (y) χ Rα (x). f (x) = |G| α y∈G REMARK. This result may be regarded as a second way (besides the one in Corollary 7.24) of generalizing Theorem 7.17 to the nonabelian case. PROOF. Using the result and notation of Corollary 7.24, we have f (x) = |G|−1 dα f (y)(Rα (y)vi(α) , v j(α) ) (Rα (x)vi(α) , v j(α) ). α

i, j

y∈G

Replace f (y) by f (gyg −1 ) since f is a class function, and then change variables and sum over g in G to see that |G| f (x) is equal to dα f (y)(Rα (y)Rα (g)vi(α) , Rα (g)v j(α) ) (Rα (x)vi(α) , v j(α) ). |G|−1 α

i, j

g,y

Within this expression we have (Rα (y)Rα (g)vi(α) , Rα (g)v j(α) ) g

=

g,k

=

g,k

= = =

|G| dα

Rα (y)(Rα (g)vi(α) , vk(α) )vk(α) , Rα (g)v j(α)

(Rα (g)vi(α) , vk(α) )(Rα (g)v j(α) , Rα (y)vk(α) ) k

(v j(α) , vi(α) )(Rα (y)vk(α) , vk(α) )

by Schur orthogonality

(α) (α) |G| dα (v j , vi )χ Rα (y) |G| dα δi j χ Rα (y).

Substituting, we obtain the formula of the theorem.

4. Group Representations

339

Corollary 7.28. If G is a ﬁnite group, then the number of irreducible ﬁnitedimensional representations of G, up to equivalence, equals the number of conjugacy classes of G. PROOF. Theorem 7.27 shows that the irreducible characters span the vector space of class functions. Proposition 7.26b shows that the irreducible characters are orthogonal and hence are linearly independent. Thus the number of irreducible characters equals the dimension of the space of class functions, which equals the number of conjugacy classes. EXAMPLE. The above information already gives us considerable control over ﬁnding a complete set of inequivalent irreducible ﬁnite-dimensional representations of elementary groups. We know that the number of such representations equals the number of conjugacy classes and that the sum of the squares of their dimensions equals |G|. For the symmetric group S3 of order 6, for example, the conjugacy classes are given by the cycle structures of the possible permutations, namely the cycle structures of (1), (1 2), and (1 2 3). Hence there are three inequivalent irreducible representations. The sum of the squares of the three dimensions is to be 6; thus we have two of dimension 1 and one of dimension 2. The multiplicative characters 1 and sgn are the two of dimension 1, and the one of dimension 2 can be taken to be the 2-dimensional representation of D3 whose character was computed in the example preceding Proposition 7.26. One ﬁnal constraint on the dimensions of the irreducible representations of a ﬁnite group G is as follows. Proposition 7.29. If G is a ﬁnite group and (R, V ) is an irreducible ﬁnitedimensional representation of G, then dim V divides |G|. For example, if |G| = p 2 with p prime, then it follows from Propositions 7.29 and 7.25 that every irreducible ﬁnite-dimensional representation of G has dimension 1, and one can easily conclude from this fact that G is abelian. (See Problem 14 at the end of the chapter.) Thus we recover as an immediate consequence the conclusion of Corollary 4.39 that groups of order p 2 are abelian. The proof of Proposition 7.29 is surprisingly subtle. We shall obtain the theorem as a consequence of Theorem 7.31 below, a theorem that will be used also in the proof of Burnside’s Theorem in the next section. Theorem 7.31 gives a little taste of the usefulness of algebraic number theory, and we shall see more of this usefulness in Chapter IX. The application to Burnside’s Theorem will use the Fundamental Theorem of Galois Theory, whose proof is deferred to Chapter IX. An algebraic integer is any complex number √ that is a root√of a monic polynomial with coefﬁcients in Z. For example, 2 and 12 (1 + i 3) are algebraic

VII. Advanced Group Theory

340

integers because they are roots of X 2 − 2 and X 2 − X + 1, respectively. Any root of unity is an algebraic integer, being a root of some polynomial X n − 1. The set of algebraic integers will be denoted in this chapter by O. Before stating Theorem 7.31, let us establish two elementary facts about O. Lemma 7.30. The set O of algebraic integers is a ring, and O ∩ Q = Z. PROOF. Suppose that x and y are complex numbers satisfying the polynomial equations x m +am−1 x m−1 +· · ·+a1 x+a0 = 0 and y n +bn−1 y n−1 +· · ·+b1 y+b0 = 0, each with integer coefﬁcients. Form the subset of C given by M=

m−1 n−1

Zx k y l .

k=0 l=0

This is a ﬁnitely generated subgroup of the abelian group C under addition. It satisﬁes m n−1 n−1 Zx k y l ⊆ M + Zy l x m xM = k=1 l=0

=M+

n−1

l=0

Zy l (−am−1 x m−1 − · · · − a1 x − a0 ) ⊆ M,

l=0

and similarly y M ⊆ M. Hence (x ± y)M ⊆ M and x y ⊆ M. To prove that O is a ring, it is enough to show that if N is a nonzero ﬁnitely generated subgroup of the abelian group C under addition and if z is a complex number with z N ⊆ N , then z is an algebraic integer. By Theorem 4.56, N is a direct sum of cyclic groups. Since every nonzero member of C has inﬁnite order additively, these cyclic groups must be copies of Z. So N is free abelian. Let z 1 , . . . , z n be a Z basis of N . Here n > 0. Since z N ⊆ N , we can ﬁnd unique integers ci j such that n zz i = ci j z j for 1 ≤ i ≤ n. j=1

z1 . This equation says that the matrix C = [ci j ] has .. as an eigenvector with zn

eigenvalue z. Therefore the matrix z I − C is singular, and det(z I − C) = 0. Since det(z I −C) is a monic polynomial expression in z with integer coefﬁcients, z is an algebraic integer. To see that O∩Q = Z, let p and q be relatively prime integers with q > 0, and suppose that p/q is a root of X n + an−1 X n−1 + · · · + a1 X + a0 with an−1 , . . . , a0 in Z. Substituting p/q for X , setting the expression equal to 0, and clearing fractions, we obtain p n + an−1 p n−1 q + · · · + a1 pq n−1 + a0 q n = 0. Since q divides every term here after the ﬁrst, we conclude that q divides p n . Since GCD( p, q) = 1, we conclude that q = 1. Thus p/q is in Z.

4. Group Representations

341

Lemma 7.30 allows us to see that if G is a ﬁnite group and χ is the irreducible character corresponding to an irreducible ﬁnite-dimensional representation R, then χ (x) is an algebraic integer for each x in G. In fact, the subgroup H of G generated by x is cyclic and is in particular abelian. Corollary 7.21 says that R H is the direct sum of irreducible representations of H , and Corollary 7.19 says that each such irreducible representation is 1-dimensional. Thus in a suitable basis, R H is diagonal. The diagonal entries must be roots of unity (in fact, N th roots of unity if x has order N ), and χ (x) is thus a sum of roots of unity. By Lemma 7.30, χ (x) is an algebraic integer. Theorem 7.31. Let G be a ﬁnite group, (R, V ) be an irreducible ﬁnitedimensional representation of G, χ be the character of R, and C be a conjugacy class in G. Denote by χ (C) the constant value of χ on the conjugacy class C. ( Then |C|χ (C) dim V is an algebraic integer. on G, then R( f ) commutes with each R(x) PROOF. If f is any class function for x in G because R( f ) = y f (y)R(y) yields R(x)R( f )R(x)−1 =

f (y)R(x)R(y)R(x)−1 =

y

=

f (x −1 zx)R(z) =

z

f (y)R(x yx −1 )

y

f (z)R(z) = R( f ).

z

By Schur’s Lemma (Proposition 7.18), R( f ) is scalar. If C is a conjugacy class, then the function IC that is 1 on C and is 0 elsewhere is a class function, and hence basis of R(IC ) is a scalar λC . As C varies, the functions IC form a vector-space the space of class functions. The formula (IC ∗ IC )(x) = y IC (y)IC (y −1 x) shows that IC ∗ IC is integer-valued, and we have seen that the convolution of = C n CC C IC for two class functions is a class function. Therefore IC ∗ IC suitable integers n CC C . Application of R gives λC λC = C n CC C λC . If we ﬁx C and let A be the square matrix with entries AC C = n CC C , we obtain λC λC =

AC C λC .

C

This equation says that the matrix A has the column vector with entries λC as an eigenvector with eigenvalue λC . Therefore the matrix λC I − A is singular, and det(λC I − A) = 0. Since det(λC I − A) is a monic polynomial expression is an algebraic integer. Taking the trace of the in λC with integer coefﬁcients, λC equation R(IC ) = λC I , we obtain x∈C χ (x) = λC dim V . Since χ (x) = χ (C) for x in C, the result is that |C|χ (C)/ dim V = λC . Since λC is an algebraic integer, |C|χ (C)/ dim V is an algebraic integer.

VII. Advanced Group Theory

342

PROOF gives

THAT

|G| = dim V

THEOREM 7.31

|χ (x)|2 = dim V

x∈G

IMPLIES

PROPOSITION 7.29. Proposition 7.26a

C

|χ (x)|2 |C|χ (C) = χ (C). dim V dim V C x∈C

Each term in parentheses on the right side is an algebraic integer, according to Theorem 7.31, and therefore Lemma 7.30 shows that |G|/ dim V is an algebraic integer. Since |G|/ dim V is in Q, Lemma 7.30 shows that |G|/ dim V is in Z. 5. Burnside’s Theorem The theorem of this section is as follows. Theorem 7.32 (Burnside’s Theorem). If G is a ﬁnite group of order pa q b with p and q prime and with a + b > 1, then G has a nontrivial normal subgroup. The argument will use the result Theorem 7.31 from algebraic number theory, and also it will make use of a special case of the Fundamental Theorem of Galois Theory, whose proof is deferred to Chapter IX. That special case is the following statement, whose context was anticipated in Section IV.1, where groups of automorphisms of certain ﬁelds were discussed brieﬂy. Since the set {1, e2πi/n , e2·2πi/n , e3·2πi/n , . . . } is linearly dependent over Q, Proposition 4.1 in that section implies that the subring Q[e2πi/n ] of C generated by Q and e2πi/n is a subﬁeld and is a ﬁnite-dimensional vector space over Q. According to Example 9 of that section, the group = Gal(Q[e2πi/n ]/Q) of automorphisms of Q[e2πi/n ] ﬁxing every element of Q is a ﬁnite group. Proposition 7.33 (special case of the Fundamental Theorem of Galois Theory). Let n > 0 be an integer, and put K = Q[e2πi/n ]. Let be the ﬁnite group of ﬁeld automorphisms of K ﬁxing every element of Q. Then the only members β of K such that σ (β) = β for every σ in are the members of Q. Lemma 7.34. Let G be a ﬁnite group, (R, V ) be an irreducible ﬁnitedimensional representation of G, χ be the character of R, and C be a conjugacy class in G. If GCD(|C|, dim V ) = 1 and if x is in C, then either R(x) is scalar or χ (x) = 0. PROOF. Deﬁne χ (C) to be the constant value of χ on C, and put α = χ (x)/ dim V = χ (C)/ dim V . Since GCD(|C|, dim V ) = 1, we can choose integers m and n with m|C| + n dim V = 1. Multiplication by α yields m|C|χ (C) + nχ (C) = α. dim V

5. Burnside's Theorem

343

(C) Theorem 7.31 shows that the coefﬁcients |C|χ dim V and χ (C) of m and n on the left side are algebraic integers, and therefore α is an algebraic integer. As we observed toward the end of the previous section, χ (x) = χ (C) is the sum of dim V roots of unity. Since α = χ (C)/ dim V , we see that |α| ≤ 1 with equality only if all the roots of unity are equal, in which case R(x) is scalar. In view of the hypothesis, we may assume that |α| < 1. We shall show that α = 0. Let K = Q[e2πi/|G| ] be the smallest subﬁeld of C containing Q and the complex number e2πi/|G| , and let be the group of ﬁeld automorphisms of K that ﬁx every element of Q. We know that K is ﬁnite-dimensional over Q and that is a ﬁnite group, and Proposition 7.33 shows that the only members of K ﬁxed by every element of are the members of Q. Our element x of G has x |G| = 1. Thus every root of unity contributing to χ (x) is a |G|th root of unity and is in K . Therefore the algebraic integer α is in K . If σ is in , each of the |G|th roots of unity is mapped by σ to some complex number x satisfying x |G| = 1, and hence the member σ (α) of K satisﬁes |σ (α)| ≤ 1. Also, σ (α) is an algebraic integer, as we see by applying σ to the monic equation with integer coefﬁcients satisﬁed by α, and we are assuming that |α| < 1. Consequently β = σ ∈ σ (α) is an algebraic integer and has absolute value < 1. A change of variables in the product shows that β is ﬁxed by every member of , and we see from the previous paragraph that β is in Q. By Lemma 7.30, β is in Z. Being of absolute value less than 1, it is 0. Thus α = 0, and χ (x) = 0.

Lemma 7.35. Let G be a ﬁnite group, and let C be a conjugacy class in G such that |C| = p k for some prime p and some integer k > 0. Then there exists an irreducible ﬁnite-dimensional representation R = 1 of G with R(x) scalar for every x in C. Consequently G is not simple. PROOF. The conjugacy class C cannot be {1} because |{1}| = p k with k > 0. Let χreg be the character of the right regular representation r of G on C(G, C). If Ig denotes the function that is 1 at g and is 0 elsewhere, then the functions Ig form an orthonormal basis of C(G, C), and therefore χreg (x) = g∈G (r (x)Ig , Ig ) = g∈G (I gx −1 , I g ). Every term on the right side is 0 if x = 1, and thus Theorem 7.23 gives dχ χ (x) for x ∈ C, (∗) 0 = χreg (x) = 1 + χ =1

the sum being taken over all irreducible characters other than 1, with dχ being the dimension of an irreducible representation corresponding to χ. Let Rχ be an irreducible representation with character χ. Any χ such that p does not divide dχ has GCD(|C|, dχ ) = 1 since |C| is assumed to be a power of p. Arguing by

344

VII. Advanced Group Theory

contradiction, we may assume that no such χ has Rχ (x) scalar, and then Lemma 7.34 says that χ (x) = 0 for all such χ. Hence (∗) simpliﬁes to dχ χ (x) for x ∈ C. (∗∗) 0=1+ χ =1, p divides dχ

Since χ (x) is an algebraic integer, Lemma 7.30 shows that this equation is of the form 1 + pβ = 0, where β is an algebraic integer. Then β = −1/ p shows that −1/ p is an algebraic integer. Since −1/ p is in Q, Lemma 7.30 shows that it must be in Z, and we have arrived at a contradiction. Thus there must have been some χ with Rχ (x) scalar for x in C. The set of g in G for which this Rχ has Rχ (g) scalar is a normal subgroup of G that contains x and cannot therefore be {1}. Assume by way of contradiction that G is simple. Then Rχ (g) is scalar for all g in G. Since Rχ is irreducible, Rχ is 1-dimensional. Then the commutator subgroup G of G is contained in the kernel of Rχ . Since Rχ = 1, G is not all of G. Since G is normal, G = {1}, and we conclude that G is abelian. But the given G has a conjugacy class with more than one element, and we have arrived at a contradiction. PROOF OF THEOREM 7.32. Corollary 4.38 shows that a group of prime-power order has a center different from {1}, and we may therefore assume that p = q, a > 0, and b > 0. Let H be a Sylow q-subgroup. Applying Corollary 4.38, let x be a member of the center Z H of H other than 1. The centralizer Z G ({x}) is a subgroup containing H , and it therefore has order pa q b . If a = a, then x is in the center of G, and the powers of x form the desired proper normal subgroup of G. Thus a < a. By Proposition 4.37 the conjugacy class C of x has |G|/ pa q b = pa−a elements with a − a > 0. By Lemma 7.35, G is not simple. 6. Extensions of Groups In Section IV.8 we examined composition series for ﬁnite groups. For a given ﬁnite group, a composition series consists of a decreasing sequence of subgroups starting with the whole group and ending with {1}, each normal in the next larger one, such that the successive quotient groups are simple. The Jordan–H¨older Theorem (Corollary 4.50) assured us that the set of successive quotients, up to isomorphism, is independent of the choice of composition series. This theorem raises the question of reconstructing the whole group from data of this kind. Consider a single step of the process. If we know the normal subgroup and the simple quotient that it yields at a certain stage, what are the possibilities for the next-larger subgroup? We study this question and some of its ramiﬁcations in this section, dropping any hypotheses that are not helpful in the analysis. Here is an example that we shall carry along.

6. Extensions of Groups

345

EXAMPLE. Suppose that the normal subgroup is the cyclic group C4 and that the quotient is the cyclic group C2 . The whole group has to be of order 8, and the classiﬁcation of groups of order 8 done in Problems 39–44 at the end of Chapter IV tells us that there are four different possibilities for the whole group: the abelian groups C4 × C2 and C8 , the dihedral group D4 , and the quaternion group H8 . Let us establish a framework for the general problem. We start with a group E, a normal subgroup N , and the quotient G = E/N . We seek data that determine the group law in E in terms of N and G. For each member u of G, ﬁx a coset representative u¯ in E such that u¯ N = u. Since N is normal, the element u¯ of E ¯ u¯ −1 . In addition, the fact yields an automorphism ( · )u of N deﬁned by x u = ux that G is a group says that any two of our representatives u¯ and v¯ have u¯ v¯ = a(u, v)uv

for some unique a(u, v) in N .

The set of all elements a(u, v) for this choice of coset representatives is called a factor set, and E is called a group extension of N by the group3 G. The automorphisms and the factor set constructed above have to satisfy two compatibility conditions, as follows: ¯ v )u¯ −1 = u¯ vx ¯ v¯ −1 u¯ −1 (i) (x v )u = a(u, v)x uv a(u, v)−1 because (x u )v = u(x −1 uv −1 = (a(u, v)uv)x(a(u, v)uv) = a(u, v)x a(u, v) , ¯ w¯ = a(u, v)uv w¯ (ii) a(v, w)u a(u, vw) = a(u, v)a(uv, w) because (u¯ v) ¯ v¯ w) ¯ = ua(v, ¯ w)vw = a(v, w)u uvw ¯ = = a(u, v)a(uv, w)uvw and u( a(v, w)u a(u, vw)uvw. Then the multiplication law in E is given in terms of the automorphisms and the factor set by the formula ¯ v) ¯ = x y u u¯ v¯ = (iii) (x u)(y ¯ v) ¯ = x y u a(u, v)uv by the computation (x u)(y u x y a(u, v)uv. Conversely, according to the proposition below, such data determine a group E with a normal subgroup isomorphic to N and a quotient E/N isomorphic to G. Proposition 7.36 (Schreier). Let two groups N and G be given, along with a family of automorphisms x → x u of N parametrized by u in G, as well as a function a : G × G → N such that (a) (x v )u = a(u, v)x uv a(u, v)−1 for all u and v in G, (b) a(v, w)u a(u, vw) = a(u, v)a(uv, w) for all u, v, w in G. Then the set N × G becomes a group E under the multiplication (c) (x, u)(y, v) = (x y u a(u, v), uv), 3 Warning:

Some authors say “group extension of G by N .”

VII. Advanced Group Theory

346

and this group has a normal subgroup isomorphic to N with quotient group isomorphic to G. More particularly, the identity of E is (a(1, 1)−1 , 1), the map x → (xa(1, 1)−1 , 1) of N into E is a one-one homomorphism that exhibits N as a normal subgroup of E, and the map (x, u) → u of E onto G is a homomorphism that exhibits G as isomorphic to E/N . PROOF. Reverting to the earlier notation, let us write x u¯ in place of (x, u) for elements of E. Associativity of multiplication follows from the computation (x u¯ y v)(z ¯ w) ¯ = x y u a(u, v)uv z w¯

by (c)

= x y a(u, v)z a(uv, w)uvw u

uv

by (c)

−1

= x y a(u, v)z a(u, v) a(u, v)a(uv, w)uvw u

uv

= x y u a(u, v)z uv a(u, v)−1 a(v, w)u a(u, vw)uvw u = x yz v a(v, w) a(u, vw)uvw = (x u) ¯ yz v a(v, w)vw

by (c)

= (x u)(y ¯ vz ¯ w) ¯

by (c).

by (b) by (a)

¯ The identity is to be 1a(1, 1)−1 . Before checking this assertion, we prove three preliminary identities. Setting u = v = 1 in (a) and replacing x 1 by x gives4 x 1 = a(1, 1)xa(1, 1)−1

for all x ∈ N .

(∗)

Setting v = w = 1 in (b) gives a(1, 1)u a(u, 1) = a(u, 1)a(u, 1) and hence a(1, 1)u = a(u, 1)

for all u ∈ G.

(†)

Meanwhile, setting u = v = 1 in (b) gives a(1, w)1 a(1, w) = a(1, 1)a(1, w) and hence a(1, w)1 = a(1, 1) for all w ∈ G. The left side a(1, w)1 of this last equality is equal to a(1, 1)a(1, w)a(1, 1)−1 by (∗); canceling a(1, 1) yields a(1, w) = a(1, 1)

for all w ∈ G.

(††)

Using these identities, we check that a(1, 1)−1 1¯ is a two-sided identity by making the computations ¯ = x(a(1, 1)−1 )u a(u, 1)u¯ (x u)(a(1, ¯ 1)−1 1) −1 u

= x(a(1, 1) ) a(1, 1) u¯ u

by (c) by (†)

= x u¯ 4 The effect of the automorphism x → x 1 is not necessarily trivial since the coset representative 1¯ of 1 is not assumed to be the identity. Thus we must distinguish between x 1 and x.

6. Extensions of Groups

347

and ¯ v) (a(1, 1)−1 1)(y ¯ = a(1, 1)−1 y 1 a(1, v)v¯ −1

by (c)

= ya(1, 1) a(1, v)v¯

by (∗)

= y v¯

by (††). −1

Let us check that a left inverse for x u¯ is a(1, 1)−1 a(u −1 , u)−1 (x u )−1 u −1 . In fact, −1 ¯ a(1, 1)−1 a(u −1 , u)−1 (x u )−1 u −1 (x u) −1 −1 = a(1, 1)−1 a(u −1 , u)−1 (x u )−1 x u a(u −1 , u)1¯ ¯ = a(1, 1)−1 1,

by (c)

as required. Thus multiplication is associative, there is a two-sided identity, and every element has a left inverse. It follows that E is a group. The map x u¯ → u of E into G is a homomorphism by (c), and it is certainly onto G. Its kernel is evidently the subgroup of all elements xa(1, 1)−1 1¯ in E. Since

xa(1, 1)−1 1¯ ya(1, 1)−1 1¯ = xa(1, 1)−1 (ya(1, 1)−1 )1 a(1, 1)1¯ = xa(1, 1)−1 a(1, 1)(ya(1, 1)−1 )1¯

by (c) by (∗)

¯ = x ya(1, 1)−1 1, the one-one map x → xa(1, 1)−1 1¯ of N onto the kernel respects the group structures and is therefore an isomorphism. In other words, the embedded version of N is the kernel. Being a kernel, it is a normal subgroup. EXAMPLE, CONTINUED. Let N = C4 = {1, r, r 2 , r 3 } and G = C2 = {1, u 0 } with u 20 = 1. The group N has two automorphisms, the nontrivial one ﬁxing 1 and r 2 while interchanging r and r 3 . The automorphism of N from 1 ∈ G has to be trivial, while the automorphism of N from u 0 ∈ G can be trivial or nontrivial. In fact, trivial for E = C4 × C2 and E = C8 , the automorphism is nontrivial for E = D4 and E = H8 . In each case the automorphism does not depend on the choice of coset representatives. The factor sets do depend on the choice of representatives, however. Let us ﬁx 1¯ as the identity of E and make a particular choice of u 0 for each E. Then

VII. Advanced Group Theory

348

the deﬁnition of factor set shows that a(1, 1) = a(u 0 , 1) = a(1, u 0 ) = 1, and the only part of the factor set yet to be determined is a(u 0 , u 0 ). Let us consider matters group by group. For C4 × C2 , we can take u 0 to be the generator of the C2 factor; this has square 1, and hence a(u 0 , u 0 ) = 1. For C8 = {1 θ, θ 2 , . . . , θ 7 }, let us think of N as embedded in E with r = θ 2 . The element u 0 can be any odd power of θ; if we take u 0 = θ, then (u 0 )2 = θ 2 = r , and hence a(u 0 , u 0 ) = r . For E = D4 , the example following Proposition 7.8 shows that we may view the elements as the rotations 1, r, r 2 , r 3 and the reﬂections s, r s, r 2 s, r 3 s for particular choices of r and s. We can take u 0 to be any of the reﬂections, and then (u 0 )2 = 1 and a(u 0 , u 0 ) = 1. Finally for E = H8 = {±1, ±i, ±j, ±k}, let us say that N is embedded as {±1, ±i}. Then u 0 can be any of the four elements ±j and ±k. Each of these has square −1, and hence a(u 0 , u 0 ) = −1. For the choices we have made, we therefore have ⎧ for E = C4 × C2 and E = D4 , ⎨1 a(u 0 , u 0 ) = r for E = C8 , ⎩ −1 for E = H8 . The formula of Proposition 7.36a reduces to (x v )u = x uv since N is abelian, and it is certainly satisﬁed. The formula for Proposition 7.36b is a(v, w)u a(u, vw) = a(u, v)a(uv, w). This is satisﬁed for E = C4 × C2 and E = D4 since a( · , · ) is identically 1. For the other two cases the values of a( · , · ) lie in the 2-element subgroup of N that is ﬁxed by the nontrivial automorphism, and hence a(v, w)u = a(v, w) in every case. The formula to be checked reduces to a(v, w)a(1, 1) = a(1, 1)a(v, w) by (††) if u = 1, to a(1, 1)a(u, w) = a(1, 1)a(u, w) by (†) and (††) if v = 1, and to a(1, 1)a(u, v) = a(u, v)a(1, 1) by (†) if w = 1. Thus all that needs checking is the case that u = v = w = u 0 , and then the formula in question reduces to a(u 0 , u 0 )a(1, 1) = a(u 0 , u 0 )a(1, 1) by (†) and (††). Let us examine for a particular extension the dependence of the automorphisms and factor set on the choice of coset representatives. Returning to our original construction, suppose that we change the coset representatives of the members of G, associating a member 3 u to u ∈ G in place of u. ¯ We then obtain a new ∗ u x3 u −1 automorphism of N corresponding to u, and we write it as x → x u = 3 u −1 instead of x → x = ux ¯ u¯ . To quantify matters, we observe that 3 u lies in the same coset of N as does u. ¯ Thus 3 u = α(u)u¯ for some function α : G → N , and the function α can be absolutely arbitrary. In terms of this function α, the two automorphisms are related by ∗

u x3 u −1 = α(u)ux ¯ u¯ −1 α(u)−1 = α(u)x u α(u)−1 . xu = 3 If the factor set for the system {3 u } of coset representatives is denoted by uv = 3 u3 v = α(u)uα(v) ¯ v¯ = {b(u, v)}, then we have b(u, v)α(uv)uv = b(u, v)7

6. Extensions of Groups

349

α(u)α(v)u a(u, v)uv. Equating coefﬁcients of uv, we obtain b(u, v) = α(u)α(v)u a(u, v)α(uv)−1 . Accordingly we say that a group extension of N by G determined by automorphisms x → x u and a factor set a(u, v) is equivalent, or isomorphic, to a group ∗ extension of N by G determined by automorphisms x → x u and a factor set b(u, v) if there is a function α : G → N such that ∗

x u = α(u)x u α(u)−1

and

b(u, v) = α(u)α(v)u a(u, v)α(uv)−1

for all u and v in G. It is immediate that equivalence of group extensions is an equivalence relation. Proposition 7.37. Suppose that E 1 and E 2 are group extensions of N by G with respective inclusions i 1 : N → E 1 and i 2 : N → E 2 and with respective quotient homomorphisms ϕ1 : E 1 → G and ϕ2 : E 2 → G. If there exists a group isomorphism : E 1 → E 2 such that the two squares in Figure 7.4 commute, then the two group extensions are equivalent. Conversely if the two group extensions are equivalent, then there exists a group isomorphism : E 1 → E 2 such that the two squares in Figure 7.4 commute. i1

ϕ1

i2

ϕ2

N −−−→ E 1 −−−→ 8 ⏐ 8 ⏐ 8

G 8 8 8

N −−−→ E 2 −−−→ G FIGURE 7.4. Equivalent group extensions. REMARKS. The commutativity of the squares is important. Just because two group extensions of N by G are isomorphic as groups does not imply that they are equivalent group extensions. An example is given in Problem 19 at the end of the chapter. PROOF. For the direct part, suppose that exists. For each u in G, select u¯ in ¯ = u. Then we can form the extension data {x → x u } and {a(u, v)} E 1 with ϕ1 (u) for E 1 relative to the normal subgroup i 1 (N ) and the system {u¯ | u ∈ G} of coset representatives. When reinterpreted in terms of N , E 1 , and G, these data become {i 1−1 (x) → i 1−1 (x u )} and {i 1−1 (a(u, v))}. ¯ since i 1 = i 2 , and Application of to the coset i 1 (N )u¯ yields i 2 (N )(u) ¯ = ϕ1 (u) ¯ = u. Setting 3 u = (u), ¯ we (u) ¯ is a member of E 2 with ϕ2 ((u)) ¯ is the coset i 2 (N )3 u of i 2 (N ) in E 2 . Thus we can determine see that (i 1 (N )u)

VII. Advanced Group Theory

350

extension data for E 2 relative to i 2 (N ) and the system {3 u | u ∈ G}, and we can −1 transform them by i 2 to obtain data relative to N , E 2 , and G. The claim is that the data relative to N , E 2 , and G match those for N , E 1 , and ∗ G. The automorphisms of N from E 2 are the maps i 2−1 (x ) → i 2−1 (x u ), where ∗ x u = 3 u −1 . From i 2 = i 1 and the fact that each of these maps is one-one, u x 3 we obtain i 2−1 = i 1−1 −1 on i 2 (N ). Substitution shows that the automorphisms of N from E 2 are ∗

i 1−1 (−1 (x )) → i 1−1 (−1 (x u )) = i 1−1 (−1 (3 u −1 )) u x 3 ¯ −1 (x )u¯ −1 ) = i 1−1 ((−1 (x ))u ). = i 1−1 (u If we set x = (x) with x in i 1 (N ), then the automorphisms of N from E 2 take the form i 1−1 (x) → i 1−1 (x u ). Thus they match the automorphisms of N from E 1 . In the case of the factor sets, we have u¯ v¯ = a(u, v)uv. Application of gives 3 u3 v = (a(u, v))7 u v. Thus the factor set for E 2 relative to N is {i 2−1 (a(u, v))}. −1 −1 Since i 2 = i 1 , this matches the factor set for E 1 relative to N . We turn to the converse part. Suppose that the multiplication law in E 1 is ¯ 1 (y)v) ¯ = i 1 (x)i 1 (y)u i 1 (a(u, v))uv for x and y in N , and that the (i 1 (x)u)(i ∗ u )(i 2 (y)3 v ) = i 2 (x)i 2 (y)u i 2 (b(u, v))7 u v. Here u¯ multiplication law in E 2 is (i 2 (x)3 u and3 v are preimages of u and v under and v¯ are preimages of u and v under ϕ1 , and3 ∗ ∗ ϕ2 . Deﬁne automorphisms of N by x u = i 1−1 (i 1 (x)u ) and x u = i 2−1 (i 2 (x)u ). We can then rewrite the multiplication laws as ¯ 1 (y)v) ¯ = i 1 (x y u a(u, v))uv (i 1 (x)u)(i ∗

(i 2 (x)3 u )(i 2 (y)3 v ) = i 2 (x y u b(u, v))7 u v.

and

The assumption that E 1 is equivalent to E 2 as an extension of N by G means that there exists a function α : G → N such that ∗

x u = α(u)x u α(u)−1

and

b(u, v) = α(u)α(v)u a(u, v)α(uv)−1

for all u and v in G. Deﬁne : E 1 → E 2 by ¯ = i 2 (xα(u)−1 )3 (i 1 (x)u) u. Certainly is one-one onto. It remains to check that is a group homomorphism and that the squares commute in Figure 7.4. To check that : E 1 → E 2 is a group homomorphism, we compare ¯ 1 (y)v) ¯ = (i 1 (x y u a(u, v))uv = i 2 (x y u a(u, v)α(uv)−1 )7 uv (i 1 (x)ui

6. Extensions of Groups

351

with the product (i 1 (x)u)(i ¯ ¯ = i 2 (xα(u)−1 )3 u i 2 (yα(v)−1 )3 v 1 (y)v) ∗

= i 2 (xα(u)−1 (yα(v)−1 )u b(u, v))7 u v. Since ∗

∗

α(u)−1 (yα(v)−1 )u b(u, v) = α(u)−1 (yα(v)−1 )u α(u)α(v)u a(u, v)α(uv)−1 = (yα(v)−1 )u α(v)u a(u, v)α(uv)−1 = y u a(u, v)α(uv)−1 , these expressions are equal, and is a group homomorphism. Thus is a group isomorphism. Now we check the commutativity of the squares. The computation ¯ = ϕ2 (i 2 (xα(u)−1 )3 u ) = u = ϕ1 (i 1 (x)u) ¯ ϕ2 (i 1 (x)u) shows that the right-hand square commutes. For the left-hand square we use the fact recorded in the statement of Proposition 1 is the identity of 7.36 that i 1 (a(1, 1)−1 )1¯ is the identity of E 1 and i 2 (b(1, 1)−1 )3 ¯ = i 2 (xa(1, 1)−1 α(1)−1 )3 1. Since E 2 . Therefore i 1 (x) = (i 1 (xa(1, 1)−1 )1) −13 i 2 (x) = xb(1, 1) 1, the left-hand square commutes if b(1, 1) = α(1)a(1, 1). This formula follows from (∗) in the proof of Proposition 7.36 by the computation b(1, 1) = α(1)α(1)1 a(1, 1)α(1)−1 = α(1)a(1, 1)α(1)α(1)−1 = α(1)a(1, 1), and thus the left-hand square indeed commutes.

For the remainder of this section, let us assume that N is abelian. In this case Proposition 7.36a reduces to the identity (x v )u = x uv for all u and v in G independently of the choice of representatives, just as it does in the example we studied with N = C4 and G = C2 . In the terminology of Section IV.7, G acts on N by automorphisms.5 Suppose we ﬁx such an action τ : G → Aut N by automorphisms and consider all extensions of N by G built from τ . In our example we are thus to consider E equal to C4 × C2 or C8 , which are built with the trivial τ , or else E equal to D4 or H8 , which are built with the nontrivial τ (in which the nontrivial element of G acts by the nontrivial automorphism of N ). Since N is abelian, let us switch to additive notation for N and to ordinary function notation for τ (w), rewriting the formula of Proposition 7.36b as τ (u)a(v, w) + a(u, vw) = a(u, v) + a(uv, w). 5 The formula (x v )u = x uv correctly corresponds to a group action with the group on the left as in Section IV.7.

352

VII. Advanced Group Theory

This condition is preserved under addition of factor sets as long as τ does not change, it is satisﬁed by the 0 factor set, and the negative of a factor set is again a factor set. Therefore the factor sets for this τ form an abelian group. Two factor sets for this τ are equivalent (in the sense of yielding equivalent group extensions) if and only if their difference is equivalent to 0, and a(u, v) is equivalent to 0 if and only if a(u, v) = α(uv) − α(u) − τ (u)α(v) for some function α : G → N . The set of factor sets for this τ that are equivalent to 0 is thus a subgroup,6 and we arrive at the following result. Proposition 7.38. Let G and N be groups with N abelian, and suppose that τ : G → Aut N is a homomorphism. Then the set of equivalence classes of group extensions of N by G corresponding to the action τ : G → Aut N is parametrized by the quotient of the abelian group of factor sets by the subgroup of factor sets equivalent to 0. The extension E corresponding to the 0 factor set is of special interest. In this case the multiplication law for the coset representatives is u¯ v¯ = uv since the member a(u, v) = 0 of N is to be interpreted multiplicatively in this product formula. Consequently the map u → u¯ of G into E is a group homomorphism, necessarily one-one, and we can regard G as a subgroup of E. Proposition 4.44 allows us to conclude that E is the semidirect product G ×τ N . The multiplication law for general elements of E, with multiplicative notation used for N , is (x u)(y ¯ v) ¯ = x(τ (u)y)uv. It is possible also to describe explicitly the extension one obtains from the sum of two factor sets corresponding to the same τ , but we leave this matter to Problems 20–23 at the end of the chapter. The operation on extensions that corresponds to addition of factor sets in this way is called Baer multiplication. What we saw in the previous paragraph says that the group identity under Baer multiplication is the semidirect product. The two conditions, the compatibility condition on a factor set given in Proposition 7.36b and the condition with α in it for equivalence to 0, are of a combinatorial type that occurs in many contexts in mathematics and is captured by the ideas of “homology” and “cohomology.” For the current situation the notion is that of cohomology of groups, and we shall deﬁne it now. The subject of homological 6 One can legitimately ask whether an arbitrary α : G → N leads to a factor set under the deﬁnition a(u, v) = α(uv) − τ (v)α(u) − α(v), and one easily checks that the answer is yes. Alternatively, one can refer to the case n = 2 in the upcoming Proposition 7.39.

6. Extensions of Groups

353

algebra, which is introduced in Advanced Algebra, puts cohomology of groups in a wider context and explains some of its mystery. We ﬁx an abelian group N , a group G, and a group action τ of G on N by automorphisms. It is customary to suppress τ in the notation for the group action, and we shall follow that convention. For integers n ≥ 0, one begins with the abelian group C n (G, N ) of n-cochains of G with coefﬁcients in N . This is deﬁned by N if n = 0, n C (G, N ) = n if n > 0. f : k=1 G → M n In words, C (G, N ) is the set of all functions into M from the n-fold direct product of G with itself. The coboundary map δn : C n (G, N ) → C n+1 (G, N ) is the homomorphism of abelian groups deﬁned by (δ0 f )(g1 ) = g1 f − f and by (δn f )(g1 , . . . , gn+1 ) = g1 ( f (g2 , . . . , gn+1 )) n + (−1)i f (g1 , . . . , gi−1 , gi gi+1 , gi+2 , . . . , gn+1 ) i=1

+ (−1)n+1 f (g1 , . . . , gn ) for n > 0. We postpone to the end of this section the proof of the following result. Proposition 7.39. δn δn−1 = 0 for all n ≥ 1. It follows from Proposition 7.39 that image δn−1 ⊆ ker δn for all n ≥ 1. Thus if we deﬁne abelian groups by Z n (G, N ) = ker δn , 0 for n = 0, n B (G, N ) = for n > 0, image δn−1 n n then B (G, N ) ⊆ Z (G, N ) for all n, and it makes sense to deﬁne the abelian groups for n ≥ 0. H n (G, N ) = Z n (G, N )/B n (G, N ) The elements of Z n (G, N ) are called n-cocycles, the elements of B n (G, N ) are called n-coboundaries, and H n (G, N ) is called the n th cohomology group of G with coefﬁcients in N . EXAMPLES IN LOW DEGREE. DEGREE 0. Here (δ0 f )(u) = u f − f with f in N and u in G. The cocycle condition is that this is 0 for all u. Thus f is to be ﬁxed by G. We say that an f

354

VII. Advanced Group Theory

ﬁxed by G is an invariant of the group action. The space of invariants is denoted by N G . By convention above, we are taking B 0 (G, N ) = 0. Thus H 0 (G, N ) = N G . DEGREE 1. Here (δ1 f )(u, v) = u( f (v)) − f (uv) + f (u) with f a function from G to N . The cocycle condition is that f (uv) = f (u) + u( f (v))

for all u, v ∈ G.

A function f satisfying this condition is called a crossed homomorphism of G into N . A coboundary is a function f : G → N of the form f (u) = (δ0 x)(u) = ux − x for some x ∈ N . Then H 1 (G, N ) is the quotient of the group of crossed homomorphisms by this subgroup. In the special case that the action of G on N is trivial, the crossed homomorphisms reduce to ordinary homomorphisms of G into N , and every coboundary is 0. Thus H 1 (G, N ) is the group of homomorphisms of G into N if G acts trivially on N . DEGREE 2. Here f is a function from G × G into N , and (δ2 f )(u, v, w) = u( f (v, w)) − f (uv, w) + f (u, vw) − f (u, v). The cocycle condition is that u( f (v, w)) + f (u, vw) = f (uv, w) + f (u, v)

for all u, v, w ∈ G.

This is the same as the condition that { f (u, v)} be a factor set for extensions of N by G relative to the given action of G on N by automorphisms. A coboundary is a function f : G × G → N of the form f (u, v) = (δ0 α)(u, v) = u(α(v)) − α(uv) + α(u)

for some α : G → N .

This is the same as the condition that {− f (u, v)} be a factor set equivalent to 0. Thus we can restate Proposition 7.38 as follows. Proposition 7.40. Let G and N be groups with N abelian, and suppose that τ : G → Aut N is a homomorphism. Then the set of equivalence classes of group extensions of N by G corresponding to the action τ : G → Aut N is parametrized by H 2 (G, N ). Since group extensions have such a nice interpretation in terms of cohomology groups H 2 , it is reasonable to look for a nice interpretation for H 1 as well. Indeed, H 1 has an interpretation in terms of uniqueness up to inner isomorphisms for semidirect-product decompositions. We continue with the abelian group N , a group G, and a group action τ of G on N by automorphisms. A semidirect product E = G ×τ N is an allowable extension. Since G embeds as a subgroup of E, we are given a one-one group homomorphism u → u¯ of G into E. The construction at the beginning of this section works with the set u¯ of coset representatives, and they have u¯ v¯ = uv.

6. Extensions of Groups

355

Suppose that the semidirect product can be formed by a second one-one group homomorphism u → 3 u of G into E. If we write 3 u = α(u)u¯ for a function α : G → N , then we know from earlier in the section that the extensions formed from {u} ¯ and from {3 u } are equivalent. Because G maps homomorphically into E for both systems, the factor sets are 0 in both cases. Consequently the function α must satisfy α(uv) − α(u) − τ (u)α(v) = 0. This is exactly the condition that α : G → N be a 1-cocyle. Thus the group Z 1 (G, N ) parametrizes all ways that we can embed G as a complementary subgroup to N in the semidirect product E = G ×τ N . A relatively trivial way to construct a one-one group homomorphism u → 3 u ¯ 0 for from u → u¯ is to form, in the usual multiplicative notation, 3 u = x0−1 ux some x0 ∈ N . Then 3 u = x0−1 ux ¯ 0 1¯ = x0−1 (τ (u)(x0 ))u, ¯ and the additive notation for α(u) has α(u) = τ (u)(x0 ) − x0 . Referring to our earlier computations in degree 1, we see that α is in the group B 1 (G, N ) of coboundaries. The conclusion is that H 1 (G, N ) parametrizes all ways, modulo relatively trivial ways, that we can embed G as a complementary subgroup to N in the semidirect product E = G ×τ N . As promised, we now return to the proof of Proposition 7.39. PROOF OF PROPOSITION 7.39. For n = 1, we have (δ1 δ0 f )(u, v) = u((δ0 f )(v)) − (δ0 f )(uv) + (δ0 f )(u) = u(v f − f ) − (uv f − f ) + (u f − f ) = 0. For n > 1, we begin with (δn δn−1 f )(g1 , . . . , gn+1 ) = g1 ((δn−1 f )(g2 , . . . , gn+1 )) n + (−1)i (δn−1 f )(g1 , . . . , gi gi+1 , . . . , gn+1 ) i=1

+ (−1)n+1 (δn−1 f )(g1 , . . . , gn ) = I + II + III. Here I = g1 g2 ( f (g3 , . . . , gn+1 )) +

n

(−1)i−1 g1 ( f (g2 , . . . , gi gi+1 , . . . , gn+1 ))

i=2

+ (−1)n g1 ( f (g2 , . . . , gn )) = IA + IB + IC, II = −(δn−1 f )(g1 g2 , g3 , . . . , gn )+

n i=2

= IIA + IIB,

(−1)i (δn−1 f )(g1 , . . . , gi gi+1 , . . . , gn+1 )

VII. Advanced Group Theory

356

III = (−1)n+1 g1 ( f (g2 , . . . , gn )) + (−1)n+1 (−1) f (g1 g2 , g3 , . . . , gn ) + (−1)n+1

n−1

(−1)i f (g1 , . . . , gi gi+1 , . . . , gn )

i=2

+ (−1)n+1 (−1)n f (g1 , . . . , gn−1 ) = IIIA + IIIB + IIIC + IIID. Terms IIA and IIB decompose further as IIA = −g1 g2 ( f (g3 , . . . , gn+1 )) + f (g1 g2 g3 , g4 , . . . , gn+1 ) n − (−1)i+1 f (g1 g2 , . . . , gi gi+1 , . . . , gn+1 ) − (−1)n f (g1 g2 , g3 , . . . , gn ) i=3

= IIAa + IIAb + IIAc + IIAd, IIB =

n

(−1)i g1 ( f (g2 , . . . , gi gi+1 , . . . , gn+1 ))

i=2

+ (−1)2 (−1) f (g1 g2 g3 , g4 , . . . , gn+1 ) n + (−1)i (−1) f (g1 g2 , . . . , gi gi+1 , . . . , gn+1 ) i=3

+

n

(−1)i

i=2

+

n

i−2

(−1) j f (g1 , . . . , g j g j+1 , . . . , gi gi+1 , . . . , gn+1 )

j=2

(−1)i (−1)i−1 f (g1 , . . . , gi−1 gi gi+1 , . . . , gn+1 )

i=3

+

n−1

(−1)i (−1)i f (g1 , . . . , gi gi+1 gi+2 , . . . , gn+1 )

i=2

+

n−2 i=2

+

n−1

(−1)i

n

(−1) j−1 f (g1 , . . . , gi gi+1 , . . . , g j g j+1 , . . . , gn+1 )

j=i+2

(−1)i (−1)n f (g1 , . . . , gi gi+1 , . . . , gn )

i=2

+ (−1)n (−1)n f (g1 , . . . , gn−1 ) = IIBa + IIBb + IIBc + IIBd + IIBe + IIBf + IIBg + IIBh + IIBi. Inspection shows that we have cancellation between term IA and term IIAa, term IB and term IIBa, term IC and term IIIA, term IIAb and term IIBb, term IIAc and term IIBc, term IIAd and term IIIB, term IIBd and term IIBg, term IIBe and term IIBf, term IIBh and term IIIC, and term IIBi and term IIID. All the terms cancel, and we conclude that δn δn−1 f = 0.

7. Problems

357

7. Problems 1.

Using Burnside’s Theorem and Problem 34 at the end of Chapter IV, show that 60 is the smallest possible order of a nonabelian simple group.

2.

A commutator in a group is any element of the form x yx −1 y −1 . (a) Prove that the inverse of a commutator is a commutator. (b) Prove that any conjugate of a commutator is a commutator.

3.

Let a and b be elements of a group G. Prove that the subgroup generated by a and b is the same as the subgroup generated by bab2 and bab3 .

4.

A subgroup H of a group G is said to be characteristic if it is carried into itself by every automorphism of G. (a) Prove that characteristic implies normal. (b) Prove that the center Z G of G is a characteristic subgroup. (c) Prove that the commutator subgroup G of G is a characteristic subgroup.

5.

In the terminology of the previous problem, which subgroups of the quaternion subgroup H8 are characteristic?

6.

Is every ﬁnite group ﬁnitely presented? Why or why not?

7.

Let G = SL(2, R), and let G be the commutator subgroup. (a) Prove that every element

1 t 01

is in G .

(b) Prove that G = G. −1 0 is not a commutator even though it is in G . (c) Prove that 0 −1 8.

Problem 53 at the end of Chapter IV produced a group G of order 27 generated by two elements a and b satisfying a 9 = b3 = b−1 aba −4 = 1. Prove that G is given by generators and relations as , G = a, b; a 9 , b3 , b−1 aba −4 .

9.

Let G n be given by generators and a single relation as , G n = x1 , y1 , . . . , xn , yn ; x1 y1 x1−1 y1−1 · · · xn yn xn−1 yn−1 . Prove that G n /G n is free abelian of rank 2n, and conclude that the groups G n are mutually nonisomorphic as n varies. (Educational note related to topology: The group G n may be shown to be the fundamental group of a compact orientable 2-dimensional manifold without boundary and with n handles.)

10. Prove that a free group of ﬁnite rank n cannot be generated by fewer than n elements.

358

VII. Advanced Group Theory

11. Let F be the free group on generators a, b, c, and let H be the subgroup generated by all words of length 2. (a) Find coset representatives g such that G is the disjoint union of the cosets H g. (b) Find a free basis of H . 12. For the free group on generators x and y, prove that the elements y, x yx −1 , x 2 yx −2 , x 3 yx −3 , . . . , constitute a free basis of the subgroup that they generate. Conclude that a free group of rank 2 has a free subgroup of inﬁnite rank. 13. Let G = C2 ∗ C2 . Prove that the only quotient groups of G, up to isomorphism, are G itself, {1}, C2 , C2 × C2 , and the dihedral groups Dn for n ≥ 3. 14. Prove that if every irreducible ﬁnite-dimensional representation of a ﬁnite group G is 1-dimensional, then G is abelian. 15. Let G be a ﬁnitely generated group, and let H be a subgroup of ﬁnite index. Prove that H is ﬁnitely generated. 16. Let N be an abelian group, let G be a group, let τ be an action of G on N by automorphisms, and let n > 0 be an integer. (a) Prove that if every element of N has ﬁnite order dividing an integer m, then every member of H n (G, N ) has ﬁnite order dividing m. (b) Suppose that G is ﬁnite and that f is an n-cocycle. Deﬁne an (n −1)-cochain F by F(g1 , . . . , gn−1 ) = f (g1 , . . . , gn−1 , g). g∈G

By summing the cocycle condition for f over the last variable, express |G| f (g1 , . . . , gn ) in terms of F, and deduce that |G| f is a coboundary. Conclude that every member of H n (G, N ) has order dividing |G|. 17. Let G be a ﬁnite group. Suppose that G has a normal abelian subgroup N , and suppose that GCD(|N |, |G/N |) = 1. Prove that there exists a subgroup H of G such that G is the semidirect product of H and N . 18. Let N be the cyclic group C2 , and let G be an arbitrary group of order 4. Identify up to equivalence all group extensions of N by G. 19. Let N = C2 , and let E = ∞ n=1 (C 2 ⊕ C 4 ). Regard E as an extension of N in two ways—ﬁrst by embedding N as one of the summands C2 of E and then by embedding N as a subgroup of one of the summands C4 of E. Show that the quotient groups E/N in the two cases are isomorphic, that E/N acts trivially on N in both cases, and that the two group extensions are not equivalent. Problems 20–23 concern Baer multiplication of extensions. Let N be an abelian group, let G be a group, let τ be an action of G on N by automorphisms, and let E 1 and E 2 be two extensions of N by G relative to τ . Write ϕ1 : E 1 → G and ϕ2 : E 2 → G for the quotient mappings. Let (E, E ) denote the subgroup of all

7. Problems

359

members (e1 , e2 ) of E 1 × E 2 for which ϕ1 (e1 ) = ϕ2 (e2 ). Writing the operation in N multiplicatively, let Q = {(x, x −1 ) ∈ E 1 × E 2 | x ∈ N }. The Baer product of E 1 and E 2 is deﬁned to be the quotient (E 1 , E 2 )/Q. A typical coset of the Baer product will be denoted by (e1 , e2 )Q. 20. Prove that the homomorphism x → (x, 1)Q is one-one from N into (E 1 , E 2 )/Q, that the homomorphism ϕ : (E 1 , E 2 ) → G deﬁned by ϕ(e1 , e2 ) = ϕ1 (e1 ) has image G and descends to the quotient (E 1 , E 2 )/Q, and that the kernel of the descended ϕ is the embedded copy of N . (Therefore (E 1 , E 2 )/Q is an extension of N by G, evidently relative to τ .) 21. For each u ∈ G, select u¯ ∈ E 1 and 3 u ∈ E 2 with ϕ1 (u) ¯ = u = ϕ2 (3 u ), and deﬁne a(u, v) and b(u, v) for u and v in G by (x u)(y ¯ v) ¯ = a(u, v)uv and (x3 u )(y3 v ) = b(u, v)3 b(u, v). Show that (u, ¯ 3 u )Q has ϕ((u, ¯ 3 u )Q) = u and that the associated 2-cocyle for (E 1 , E 2 )/Q is a(u, v)b(u, v) if the group operation in N is written multiplicatively. 22. Prove that Baer multiplication descends to a well-deﬁned multiplication of equivalence classes of extensions of N by G relative to τ , in the following sense: Suppose that E 1 and E 1 are equivalent extensions and that E 2 and E 2 are equivalent extensions. Let (E 1 , E 2 )/Q and (E 1 , E 2 )/Q be the Baer products. Then (E 1 , E 2 )/Q is equivalent to (E 1 , E 2 )/Q . Conclude that if Baer multiplication is imposed on equivalence classes of extensions of N by G relative to τ , then the correspondence stated in Proposition 7.40 of equivalence classes to members of H 2 (G, N ) is a group isomorphism. Problems 23–24 derive the Poisson summation formula for ﬁnite abelian groups. If G is its group of multiplicative characters, then the Fourier is a ﬁnite abelian group and G of a function f in C(G, C) is coefﬁcient at χ ∈ G f (χ ) = g∈G f (g)χ (g). The Fourier inversion formula in Theorem 7.17 says that f (g) = |G|−1 χ∈ G f (χ )χ (g). 23. Let G be a ﬁnite abelian group, let H be a subgroup, and let G/H be the quotient . group. If t is in G, write t for the coset of t in G/H . Let f be in C(G, C) . and deﬁne F(t) = h∈H f (t + h) as a function on G/H . Suppose that χ is a that is identically 1 on H , so that χ descends to a member χ. of member of G χ. ). . Prove that G/H f (χ ) = F( 24. (Poisson summation formula) With f and F as in the previous problem, apply the Fourier inversion formula for G/H to the function F, and derive the formula h∈H

f (t + h) =

1 |G/H |

ω∈ G , ω| H =1

f (ω)ω(t).

(Educational is often applied with t = 0, in which case it note: This formula 1 reduces to h∈H f (h) = |G/H ω∈ G , ω| H =1 f (ω).) |

360

VII. Advanced Group Theory

Problems 25–28 continue the introduction to error-correcting codes begun in Problems 63–73 at the end of Chapter IV, combining those results with the Poisson summation formula in the problems above and with notions from Section VI.1. Let F be the ﬁeld n n Z/2Z, and form n the Hamming space Fn . Deﬁne a nondegenerate bilinear form on F by (a, c) = i=1 ai ci for a and c in F . Recall from Chapter IV that a linear code C is a vector subspace of Fn . For such a C, let C ⊥ as in Section VI.1 be the set of all a ∈ Fn such that (a, c) = 0 for all c ∈ C; the linear code C ⊥ is called the dual code. A linear code is self dual if C ⊥ = C. 25. (a) Show that the codes 0 and Fn are dual to each other. (b) Show that the repetition code and the parity-check code are dual to each other. (c) Show that the Hamming code of order 8 is self dual. (d) Show that any self-dual linear code C has dim C = n/2, and conclude that the Hamming code of order 2r with r > 3 is not self dual. (e) Show that any member c of a self-dual linear code C has even weight. (f) Show that if a linear code C has C ⊆ C ⊥ and if every member c of C has even weight, then c → 12 wt(c) mod 2 is a group homomorphism of C into Z/2Z. Here wt(c) denotes the weight of c. 26. Regard Fn as an additive group G to which the Fourier inversion formula of Section 4 can be applied. to Fn by χ → aχ with χ (c) = (−1)(aχ ,c) and that (a) Show that one can map G the result is a group isomorphism. (Therefore if f is in C(Fn , C), we can henceforth regard f as a function on Fn .) (b) Show under the identiﬁcation in (a) that if f is in C(Fn , C), then f (a) = (a,c) for a in Fn . c∈Fn f (c)(−1) (c) Suppose that the function f ∈ C(Fn , C) is of the special form f (c) = n each f i is a function on i=1 f i (ci ) whenever c = (c1 , . . . , cn ). Here n the 2-element group F. Prove that f (a) = i= 1 f i (ai ) whenever a = (a1 , . . . , a n ). Here f i is given by the formula of (b) for the case n = 1: f i (ai ) = ci ∈F f i (ci )(−1)ai ci . 27. Fix two complex numbers x and y. Deﬁne f 0 : F → C to be the function with f 0 (0) = y. Deﬁne f : F → C to be the function with n = x and f 0 (1) n−wt(c) wt(c) f (c) = i= f (c ) = x y where wt(c) is the weight of c. 1 0 i (a) Show that f 0 (0) = x + y and f 0 (1) = x − y. (b) Show that f (a) = (x + y)n−wt(a) (x − y)wt(a) . 28. Let C be a linear code in Fn . Take G to be the additive group of Fn and H to be the additive group of C. Regard C ⊥ as an additive group also. to C ⊥ by χ → aχ with χ (c) = (−1)(aχ ,c) . Show that this (a) Map G/H mapping is a group isomorphism.

7. Problems

361

(b) Applying the Poisson summation formula of Problem 24, prove that 1 f (h) = ⊥ f (a) |C | a∈C ⊥ h∈C for all f in C(Fn , C). n n−k k (c) (MacWilliams identity) Let WC (X, Y ) = Y , where k=0 Nk (C)X Nk (C) is the number of members of C with weight k, be the weightenumerator polynomial of C, and let WC ⊥ (X, Y ) be deﬁned similarly. By applying (b) to the function f in the previous problem, prove that −1 WC (x, y) = |C ⊥ | WC ⊥ (x + y, x − y) for each x and y. Conclude from Corollary 4.32 that weight-enumerator polynomials satisfy WC (X, Y ) = −1 |C ⊥ | WC ⊥ (X + Y, X − Y ). (d) The polynomials WC (X, Y ) were seen in Chapter IV to be X n for the 0 code, (X + Y )n for the code Fn , X n + Y n for the repetition code, 1 n n 8 4 4 8 2 ((X +Y ) +(X −Y ) ) for the parity-check code, and X +14X Y +Y for the Hamming code of order 8. Using relationships established in Problem 25, verify the result of (c) for each of these codes. (e) Suppose that C is a self-dual linear code. Applying (c) in this case, exhibit WC (X, Y ) as being invariant under a copy of the dihedral group D8 of order 16. (Educational note: If the polynomial WC (X, Y ) is invariant also under X → i X , as is true for the Hamming code of order 8, then WC (X, Y ) is invariant under the group generated by D8 and this transformation, which can be shown to have order 192.) Problems 29–31 concern an unexpectedly fast method of computation of Fourier coefﬁcients in the context of ﬁnite abelian groups, particularly in the context of cyclic groups. They show for a cyclic group of order m = pq that the use of the idea behind the Poisson summation formula of Problem 24 makes it possible to compute the Fourier coefﬁcients of a function in about pq( p +q) steps rather than the expected m 2 = p 2 q 2 steps. This savings may be iterated in the case of a cyclic group of order 2n so that the Fourier coefﬁcients are computed in about n2n steps rather than the expected 22n steps. An organized algorithm to implement this method of computation is known as the fast Fourier transform. Write the cyclic group Cm as the set {0, 1, 2, . . . , m −1} of integers modulo m under addition, and let ζm = e2πi/m . For k in Cm deﬁne a multiplicative character χn of Cm by χn (k) = (ζmn )k . The resulting !m since distinct m multiplicative characters satisfy χn χn = χn+n , and they exhaust C multiplicative characters are orthogonal. It will be convenient to identify χn with χn (1) = ζmn . 29. In the setting of Problem 23, suppose that G = Cm with m = pq; here p and q need not be relatively prime. Let H = {0, q, 2q, . . . , ( p−1)q} be the subgroup of G isomorphic to C p , so that G/H = {0, 1, 2, . . . , q − 1} is isomorphic to p 2p (q−1) p Cq . Prove that the characters χ of G identiﬁed with ζm0 , ζm , ζm , . . . , ζm

362

VII. Advanced Group Theory

are the ones that are identically 1 on H and therefore descend to characters . of G/H . Verify that the descended characters χ are the ones identiﬁed with q−1 χ. ) of Problem 23 ζq0 , ζq1 , ζq2 , . . . , ζq . Consequently the formula f (χ ) = F( p 2p (q−1) p provides a way of computing f at ζm0 , ζm , ζm , . . . , ζm from the values of F. Show that if F is computed from the deﬁnition of Fourier coefﬁcients, then the number of steps involved in its computation is about q 2 , apart from a constant factor. Show therefore that the total number of steps in computing f at these special values of χ is therefore on the order of q 2 + pq. 30. In the previous problem show for each k with 0 ≤ k ≤ p−1 that the value of f at p+k 2 p+k (q−1) p+k k ζm , ζm , ζm , . . . , ζm can be handled in the same way with a different F by replacing f by a suitable variant of f . Doing so for each k requires p times the number of steps detected in the previous problem, and therefore all of f can be computed in about p(q 2 + pq) = pq( p + q) steps. 31. Show how iteration of this process to compute the Fourier coefﬁcients of each F, together with further iteration of this process, allows one to compute the Fourier coefﬁcients for a function on Cm 1 m 2 ···m r in about m 1 m 2 · · · m r (m 1 +m 2 +· · ·+m r ) steps. Problems 32–36 concern contragredient representations and the decomposition of the left regular representation of a ﬁnite group G. They make use of Problems 24–28 in Chapter III, which introduce the complex conjugate V of a complex vector space V . In the case that V is an inner-product space, those problems deﬁne (u, v)V = (v, u)V , and they show that if v ∈ V is given by v (u) = (u, v)V = (v, u)V , then the mapping v ↔ v is an isomorphism of V with V . 32. Show that the deﬁnition ( v1 , v2 )V = (v1 , v2 )V makes the isomorphism of V with V preserve inner products. 33. If R is a unitary representation of G on the ﬁnite-dimensional complex vector space V , deﬁne the contragredient representation R c of G on V by R c (x) = R(x −1 )t . Prove that R c (x) v = R(x)v and that R c is unitary on V . 34. Show that the matrix coefﬁcients of R c are the complex conjugates of those of R and that the characters satisfy χ R c = χ R . 35. Give an example of an irreducible representation of a ﬁnite group G that is not equivalent to its contragredient. 36. Let be the left regular representation of G on C(G, C), and let VR be the linear span in C(G, C) of the matrix coefﬁcients of an irreducible representation R of dimension d. Prove that the representation ( , VR ) of G is equivalent to the direct sum of d copies of the contragredient R c . Problems 37–46 concern the free product C2 ∗ C3 and its quotients. The problems make use of the group of matrices SL(2, Z/mZ) of determinant 1 over the commutative ring Z/mZ, as discussed in Section V.2. One of the quotients of C2 ∗ C3

7. Problems

363

will be PSL(2, Z) = SL(2, Z)/{scalar matrices}, and these problems show that the quotient mapping can be arranged to be an isomorphism. Other quotients will be the groups G m = X, Y ; X 2 , Y 3 , (X Y )m with m ≥ 2. These arise in connection with tilings in 2-dimensional geometry. The isomorphism C2 ∗ C3 ∼ = PSL(2, Z) leads to a homomorphism that will be called σm carrying G m onto PSL(2, Z/mZ) = SL(2, Z/mZ)/{scalar matrices}, the image group being ﬁnite. The problems show that the homomorphism σm : G m → PSL(2, Z/mZ) is an isomorphism for the cases in which G m arises from spherical geometry, namely for 2 ≤ m ≤ 5, and that the homomorphism is not an isomorphism for m = 6, the case in which G m arises from Euclidean geometry. 0 −1 0 1 37. Show that the elements 1 0 and −1 −1 generate SL(2, Z) by arguing as follows: if the subgroup ofSL(2, Z) generated by these two elements is not a b SL(2, Z), choose an element c d outside having max(|a|, |b|) as small as possible, and derive a contradiction by showing that a suitable right multiple of it by elements of is in . 0 −1 0 1 38. By mapping X → x = 1 0 mod ±I and Y → y = −1 −1 mod ±I , produce a group homomorphism of C2 ∗C3 = X, Y ; X 2 , Y 3 onto PSL(2, Z).

39. Let x, y, and : C2 ∗C3 → problem. PSL(2, Z) be as in the previous a b (a) For any member c d mod ±I of PSL(2, Z), deﬁne µ ac db mod ±I = max(|a|, |b|) and ν ac db mod ±I = max(|c|, |d|). Prove that if z = ac db mod ±I in PSL(2, Z) has ab ≤ 0, then µ(zyx) ≥ µ(z) and

(b) (c) (d)

(e)

µ(zy −1 x) ≥ µ(z), while if cd ≤ 0, then ν(zyx) ≥ ν(z) and ν(zy −1 x) ≥ ν(z). Prove that µ(zx) = µ(z) and ν(zx) = ν(z) for all z in PSL(2, Z). Show that there are only 10 members z of PSL(2, Z) for which the two conditions µ(z) = 1 and ν(z) = 1 both hold. A reduced word in C2 ∗ C3 is a ﬁnite sequence of factors X , Y , and Y −1 , with no two consecutive factors equal and with no two consecutive factors Y Y −1 or Y −1 Y . Prove for any reduced word a1 · · · an in C2 ∗ C3 , where each a j is one of X , Y , and Y −1 , that µ((a1 · · · an )) ≥ µ((a1 · · · an−1 )) and that ν((a1 · · · an )) ≥ ν((a1 · · · an−1 )). Deduce that the homomorphism is an isomorphism.

40. Let (m) be the group of all matrices M in SL(2, Z) such that every entry of M − I is divisible by m. (a) Prove that passage from a matrix in SL(2, Z) to the same matrix with its entries considered modulo m gives a homomorphism 3 σm : SL(2, Z) → SL(2, Z/mZ) with ker 3 σm = (m).

364

VII. Advanced Group Theory

(b) Prove that if α, β, and m are positive integers with GCD(α, β, m) = 1, then there exists an integer r such that GCD(α + mr, β) = 1. (One way of proceeding is to use Dirichlet’s theorem on primes in arithmetic progressions.) (c) Prove that image 3 σm = SL(2, Z/mZ), i.e., 3 σm is onto. 41. Let m : C2 ∗C3 → G m be the homomorphism deﬁned by the conditions X → X and Y → Y . Let Hm be the smallest normal subgroup of PSL(2, Z) containing σm : SL(2, Z) → SL(2, Z/mZ) be the homomorphism of (x y)m mod ±I . Let 3 the previous problem. (a) Why is m well deﬁned? (b) Why is Hm = (ker m )? (c) Deﬁne PSL(Z/mZ) = SL(2, Z/mZ)/{scalar matrices}. Why does the composition of 3 σm followed by passage to the quotient descend to a homomorphism σm of PSL(2, Z) onto PSL(2, Z/mZ)? (d) If K ⊆ PSL(2, Z) is the kernel of σm , why is Hm ⊆ K m ? (e) Show that if t isany integer, then the following members of K m lie in the mod ±I , subgroup Hm : 10 tm 1 1+tm −tm and tm 1−tm mod ±I .

1 0

tm 1

mod ±I ,

1+tm

tm −tm 1−tm

mod ±I ,

42. With G m deﬁned as above, exhibit homomorphisms of various groups G m onto the following ﬁnite groups: (a) S3 when m = 2 by sending X → (1 2) and Y → (1 2 3). (b) A4 when m = 3 by sending X → (1 2)(3 4) and Y → (1 2 3). (c) S4 when m = 4 by sending X → (1 2) and Y → (2 3 4). (d) A5 when m = 5 by sending X → (1 2)(3 4) and Y → (1 3 5). 43. This problem shows how to prove that Hm = K m for 2 ≤ m ≤ 5, and it asks that the steps be carried out for m = 2 and m = 3. Recall from the remark with Lemma 7.11 that Lemma 7.11 is valid for all groups in determining a set of generators of a subgroup from generators of the whole group and a system of coset representatives. The lemma is to be applied to the group PSL(2, Z) and the subgroup K m . Generators of PSL(2, Z) are taken as b1 = x mod ±I and b2 = y mod ±I . (a) For the case m = 2, ﬁnd members g1 , . . . , g6 of PSL(2, Z) such that the six cosets of PSL(2, Z)/K 2 are exactly K 2 g1 , . . . , K 2 g6 . (b) Still for the case m = 2, ﬁnd g j bi ρ(g j bi )−1 for 1 ≤ i ≤ 2 and 1 ≤ j ≤ 6. Lemma 7.11 says that these 12 elements generate K 2 . (c) Using Problem 41e and any necessary variations of it, show that each of the 12 generators of K 2 in (b) lies in the subgroup H2 , and conclude that H2 = K 2 .

7. Problems

365

(d) Repeat steps (a), (b), and (c) for m = 3. There are 12 cosets K 3 g j of PSL(2, Z)/K 3 . (Educational note: There are 24 cosets for PSL(2, Z)/K 4 and 60 cosets for PSL(2, Z)/K 5 .) 44. Take for granted that Hm = K m for 2 ≤ m ≤ 5. Deduce the isomorphisms (a) G 2 ∼ = PSL(2, Z/2Z) ∼ = S3 . ∼ (b) G 3 = PSL(2, Z/3Z) ∼ = A4 . (This group is called the tetrahedral group.) (c) G 4 ∼ = PSL(2, Z/4Z) ∼ = S4 . (This group is called the octahedral group.) (d) G 5 ∼ = PSL(2, Z/5Z) ∼ = A5 . (This group is called the icosahedral group.) 45. A translation in the Euclidean plane R2 is any function T(a,b) (x, y) = (a + x, b + y), the rotation about the origin clockwise through the angle θ cos θ − sin θ is the linear map Rθ given by the matrix sin θ cos θ , and the rotation about (x0 , y0 ) clockwise through the angle θ is the linear map given by (x, y) → Rθ (x − x0 , y − y0 ) + (x0 , y0 ). (a) Prove that Rθ T(a,b) Rθ−1 = TRθ (a,b) . (b) Prove that the union of the set of translations and all the sets of rotations about points of R2 is a group by showing that it is the semidirect product of the subgroup of rotations about the origin and the normal subgroup of translations. 46. Fix a triangle T in the Euclidean plane with vertices arranged counterclockwise at a, b, c and with angles π/2 at a, π/3 at b, and π/6 at c. Let ra be rotation clockwise through π at a, rb be rotation clockwise through 2π/3 at b, and rc be rotation counterclockwise through π/3 at c. (a) Show that ra2 = 1, rb3 = 1, rc6 = 1, and rc = ra rb . (b) Show that the member rb ra rb ra rb of the group generated by ra and rb is a nontrivial translation and therefore that the generated group is inﬁnite. 3 denotes the (c) Conclude that G 6 PSL(2, Z/6Z). (Educational note: If T union of T and the reﬂection of T in one of the sides of T , it can be shown that the group generated by ra and rb is isomorphic to G 6 and tiles the plane 3.) with copies of T Problems 47–52 establish a harmonic analysis for arbitrary representations of ﬁnite groups on complex vector spaces, whether ﬁnite-dimensional or inﬁnite-dimensional. Let G be a ﬁnite group, and let V be a complex vector space. For any representation R of G on V , one deﬁnes R( f )v = x∈G f (x)R(x)v for f in C(G, C) and v in V , just as in the case that V is ﬁnite-dimensional. The same computation as in Section VII.4 shows that the formula R( f 1 ∗ f 2 ) = R( f 1 )R( f 2 ) remains valid when V is inﬁnite-dimensional. 47. Let (R1 , V1 ) and (R2 , V2 ) be irreducible ﬁnite-dimensional representations of G on complex vector spaces, and let χ R1 and χ R2 be their characters. Using Schur orthogonality, prove that (a) χ R1 ∗ χ R2 = 0 if R1 and R2 are inequivalent,

366

VII. Advanced Group Theory

(b) χ R1 ∗ χ R1 = |G|d R−11 χ R1 , where d R1 = dim VR . 48. With (R, V ) given, let (Rα , Vα ) be any irreducible ﬁnite-dimensional representation of G, and deﬁne E α : V → V by E α = |G|−1 dα R(χα ), where χα is the character of Rα and where dα = dim Vα . (a) Prove that E α2 = E α . (b) Prove that E α E β = E β E α = 0 if (Rβ , Vβ ) is an irreducible ﬁnitedimensional representation of G such that Rα and Rβ are inequivalent. 49. Observe for each v in V that {R(x)v | x ∈ G} spans a ﬁnite-dimensional invariant subspace of V . By Corollary 7.21, each v in V lies in a ﬁnite direct sum of ﬁnitedimensional invariant subspaces of V on each of which R acts irreducibly. Using Zorn’s Lemma, prove that V is the direct sum of ﬁnite-dimensional subspaces on each of which R acts irreducibly. (If V is inﬁnite-dimensional, there will of course be inﬁnitely many such subspaces.) 50. Suppose that V0 is a ﬁnite-dimensional invariant subspace of V such that R V0

is equivalent to some Rα , where Rα is as in Problem 48. Prove that E α is the identity on V0 . 51. Deduce that if {(Rβ , Vβ )} is a maximal collection of inequivalent ﬁnitedimensional irreducible representations of G, then β E β = I on V and the image of E α is the set of all sums of vectors in V lying in some ﬁnite-dimensional invariant subspace V0 of V such that R V0 is equivalent to Rα . (Educational note: Consequently V is exhibited as the ﬁnite direct sum of the spaces image E α , each space image E α is the direct sum of ﬁnite-dimensional irreducible invariant subspaces, and the restriction of R to any ﬁnite-dimensional irreducible invariant subspace of image E α is equivalent with Rα . 52. Suppose that (Rα , Vα ) is a 1-dimensional representation of G given by a multiplicative character ω. Prove that the image of E α consists of all vectors v in V such that R(x)v = ω(x)v for all x in G.

CHAPTER VIII Commutative Rings and Their Modules

Abstract. This chapter ampliﬁes the theory of commutative rings that was begun in Chapter IV, and it introduces modules for any ring. Emphasis is on the topic of unique factorization. Section 1 gives many examples of rings, some commutative and some noncommutative, and introduces the notion of a module for a ring. Sections 2–4 discuss some of the tools related to questions of factorization in integral domains. Section 2 deﬁnes the ﬁeld of fractions for an integral domain and gives its universal mapping property. Section 3 deﬁnes prime and maximal ideals and relates quotients of them to integral domains and ﬁelds. Section 4 introduces principal ideal domains, which are shown to have unique factorization, and it deﬁnes Euclidean domains as a special kind of principal ideal domain for which greatest common divisors can be obtained constructively. Section 5 proves that if R is an integral domain with unique factorization, then so is the polynomial ring R[X ]. This result is a consequence of Gauss’s Lemma, which addresses what happens to the greatest common divisor of the coefﬁcients when one multiplies two members of R[X ]. Gauss’s Lemma has several other consequences that relate factorization in R[X ] to factorization in F[X ], where F is the ﬁeld of fractions of R. Still another consequence is Eisenstein’s irreducibility criterion, which gives a sufﬁcient condition for a member of R[X ] to be irreducible. Section 6 contains the theorem that every ﬁnitely generated unital module over a principal ideal domain is a direct sum of cyclic modules. The cyclic modules may be assumed to be primary in a suitable sense, and then the isomorphism types of the modules appearing in the direct-sum decomposition, together with their multiplicities, are uniquely determined. The main results transparently generalize the Fundamental Theorem for Finitely Generated Abelian Groups, and less transparently they generalize the existence and uniqueness of Jordan canonical form for square matrices with entries in an algebraically closed ﬁeld. Sections 7–11 contain foundational material related to factorization for the two subjects of algebraic number theory and algebraic geometry. Both these subjects rely heavily on the theory of commutative rings. Section 7 is a section of motivation, showing the analogy between a situation in algebraic number theory and a situation in algebraic geometry. Sections 8–10 introduce Noetherian rings, integral closures, and localizations. Section 11 uses this material to establish unique factorization of ideals for Dedekind domains, as well as some other properties.

1. Examples of Rings and Modules Sections 4–5 of Chapter IV introduced rings and ﬁelds, giving a small number of examples of each. In the present section we begin by recalling those examples and giving further ones. Although Chapters VI and VII are not prerequisite for 367

368

VIII. Commutative Rings and Their Modules

the present chapter, our list of examples will include some rings and ﬁelds that arose in those two chapters. The theory to be developed in this chapter is intended to apply to commutative rings, especially to questions related to unique factorization in such rings. Despite this limitation it seems wise to include examples of noncommutative rings in the list below. In the conventions of this book, a ring need not have an identity. Many rings that arise only in the subject of algebra have an identity, but there are important rings in the subject of real analysis that do not. From the point of view of category theory, one therefore distinguishes between the category of all rings, with ring homomorphisms as morphisms, and the category of all rings with identity, with ring homomorphisms carrying 1 to 1 as morphisms. In the latter case one may want to exclude the zero ring from being an object in the category under certain circumstances. EXAMPLES OF RINGS. (1) Basic commutative rings from Chapter IV. All of the structures Z, Q, R, C, Z/mZ, and 2Z are commutative rings. All but the last have an identity. Of these, Q, R, and C are ﬁelds, and so is F p = Z/ pZ if p is a prime number. The others are not ﬁelds. (2) Polynomial rings. Let R be a nonzero commutative ring with identity. In Section IV.5 we deﬁned the commutative ring R[X 1 , . . . , X n ] of polynomials over R in n indeterminates. It has a universal mapping property with respect to substitution for the indeterminates and use of a homomorphism on the coefﬁcients. Making substitutions from R itself and mapping the coefﬁcients by the identity homomorphism, we are led to the ring of all functions (r1 , . . . , rn ) → f (r1 , . . . , rn ) for r1 , . . . , rn in R and f (X 1 , . . . , X n ) in R[X 1 , . . . , X n ]; this is called the ring of all polynomial functions in n variables on R. Polynomials may be considered also in inﬁnitely many variables, but we did not treat this case in any detail. (3) Matrix rings over commutative rings. Let R be a nonzero commutative ring with identity. The set Mn (R) of all n-by-n matrices with entries in R is a ring under entry-by-entry addition and the usual deﬁnition of matrix multiplication: (AB)i j = nk=1 Aik Bk j . It has an identity, namely the identity matrix I with Ii j = δi j . In this setting, Section V.2 introduced a theory of determinants, and it was proved that a matrix has a one-sided inverse if and only if it has a two-sided inverse, if and only if its determinant is a member of the group R × of units in R, i.e., elements of R invertible under multiplication. The matrix ring Mn (R) is always noncommutative if n > 1. (4) Matrix rings over noncommutative rings. If R is any ring, we can still make the set Mn (R) of all n-by-n matrices with entries in R into a ring. However, if

1. Examples of Rings and Modules

369

R has no identity, Mn (R) will have no identity. The theory of determinants does not directly apply if R is noncommutative or if R fails to have an identity,1 and as a consequence, questions about the invertibility of matrices are more subtle than with the previous example. (5) Spaces of linear maps from a vector space into itself. Let V be a vector space over a ﬁeld K. The vector space EndK (V ) = HomK (V, V ) of all K linear maps from V to itself is initially a vector space over K. Composition provides a multiplication that makes EndK (V ) into a ring with identity. In fact, associativity of multiplication is automatic for any kind of function, and so is the distributive law (L 1 + L 2 )L 3 = L 1 L 3 + L 2 L 3 . The distributive law L 1 (L 2 + L 3 ) = L 1 L 2 + L 1 L 3 follows from the fact that L 1 is linear. This ring is isomorphic as a ring to Mn (K) if V is n-dimensional, an isomorphism being determined by specifying an ordered basis of V . (6) Associative algebras over ﬁelds. These were deﬁned in Section VI.7, knowledge of which is not being assumed now. Thus we repeat the deﬁnition. If K is a ﬁeld, then an associative algebra over K, or associative K algebra, is a ring A that is also a vector space over K such that the multiplication A × A → A is K-linear in each variable. The conditions of linearity concerning multiplication have two parts to them: an additive part saying that the usual distributive laws are valid and a scalar-multiplication part saying that (ka)b = k(ab) = a(kb)

for all k in K and a, b in A.

If A has an identity, the displayed condition says that all scalar multiples of the identity lie in the center of A, i.e., commute with every element of A. In Examples 2 and 3, when R is a ﬁeld K, the polynomial rings and matrix rings over K provide examples of associative algebras over K; scalar multiplication is to be done in entry-by-entry fashion. Example 5 is an associative algebra as well. If L is any ﬁeld such that K is a subﬁeld, then L may be regarded as an associative algebra over K. An interesting commutative associative algebra over C without identity is the algebra Ccom (R) of all continuous complex-valued functions on R that vanish outside a bounded interval; the vector-space operations are the usual pointwise operations, and the operation of multiplication is given by convolution 9 f (x − y)g(y) dy. ( f ∗ g)(x) = R

Section VII.4 worked with an analog C(G, C) of this algebra in the context that R is replaced by a ﬁnite group G. 1 A limited theory of determinants applies in the noncommutative case, but it will not be helpful for our purposes.

370

VIII. Commutative Rings and Their Modules

(7) Division rings. A division ring is a nonzero ring with identity such that every element has a two-sided inverse under multiplication. A commutative division ring is just a ﬁeld. The ring H of quaternions is the only explicit noncommutative division ring that we have encountered so far. It is an associative algebra over R. More generally, if A is a division ring, then we can easily check that the center K of A is a ﬁeld and that A is an associative algebra over K.2 (8) Tensor, symmetric, and exterior algebras. If E is a vector space over a ﬁeld K, Chapter VI deﬁned the tensor, symmetric, and exterior algebras of E over K, as well as the polynomial algebra on E in the case that E is ﬁnite-dimensional. These are all associative algebras with identity. Symmetric algebras and polynomial algebras are commutative. None of these algebras will be discussed further in this chapter. (9) A ﬁeld of 4 elements. This was constructed in Section IV.4. Further ﬁnite ﬁelds beyond the ﬁeld of 4 elements and the ﬁelds F p = Z/ pZ with p prime will be constructed in Chapter IX. (10) Algebraic number ﬁelds Q[θ]. These were discussed in Sections IV.1 and IV.4. In deﬁning Q[θ], we assume that θ is a complex number and that there exists an integer n > 0 such that the complex numbers 1, θ, θ 2 , . . . , θ n are linearly dependent over Q. The set Q[θ] is deﬁned to be the subset of C obtained by substitution of θ into all members of Q[X ]. It coincides with the linear span over Q of 1, θ, θ 2 , . . . , θ n−1 . Proposition 4.1 shows that it is closed under the arithmetic operations, including passage to multiplicative inverses of nonzero elements, and it is therefore a subﬁeld of C. This example ties in with the notion of minimal polynomial in Chapter V because the members of Q[X ] with θ as a root are all multiples of one nonzero such polynomial that exhibits the linear dependence. We return to this example occasionally later in this chapter, particularly in Sections 7–11, and then we treat it in more detail in Chapter IX. (11) Algebraic integers in a number ﬁeld Q[θ]. Algebraic integers were deﬁned in Section VII.5 as the roots in C of monic polynomials in Z[X ], and they were shown to form a commutative ring with identity. The set of algebraic integers in Q[θ] is therefore a commutative ring with identity, and it plays somewhat the same role for Q[θ] that Z plays for Q. We discuss this example further in Sections 7–11. (12) Integral group rings. If G is a group, then we can make the free abelian group of G into a ring by deﬁning multiplication to be ZGon the elements m g n h = i, j (m i n j )(gi h j ) when the m i and n j are in Z and the i i j j i j gi and h j are in G. It is immediate that the result is a ring with identity, and ZG 2 Use of the term “division algebra” requires some care. Some mathematicians understand division algebras to be associative, and others do not. The real algebra O of octonions, as deﬁned in Problems 52–56 at the end of Chapter VI, is not associative, but it does have division.

1. Examples of Rings and Modules

371

is called the integral group ring of G. The group G is embedded as a subgroup of the group (ZG)× of units of ZG, each element of g being identiﬁed with a sum ι(g) = m i gi in which the only nonzero term is 1g. The ring ZG has the universal mapping property illustrated in Figure 8.1 and described as follows: whenever ϕ : G → R is a group homomorphism of G into the group R × of units of a ring R, then there exists a unique ring homomorphism : ZG → R such that ι = ϕ. The existence of as a homomorphism of additive groups follows from the universal mapping property of free abelian groups, and then one readily checks that respects multiplication.3 ϕ

G −−−→ R ⏐ ⏐ ι ZG FIGURE 8.1. Universal mapping property of the integral group ring of G. (13) Quotient rings. If R is a ring and I is a two-sided ideal, then we saw in Section IV.4 that the additive quotient R/I has a natural multiplication that makes it into a ring called a quotient ring of R. This in effect was the construction that obtained the ring Z/mZ from the ring Z. (14) Direct product of rings. If {Rs | s ∈ S} is a nonempty set of rings, then a direct product s∈S Rs is a ring whose additive group is any direct product of the underlying additive groups and whose multiplication is given in entryby-entry fashion. The resulting ring and the associated ring homomorphisms ps0 : s∈S Rs → Rs0 amount to the product functor for the category of rings; if each Rs has an identity, the result amounts also to the product functor for the category of rings with identity. We give further examples of rings near the end of this section after we have deﬁned modules and given some examples. Informally a module is a vector space over a ring. But let us be more precise. If R is a ring, then a left R module4 M is an abelian group with the additional structure of a “scalar multiplication” R × M → M such that (i) r (r m) = (rr )m for r and r in R and m in M, 3 Universal mapping properties are discussed systematically in Problems 18–22 at the end of Chapter VI. The subject of such a property, here the pair (Z G, ι), is always unique up to canonical isomorphism in a given category, but its existence has to be proved. 4 Many algebra books write “R-module,” using a hyphen. However, when R is replaced by an expression, particularly in applications of the theory, the hyphen is often dropped. For an example, see “module” in Hall’s The Theory of Groups. The present book omits the hyphen in all cases in order to be consistent.

372

VIII. Commutative Rings and Their Modules

(ii) (r + r )m = r m + r m and r (m + m ) = r m + r m if r and r are in R and m and m are in M. In addition, if R has an identity, we say that M is unital if (iii) 1m = m for all m in M. One may also speak of right R modules. For these the scalar multiplication is usually written as mr with m in M and r in R, and the expected analogs of (i) and (ii) are to hold. When R is commutative, it is immaterial which side is used for the scalar multiplication, and one speaks simply of an R module. Let R be a ring, and let M and N be two left R modules. A homomorphism of left R modules, or more brieﬂy an R homomorphism, is an additive group homomorphism ϕ : M → N such that ϕ(r m) = r ϕ(m) for all r in R. Then we can form a category for ﬁxed R in which the objects are the left R modules and the morphisms are the R homomorphisms from one left R module to another. Similarly the right R modules, along with the corresponding kind of R homomorphisms, form a category. If R has an identity, then the unital R modules form a subcategory in each case. These categories are fundamental to the subject of homological algebra, which we take up in Advanced Algebra. EXAMPLES OF MODULES. (1) Vector spaces. If R is a ﬁeld, the unital R modules are exactly the vector spaces over R. (2) Abelian groups. The unital Z modules are exactly the abelian groups. Scalar multiplication is given in the expected way: If n is a positive integer, the product nx is the n-fold sum of x with itself. If n = 0, the product nx is 0. If n < 0, the product nx is −((−n)x). (3) Vector spaces as unital modules for the polynomial ring K[X ]. Let V be a ﬁnite-dimensional vector space over the ﬁeld K, and ﬁx L be in EndK (V ). Then V becomes a unital K[X ] module under the deﬁnition A(X )v = A(L)(v) whenever A(X ) is a polynomial in K[X ]; here A(L) is the member of EndK (V ) deﬁned as in Section V.3. In Section 6 in this chapter we shall see that some of the deeper results in the theory of a single linear transformation, as developed in Chapter V, follow from the theory of unital K[X ] modules that will emerge from the present chapter. (4) Modules in the context of algebraic number ﬁelds. Let Q[θ] be a subﬁeld of C as in Example 10 of rings earlier in this section. It is assumed that the Q vector space Q[θ] is ﬁnite-dimensional. Let L be the member of EndQ (Q[θ]) given as left multiplication by θ on Q[θ]. As in the previous example, Q[θ] becomes a unital Q[X ] module. Chapter V deﬁnes a minimal polynomial for L, as well as a characteristic polynomial. These objects play a role in the study

1. Examples of Rings and Modules

373

to be carried out in Chapter IX of ﬁelds like Q[θ]. If θ is an algebraic integer as in Example 11 of rings earlier in this section, then we can get more reﬁned information by replacing Q by Z in the above analysis; this technique plays a role in the theory to be developed in Sections 7–11. (5) Rings and their quotients. If R is a ring, then R is a left R module and also a right R module. If I is a two-sided ideal in I , then the quotient ring R/I , as deﬁned in Proposition 4.20, is a left R module and also a right R module. These modules are automatically unital if R has an identity. Later in this section we shall consider quotients of R by “one-sided ideals.” (6) Spaces of rectangular matrices. If R is a ring, then the space Mmn (R) of m-by-n matrices with entries in R is an abelian group under addition and becomes a left R module when multiplication by the scalar r is deﬁned as left multiplication by r in each entry. Also, if we put S = Mm (R), then Mmn (R)is a left S module under the usual deﬁnition of matrix multiplication: (sv)i j = nk=1 sik vk j , where s is in S and v is in Mmn (R). (7) Direct product of R modules. If S is a nonempty set and {Ms }s∈S is a corresponding system of left R modules, then a direct product s∈S Ms is obtained as an additive group by forming any direct product of the underlying additive groups of the Ms ’s and deﬁning scalar multiplication by members of R to be scalar multiplication in each coordinate. The associated abelian-group homomorphisms ps0 : s∈S Ms → Ms0 become R homomorphisms under this deﬁnition of scalar multiplication on the direct product. Direct product amounts to the product functor for the category of left R modules; we omit the easy veriﬁcation, which makes use of the corresponding fact about abelian groups. As in the case of abelian groups, we can speak of an external direct product as the result of a construction that starts with the product of the sets Ms , and we can speak of recognizing a direct product as internal when the Ms ’s are contained in the direct product and the restriction of each ps to Ms is the identity function. (8) Direct sum of R modules. If S is a nonempty set and {Ms }s∈S is a corresponding system of left R modules, then a direct sum s∈S Ms is obtained as an additive group by forming any direct sum of the underlying additive groups of the Ms ’s and deﬁning scalar multiplication by members of R to be scalar multiplication in each coordinate. The associated abelian-group homomorphisms i s0 : Ms0 → s∈S Ms become R homomorphisms under this deﬁnition of scalar multiplication on the direct sum. Direct sum amounts to the coproduct functor for the category of left R modules; we omit the easy veriﬁcation, which makes use of the corresponding fact about abelian groups. As in the case of abelian groups, we can speak of an external direct sum as the result of a construction that starts with a subset of the product of the sets Ms , and we can speak of recognizing a

374

VIII. Commutative Rings and Their Modules

direct sum as internal when the Ms ’s are contained in the direct sum and each i s is the inclusion mapping. (9) Free R modules. Let R be a nonzero ring with identity, and let S be a nonempty set. As in Example 5, let us regard R as a unital left R module. Then the left R module given as the direct sum F(S) = s∈S R is called a free R module, or free left R module. We deﬁne ι : S → F(S) by ι(s) = i s (1), where i s is the usual embedding map for the direct sum of R modules. The left R module F(S) has a universal mapping property similar to the corresponding property of free abelian groups. This is illustrated in Figure 8.2 and is described as follows: whenever M is a unital left R module and ϕ : S → M is a function, then there exists a unique R homomorphism : F(S) → M such that ι = ϕ. The existence of as an R homomorphism follows from the universal mapping property of direct sums (Example 8) as soon as the property is demonstrated for S equal to a singleton set. Thus let A be any left R module, and let a ∈ A be given; then it is evident that r → ra is the unique R homomorphism of the left R module R into A carrying 1 to a. S ⏐ ⏐ ι

ϕ

−−−→ M

F(S) FIGURE 8.2. Universal mapping property of a free left R module. If R is a ring and M is a left R module, then an R submodule N of M is an additive subgroup of M that is closed under scalar multiplication, i.e., has r m in N when r is in R and m is in N . In situations in which there is no ambiguity, the use of “left” in connection with R submodules is not necessary. EXAMPLES OF SUBMODULES. If V is a vector space over a ﬁeld K, then a K submodule of V is a vector subspace of V . If M is an abelian group, then a Z submodule of M is a subgroup. In Example 6 of modules, in which S = Mm (R), then an example of a left S submodule of Mmn (R) is all matrices with 0 in every entry of a speciﬁed subset of the n columns. If the ring R has an identity and M is a unital left R module, then the R submodule of M generated by m ∈ M, i.e., the smallest R submodule containing m, is Rm, the set of products r m with r in R. In fact, the set of all r m is an abelian group since (r ± s)m = r m ± sm, it is closed under scalar multiplication since s(r m) = (sr )m, and it contains m since 1m = m. However, if the left R module M is not unital, then the R submodule generated by m may not equal Rm, and it was for that reason that R modules were assumed to be unital in the construction of free R modules in Example 9 of modules above. More generally the R submodule

1. Examples of Rings and Modules

375

of M generated by a ﬁnite set {m 1 , . . . , m n } in M is Rm 1 + · · · + Rm n if the left R module M is unital. Example 5 of modules treated R as a left R module. In this setting the left R submodules are called left ideals in R. That is, a left ideal I is an additive subgroup of R such that ri is in I whenever r is in R and i is in I . As a special case of what was said in the previous paragraph, if the ring R has an identity, then the left R module R is automatically unital, and the left ideal of R generated by an element a is Ra, the set of all products ra with r in R. Similarly a right ideal in R is an additive subgroup I such that ir is in I whenever r is in R and i is in I . The right ideals are the right R submodules of the right R module R. If R is commutative, then left ideals, right ideals, and two-sided ideals are all the same. Suppose that ϕ : M → N is an R homomorphism of left R modules. In this situation we readily verify that the kernel of ϕ, denoted by ker ϕ as usual, is an R submodule of M, and the image of ϕ, denoted by image ϕ as usual, is an R submodule of N . The R homomorphism ϕ is one-one if and only if ker ϕ = 0, as a consequence of properties of homomorphisms of abelian groups. A one-one R homomorphism of one left R module onto another is called an R isomorphism; its inverse is automatically an R isomorphism, and “is R isomorphic to” is an equivalence relation. Still with R as a ring, suppose that M is a left R submodule and N is an R submodule. Then we can form the quotient M/N of abelian groups. This becomes a left R module under the deﬁnition r (m + N ) = r m + N , as we readily check. We call M/N a quotient module. The quotient mapping m → m + N of M to M/N is an R homomorphism onto. A particular example of a quotient module is R/I , where I is a left ideal in R. We can now go over the results on quotients of abelian groups in Section IV.2, speciﬁcally Proposition 4.11 through Theorem 4.14, and check that they extend immediately to results about left R modules. The statements appear below. The arguments are all routine, and there is no point in repeating them. In the special case that R is a ﬁeld and the R modules are vector spaces, these results specialize to results proved in Sections II.5 and II.6. Proposition 8.1. Let R be a ring, let ϕ : M1 → M2 be an R homomorphism between left R modules, let N0 = ker ϕ, let N be an R submodule of M1 contained in N0 , and deﬁne q : M1 → M1 /N to be the R module quotient map. Then there exists an R homomorphism ϕ : M1 /N → M2 such that ϕ = ϕq, i.e, ϕ(m 1 + N ) = ϕ(m 1 ). It has the same image as ϕ, and ker ϕ = {h 0 N | h 0 ∈ N0 }. REMARK. As with groups, one says that ϕ factors through M1 /N or descends to M1 /N . Figure 8.3 illustrates matters.

376

VIII. Commutative Rings and Their Modules

M1 ⏐ ⏐ q

ϕ

−−−→ M2 ϕ

M1 /N FIGURE 8.3. Factorization of R homomorphisms via a quotient of R modules. Corollary 8.2. Let R be a ring, let ϕ : M1 → M2 be an R homomorphism between left R modules, and suppose that ϕ is onto M2 and has kernel N . Then ϕ exhibits the left R module M1 /N as canonically R isomorphic to M2 . Theorem 8.3 (First Isomorphism Theorem). Let R be a ring, let ϕ : M1 → M2 be an R homomorphism between left R modules, and suppose that ϕ is onto M2 and has kernel K . Then the map N1 → ϕ(N1 ) gives a one-one correspondence between (a) the R submodules N1 of M1 containing K and (b) the R submodules of M2 . Under this correspondence the mapping m + N1 → ϕ(m) + ϕ(N1 ) is an R isomorphism of M1 /N1 onto M2 /ϕ(N1 ). REMARK. In the special case of the last statement that ϕ : M1 → M2 is an R module quotient map q : M → M/K and N is an R submodule of M containing K ( , the last statement of the theorem asserts the R isomorphism M/N ∼ = (M/K ) (N /K ). Theorem 8.4 (Second Isomorphism Theorem). Let R be a ring, let M be a left R module, and let N1 and N2 be R submodules of M. Then N1 ∩ N2 is an R submodule of N1 , the set N1 + N2 of sums is an R submodule of M, and the map n 1 + (N1 ∩ N2 ) → n 1 + N2 is a well-deﬁned canonical R isomorphism N1 /(N1 ∩ N2 ) ∼ = (N1 + N2 )/N2 . A quotient of a direct sum of R modules by the direct sum of R submodules is the direct sum of the quotients, according to the following proposition. The result generalizes Lemma 4.58, which treats the special case of abelian groups (unital Z modules). Proposition 8.5. Let R be a ring, let M = s∈S Ms be a direct sum of left R modules, and for each s in S, let Ns be a left R submodule of Ms . Then the natural map of s∈S Ms to the direct sum of quotients descends to an R isomorphism : Ms Ns ∼ (Ms /Ns ). = s∈S

s∈S

s∈S

1. Examples of Rings and Modules

377

PROOF. Let ϕ : s∈S Ms → s∈S (Ms /Ns ) be the R homomorphism deﬁned }s∈S ) = {m s + Ns }s∈S . The mapping ϕ is onto s∈S (Ms /Ns ), and the by ϕ({m s kernel is s∈S Ns . Then Corollary 8.2 shows that ϕ descends to the required R isomorphism. EXAMPLES OF RINGS, CONTINUED. (15) Associative algebras over commutative rings with identity. These directly generalize Example 6 of rings. Let R be a nonzero commutative ring with identity. An associative algebra over R, or associative R algebra, is a ring A that is also a left R module such that multiplication A × A → A is R linear in each variable. The conditions of R linearity in each variable mean that addition satisﬁes the usual distributive laws for a ring and that the following condition is to be satisﬁed relating multiplication and scalar multiplication: (ra)b = r (ab) = a(r b)

for all r in R and a, b ∈ A.

If A has an identity, the displayed condition says that all scalar multiples of the identity lie in the center of A, i.e., commute with every element of A. Examples 2 and 3, treating polynomial rings and matrix rings whose scalars lie in a commutative ring with identity, furnish examples. Every ring R is an associative Z algebra when the Z action is deﬁned so as to make the abelian group underlying the additive structure of R into a Z module. All that needs to be checked is the displayed formula. For n = 1, we have (1a)b = 1(ab) = a(1b) since the Z module R is unital. If we also have (na)b = n(ab) = a(nb) for a positive integer n, then we can add and use the appropriate distributive laws to obtain ((n + 1)a)b = (n + 1)(ab) = a((n + 1)b). Induction therefore gives (na)b = n(ab) = a(nb) for all positive integers n, and this equality extends to all integers n by using additive inverses. The associative R algebras form a category in which the morphisms from one such algebra to another are the ring homomorphisms that are also R homomorphisms. The product functor for this category is the direct product as in Example 14 with an overlay of scalar multiplication as in Example 7 of modules. The coproduct functor in the category of commutative associative R algebras with identity is more subtle and involves a tensor product over R, a notion we postpone introducing until Chapter X. (16) Group algebra RG over R. If G is a group and R is a commutative ring with identity, then we can introduce a multiplication in the free R module RG = r g s h on the elements of G by the deﬁnition i i j j i j i, j (ri s j )(gi h j ) when the ri and s j are in R and the gi and h j are in G. It is immediate that this multiplication makes the free R module into an associative R algebra with identity, and RG is called the group algebra of G over R. The special case R = Z leads to the integral group ring as in Example 12. The group G is embedded as a

378

VIII. Commutative Rings and Their Modules

× subgroup of the group (RG) of units of RG, each element of g being identiﬁed with a sum ι(g) = ri gi in which the only nonzero term is 1g. The associative R algebra RG has a universal mapping property similar to that in Figure 8.1 and given in Figure 8.4 as follows: whenever ϕ : G → A is a group homomorphism of G into the group A× of units of an associative R algebra A, then there exists a unique associative R algebra homomorphism : RG → A such that ι = ϕ. ϕ

G −−−→ A ⏐ ⏐ ι RG FIGURE 8.4. Universal mapping property of the group algebra RG. (17) Scalar-valued functions of ﬁnite support on a group, with convolution as multiplication. If G is a group and R is a commutative ring with identity, denote by C(G, R) the R module of all functions from G into R that are of ﬁnite support in the sense that each function is 0 except on a ﬁnite subset of G. This R module readily becomes an associative R algebra if ring multiplication is taken to be pointwise multiplication, but the interest here is in a different deﬁnition of multiplication. Instead, multiplication is deﬁned to be convolution with f 1 (x y −1 ) f 2 (y) = f 1 (y) f 2 (y −1 x). ( f 1 ∗ f 2 )(x) = y∈G

y∈G

The sums in question are ﬁnite because of the ﬁnite support of f 1 and f 2 , and the sums are equal by a change of variables. This multiplication was introduced in the special case R = C in Section VII.4, and the argument for associativity given there in the special case works in general. With convolution as multiplication, C(G, R) becomes an associative R algebra with identity. Problem 14 at the end of the chapter asks for a veriﬁcation that the mapping g → f g with 1 for x = g, f g (x) = 0 for x = g, extends to an R algebra isomorphism of RG onto C(G, R).

2. Integral Domains and Fields of Fractions For the remainder of the chapter we work with commutative rings only. In several of the sections, including this one, the commutative ring will be an integral domain, i.e., a nonzero commutative ring with identity and with no zero divisors.

2. Integral Domains and Fields of Fractions

379

In this section we show how an integral domain can be embedded canonically in a ﬁeld. This embedding is handy for recognizing certain facts about integral domains as consequences of facts about ﬁelds. For example Proposition 4.28b established that if R is a nonzero integral domain and if A(X ) is a polynomial in R[X ] of degree n > 0, then A(X ) has at most n roots. Since the coefﬁcients of the polynomial can be considered to be members of the larger ﬁeld that contains R, this result is an immediate consequence of the corresponding fact about ﬁelds (Corollary 1.14). The prototype is the construction of the ﬁeld Q of rationals from the integral domain Z of integers as in Section A3 of the appendix, in which one thinks of ab as a pair (a, b) with b = 0 and then identiﬁes pairs by saying that ab = dc if and only if ad = bc. We proceed in the same way in the general case. Thus let R be a nonzero integral domain, form the set 3 = {(a, b) | a ∈ R, b ∈ R, b = 0}, F and impose the equivalence relation (a, b) ∼ (c, d) if ad = bc. The relation ∼ is certainly reﬂexive and symmetric. To see that it is transitive, suppose that (a, b) ∼ (c, d) and (c, d) ∼ (e, f ). Then ad = bc and c f = de, and these together force ad f = bc f = bde. In turn, this implies a f = be since R is an integral domain and d is assumed = 0. Thus ∼ is transitive and is an equivalence relation. Let F be the set of equivalence classes. 3 is (a, b)+(c, d) = (ad+bc, bd), the expression The deﬁnition of addition in F we get by naively clearing fractions, and we want to see that addition is consistent with the equivalence relation. In checking this, we need change only one of the pairs at a time. Thus suppose that (a , b ) ∼ (a, b) and that (c, d) is given. We know that a b = ab , and we want to see that (ad + bc, bd) ∼ (a d + b c, b d), i.e., that (ad + bc)b d = (a d + b c)bd. In other words, we are to check that adb d = a dbd; we see immediately that this equality is valid since ab = a b. Consequently addition is consistent with the equivalence relation and descends to be deﬁned on the set F of equivalence classes. Taking into account the properties satisﬁed by members of an integral domain, 3 and it follows we check directly that addition is commutative and associative on F, that addition is commutative and associative on F. 3 and hence the The element (0, 1) is a two-sided identity for addition in F, class of (0, 1) is a two-sided identity for addition in F. We denote this class by 0. Let us identify this class. A pair (a, b) is in the class of (0, 1) if and only if 0 · b = 1 · a, hence if and only if a = 0. In other words, the class of (0, 1) consists of all (0, b) with b = 0. 3 we have (a, b) + (−a, b) = (ab + b(−a), bb) = (0, b2 ) ∼ (0, 1), and In F, therefore the class of (−a, b) is a two-sided inverse to the class of (a, b) under

380

VIII. Commutative Rings and Their Modules

addition. Consequently F is an abelian group under addition. 3 is (a, b)(c, d) = (ac, bd), and it is The deﬁnition of multiplication in F routine to see that this deﬁnition is consistent with the equivalence relation. Therefore multiplication descends to be deﬁned on F. We check by inspection 3 and it follows that it is that multiplication is commutative and associative on F, commutative and associative on F. The element (1, 1) is a two-sided identity for 3 and the class of (1, 1) is therefore a two-sided identity for multiplication in F, multiplication in F. We denote this class by 1. If (a, b) is not in the class 0, then a = 0, as we saw above. Then ab = 0, and we have (a, b)(b, a) = (ab, ab) ∼ (1, 1) = 1. Hence the class of (b, a) is a two-sided inverse of the class of (a, b) under multiplication. Consequently the nonzero elements of F form an abelian group under multiplication. For one of the distributive laws, the computation (a, b)((c, d) + (e, f )) = (a, b)(c f + de, d f ) = (a(c f + de), bd f ) = (ac f + ade, bd f ) ∼ (acb f + bdae, b2 d f ) = (ac, bd) + (ae, b f ) = (a, b)(c, d) + (a, b)(e, f ) shows that the classes of (a, b)((c, d) + (e, f )) and of (a, b)(c, d) + (a, b)(e, f ) are equal. The other distributive law follows from this one since F is commutative under multiplication. Therefore F is a ﬁeld. The ﬁeld F is called the ﬁeld of fractions of the integral domain R. The function η : R → F deﬁned by saying that η(r ) is the class of (η, 1) is easily checked to be a homomorphism of rings sending 1 to 1. It is one-one. Let us call it the canonical embedding of R into F. The pair (F, η) has the universal mapping property stated in Proposition 8.6 and illustrated in Figure 8.5. ϕ

R −−−→ F ⏐ ⏐ η ϕ F FIGURE 8.5. Universal mapping property of the ﬁeld of fractions of R. Proposition 8.6. Let R be a nonzero integral domain, let F be its ﬁeld of fractions, and let η be the canonical embedding of R into F. Whenever ϕ is a one-one ring homomorphism of R into a ﬁeld F carrying 1 to 1, then there exists ϕ η, and 3 ϕ is one-one a unique ring homomorphism 3 ϕ : F → F such that ϕ = 3 as a homomorphism of ﬁelds. ϕ is the extension of ϕ from R to F. Once this REMARK. We say that 3 proposition has been proved, it is customary to drop η from the notation and regard R as a subring of its ﬁeld of fractions.

3. Prime and Maximal Ideals

381

3 we deﬁne (a, b) = ϕ(a)ϕ(b)−1 . PROOF. If (a, b) with b = 0 is a pair in F, This is well deﬁned since b = 0 and since ϕ, being one-one, cannot have ϕ(b) = 0. Let us see that is consistent with the equivalence relation, i.e., that (a, b) ∼ (a , b ) implies (a, b) = (a , b ). Since (a, b) ∼ (a , b ), we have ab = a b and therefore also ϕ(a)ϕ(b ) = ϕ(a )ϕ(b) and (a, b) = ϕ(a)ϕ(b)−1 = ϕ(a )ϕ(b )−1 = (a , b ), as required. We can thus deﬁne 3 ϕ of the class of (a, b) to be (a, b), and 3 ϕ is well deﬁned ϕ (η(r )) = 3 ϕ (class of (r, 1)) = as a function from F to F . If r is in R, then 3 (r, 1) = ϕ(r )ϕ(1)−1 , and this equals ϕ(r ) since ϕ is assumed to carry 1 into 1. Therefore 3 ϕ η = ϕ. For uniqueness, let the class of (a, b) be given in F. Since b is nonzero, this class is the same as the class of (a, 1)(b, 1)−1 , which equals η(a)η(b)−1 . Since (3 ϕ η)(a) = ϕ(a) and (3 ϕ η)(b) = ϕ(b), we must have 3 ϕ (class of (a, b)) = ϕ. 3 ϕ (η(a))3 ϕ (η(b))−1 = ϕ(a)ϕ(b)−1 . Therefore ϕ uniquely determines 3 If K is a ﬁeld, then R = K[X ] is an integral domain, and Proposition 8.6 applies to this R. The ﬁeld of fractions consists in effect of formal rational expressions P(X )Q(X )−1 in the indeterminate X , with the expected identiﬁcations made. We write K(X ) for this ﬁeld of fractions. More generally the ﬁeld of fractions of the integral domain K[X 1 , . . . , X n ] consists of formal rational expressions in the indeterminates X 1 , . . . , X n , with the expected identiﬁcations made, and is denoted by K(X 1 , . . . , X n ). 3. Prime and Maximal Ideals In this section, R will denote a commutative ring, not necessarily having an identity. We shall introduce the notions of “prime ideal” and “maximal ideal,” and we shall investigate relationships between these two notions. A proper ideal I in R is prime if ab ∈ I implies a ∈ I or b ∈ I . The ideal I = R is not prime, by convention.5 We give three examples of prime ideals; a fourth example will be given in a proposition immediately afterward. EXAMPLES. (1) For Z, it was shown in an example just before Proposition 4.21 that each ideal is of the form mZ for some integer m. We may assume that m ≥ 0. The prime ideals are 0 and all pZ with p prime. To see this latter fact, consider mZ with m ≥ 2. If m = ab nontrivially, then neither a nor b is in I , but ab is in I ; hence I is not prime. Conversely if m is prime, and if ab is in I = mZ, then 5 This convention is now standard. Books written before about 1960 usually regarded I = R as a prime ideal. Correspondingly they usually treated the zero ring as an integral domain.

382

VIII. Commutative Rings and Their Modules

ab = mc for some integer c. Since m is prime, Lemma 1.6 shows that m divides a or m divides b. Hence a is in I or b is in I . Therefore I is prime. (2) If K is a ﬁeld, then each ideal in R = K[X ] is of the form A(X )K[X ] with A(X ) in K[X ], and A(X )K[X ] is prime if and only if A(X ) is 0 or is a prime polynomial. In fact, each ideal is of the form A(X )K[X ] by Proposition 5.8. If A(X ) is not a constant polynomial, then the argument that A(X )K[X ] is prime if and only if the polynomial A(X ) is prime proceeds as in Example 1, using Lemma 1.16 in place of Lemma 1.6. (3) In R = Z[X ], the structure of the ideals is complicated, and we shall not attempt to list all ideals. Let us observe simply that the ideal I = X Z[X ] is prime. In fact, if A(X )B(X ) is in X Z[X ], then A(X )B(X ) = XC(X ) for some C(X ) in Z[X ]. If the constant terms of A(X ) and B(X ) are a0 and b0 , this equation says that a0 b0 = 0. Therefore a0 = 0 or b0 = 0. In the ﬁrst case, A(X ) = X P(X ) for some P(X ), and then A(X ) is in I ; in the second case, B(X ) = X Q(X ) for some Q(X ), and then B(X ) is in I . We conclude that I is prime. Proposition 8.7. An ideal I in the commutative ring R is prime if and only if R/I is an integral domain. / I and PROOF. If a proper ideal I fails to be prime, choose ab in I with a ∈ b∈ / I . Then a + I and b + I are nonzero in R/I and have product 0 + I . So R/I is nonzero and has a zero divisor; by deﬁnition, R/I fails to be an integral domain. Conversely if R/I (is nonzero and) has a zero divisor, choose a + I and b + I nonzero with product 0 + I . Then neither a nor b is in I but ab is in I . Since I is certainly proper, I is not prime. A proper ideal I in the commutative ring R is said to be maximal if R has no proper ideal J with I J . If the commutative ring R has an identity, a simple way of testing whether an ideal I is proper is to check whether 1 is in I ; in fact, if 1 is in I , then I ⊇ R I ⊇ R1 = R implies I = R. Maximal ideals exist in abundance when R is nonzero and has an identity, as a consequence of the following result. Proposition 8.8. In a commutative ring R with identity, any proper ideal is contained in a maximal ideal. PROOF. This follows from Zorn’s Lemma (Section A5 of the appendix). Speciﬁcally let I be the given proper ideal, and form the set S of all proper ideals that contain I . This set is nonempty, containing I as a member, and we order it by inclusion upward. If we have a chain in S, then the union of the members of the chain is an ideal that contains all the ideals in the chain, and it is

3. Prime and Maximal Ideals

383

proper since it does not contain 1. Therefore the union of the ideals in the chain is an upper bound for the chain. By Zorn’s Lemma the set S has a maximal element, and any such maximal element is a maximal ideal containing I . Lemma 8.9. If R is a nonzero commutative ring with identity, then R is a ﬁeld if and only if the only proper ideal in R is 0. PROOF. If R is a ﬁeld and I is a nonzero ideal in R, let a = 0 be in I . Then 1 = aa −1 is in I , and consequently I = R. Conversely if the only ideals in R are 0 and R, let a = 0 be given in R, and form the ideal I = a R. Since 1 is in R, a is in I . Thus I = 0. Then I must be R. So there exists some b in R with 1 = ba, and a is exhibited as having the inverse b. Proposition 8.10. If R is a commutative ring with identity, then an ideal I is maximal if and only if R/I is a ﬁeld. REMARK. One can readily give a direct proof, but it seems instructive to give a proof reducing the result to Lemma 8.9. PROOF. We consider R and R/I as unital R modules, the ideals for each of R and R/I being the R submodules. The quotient ring homomorphism R → R/I is an R homomorphism. By the First Isomorphism Theorem for modules (Theorem 8.3), there is a one-one correspondence between the ideals in R containing I and the ideals in R/I . Then the result follows immediately from Lemma 8.9. Corollary 8.11. If R is a commutative ring with identity, then every maximal ideal is prime. PROOF. If I is maximal, then R/I is a ﬁeld by Proposition 8.10. Hence R/I is an integral domain, and I must be prime by Proposition 8.7. In the converse direction nonzero prime ideals need not be maximal, as the following example shows. However, Proposition 8.12 will show that nonzero prime ideals are necessarily maximal in certain important rings. EXAMPLE. In R = Z[X ], we have seen that I = X Z[X ] is a prime ideal. But I is not maximal since X Z[X ] + 2Z[X ] is a proper ideal that strictly contains I . Proposition 8.12. In R = Z or R = K[X ] with K a ﬁeld, every nonzero prime ideal is maximal. PROOF. Examples 1 and 2 at the beginning of this section show that every nonzero prime ideal is of the form I = p R with p prime. If such an I is given and if J is any ideal strictly containing I , choose a in J with a not in I . Since a

384

VIII. Commutative Rings and Their Modules

is not in I = p R, it is not true that p divides a. So p and a are relatively prime, and there exist elements x and y in R with x p + ya = 1, by Proposition 1.2c or 1.15d. Since p and a are in J , so is 1. Therefore J = R, and I is not strictly contained in any proper ideal. So I is maximal. EXAMPLE. Algebraic number ﬁelds Q[θ]. These were introduced brieﬂy in Chapter IV and again in Section 1 as the Q linear span of all powers 1, θ, θ 2 , . . . . Here θ is a nonzero complex number, and we make the assumption that Q[θ] is a ﬁnite-dimensional vector space over Q. Proposition 4.1 showed that Q[θ] is then indeed a ﬁeld. Let us see how this conclusion relates to the results of the present section. In fact, write a nontrivial linear dependence of 1, θ, θ 2 , . . . over Q in the form c0 + c1 θ + c2 θ 2 + · · · + cn−1 θ n−1 + θ n = 0. Without loss of generality, suppose that this particular linear dependence has n as small as possible among all such relations. Then θ is a root of P(X ) = c0 + c1 X + c2 X 2 + · · · + cn−1 X n−1 + X n . Consider the substitution homomorphism E : Q[X ] → C given by E(A(X )) = A(θ ). This ring homomorphism carries Q[X ] onto the ring Q[θ], and the kernel is some ideal I . Speciﬁcally I consists of all polynomials A(X ) with A(θ ) = 0, and P(X ) is one of these of lowest possible degree. Proposition 5.8 shows that I consists of all multiples of some polynomial, and that polynomial may be taken to be P(X ) by minimality of the integer n. Proposition 8.1 therefore shows that Q[θ] ∼ = Q[X ]/P(X )Q[X ] as a ring. If P(X ) were to have a nontrivial factorization as P(X ) = Q 1 (X )Q 2 (X ), then P(θ ) = 0 would imply Q 1 (θ ) = 0 or Q 2 (θ ) = 0, and we would obtain a contradiction to the minimality of n. Therefore P(X ) is prime. By Example 2 earlier in the section, I = P(X )Q[X ] is a nonzero prime ideal, and Proposition 8.12 shows that it is maximal. By Proposition 8.10 the quotient ring Q[θ] = Q[X ]/P(X )Q[X ] is a ﬁeld. These computations with Q[θ] underlie the ﬁrst part of the theory of ﬁelds that we shall develop in Chapter IX.

4. Unique Factorization We have seen that the positive members of Z and the nonzero members of K[X ], when K is a ﬁeld, factor into the products of “primes” and that these factorizations are unique up to order and up to adjusting each of the prime factors in K[X ] by a unit. In this section we shall investigate this idea of unique factorization more generally. Zero divisors are problematic from the point of view of factorization, and it will be convenient to exclude them. Therefore we work exclusively with integral domains.

4. Unique Factorization

385

The ﬁrst observation is that unique factorization is not a completely general notion for integral domains. Let us consider an example in detail. √ members EXAMPLE. R √ √ are of √ = Z[ −5 ]. This is the subring of C whose the form a + b −5 with√a and b integers. Since (a + b −5 )(c + d −5 ) = (ac − 5cd) + (ad + bc) −5, R is closed under multiplication and is indeed a √ √ √ subring. Deﬁne N (a + b −5 ) = a 2 + 5b2 = (a + b −5 )(a + b −5). This is a nonnegative-integer-valued function on R and is 0 only on the 0 element of R. Since complex conjugation is an automorphism of C, we check immediately that √ √ √ √ N (a + b −5 )(c + d −5 ) = N (a + b −5 )N (c + d −5 ). The group of units of R, i.e., of elements with inverses under multiplication, is denoted by R × as usual. If r is in R × , then rr −1 = 1, and so N (r )N (r −1 ) = N (1) = 1. Consequently the units r of R all have N (r ) = 1. Setting a 2 +5b2 = 1, we see that the units are ±1. The product formula for N shows that if we start factoring a member of R, then factor its factors, and so on, and if we forbid factorizations into two factors when one is a unit, then the process of factorization has to stop at some point. So complete factorization makes sense. Now consider the equality √ √ 6 = (1 + −5 )(1 − −5 ) = 2 · 3. √ √ The factors here have N (1 + −5 ) = N (1 − −5 ) = 6, N (2) = 4, and N (3) = 9. Considering the possible values of a 2 + 5b2 , we see√that N ( · )√does not take on either of the values 2 and 3 on R. Consequently 1 + −5, 1 − −5, 2, and 3 do not have nontrivial factorizations. On the other hand, consideration √ of the values of N ( · ) shows that 2 and 3 are not products of either of 1 ± −5 with units. We conclude that the displayed factorizations of 6 show that unique factorization has failed. Thus unique factorization is not universal for integral domains. It is time to be careful about terminology. With Z and K[X ], we have referred to the individual factors in a complete factorization as “primes.” Their deﬁning property in Chapter I was that they could not be factored further in nontrivial fashion. Primes in these rings were shown to have the additional property that if a prime divides a product then it divides one of the factors. It is customary to separate these two properties for general integral domains. Let us say that a nonzero element a divides b if b = ac for some c. In this case we say also that a is a factor of b. In an integral domain R, a nonzero element r that is not a unit is said to be irreducible if every factorization r = r1r2 in R has the property that either r1 or r2 is a unit. Nonzero nonunits that are not irreducible are said

386

VIII. Commutative Rings and Their Modules

to be reducible. A nonzero element p that is not a unit is said to be prime6 if the condition that p divides a product ab always implies that p divides a or p divides b. Prime implies irreducible. In fact, if p is a prime that is reducible, let us write p = r1r2 with neither r1 nor r2 equal to a unit. Since p is prime, p divides r1 or r2 , say r1 . Then r1 = pc with c in R, and we obtain p = r1r2 = pcr2 . Since R is an integral domain, 1 = cr2 , and r2 is exhibited as a unit with inverse c, in contradiction to the assumption that r2 is not a unit. On the other hand, √ irreducible does not imply √ prime. In fact, we saw √ in √ Z[ −5 ] that 1 + −5 is irreducible. But 1 + √ −5 divides 2 · 3, and 1 + −5 does not divide either of 2 or 3. Therefore 1 + −5 is not prime. We shall see in a moment that the distinction between “irreducible” and “prime” lies at the heart of the question of unique factorization. Let us make a deﬁnition that helps identify our problem precisely. We say that an integral domain R is a unique factorization domain if R has the two properties (UFD1) every nonzero nonunit of R is a ﬁnite product of irreducible elements, (UFD2) the factorization in (UFD1) is always unique up to order and to multiplication of the factors by units. The problem that arises for us for a given R is to decide whether R is a unique factorization domain. The following proposition shows the relevance of the distinction between “irreducible” and “prime.” Proposition 8.13. In an integral domain R in which (UFD1) holds, the condition (UFD2) is equivalent to the condition (UFD2 ) every irreducible element is prime. REMARKS. In fact, showing that irreducible implies prime was the main step in Chapter I in proving unique factorization for positive integers and for K[X ] when K is a ﬁeld. The mechanism for carrying out the proof that irreducible implies prime for those settings will be abstracted in Theorems 8.15 and 8.17. PROOF. Suppose that (UFD2) holds, that p is an irreducible element, and that p divides ab. We are to show that p divides a orp divides b. We may assume that ab = 0. Write ab = pc, and let a = i pi , b = j p j , and c = k qk be factorizations via (UFD1) into products of irreducible elements. 6 This deﬁnition enlarges the deﬁnition of “prime” in Z to include the negatives of the usual prime numbers. Unique factorization immediately extends to nonzero integers of either sign, but the prime factors are now determined only up to factors of ±1. In cases where confusion about the sign of an integer prime might arise, the text will henceforth refer to “primes of Z” or “integer primes” when both signs are allowed, and to “positive primes” or “prime numbers” when the primes are understood to be as in Chapter I.

4. Unique Factorization

387

Then i, j pi p j = p k qk . By (UFD2) one of the factors on the left side is εp for some unit ε. Then p either is of the form ε −1 pi and then p divides a, or is of the form ε −1 p j and then p divides b. Hence (UFD2 ) holds. Conversely suppose that (UFD2 ) holds. Let the nonzero nonunit r have two factorizations into irreducible elements as r = p1 p2 · · · pm = ε0 q1 q2 · · · qn with m ≤ n and with ε0 a unit. We prove the uniqueness by induction on m, the case m = 0 being trivial and the case m = 1 following from the deﬁnition of “irreducible.” Inductively from (UFD2 ) we know that pm divides qk for some k. Since qk is irreducible, qk = εpm for some unit ε. Thus we can cancel qk and obtain p1 p2 · · · pm−1 = ε0 εq1 q2 · · · qk · · · qn , the hat indicating an omitted factor. By induction the factors on the two sides here are the same except for order and units. Thus the same conclusion is valid when comparing the two sides of the equality p1 p2 · · · pm = ε0 q1 q2 · · · qn . The induction is complete, and (UFD2) follows. It will be convenient to simplify our notation for ideals. In any commutative ring R with identity, if a is in R, we let (a) denote the ideal Ra generated by a. An ideal of this kind with a single generator is called a principal ideal. More generally, if a1 , . . . , an are members of R, then (a1 , . . . , an ) denotes the ideal Ra1 + · · · + Ran generated by a1 , . . . , an . For example, in Z[X ], (2, X ) denotes the ideal 2Z + X Z of all polynomials whose constant term is even. The following condition explains a bit the mystery of what it means for an element to be prime. Proposition 8.14. A nonzero element p in an integral domain R is prime if and only if the ideal ( p) in R is prime. PROOF. Suppose that the element p is prime. Then the ideal ( p) is not R; in fact, otherwise 1 would have to be of the form 1 = r p for some r ∈ R, r would be a multiplicative inverse of p, and p would be a unit. Now suppose that a product ab is in the ideal ( p). Then ab = pr for some r in R, and p divides ab. Since p is prime, p divides a or p divides b. Therefore the ideal ( p) is prime. Conversely suppose that ( p) is a prime ideal with p = 0. Since ( p) = R, p is not a unit. If p divides the product ab, then ab = pc for some c in R. Hence ab is in ( p). Since ( p) is assumed prime, either a is in ( p) or b is in ( p). In the ﬁrst case, p divides a, and in the second case, p divides b. Thus the element p is prime. An integral domain R is called a principal ideal domain if every ideal in R is principal. At the beginning of Section 3, we saw a reminder that Z is a principal ideal domain and that so is K[X ] whenever K is a ﬁeld. It turns out that unique factorization for these cases is a consequence of this fact.

388

VIII. Commutative Rings and Their Modules

Theorem 8.15. Every principal ideal domain is a unique factorization domain. REMARKS. Let R be the given principal ideal domain. Proposition 8.13 shows that it is enough to show that (UFD1) and (UFD2 ) hold in R. PROOF OF (UFD1). Let a1 be a nonzero nonunit of R. If a1 is not irreducible, then a1 has a factorization a1 = a2 b2 in which neither a2 nor b2 is a unit. If a2 is not irreducible, then a2 has a factorization a2 = a3 b3 in which neither a3 nor b3 is a unit. We continue in this way as long as it is possible to do so. Let us see that this process cannot continue indeﬁnitely. Assume the contrary. The equality a1 = a2 b2 with b2 not a unit says that a1 is in the ideal (a2 ) and a2 is not in the ideal (a1 ). Arguing in this way with a2 , a3 , and so on, we obtain (a1 ) (a2 ) (a3 ) · · · . Let I = ∞ n=1 (an ). Then I is an ideal. Since R is a principal ideal domain, I = (a) for some a. This element a must be in (ak ) for some k, and then we have (ak ) = (ak+1 ) = · · · = (a). This is a contradiction, and hence the process does not continue indeﬁnitely. Therefore some irreducible element c1 , namely the element ak in the above argument, divides a1 . Write a1 = c1 a2 , and repeat the above argument with a2 . Iterating this construction, we obtain an = cn an+1 for each n with cn irreducible. Thus a1 = c1 c2 · · · cn an+1 with c1 , . . . , cn irreducible. Let us see that this process cannot continue indeﬁnitely. Assuming the contrary, we are led to the strict inclusions (a1 ) (a2 ) (a3 ) · · · . Again we cannot have such an inﬁnite chain of strict inclusions in a principal ideal domain, and we must have (an ) = (an+1 ) at some stage. Then cn has to be a unit, contradiction. Thus an has no nontrivial factorization, and a1 = c1 · · · cn−1 an is the desired factorization. This proves (UFD1). PROOF OF (UFD2 ). If p is an irreducible element, we prove that the ideal ( p) is maximal. Corollary 8.11 shows that ( p) is prime, and Proposition 8.14 shows that p is prime. Thus (UFD2 ) will follow. The element p, being irreducible, is not a unit. Thus ( p) is proper. Suppose that I is an ideal with I ( p). Since R is a principal ideal domain, I = (c) for some c. Then p = r c for some r in R. Since I = ( p), r cannot be a unit. Therefore the irreducibility of p implies that c is a unit. Then I = (c) = (1) = R, and we conclude that ( p) is maximal. Let us record what is essentially a corollary of the proof.

4. Unique Factorization

389

Corollary 8.16. In a principal ideal domain, every nonzero prime ideal is maximal. PROOF. Let ( p) be a nonzero prime ideal. Proposition 8.14 shows that p is prime, and prime elements are automatically irreducible. The proof of the uniqueness part of Theorem 8.15 then deduces in the context of a principal ideal domain that ( p) is maximal. Principal ideal domains arise comparatively infrequently, and recognizing them is not necessarily easy. The technique that was used with Z and K[X ] generalizes slightly, and we take up that generalization now. An integral domain R is called a Euclidean domain if there exists a function δ : R → {integers ≥ 0} such that whenever a and b are in R with b = 0, there exist q and r in R with a = bq + r and δ(r ) < δ(b). The ring Z of integers is a Euclidean domain if we take δ(n) = |n|, and the ring K[X ] for K a ﬁeld is a Euclidean domain if we take δ(P(X )) to be 2deg P if P(X ) = 0 and to be 0 if P(X ) =√0. √ Another example of a Euclidean √ Z[ −1 ]√= Z + Z 2−1 of √domain is the ring = (a + b −1 )(a − b −1 ) = a + b2 , Gaussian integers. It has δ(a + b −1 ) √ a and b being integers. Let us abbreviate −1 as i. To see that δ has the required property, we ﬁrst extend δ to Q[i], writing δ(x + yi) = (x + yi)(x − yi) = x 2 + y 2 if x and y are rational. We use the fact that δ(zz ) = δ(z)δ(z )

for z and z in Q[i],

which follows from the computation δ(zz ) = zz · zz = zzz z = δ(z)δ(z ). For real number u, let [u] be the greatest integer ≤ u. Every real u satisﬁes any [u + 1 ] − u ≤ 1 . Given a + ib and c + di with c + di = 0, we write 2 2 a + bi (a + bi)(c − di) ac + bd bc − ad = = 2 + 2 i. 2 2 2 c + di c +d c +d c + d2 < ; < ; 1 bc−ad 1 + + , q = Put p = ac+bd 2 2 , and r +si = (a +bi)−(c+di)( p +qi). c2 +d 2 c2 +d 2 Then a + bi = (c + di)( p + qi) + (r + si), and a + bi − ( p + qi) . δ(r + si) = δ (a + bi) − (c + di)( p + qi) = δ(c + di)δ c + di ac+bd bc−ad The complex number x + yi = a+bi c+di − ( p + qi) = c2 +d 2 − p + c2 +d 2 − q i has |x| ≤ 12 and |y| ≤ 12 , and therefore δ(x + yi) = x 2 + y 2 ≤ 14 + 14 = 12 . Hence δ(r + si) < δ(c + di), as required.

390

VIII. Commutative Rings and Their Modules

Some further examples of this kind appear in Problems 13 √ and 25–26 at the end of the chapter. The matter is a little√delicate. The ring Z[ −5 ] may seem √ superﬁcially similar to Z[ −1 ]. But Z[ −5 ] does not have unique factorization, and√the following theorem, in combination with Theorem 8.15, assures us that Z[ −5 ] cannot be a Euclidean domain. Theorem 8.17. Every Euclidean domain is a principal ideal domain. PROOF. Let I be an ideal in R. We are to show that I is principal. Without loss of generality, we may assume that I = 0. Choose b = 0 in I with δ(b) as small as possible. Certainly I ⊇ (b). If a = 0 is in I , write a = bq + r with δ(r ) < δ(b). Then r = a − bq is in I with δ(r ) < δ(b). The minimality of b forces r = 0 and a = bq. Thus I ⊆ (b), and we conclude that I = (b).

5. Gauss’s Lemma In the previous section we saw that every principal ideal domain has unique factorization. In the present section we shall establish that certain additional integral domains have unique factorization, namely any integral domain R[X ] for which R is a unique factorization domain. A prototype is Z[X ], which will be seen to have unique factorization even though there exist nonprincipal ideals like (2, X ) in the ring. An important example for applications, particularly in algebraic geometry, is K[X 1 , . . . , X n ], where K is a ﬁeld; in this case our result is to be applied inductively, making use of the isomorphism K[X 1 , . . . , X n ] ∼ = K[X 1 , . . . , X n−1 ][X n ] given in Corollary 4.31. For the conclusion that R[X ] has unique factorization if R does, the heart of the proof is application of a result known as Gauss’s Lemma, which we shall prove in this section. Gauss’s Lemma has additional consequences for R[X ] beyond unique factorization, and we give them as well. Before coming to Gauss’s Lemma, let us introduce some terminology and prove one preliminary result. In any integral domain R, we call two nonzero elements a and b associates if a = bε for some ε in the group R × of units. The property of being associates is an equivalence relation because R × is a group. Still with the nonzero integral domain R, let us deﬁne a greatest common divisor of two nonzero elements a and b to be any element c of R such that c divides both a and b and such that any divisor of a and b divides c. Any associate of a greatest common divisor of a and b is another greatest common divisor of a and b. Conversely if a and b have a greatest common divisor, then any two greatest common divisors are associates. In fact, if c and c are greatest common divisors, then each of them divides both a and b, and the deﬁnition forces each

5. Gauss's Lemma

391

of them to divide the other. Thus c = cε and c = c ε , and then c = c ε ε and 1 = ε ε. Consequently ε is a unit, and c and c are associates. If R is a unique factorization domain, then any two nonzero elements a and b have a greatest common divisor. In fact, we decompose a and b into the product m piki and of a unitby powers of nonassociate irreducible elements as a = ε i=1 n lj b = ε j=1 p j . For each p j such that p j is associate to some pi , we replace p j by pi in the factorization of b, adjusting ε as necessary, and then we reorder the factors of a and b so that the common pi ’s are the ones for 1 ≤ i ≤ r . Then c = ri=1 pimin(ki ,li ) is a greatest common divisor of a and b. We write GCD(a, b) for a greatest common divisor of a and b; as we saw above, this is well deﬁned up to a factor of a unit.7 One should not read too much into the notation. In a principal ideal domain if a and b are nonzero, then, as we shall see momentarily, GCD(a, b) is deﬁned by the condition on ideals that (GCD(a, b)) = (a, b). This condition implies that there exist elements x and y in R such that xa + yb = GCD(a, b). However, in the integral domain Z[X ], in which GCD(2, X ) = 1, there do not exist polynomials A(X ) and B(X ) with A(X )2 + B(X )X = 1. To prove that (GCD(a, b)) = (a, b) in a principal ideal domain, write (c) for the principal ideal (a, b); c satisﬁes c = xa + yb for some x and y in R. Since a and b lie in (c), a = r c and b = r c. Hence c divides both a and b. In the reverse direction if d divides a and b, then ds = a and ds = b. Hence c = xa + yb = (xs + ys )d, and d divides c. So c is indeed a greatest common divisor of a and b. In a unique factorization domain the deﬁnition of greatest common divisor immediately extends to apply to n nonzero elements, rather than just two. We readily check up to a unit that GCD(a1 , . . . , an ) = GCD GCD(a1 , . . . , an−1 ), an . Moreover, we can allow any of a2 , . . . , an to be 0, and there is no difﬁculty. In addition, we have GCD(da1 , . . . , dan ) = d GCD(a1 , . . . , an )

up to a unit

if d and a1 are not 0. Let R be a unique factorization domain. If A(X ) is a nonzero element of R[X ], we say that A(X ) is primitive if the GCD of its coefﬁcients is a unit. In this case no prime of R divides all the coefﬁcients of A(X ). 7 Greatest common divisors can exist for certain integral domains that fail to have unique factorization, but we shall not have occasion to work with any such domains.

392

VIII. Commutative Rings and Their Modules

Theorem 8.18 (Gauss’s Lemma). If R is a unique factorization domain, then the product of primitive polynomials is primitive. PROOF #1. Arguing by contradiction, let A(X ) = am X m + · · · + a0 and B(X ) = bn X n + · · · + b0 be primitive polynomials such that every coefﬁcient of A(X )B(X ) is divisible by some prime p. Since A(X ) and B(X ) are primitive, we may choose k and l as small as possible such that p does not divide ak and does not divide bl . The coefﬁcient of X k+l in A(X )B(X ) is a0 bk+l + a1 bk+l−1 + · · · + ak bl + · · · + ak+l b0 and is divisible by p. Then all the individual terms, and their sum, are divisible by p except possibly for ak bl , and we conclude that p divides ak bl . Since p is prime and p divides ak bl , p must divide ak or bl , contradiction. PROOF #2. Arguing by contradiction, let A(X ) and B(X ) be primitive polynomials such that every coefﬁcient of A(X )B(X ) is divisible by some prime p. Proposition 8.14 shows that the ideal ( p) is prime, and Proposition 8.7 shows that R = R/( p) is an integral domain. Let ϕ : R → R [X ] be the composition of the quotient homomorphism R → R and the inclusion of R into constant polynomials in R [X ], and let : R[X ] → R [X ] be the corresponding substitution homomorphism of Proposition 4.24 that carries X to X . Since A(X ) and B(X ) are primitive, (A(X )) and (B(X )) are not zero. Their product (A(X ))(B(X )) = (A(X )B(X )) is 0 since p divides every coefﬁcient of A(X )B(X ), and this conclusion contradicts the assertion of Proposition 4.29 that R [X ] is an integral domain. Let F be the ﬁeld of fractions of the unique factorization domain R. The consequences of Theorem 8.18 exploit a simple relationship between R[X ] and F[X ], which we state below as Proposition 8.19. Once that proposition is in hand, we can state the consequences of Theorem 8.18. If A(X ) is a nonzero polynomial in R[X ], let c(A) to be the greatest common divisor of the coefﬁcients, i.e., c(A) = GCD(an , . . . , a1 , a0 )

if A(X ) = an X n + · · · + a1 X + a0 .

The element c(A) is well deﬁned up to a factor of a unit. In this notation the deﬁnition of “primitive” becomes, A(X ) is primitive if and only if c(A) is a unit. If A(X ) is not necessarily primitive, then at least c(A) divides each coefﬁcient of A(X ), and hence c(A)−1 A(X ) is in R[X ], say with coefﬁcients bn , . . . , b1 , b0 . Then we have c(A) = GCD(an , . . . , a1 , a0 ) = GCD(c(A)bn , . . . , c(A)b1 , c(A)b0 ) = c(A)GCD(bn , . . . , b1 , b0 ) = c(A)c c(A)−1 A(X )

5. Gauss's Lemma

393

up to a unit factor, and hence c c(A)−1 A(X ) is a unit. We conclude that A(X ) ∈ R[X ]

implies that

c(A)−1 A(X ) is primitive.

Proposition 8.19. Let R be a unique factorization domain, and let F be its ﬁeld of fractions. If A(X ) is any nonzero polynomial in F[X ], then there exist α in F and A0 (X ) in R[X ] such that A(X ) = α A0 (X ) with A0 (X ) primitive. The scalar α and the polynomial A0 (X ) are unique up to multiplication by units in R. REMARK. We call A0 (X ) the associated primitive polynomial to A(X ). According to the proposition, it is unique up to a unit factor in R. PROOF. Let A(X ) = cn X n + · · · + c1 X + c0 with each ck in F. We can write −1 That is, each ck as a k bk with ak and bk in R and bk = 0. We clear fractions. n th we let β = k=0 bk . Then the k coefﬁcient of β A(X ) is ak l =k bl and is in R. Hence β A(X ) is in R[X ]. The observation just before the proposition shows that c(β A)−1 β A is primitive. Thus A(X ) = α A0 (X ) with α = β −1 c(β A) and A0 (X ) = c(β A)−1 β A(X ), A0 (X ) being primitive. This proves existence. If α1 A1 (X ) = α2 A2 (X ) with α1 and α2 in F and with A1 (X ) and A2 (X ) primitive, choose r = 0 in R such that r α1 and r α2 are in R. Up to unit factors in R, we then have r α1 = r α1 c(A1 ) = c(r α1 A1 ) = c(r α2 A2 ) = r α2 c(A2 ) = r α2 . Hence, up to a unit factor in R, we have α1 = α2 . This proves uniqueness. Corollary 8.20. Let R be a unique factorization domain, and let F be its ﬁeld of fractions. (a) Let A(X ) and B(X ) be nonzero polynomials in R[X ], and suppose that B(X ) is primitive. If B(X ) divides A(X ) in F[X ], then it divides A(X ) in R[X ]. (b) If A(X ) is an irreducible polynomial in R[X ] of degree > 0, then A(X ) is irreducible in F[X ]. (c) If A(X ) is a monic polynomial in R[X ] and if B(X ) is a monic factor of A(X ) within F[X ], then B(X ) is in R[X ]. (d) If A(X ), B(X ), and C(X ) are in R[X ] with A(X ) primitive and with A(X ) = B(X )C(X ), then B(X ) and C(X ) are primitive. PROOF. In (a), write A(X ) = B(X )Q(X ) in F(X ), and let Q(X ) = ρ Q 0 (X ) be a decomposition of Q(X ) as in Proposition 8.19. Since c(A)−1 A(X ) is primitive, the corresponding decomposition of A(X ) is A(X ) = c(A) c(A)−1 A(X ) . The equality A(X ) = ρ B(X )Q 0 (X ) then reads c(A)(c(A)−1 A(X )) = ρ B(X )Q 0 (X ). Since B(X )Q 0 (X ) is primitive according to Theorem 8.18, the uniqueness in Proposition 8.19 shows that c(A)−1 A(X ) = B(X )Q 0 (X ) except possibly for a unit factor in R. Then B(X ) divides A(X ) with quotient c(A)Q 0 (X ), apart from a unit factor in R. Since c(A)Q 0 (X ) is in R[X ], (a) is proved.

394

VIII. Commutative Rings and Their Modules

In (b), the condition that deg A(X ) > 0 implies that A(X ) is not a unit in F[X ]. Arguing by contradiction, suppose that A(X ) = B(X )Q(X ) in F[X ] with neither of B(X ) and Q(X ) of degree 0. Let B(X ) = β B0 (X ) be a decomposition of B(X ) as in Proposition 8.19. Then we have A(X ) = B0 (X )(β Q(X )), and (a) shows that β Q(X ) is in R[X ], in contradiction to the assumed irreducibility of A(X ) in R[X ]. In (c), write A(X ) = B(X )Q(X ), and let B(X ) = β B0 (X ) be a decomposition of B(X ) as in Proposition 8.19. Then we have A(X ) = B0 (X )(β Q(X )) with β Q(X ) in F[X ]. Conclusion (a) shows that β Q(X ) is in R[X ]. If b ∈ R is the leading coefﬁcient of B0 (X ) and if q ∈ R is the leading coefﬁcient of β Q(X ), then we have 1 = bq, and consequently b and q are units in R. Since B(X ) = β B0 (X ) and B(X ) is monic, 1 = βb, and therefore β = b−1 is a unit in R. Hence B(X ) is in R[X ]. In (d), we argue along the same lines as in (a). We may take B(X ) = c(B)(c(B)−1 B(X )) and C(X ) = c(C)(c(C)−1 C(X )) as decompositions of B(X ) and *C(X ) according to Proposition 8.19. Then we have A(X ) = + (c(B)c(C)) c(B)−1 B(X )c(C)−1 C(X ) . Theorem 8.18 says that the factor in brackets is primitive, and the uniqueness in Proposition 8.19 shows that 1 = c(B)c(C), up to unit factors. Therefore c(B) and c(C) are units in R, and B(X ) and C(X ) are primitive. Corollary 8.21. If R is a unique factorization domain, then the ring R[X ] is a unique factorization domain. REMARK. As was mentioned at the beginning of the section, Z[X ] and K[X 1 , . . . , X n ], when K is a ﬁeld, are unique factorization domains as a consequence of this result. PROOF. We begin with the proof of (UFD1). Suppose that A(X ) is a nonzero member of R[X ]. We may take its decomposition according to Proposition 8.19 to be A(X ) = c(A)(c(A)−1 A(X )). Consider divisors of c(A)−1 A(X ) in R[X ]. These are all primitive, according to (d). Hence those of degree 0 are units in R. Thus any nontrivial factorization of c(A)−1 A(X ) is into two factors of strictly lower degree, both primitive. In a ﬁnite number of steps, this process of factorization with primitive factors has to stop. We can then factor c(A) within R. Combining the factorizations of c(A) and c(A)−1 A(X ), we obtain a factorization of A(X ). For (UFD2 ), let P(X ) be irreducible in R[X ]. Since the factorization P(X ) = c(P)(c(P)−1 P(X )) has to be trivial, either c(P) is a unit, in which case P(X ) is primitive, or c(P)−1 P(X ) is a unit, in which case P(X ) has degree 0. In either case, suppose that P(X ) divides a product A(X )B(X ). In the ﬁrst case, P(X ) is primitive. Since F[X ] is a principal ideal domain, hence a unique factorization domain, either P(X ) divides A(X ) in F[X ] or P(X )

5. Gauss's Lemma

395

divides B(X ) in F[X ]. By symmetry we may assume that P(X ) divides A(X ) in F[X ]. Then (a) shows that P(X ) divides A(X ) in R[X ]. In the second case, P(X ) = P has degree 0 and is prime in R. Write A(X )B(X ) = P Q(X ) with Q(X ) in R[X ]. Once more we argue along the same lines as in (a). We may take A(X ) = c(A)(c(A)−1 A(X )), B(X ) = c(B)(c(B)−1 B(X )), and Q(X ) = c(Q)(c(Q)−1 Q(X )) as the decompositions of A(X ), B(X ), and Q(X ) according to Proposition 8.19. Then we have + * c(A)c(B) c(A)−1 A(X )c(B)−1 B(X ) = Pc(Q) c(Q)−1 Q(X ) . Theorem 8.18 shows that the product in brackets is primitive, and the uniqueness in Proposition 8.19 shows that we have c(A)c(B) = Pc(Q) up to factors of units in R. Since P is prime in R, P divides c(A) or P divides c(B). By symmetry we may assume that P divides c(A). Then P divides A(X ) since c(A) divides every coefﬁcient of A(X ). The ﬁnal application, Eisenstein’s irreducibility criterion, is proved somewhat in the style of Gauss’s Lemma (Theorem 8.18). We shall give only the analog of Proof #1 of Gauss’s Lemma, leaving the analog of Proof #2 to Problem 21 at the end of the chapter. Corollary 8.22 (Eisenstein’s irreducibility criterion). Let R be a unique factorization domain, let F be its ﬁeld of fractions, and let p be a prime in R. If A(X ) = a N X N + · · · + a1 X + a0 is a polynomial of degree ≥ 1 in R[X ] such that p divides a N −1 , . . . , a0 but not a N and such that p 2 does not divide a0 , then A(X ) is irreducible in F[X ]. REMARK. The polynomial A(X ) will be irreducible in R[X ] also unless all its coefﬁcients are divisible by some nonunit of R. PROOF. Without loss of generality, we may replace A(X ) by c(A)−1 A(X ) and thereby assume that A(X ) is primitive. Corollary 8.20b shows that it is enough to prove irreducibility in R[X ]. Assuming the contrary, suppose that A(X ) factors in R[X ] as A(X ) = B(X )C(X ) with B(X ) = bm X m + · · · + b1 X + b0 , C(X ) = cn X n + · · · + c1 X + c0 , and neither of B(X ) and C(X ) equal to a unit. Corollary 8.20d shows that B(X ) and C(X ) are primitive. In particular, B(X ) and C(X ) have to be nonconstant polynomials. Deﬁne ak = 0 for k > N , bk = 0 for k > m, and ck = 0 for k > n. Since p divides a0 = b0 c0 and p is prime, p divides either b0 or c0 . Without loss of generality, suppose that p divides b0 . Since p 2 does not divide a0 , p does not divide c0 . We show, by induction on k, that p divides bk for every k < N . The case k = 0 is the base case of the induction. If p divides b j for j < k, then we have ak = b0 ck + b1 ck−1 + · · · + bk−1 c1 + bk c0 .

396

VIII. Commutative Rings and Their Modules

Since k < N , the left side is divisible by p. The inductive hypothesis shows that p divides every term on the right side except possibly the last. Consequently p divides bk c0 . Since p does not divide c0 , p divides bk . This completes the induction. Since C(X ) is nonconstant, the degree of B(X ) is < N , and therefore we have shown that every coefﬁcient of B(X ) is divisible by p. Then c(B) is divisible by p, in contradiction to the fact that B(X ) is primitive. EXAMPLES. (1) Cyclotomic polynomials in Q[X ]. Let us see for each prime number p that the polynomial (X ) = X p−1 + X p−2 + · · · + X + 1 is irreducible in Q[X ]. p −1 = We have X p − 1 = (X − 1)(X ). Replacing X − 1 by Y gives 1) p (Y p+ k Y (Y + 1). The left side, by the Binomial Theorem, is k=1 k Y . Hence p p k−1 (Y + 1) = . The binomial coefﬁcient kp is divisible by p k=1 k Y for 1 ≤ k ≤ p − 1 since p is prime, and therefore the polynomial (Y ) = (Y + 1) satisﬁes the condition of Corollary 8.22 for the ring Z. Hence (Y ) is irreducible over Q[Y ]. A nontrivial factorization of (X ) would yield a nontrivial factorization of (Y ), and hence (X ) is irreducible over Q[X ]. (2) Certain polynomials in K[X, Y ] when K is a ﬁeld. Since K[X, Y ] ∼ = K[X ][Y ], it follows that K[X, Y ] is a unique factorization domain, and any member of K[X, Y ] can be written as A(X, Y ) = an (X )Y n + · · · + a1 (X )Y + a0 (X ). The polynomial X is prime in K[X, Y ], and Corollary 8.22 therefore says that A(X, Y ) is irreducible in K(X )[Y ] if X does not divide an (X ) in K[X ], X divides an−1 (X ), . . . , a0 (X ) in K[X ], and X 2 does not divide a0 (X ) in K[X ]. The remark with the corollary points out that A(X, Y ) is irreducible in K[X, Y ] if also there is no nonconstant polynomial in K[X ] that divides every ak (X ). For example, Y 5 + X Y 2 + X Y + X is irreducible in K[X, Y ]. 6. Finitely Generated Modules The Fundamental Theorem of Finitely Generated Abelian Groups (Theorem 4.56) says that every ﬁnitely generated abelian group is a direct sum of cyclic groups. If we think of abelian groups as Z modules, we can ask whether this theorem has some analog in the context of R modules. The answer is yes—the theorem readily extends to the case that Z is replaced by an arbitrary principal ideal domain. The surprising addendum to the answer is that we have already treated a second special case of the generalized theorem. That case arises when the principal ideal domain is K[X ] for some ﬁeld K. If V is a ﬁnite-dimensional vector space over K and L : V → V is a K linear map, then V becomes a K[X ] module under the deﬁnition X v = L(v). This module is ﬁnitely generated even without the X present because V is ﬁnite-dimensional, and the generalized theorem that we

6. Finitely Generated Modules

397

prove in this section recovers the analysis of L that we carried out in Chapter V. When K is algebraically closed, we obtain the Jordan canonical form; for general K, we obtain a different canonical form involving cyclic subspaces that was worked out in Problems 32–40 at the end of Chapter V. The deﬁnitions for the generalization of Theorem 4.56 are as follows. Let R be a principal ideal domain. A subset S of an R module M is called a set of generators of M if M is the smallest R submodule of M containing all the members of S. If {m s | s ∈ S} is a subset of M, then the set of all ﬁnite sums s∈S rs m s is an R submodule, but it need not contain the elements m s and therefore need not be the R submodule generated by all the m s . However, if M and all other rs equal to 0 exhibits m s0 as in the R is unital, then taking rs0 = 1 submodule of all ﬁnite sums s∈S rs m s . For this reason we shall insist that all the R submodules in this section be unital. We say that the R module M is ﬁnitely generated if it has a ﬁnite set of generators. The main theorem gives the structure of unital ﬁnitely generated R modules when R is a principal ideal domain. We need to take a small preliminary step that eliminates technical complications from the discussion, the same step that was carried out in Lemma 4.51 and Proposition 4.52 in the case of Z modules, i.e., abelian groups. Lemma 8.23. Let R be a commutative ring with identity, and let ϕ : M → N be a homomorphism of unital R modules. If ker ϕ and image ϕ are ﬁnitely generated, then M is ﬁnitely generated. PROOF. Let {x1 , . . . , xm } and {y1 , . . . , yn } be respective ﬁnite sets of generators for ker ϕ and image ϕ. For 1 ≤ j ≤ n, choose x j in M with ϕ(x j ) = yj . We shall prove that {x1 , . . . , xm , x1 , . . . , xn } is a set of generators for M. Thus let x be in M. Since ϕ(x) is in image ϕ, there exist r1 , . . . , rn in R with ϕ(x) = r1 y1 +· · ·+rn yn . The element x = r1 x1 + · · · + rn xn of M has ϕ(x ) = r1 y1 + · · · + rn yn = ϕ(x). Therefore ϕ(x − x ) = 0, and there exist s1 , . . . , sm in R such that x − x = s1 x1 + · · · + sm xm . Consequently x = s1 x1 + · · · + sm xm + x = s1 x1 + · · · + sm xm + r1 x1 + · · · + rn xn . Proposition 8.24. If R is a principal ideal domain, then any R submodule of a ﬁnitely generated unital R module is ﬁnitely generated. Moreover, any R submodule of a singly generated unital R module is singly generated. PROOF. Let M be unital and ﬁnitely generated with a set {m 1 , . . . , m n } of n generators, and deﬁne Mk = Rm 1 + · · · + Rm k for 1 ≤ k ≤ n. Then Mn = M since M is unital. We shall prove by induction on k that every R submodule of Mk is ﬁnitely generated. The case k = n then gives the proposition. For k = 1, suppose that S is an R submodule of M1 = Rm 1 . Since S is an R submodule

398

VIII. Commutative Rings and Their Modules

and every member of S lies in Rm 1 , the subset I of all r in R with r m 1 in S is an ideal with I m 1 = S. Since every ideal in R is singly generated, we can write I = (r0 ). Then S = I m 1 = Rr0 m 1 , and the single element r0 m 1 generates S. Assume inductively that every R submodule of Mk is known to be ﬁnitely generated, and let Nk+1 be an R submodule of Mk+1 . Let q : Mk+1 → Mk+1 /Mk be the quotient R homomorphism, and let ϕ be the restriction q Nk+1 , mapping Nk+1 into Mk+1 /Mk . Then ker ϕ = Nk+1 ∩ Mk is an R submodule of Mk and is ﬁnitely generated by the inductive hypothesis. Also, image ϕ is an R submodule of Mk+1 /Mk , which is singly generated with generator equal to the coset of m k+1 . Since an R submodule of a singly generated unital R module was shown in the previous paragraph to be singly generated, image ϕ is ﬁnitely generated. Applying Lemma 8.23 to ϕ, we see that Nk+1 is ﬁnitely generated. This completes the induction and the proof. According to the deﬁnition in Example 9 of modules in Section 1, a free R module is a direct sum, ﬁnite or inﬁnite, of copies of the R module R. A free R module is said to have ﬁnite rank if some direct sum is a ﬁnite direct sum. A unital R module M is said to be cyclic if it is singly generated, i.e., if M = Rm 0 for some m 0 in M. In this case, we have an R isomorphism M ∼ = R/I , where I is the ideal {r ∈ R | r m 0 = 0}. Before coming to the statement of the theorem and the proof, let us discuss the heart of the matter, which is related to row reduction of matrices. We regard the space M1n (R) of all 1-row matrices with n entries in R as a free R module. Suppose that R is a principal ideal domain, and suppose that we have a particular 2-by-n matrix with entries in R and with the property that the two rows have nonzero elements a and b, respectively, in the ﬁrst column. We can regard the set of R linear combinations of the two rows of our particular matrix as an R submodule of the free R module M1n (R). Let c = GCD(a, b). This member of R is deﬁned only up to multiplication by a unit, but we make a deﬁnite choice of it. The idea is that we can do a kind of invertible row-reduction step that simultaneously replaces the two rows of our 2-by-n matrix by a ﬁrst row whose ﬁrst entry is c and a second row whose ﬁrst entry is 0; in the process the corresponding R submodule of M1n (R) will be unchanged. In fact, we saw in the previous section that the hypothesis on R implies that there exist members x and y of R with xa + yb = c. Since c divides a and b, we can this equality as rewrite x y x(ac−1 ) + y(bc−1 ) = 1. Then the 2-by-2 matrix M = −bc−1 ac−1 with entries in R has the property that

a ∗ c ∗ x y = . b ∗ 0 ∗ −bc−1 ac−1 c ∗ This equation shows explicitly that the rows of 0 ∗ lie in the R linear span of the

6. Finitely Generated Modules

399

a ∗ rows of b ∗ . The key fact about M is that its determinant x(ac−1 ) + y(bc−1 ) is 1 and that M is therefore invertible with entries in R: the inverse is just a ∗ −1 M −1 = ac−1 −y . This invertibility shows that the rows of b ∗ lie in the R bc x c ∗ linear span of 0 ∗ . Consequently the R linear span of the rows of our given 2-by-n matrix is preserved under left multiplication by M. In effect we can do the same kind of row reduction of matrices over R as we did with matrices over Z in the proof of Theorem 4.56. The only difference is that this time we do not see constructively how to ﬁnd the x and y that relate a, b, and c. Thus we would lack some information if we actually wanted to follow through and calculate a particular example. We were able to make calculations to imitate the proof of Theorem 4.56 because we were able to use the Euclidean algorithm to arrive at what x and y are. In the present context we would be able to make explicit calculations if R were a Euclidean domain. Theorem 8.25 (Fundamental Theorem of Finitely Generated Modules). If R is a principal ideal domain, then (a) the number of R summands in a free R module of ﬁnite rank is independent of the direct-sum decomposition, (b) any R submodule of a free R module of ﬁnite rank n is a free R module of rank ≤ n, (c) any ﬁnitely generated unital R module is the ﬁnite direct sum of cyclic modules. REMARK. Because of (a), it is meaningful to speak of the rank of a free R module of ﬁnite rank; it is the number of R summands. By convention the 0 module is a free R module of rank 0. Then the statement of (b) makes sense. Statement (c) will be ampliﬁed in Corollary 8.29 below. PROOF. Let M be a free R module of the form Rx1 ⊕ · · · ⊕ Rxn , and suppose that y1 , . . . , ym are elements of M such that no nontrivial combination r1 y1 +· · · + rm ym is 0. Deﬁne an m-by-n matrix C with entries in R by n yi = j=1 C i j x j for 1 ≤ i ≤ m. If F is the ﬁeld of fractions of R, then we can regard C as a matrix with entries in F. As such, the matrix has rank ≤ n. If m > n, then the rows are linearly m dependent, and we can ﬁnd members qi Ci j = 0 for 1 ≤ j ≤ n. Clearing q1 , . . . , qm of F, not all 0, such that i=1 m fractions, we obtain members r1 , . . . , rm of R, not all 0, such that i=1 ri C i j = 0 for 1 ≤ j ≤ n. Then m m n n m n ri yi = ri Ci j x j = ri C i j x j = 0x j = 0, i=1

i=1

j=1

j=1

i=1

j=1

in contradiction to the assumed independence property of y1 , . . . , ym . Therefore we must have m ≤ n.

400

VIII. Commutative Rings and Their Modules

If we apply this conclusion to a set x1 , . . . , xn that exhibits M as free and to another set, possibly inﬁnite, that does the same thing, we ﬁnd that the second set has ≤ n members. Reversing the roles of the two sets, we ﬁnd that they both have n members. This proves (a). For (b) and (c), we shall reduce the result to a lemma saying that a certain kind of result can be achieved by row and column reduction of matrices with entries in R. Let M be a free R module of rank n, deﬁned by a subset x1 , . . . , xn of M, and let N be an R submodule of M. Proposition 8.24 shows that N is ﬁnitely generated. any independence property. We let y1 , . . . , ym be generators, not necessarily with Deﬁne an m-by-n matrix C with entries in R by yi = nj=1 Ci j x j . We can recover M as the set of R linear combinations of x1 , . . . , xn , and we can recover N as the set of R linear combinations of y1 , . . . , ym . If B is an n-by-n matrix with entries in R and with determinant in the group R × of units, then 5.5 shows that B −1 exists and has entries in R. If Corollary n we deﬁne xi = j=1 Bi j x j , then any R linear combination of x1 , . . . , xn is an n (B −1 )ki xi = R linear combination of x1 , . . . , xn . Also, the computation i=1 −1 i, j (B )ki Bi j x j = j δk j x j = x k shows that any R linear combination of x1 , . . . , xn is an R linear combination of x1 , . . . , xn . Thus we can recover the same M and N if we replace C by C B. Arguing in the same way with y1 , . . . , ym and y1 , . . . , ym , we see that we can recover the same M and N if we replace C B by AC B, where A is an m-by-m matrix with entries in R and with determinant in R × . Lemma 8.26 below will say that we can ﬁnd A and B such that the nonzero entries of D = AC B are exactly the diagonal ones Dkk for 1 ≤ k ≤ l, where l is a certain integer with 0 ≤ l ≤ min(m, n). That is, the resulting equations restricting y1 , . . . , ym in terms of x1 , . . . , xn will be of the form Dkk xk for 1 ≤ k ≤ l, (∗) yk = 0 for l + 1 ≤ k ≤ m. Now let us turn to (b) and (c). For (b), the claim is that the elements yk with 1 ≤ k ≤ l exhibit N as a free R module. We know that y1 , . . . , ym generate N and hence that y1 , . . . , yl generate N . For the independence, suppose we can ﬁnd members r1 , . . . , rl not all 0 in R such that lk=1 rk yk = 0. Then substitution gives lk=1 rk Dkk xk = 0, and the independence of x1 , . . . , xl forces rk Dkk = 0 for 1 ≤ k ≤ l. Since R is an integral domain, rk = 0 for such k. Thus indeed the elements yk with 1 ≤ k ≤ l exhibit N as a free R module. Since l ≤ min(m, n), the rank of N is at most the rank of M. For (c), let Q be a ﬁnitely generated unital R module, say with n generators. By the universal mapping property of free R modules (Example 9 in Section 1),

6. Finitely Generated Modules

401

there exists a free R module M of rank n with Q as quotient. Let x1 , . . . , xn be generators of M that exhibit M as free, and let N be the kernel of the quotient R homomorphism M → Q, so that Q ∼ = M/N . Then (b) shows that N is a that exhibit free R module of rank m ≤ n. Let y1 , . . . , ym be generators of N N as free, and deﬁne an m-by-n matrix C with entries in R by yi = nj=1 Ci j x j for 1 ≤ i ≤ m. The result is that we are reduced to the situation we have just considered, and we can obtain equations of the form (∗) relating their respective generators, namely y1 , . . . , ym for N and x1 , . . . , xn for M. For 1 ≤ k ≤ n, deﬁne Mk = Rxk and Nk =

Ryk = R Dkk xk

for 1 ≤ k ≤ l,

0

for l + 1 ≤ k ≤ n,

∼ N1 ⊕ · · · ⊕ Nn . Then Mk /Nk is R isomorphic to the cyclic R module so that N = R/(Dkk ) if 1 ≤ k ≤ l, while Mk /Nk = Mk is isomorphic to the cyclic R module R if l + 1 ≤ k ≤ n. Applying Proposition 8.5, we obtain M/N ∼ = (M1 /N1 ) ⊕ · · · ⊕ (Mn /Nn ). = (M1 ⊕ · · · ⊕ Mn )/(N1 ⊕ · · · ⊕ Nn ) ∼ Thus M/N is exhibited as a direct sum of cyclic R modules.

To complete the proof of Theorem 8.25, we are left with proving the following lemma, which is where row and column reduction take place. Lemma 8.26. Let R be a principal ideal domain. If C is an m-by-n matrix with entries in R, then there exist an m-by-m matrix A with entries in R and with determinant in R × and an n-by-n matrix B with entries in R and with determinant in R × such that for some l with 0 ≤ l ≤ min(m, n), the nonzero entries of D = AC B are exactly the diagonal entries D11 , D22 , . . . , Dll . PROOF. The matrices A and B will be constructed as products of matrices of determinant ±1, and then det A and det B equal ±1 by Proposition 5.1a. The matrix A will correspond to row operations on C, and B will correspond to column operations. Each factor will be the identity except in some 2-by-2 block. Among the row and column operations of interest are the interchange of two rows or two columns, in which the 2-by-2 block is 01 10 . Another row operation of interest replaces two rows having respective j th entries a and b by R linear combinations of them in which a and b are replaced by c = GCD(a, b) and 0. x y If x(ac−1 ) + y(bc−1 ) = 1, then the 2-by-2 block is −bc−1 ac−1 . A similar operation is possible with columns. The reduction involves an induction that successively constructs the entries D11 , D22 , . . . , Dll , stopping when the part of C involving rows and columns

402

VIII. Commutative Rings and Their Modules

numbered ≥ l + 1 has been replaced by 0. We start by interchanging rows and columns to move a nonzero entry into position (1, 1). By a succession of row operations as in the previous paragraph, we can reduce the entry in position (1, 1) to the greatest common divisor of the entries of C in the ﬁrst column, while reducing the remaining entries of the ﬁrst column to 0. Next we do the same thing with column operations, reducing the entry in position (1, 1) to the greatest common divisor of the members of the ﬁrst row, while reducing the remaining entries of the ﬁrst row to 0. Then we go back and repeat the process with row operations and with column operations as many times as necessary until all the entries of the ﬁrst row and column other than the one in position (1, 1) are 0. We need to check that this process indeed terminates at some point. If the entries that appear in position (1, 1) as the iterations proceed are c1 , c2 , c3 , . . . , then we have (c1 ) ⊆ (c2 ) ⊆ (c3 ) ⊆ · · · . The union of these ideals is an ideal, necessarily a principal ideal of the form (c), and c occurs in one of the ideals in the union; the chain of ideals must beconstant after that stage. Once the corner entry becomes x y constant, the matrices −bc−1 ac−1 for the row operations can be chosen to be 1 0 of the form −ba −1 1 , and the result is that the row operations do not change the entries of the ﬁrst row. Similar remarks apply to the matrices for the column operations. The upshot is that we can reduce C in this way so that all entries of the ﬁrst row and column are 0 except the one in position (1, 1). This handles the inductive step, and we can proceed until at some l th stage we have only the 0 matrix to process. This completes the proof of Theorem 8.25. In Theorem 4.56, in which we considered the special case of abelian groups, we obtained a better conclusion than in Theorem 8.25c: we showed that the direct sum of cyclic groups could be written as the direct sum of copies of Z and of cyclic groups of prime-power order, and that in this case the decomposition was unique up to the order of the summands. We shall now obtain a corresponding better conclusion in the setting of Theorem 8.25. The existence of the decomposition into cyclic modules of a special kind uses a very general form of the Chinese Remainder Theorem, whose classical statement appears as Corollary 1.9. The generalization below makes use of the following operations of addition and multiplication of ideals in a commutative ring with identity: if I and J are ideals, then I + J denotes the set of sums x + y with x ∈ I and y ∈ J , and I J denotes the set of all ﬁnite sums of products x y with x ∈ I and y ∈ J ; the sets I + J and I J are ideals. Theorem 8.27 (Chinese Remainder Theorem). Let R be a commutative ring with identity, and let I1 , . . . , In be ideals in R such that Ii + I j = R whenever i = j.

6. Finitely Generated Modules

403

(a) If elements x1 , . . . , xn of R are given, then there exists x in R such that x ≡ x j mod I j , i.e., x − x j is in I j , for all j. The element x is unique if I1 ∩ · · · ∩ In = 0. (b) The map ϕ : R → nj=1 R/I j given by ϕ(r ) = (. . . , r + I j , . . . ) is an onto ring homomorphism, its kernel is nj=1 I j , and the homomorphism descends to a ring isomorphism n ( R Ij ∼ = R/I1 × · · · × R/In . (c) The intersection

n

j=1

j=1 I j

and the product I1 · · · In coincide.

PROOF. For existence in (a) when n = 1, we take x = x1 . For existence when n = 2, the assumption I1 + I2 = R implies that there exist a1 ∈ I1 and a2 ∈ I2 with a1 + a2 = 1. Given x1 and x2 , we put x = x1 a2 + x2 a1 , and then x ≡ x1 a2 ≡ x1 mod I1 and x ≡ x2 a1 ≡ x2 mod I2 . For general n, the assumption I1 + I j = R for j ≥ 2 implies that there exist a j ∈ I1 and b j ∈ I j with a j + b j = 1. If we expand out the product 1 = nj=2 (a j + b j ), then all terms but one on the right side involve some a j and are therefore in I1 . That one term is b2 b2 · · · bn , and it is in nj=2 I j . Thus I1 + nj=2 I j = R. The case n = 2, which was proved above, yields an element y1 in R such that and y1 ≡ 0 mod j =1 I j . y1 ≡ 1 mod I1 Repeating this process for index i and using the assumption Ii + I j = R for j = i, we obtain an element yi in R such that and yi ≡ 0 mod j =i I j . yi ≡ 1 mod Ii If we put x = x1 y1 + · · · + xn yn , then we have x ≡ xi yi mod Ii ≡ xi mod Ii for each i, and the proof of existence is complete. For uniqueness in (a), if we have two elements x and x satisfying the congruences, then their difference x − x lies in I j for every j, hence is 0 under the assumption that I1 ∩ · · · ∩ In = 0. In (b), the map ϕ is certainly a ring homomorphism. The existence result in (a) shows that ϕ is onto, and the proof of the uniqueness result identiﬁes the kernel. The isomorphism follows. For (c), consider the special case that I and J are ideals with I + J = R. Certainly I J ⊆ I ∩ J . For the reverse inclusion, choose x ∈ I and y ∈ J with x + y = 1; this is possible since I + J = R. If z is in I ∩ J , then z = zx + zy with zx in J I and zy in I J . Thus z is exhibited as in I J . Consequently I1 I2 = I1 ∩ I2 . Suppose inductively that I1 · · · Ik = I1 ∩· · ·∩ Ik . We saw in the proof of (a) that Ik+1 + j =k+1 I j = R, and thus we certainly have

404

VIII. Commutative Rings and Their Modules

Ik+1 + kj=1 I j = R. The special case in the previous paragraph, in combination k with the inductive hypothesis, shows that Ik+1 I1 · · · Ik = Ik+1 · j=1 I j = k+1 j=1 I j . This completes the induction and the proof. Corollary 8.28. Let R be a principal ideal domain, and let a = εp1k1 · · · pnkn be a factorization of a nonzero nonunit element a into the product of a unit and powers of nonassociate primes. Then there is a ring isomorphism k R/(a) ∼ = R/( p11 ) × · · · × R/( pnkn ). k

k

PROOF. Let I j = ( p j j ) in Theorem 8.27. For i = j, we have GCD( piki , p j j ) = k

1. Since R is a principal ideal domain, there exist a and b in R with apiki +bp j j = 1, k

and consequently ( piki ) + ( p j j ) = R. The theorem applies, and the corollary follows. Corollary 8.29. If R is a principal ideal domain, then any ﬁnitely generated s R of unital R module M is the direct sum of a nonunique free R submodule i=1 a well-deﬁned ﬁnite rank s ≥ 0 and the R submodule T of all members m of M such that r m = 0 for some r = 0 in R. In turn, the R submodule T is isomorphic to a direct sum n k R/( p j ), T ∼ = j

j=1 k

where the p j are primes in R and the ideals ( p j j ) are not necessarily distinct. The number of summands ( p k ) for each class of associate primes p and each positive integer k is uniquely determined by M. PROOF. Theorem 8.25c gives M = F ⊕ nj=1 Ra j , where F is a free R submodule of some ﬁnite rank s and the a j ’s are nonzero members of M that are each annihilated by some nonzero member of R. The set T of all m with r m = 0 for some r = 0 in R is exactly nj=1 Ra j . Then F is R isomorphic to M/T , hence is isomorphic to the same free R module independently of what direct-sum decomposition of M is used. By Theorem 8.25a, s is well deﬁned. The cyclic R module Ra j is isomorphic to R/(b j ), where (b j ) is the ideal of all elements r in R with ra j = 0. The ideal (b j ) is nonzero by assumption and is not all of R since the element r = 1 has 1a j = a j = 0. Applying Corollary n k 8.28 for each j and adding the results, we obtain T ∼ = i=1 R/( pi i ) for suitable primes pi and powers ki . The isomorphism in Corollary 8.28 is given as a ring isomorphism, and we are reinterpreting it as an R isomorphism. The primes pi that arise for ﬁxed (b j ) are distinct, but there may be repetitions in the pairs ( pi , ki ) as j varies. This proves existence of the decomposition.

6. Finitely Generated Modules

405

If p is a prime in R, then the elements m of T such that p k m = 0 for some k k are the ones corresponding to the sum of the terms in nj=1 R/( p j j ) in which p j is an associate of p. Thus, to complete the proof, it is enough to show that the R isomorphism class of the R module N = R/( pl1 ) ⊕ · · · ⊕ R/( plm ) with p ﬁxed and with 0 < l1 ≤ · · · ≤ lm completely determines the integers l1 , . . . , lm . For any unital R module L, we can form the sequence of R submodules p j L. The element p carries p j L into p j+1 L, and thus each p j L/ p j+1 L is an R module on which p acts as 0. Consequently each p j L/ p j+1 L is an R/( p) module. Corollary 8.16 and Proposition 8.10 together show that R/( p) is a ﬁeld, and therefore we can regard each p j L/ p j+1 L as an R/( p) vector space. We shall show that the dimensions dim R/( p) ( p j N / p j+1 N ) of these vector spaces determine the integers l1 , . . . , lm . We start from p j N = p j R/( pl1 ) ⊕ · · · ⊕ p j R/( plm ). The term p j R/( plk ) is 0 if j ≥ lk . Thus p j R/( plk ) = p j R/ plk R. pjN = j 3. Since [L : Q] is a divisor of 6 greater than 3, [L : Q] = 6. Thus [K : L] = 1, and K = L. (2) k = Q and F(X ) = X 3 − X − 13 . Application of Corollary 8.20c to the polynomial G(X ) = −3X 2 F(1/ X ) = X 3 + 3X 2 − 3 shows that G(X ) has no degree-one factor and hence is irreducible over Q. Then it follows that F(X ) is irreducible over Q. The proof of Theorem 9.12 takes k1 = Q(r ), where r 3 − r − 13 = 0. Then division gives X3 − X −

1 3

= (X − r )(X 2 + r X + (r 2 − 1)).

The discriminant b2 − 4ac of the quadratic factor is r 2 − 4(r 2 − 1) = 4 − 3r 2 =

r2 , (1 + 2r )2

the right-hand equality following from direct computation. This discriminant is a square in k1 = Q(r ), and hence X 2 + r X + (r 2 − 1) factors into degree-one factors in Q(r ) without passing to an extension ﬁeld. Therefore L = Q(r ) with [L : Q] = 3. Theorem 9.13 (uniqueness of splitting ﬁeld). If F(X ) is a nonconstant polynomial in k[X ], then any two splitting ﬁelds of F(X ) over k are k isomorphic.

456

IX. Fields and Galois Theory

The idea of the proof is simple enough, but carrying out the idea runs into a technical complication. The idea is to proceed by induction, using the uniqueness result for simple algebraic extensions (Theorem 9.11) repeatedly until all the roots have been addressed. The difﬁculty is that after one step the coefﬁcients of the two quotient polynomials end up in two distinct but k isomorphic ﬁelds. Thus at the second step Theorem 9.11 does not apply directly. What is needed is the reformulated version given below as Theorem 9.11 , which lends itself to this kind of induction. In addition, as soon as the induction involves at least three steps, the above statement of Theorem 9.13 does not lend itself to a direct inductive proof. For this reason we shall instead prove a reformulated version Theorem 9.13 of Theorem 9.13 that is ostensibly more general than Theorem 9.13. Recall from Proposition 4.24 that a general substitution homomorphism that starts from a polynomial ring can have two ingredients. One is the substitution of some element, such as x, for the indeterminate X , and the other is a homomorphism that is made to act on the coefﬁcients. If the homomorphism is σ , let us write F σ (X ) to indicate the polynomial obtained by applying σ to each coefﬁcient of F(X ). Theorem 9.11 . Let k and k be ﬁelds, and let σ : k → k be a ﬁeld isomorphism. Suppose that F(X ) is a monic prime polynomial in k[X ] and that K = k(x) and K = k (x ) are simple algebraic extensions such that F(x) = 0 and Fσ (x ) = 0. Then there exists a ﬁeld isomorphism ϕ : k(x) → k (x ) such that ϕ k = σ and ϕ(x) = x . PROOF. The argument is essentially unchanged from the proof of Theorem 9.11. We start from the substitution homomorphism G(X ) → G σ (x ) that replaces X by x and that operates by σ on the coefﬁcients. This descends to a ﬁeld map of k[x] into k [x ], and the homomorphism must be onto k [x ] by a count of dimensions. Theorem 9.13 . Let k and k be ﬁelds, and let σ : k → k be a ﬁeld isomorphism. If F(X ) is a nonconstant polynomial in k[X ] and if L and L σ are respective splitting ﬁelds for F(X ) over k and for F (X ) over k , then there exists a ﬁeld isomorphism ϕ : L → L such that ϕ k = σ and such that ϕ sends the set of roots of F(X ) to the set of roots of F σ (X ). PROOF. We proceed by induction on n = deg F(X ), the case n = 1 being evident. Assume the result for degree n − 1. Let G(X ) be a prime factor of F(X ) over k. Then G σ (X ) is a prime factor of F σ (X ) over k . The polynomials G(X ) and G σ (X ) have roots in L and L , respectively. Fix one such root for each, say x1 and x1 . By Theorem 9.11 , there exists a ﬁeld isomorphism σ1 : k(x1 ) → k (x1 ) extending σ and satisfying σ1 (x1 ) = x1 . Write F(X ) = (X − x1 )H (X ) with coefﬁcients in k(x1 ), by the Factor Theorem (Corollary 1.13). Applying σ1 to

3. Finite Fields

457

the coefﬁcients, we obtain F σ (X ) = (X − x1 )H σ1 (X ) with coefﬁcients in k (x1 ). Then L and L are splitting ﬁelds for H (X ) and H σ1 (X ) over k(x1 ) and k (x1 ), respectively. By induction we can extend σ1 to an isomorphism ϕ : L → L , and the theorem readily follows.

3. Finite Fields In this section we shall use the results on splitting ﬁelds in Section 2 to classify ﬁnite ﬁelds up to isomorphism. So far, the examples of ﬁnite ﬁelds that we have encountered are the prime ﬁelds F p = Z/ pZ with p elements, p being any prime number, and the ﬁeld of 4 elements in Example 3 of ﬁelds in Section IV.4. Every ﬁnite ﬁeld has to contain a subﬁeld isomorphic to one of the prime ﬁelds F p , and Proposition 4.33 observed as a consequence that any ﬁnite ﬁeld necessarily has p n elements for some prime number p and some integer n > 0. Theorem 9.14. For each p n with p a prime number and with n a positive integer, there exists up to isomorphism one and only one ﬁeld with p n elements. n Such a ﬁeld is a splitting ﬁeld for X p − X over the prime ﬁeld F p . If q = p n , it is customary to denote by Fq a ﬁeld of order q. The theorem says that Fq exists and is unique up to isomorphism. Some authors refer to ﬁnite ﬁelds as Galois ﬁelds. Some preparation is needed before we can come to the proof of the theorem. We need to carry over the simplest aspects of differential calculus to polynomials with coefﬁcients in an arbitrary ﬁeld k. First we give an informal deﬁnition of the derivative of a polynomial; then we give a more precise deﬁnition. For any polynomial F(X ) = nj=0 c j X j in k[X ], we informally deﬁne the derivative to be the polynomial F (X ) =

n j=1

jc j X j−1 =

n−1

( j + 1)c j+1 X j .

j=0

The more precise deﬁnition uses the deﬁnition of members of k[X ] as inﬁnite sequences of members of k whose terms are 0 from some point on. In this notation if F = (c0 , c1 , . . . , cn , 0, . . . ) with c j in the j th position for j ≤ n and with 0 in the j th position for j > n, then F = (c1 , 2c2 , . . . , ncn , 0, . . . ) with ( j + 1)c j+1 in the j th position for j ≤ n − 1 and with 0 in the j th position for j > n − 1. In any event, the mapping F → F is k linear from k[X ] to itself. The operation is called differentiation.

IX. Fields and Galois Theory

458

Proposition 9.15. Differentiation on k[X ] satisﬁes the product rule: F = G H implies F = G H + G H . PROOF. als. Thus F (X ) = n X m+n−1 .

Because of the k linearity, it is enough to prove the result for monomilet G(X ) = X m and H (X ) = X n , so that F(X ) = X m+n . Then (m + n)X m+n−1 , G (X )H (X ) = m X m+n−1 , and G(X )H (X ) = Hence we indeed have F (X ) = G (X )H (X ) + G(X )H (X ).

Corollary 9.16. If n is a positive integer, if r is in k, and if F(X ) = (X − r )n in k[X ], then F (X ) = n(X − r )n−1 . PROOF. This is immediate by induction from Proposition 9.15 since the derivative of X − r is 1. Corollary 9.17. Let r be in k, and let F(X ) be in k[X ]. If (X − r )2 divides F(X ), then F(r ) = F (r ) = 0. PROOF. Write F(X ) = (X − r )2 G(X ). If we substitute r for X , we see that F(r ) = 0. If instead we differentiate, using Proposition 9.15 and Corollary 9.16, then we obtain F (X ) = 2(X − r )G(X ) + (X − r )2 G (X ). Substituting r for X , we obtain F (r ) = 0 + 0 = 0. Lemma 9.18. If k is a ﬁeld of characteristic p = 0, then the map ϕ : k → k given by ϕ(x) = x p is a ﬁeld mapping. REMARK. The map x → x p is often called the Frobenius map. If k is a ﬁnite ﬁeld, then it must carry k onto k since one-one implies onto for functions from a ﬁnite set to itself; in this case the map is an automorphism of k. PROOF. The computation ϕ(uv) = (uv) p = u p v p = ϕ(u)ϕ(v) shows that ϕ respects products. If u and v are in k, then ϕ(u + v) = (u + v) p = ϕ(u) +

p−1 j=1

p p− j j v j u

+ ϕ(v) = ϕ(u) + ϕ(v),

the last equality holding since the binomial coefﬁcient pj has a p in the numerator for 1 ≤ j ≤ p − 1. Thus ϕ is a ring homomorphism. Since ϕ(1) = 1, ϕ is a ﬁeld mapping. PROOF OF UNIQUENESS IN THEOREM 9.14. Let k be a ﬁnite ﬁeld, say of characteristic p, and let P be the prime ﬁeld of order p within k. We know that P is isomorphic to F p = Z/ pZ. Since k is a ﬁnite-dimensional vector space over P, we know also that k has order q = p n for some integer n > 0. The multiplicative group k× of k thus has order q − 1, and every x = 0 in k therefore satisﬁes

3. Finite Fields

459

x q−1 = 1. Taking x = 0 into account, we see that every member of k satisﬁes x q = x. Forming the polynomial X q − X in P[X ], we see that every member of k is a root of this polynomial. Iterated application q times of the Factor Theorem (Corollary 1.13) shows that X q − X factors into degree-one factors in k. Since every member of k is a root of X q − X , k is a splitting ﬁeld of X q − X over P. Then the uniqueness of the prime ﬁeld up to isomorphism, in combination with the uniqueness of the splitting ﬁeld of X q − X given in Theorem 9.13 , shows that k is uniquely determined up to isomorphism. PROOF OF EXISTENCE IN THEOREM 9.14. Let q = p n be given, and deﬁne k to be a splitting ﬁeld of X q − X over F p = Z/ pZ. The ﬁeld k exists by Theorem 9.12, and it has characteristic p. Since X q − X is monic of degree q, the deﬁnition of splitting ﬁeld says that we can write X q − X = (X − u 1 )(X − u 2 ) · · · (X − u q )

with all u j ∈ k.

Because of Lemma 9.18, the map ϕ(u) = u q , which is the n th power of the map u → u p , is a ﬁeld mapping of k into itself. The members of k ﬁxed by ϕ form a subﬁeld of k, and these elements of k are exactly the members of the set S = {u 1 , . . . , u q }. Therefore S is a subﬁeld of k, necessarily containing F p = Z/ pZ. Since X q − X splits in S and since the roots of X q − X generate S, S is a splitting ﬁeld of X q − X over F p . In other words, S = k. To complete the proof, it is enough to show that the elements u 1 , . . . , u q are distinct, and then k will be a ﬁeld of q elements. The question is therefore whether some root of X q − X has multiplicity at least 2, i.e., whether (X −r )2 divides X q − X for some r in k. Corollary 9.17 gives a necessary condition for this divisibility, saying that the derivative of X q − X must have r as a root. However, the derivative of X q − X is q X q−1 − 1 = −1, and the constant polynomial −1 has no roots. We conclude that k has q elements. Corollary 9.19. If q and r are integers with 2 ≤ q ≤ r , then the ﬁnite ﬁeld Fq is isomorphic to a subﬁeld of the ﬁnite ﬁeld Fr if and only if r = q n for some integer n ≥ 1. PROOF. If Fq is isomorphic to a subﬁeld of Fr , then we may consider Fr as a vector space over Fq , say of dimension n. In this case, Fr has q n elements. n Conversely let r = q n , and regard Fr as a splitting ﬁeld of X q − X over the prime ﬁeld F p , by Theorem 9.14. Let S be the subset of Fr of all roots of X q − X . n −1 = q n−1 + q n−2 + · · · + 1, we have Putting a = q − 1 and k = qq−1 X ka − 1 = (X a − 1)(X (k−1)a + X (k−2)a + · · · + 1). n

n

Multiplying by X , we see that X q − X is a factor of X q − X . Since X q − X splits in Fr and has distinct roots, the same is true of X q − X . Therefore |S| = q.

460

IX. Fields and Galois Theory

Let q = p m . The m th power of the homomorphism of Lemma 9.18 on k = Fr is x → x q , and the subset of Fr ﬁxed by this homomorphism is a subﬁeld. Thus S is a subﬁeld, and it has q elements. 4. Algebraic Closure Algebraically closed ﬁelds—those for which every nonconstant polynomial with coefﬁcients in the ﬁeld has a root in the ﬁeld—were introduced in Section V.1, and it was mentioned at that time that every ﬁeld is a subﬁeld of some algebraically closed ﬁeld. We shall prove that existence theorem in this section in a form lending itself to a uniqueness result. Throughout this section let k be a ﬁeld. We begin by giving further descriptions of algebraically closed ﬁelds that take the theory of Sections 1–2 into account. Proposition 9.20. The following conditions on the ﬁeld k are equivalent: (a) k has no nontrivial algebraic extensions, (b) every irreducible polynomial in k[X ] has degree 1, (c) every polynomial in k[X ] of positive degree has at least one root in k, (d) every polynomial in k[X ] of positive degree factors over k into polynomials of degree 1. PROOF. If (a) holds, then (b) holds since any irreducible polynomial of degree greater than 1 would give a nontrivial simple algebraic extension (Theorem 9.10). If (b) holds and a polynomial of positive degree is given, apply (b) to an irreducible factor to see that the given polynomial has a root; thus (c) holds. Condition (c) implies condition (d) by induction and the Factor Theorem. If (d) holds and if K is an algebraic extension of k, let x be in K, and let F(X ) be the minimal polynomial of x over k. Then F(X ) is irreducible over k, and (d) says that F(X ) has degree 1. Hence x is in k, and we conclude that K = k. A ﬁeld satisfying the equivalent conditions of Proposition 9.20 is said to be algebraically closed. EXAMPLES OF ALGEBRAICALLY CLOSED FIELDS. (1) The Fundamental Theorem of Algebra (Theorem 1.18) says that C is algebraically closed. This theorem was not proved in Chapter I, but a proof will be given in this chapter in Section 10. (2) Let K be the subset of all members of C that are algebraic over Q. By Corollary 9.9, K is a subﬁeld of C. Example 1 shows that every polynomial in Q[X ] splits in K, and Lemma 9.21 below then allows us to conclude that K is algebraically closed.

4. Algebraic Closure

461

(3) Fix a prime number p, and start with k0 = F p as the prime ﬁeld Z/ pZ. Enumerate the members of F p [X ], letting Fn (X ) be the n th such polynomial. We construct kn by induction on n so that kn is a splitting ﬁeld for Fn (X ) over kn−1 when n ≥ 1. Then k0 ⊆ k1 ⊆ k2 ⊆ · · · is an increasing sequence of ﬁelds containing F p . Let K be the union. Any two elements of K lie in a single kn , and it follows that K is closed under the ﬁeld operations. Any three elements lie in a single kn , and it follows that any of the deﬁning properties of a ﬁeld is valid in K because it is valid in kn . Therefore K is a ﬁeld. This ﬁeld is an extension of F p , and every polynomial in F p [X ] splits in K. As in Example 2, Lemma 9.21 below shows that K is algebraically closed. Lemma 9.21. If K/k is an algebraic extension of ﬁelds and if every nonconstant polynomial in k[X ] splits into degree-one factors in K, then K is algebraically closed. PROOF. Let K be an algebraic extension of K, and let x be in K . Let G(X ) be the minimal polynomial of x over K, and write G(X ) as G(X ) = X n + cn−1 X n−1 + · · · + c0

with all ci ∈ K.

Then x is algebraic over k(cn−1 , . . . , c0 ), which is a ﬁnite extension of k by Theorem 9.8. By Corollary 9.7, x lies in a ﬁnite extension of k. Thus Proposition 9.4 shows that x is algebraic over k. Let F(X ) be the minimal polynomial of x over k. By assumption this splits over K, say as F(X ) = (X − x1 ) · · · (X − xm )

with all xi ∈ K.

Evaluating at x and using the fact that F(x) = 0, we see that x = x j for some j. Therefore x is in K, and K is algebraically closed. An extension ﬁeld K/k is an algebraic closure of k if K is algebraic over k and if K is algebraically closed. Example 2 of algebraically closed ﬁelds above gives an algebraic closure of Q, and Example 3 gives an algebraic closure of F p . Theorem 9.22 (Steinitz). Every ﬁeld k has an algebraic closure, and this is unique up to k isomorphism. REMARKS. The proof of existence is modeled on the argument for Example 3 of algebraic closures. However, we are not free in general to use a simple union of a sequence of ﬁelds and have to work harder. Because there is no evident set of possibilities within which we are forming extension ﬁelds, Zorn’s Lemma is inconvenient to use and tends to result in an unintuitive construction. Instead, we use Zermelo’s Well-Ordering Theorem, whose use more closely parallels the inductive construction in Example 3.

462

IX. Fields and Galois Theory

PROOF OF EXISTENCE. With k as the given ﬁeld, let S be the set of nonconstant polynomials s(X ) in k[X ], and introduce a well ordering into S by means of Zermelo’s Well-Ordering Theorem (Section A5 of the appendix). Let us write ≺ for “strictly precedes in the ordering” and for “equals or strictly precedes.” For each s ∈ S, let s¯ be the successor of s, i.e., the ﬁrst element among all elements t with s ≺ t. We write s0 for the ﬁrst element of S. Without loss of generality, we may assume that S has a last element s∞ . The idea is to construct simultaneously two kinds of things: (i) an algebraic extension ﬁeld ks /k for each s ∈ S such that ks0 = k and such that ks¯ is a splitting ﬁeld for s(X ) over ks whenever s ≺ s∞ , (ii) a ﬁeld mapping ϕut : kt → ku for each ordered pair of elements t and u in S having t u, such that ϕtt = 1 for all t and such that t u v implies ϕvt = ϕvu ϕut . These extension ﬁelds and mappings are to be such that ks = t≺s ϕst (kt ) whenever s is not a successor and is not s0 . If such a system of extension ﬁelds and ﬁeld homomorphisms exists, then Lemma 9.21 applies to a splitting ﬁeld over ks∞ of the nonconstant polynomial s∞ (X ) and shows that this splitting ﬁeld is algebraically closed; since this splitting ﬁeld is an algebraic extension of k, it is an algebraic closure of k. A partial such system through t0 means a system consisting of ﬁelds ks with s t0 and ﬁeld homomorphisms ϕut with t u t0 such that the above conditions hold as far as they are applicable. A partial system exists through the ﬁrst member s0 of S because we can take ks0 = k and ϕs0 s0 = 1. Arguing by contradiction, we suppose that such a system of extension ﬁelds and ﬁeld homomorphisms fails to exist through some member of S. Let t0 be the ﬁrst member of S such that there is no partial system through t0 . Suppose that t0 is the successor of some element t1 in S. We know that a partial system exists through t1 . If we let kt0 be a splitting ﬁeld for t1 (X ) over kt1 , and if we deﬁne ϕt0 t1 ϕt1 t for t t1 , ϕt0 t = 1 for t = t0 , then the enlarged system is a partial system through t0 , contradiction. Thus t0 cannot be the successor of some element of S. When t0 is not a successor, at least kt is deﬁned for t ≺ t0 and ϕut is deﬁned for t u ≺ t0 . We want to form a union, but we have to keep the ﬁeld operations aligned properly in the process. Deﬁne a “t-allowable tuple” to be a function u → xu deﬁned for t u ≺ t0 such that xu is in ku and ϕvu (xu ) = xv whenever t u v ≺ t0 . If x is in kt , then an example of a t-allowable tuple is given by u → ϕut (x) for t u ≺ t0 . If t ≺ t0 and t ≺ t0 , then we can apply ﬁeld operations to the t-allowable tuple u → xu and to the t -allowable tuple u → yu , obtaining max(t, t )-allowable

4. Algebraic Closure

463

tuples u → xu + yu , u → −xu , u → xu yu , and xu → xu−1 as long as xt = 0. These operations are meaningful since each ϕvu is a ﬁeld mapping. If t ≺ t0 and t ≺ t0 , we say that the t-allowable tuple u → xu is equivalent to the t -allowable tuple u → yu if xu = yu for max(t, t ) u ≺ t0 . The result is an equivalence relation, and the equivalence relation respects the ﬁeld operations in the previous paragraph. We deﬁne kt0 to be the set of equivalence classes of allowable tuples with the inherited ﬁeld operations. The 0 element is the class of the s0 -allowable tuple u → 0, and the multiplicative identity is the class of the s0 -allowable tuple u → 1. It is a routine matter to check that kt0 is a ﬁeld. If t ≺ t0 is given, we deﬁne the function ϕt0 t : kt → kt0 as follows: if x is in kt , we form the t-allowable tuple u → ϕut (x) and take its equivalence class, which is a member of kt0 , as ϕt0 t (x). Then ϕt0 t is evidently a ﬁeld mapping. It is evident also that ϕt0 v ϕvu = ϕt0 u when u v ≺ t0 . Deﬁning ϕt0 t0 to be the identity, we have a complete system of ﬁeld mappings ϕvu for kt0 . The ﬁnal step is to check that kt0 is the union of the images of the ϕt0 t for t ≺ t0 . Thus choose a representative of an equivalence class in kt0 . Let the representative be a t-allowable tuple u → xu for t u ≺ t0 . The element xt is in kt , and the condition xu = ϕut (xt ) is just the condition that the class of u → xu be the image of xt under ϕt0 t . Hence every member of kt0 is in the image of some ϕt0 t with t ≺ t0 , and we have a contradiction to the hypothesis that a partial system through t0 does not exist. This completes the proof of existence. For the uniqueness in Theorem 9.22, we again need a serious application of the Axiom of Choice, but here Zorn’s Lemma can be applied fairly routinely. The proof will show a little more than is needed, and in fact the uniqueness in Theorem 9.22 will be derived as a consequence of Theorem 9.23 below. Theorem 9.23. Let K be an algebraically closed ﬁeld, and let K be an algebraic extension of a ﬁeld k. If ϕ is a ﬁeld mapping of k into K , then ϕ can be extended to a ﬁeld mapping of K into K . PROOF OF UNIQUENESS IN THEOREM 9.22 USING THEOREM 9.23. Let K and K be algebraic closures of k, and let ϕ : k → K be the inclusion mapping. Theorem 9.23 supplies a ﬁeld mapping : K → K such that k = ϕ, i.e., such that ﬁxes k. Since K is an algebraic closure of k, so is (K). Then K is an algebraic extension of the algebraically closed ﬁeld (K), and we must have (K) = K . Thus is a k isomorphism of K onto K .

PROOF OF THEOREM 9.23. Let S be the set of all triples (L, L , ψ) such that L is a ﬁeld with k ⊆ L ⊆ K and ψ is a ﬁeld mapping of L onto the subﬁeld L of K with ψ k = ϕ. The set S is nonempty since (k, ϕ(k), ϕ) is a member of it. Deﬁning (L1 , L1 , ψ1 ) ⊆ (L2 , L2 , ψ2 ) to mean that L1 ⊆ L2 ,

464

IX. Fields and Galois Theory

that L1 ⊆ L2 , and that ψ1 as a set of ordered pairs is a subset of ψ2 as a set of ordered pairs, we partially order S by inclusionupward. α , Lα , ψα )} is If {(L a nonempty chain , α ψα , and put ψ = α Lα , α Lα in S, formthe triple = ψ . Then ψ L L , and consequently α α α α α α α Lα , α Lα , α ψα is an upper bound in S for the chain. By Zorn’s Lemma, S has a maximal element (L0 , L0 , ψ0 ). We shall prove that L0 = K, and the proof will be complete. Fix x in K, and let F(X ) be the minimal polynomial of x over L0 . The minimal polynomial of ψ0 (x) over L0 is then F ψ0 (X ). Since K is algebraically closed, F ψ0 (X ) has a root x in K . By Theorem 9.11 , ψ0 : L0 → L can be extended to an isomorphism 0 : L0 (x) → L0 (x ) such that ψ0 (x) = x . Then (L0 (x), L0 (x ), 0 ) is in S and contains (L0 , L0 , ψ0 ). This containment, if strict, would contradict the fact that (L0 , L0 , ψ0 ) is a maximal element of S. Thus equality must hold: L0 (x) = L0 . Therefore x is in L0 , and we conclude that L0 = K.

5. Geometric Constructions by Straightedge and Compass Classical Euclidean geometry attached a certain emphasis to constructions in the Euclidean plane that could be made by straightedge and compass. These are often referred to casually as constructions by “ruler and compass,” but one is not allowed to use the markings on a ruler. Thus “straightedge and compass” is a more accurate description. In these constructions the starting conﬁguration may be regarded as a line with two points marked on the line. Allowable constructions are the following: to form the line through a given point different from ﬁnitely many other lines through that point, to form the line through two distinct points, to form a circle with a given center and a radius different from that of ﬁnitely many other circles through the point, and to form a circle with a given center and radius. Intersections of a line or a circle with previous lines and circles establish new points for continuing the construction. For example a line perpendicular to a given line at a given point can be constructed by drawing any circle centered at the point, using the two intersection points as centers of new circles, drawing those circles so as to have radius larger than the ﬁrst circle, and forming the line between their two points of intersection. An angle at the point P of intersection between two intersecting lines A and B may be bisected by drawing any circle centered at P, selecting one of the points of intersection on each line so that P and the two new points Q and R describe the angle, drawing circles with that same radius centered at Q and R, and forming the line between the points of intersection of the two circles. And so on.

5. Geometric Constructions by Straightedge and Compass

465

Three notable problems remained unsolved in antiquity: (i) how to double a cube, i.e., how to construct the side of a cube of double the volume of a given cube, (ii) how to trisect any constructible angle, i.e., how to divide the angle into three equal parts by means of constructed lines, (iii) how to square a circle, i.e., how to construct the side of a square whose area equals that of a given disk. In this section we shall use the elementary ﬁeld theory of Sections 1–2 to show that doubling a cube and trisecting a 60-degree angle are impossible with straightedge and compass. As to (iii), we shall reduce a proof of the impossibility of squaring the circle to a proof that π is transcendental over Q. This latter proof we give in Section 14. The ﬁrst step is to translate the problem of geometric constructibility into a statement in algebra. Since we are given two points on a line, we can introduce Cartesian coordinates for the Euclidean plane, taking one of the points to be (0, 0) and the other point to be (1, 0). Points in the Euclidean plane are now determined by their Cartesian coordinates, which determine all distances. Distances in turn can be laid off on the x-axis from (0, 0). Thus the question becomes, what points on the x-axis can be constructed? c a

b

d

FIGURE 9.1. Closure of positive constructible x coordinates under multiplication and division. Let C be the set of constructible x coordinates. We are given that 0 and 1 are in C. Closure of C under addition and subtraction is evident; the straightedge is not even necessary for this step. Figure 9.1 indicates why the positive elements of C are closed under multiplication and division. In more detail we take two intersecting lines and mark three known positive members of C as the distances a, b, c in the ﬁgure. Then we form the line through the two points marking a and b, and we form a line parallel to that line through the point marked off by the distance c. The intersection of this parallel line with the other original line deﬁnes a distance d. Then a/b = c/d, and so d = bc/a. By taking a = 1, we see that we can multiply any two members b and c in C, obtaining a result in C.

466

IX. Fields and Galois Theory

By instead taking c = 1, we see that we can divide. The conclusion is that C is a ﬁeld.

c a b FIGURE 9.2. Closure of positive constructible x coordinates under square roots. Figure 9.2 indicates why the positive elements of C are closed under taking square roots. In more detail let a and b be positive members of C with a < b. By forming a circle whose diameter is a segment of length b and by forming a line perpendicular to that line at the point marked by a, we determine the pictured √ right triangle with a side c satisfying a/c = c/b. Then c = ab. By taking one of a and b to be 1, we see that the square root of the other of a and b is in C. This completes the proof of the direct part of the following theorem. Theorem 9.24. The set C of x coordinates that can be constructed from x = 1 and x = 0 by straightedge and compass forms a subﬁeld of R such that the square root of any positive element of the ﬁeld lies in the ﬁeld. Conversely the members of C are those real numbers lying in some subﬁeld Fn of R of the form √ √ √ F1 = Q( a0 ), F2 = F1 ( a1 ), . . . , Fn = Fn−1 ( an−1 ) with each a j in Fj and with a0 , . . . , an−1 all ≥ 0. PROOF OF CONVERSE. Suppose we have a subﬁeld F = Fn of R of the kind described in the statement of the theorem. The possibilities for obtaining a new constructible point from F by an additional construction arise from three situations: the intersection of two lines, each passing through two points of F; the intersection of a line and a circle, each determined by data from F; and the intersection of two circles, each determined by data from F. In the case of two intersecting lines, each line is of the form ax + by = c for suitable coefﬁcients a, b, c in F, and the intersection is a point (x, y) in F × F. So intersections of lines do not force us to enlarge F. For a line and a circle, we assume that the line is given by ax + by = c with a, b, c in F, that the circle has radius in F and center in F × F, and that the lines and the circle actually intersect. The circle is then given by (x −h)2 +(y−k)2 = r 2 with h, k, r in F. Substitution of the equation of the line into the equation of the

5. Geometric Constructions by Straightedge and Compass

467

circle gives us a quadratic equation either for x, and x then determines y, or for y, and y then determines x. The quadratic equation has real roots, √ and thus its discriminant is ≥ 0. The result is that x and y are in a ﬁeld F( l ) for some l ≥ 0 in F. For two circles, without loss of generality, we may take their equations to be x 2 + y2 = r 2

and

(x − h)2 + (y − h)2 = s 2

with r, h, k, s in F. Subtracting gives 2xh + 2yk = h 2 + k 2 − s 2 + r 2 . With this equation and with x 2 + y 2 = r 2 , we again have a line and circle that are being intersected. Thus the same remarks apply as in the previous paragraph. The conclusion is that any new single construction of points of intersection by √ straightedge and compass leads from F to F( l ) for some l ≥ 0 in F. Thus every member of the set C is as described in the theorem. To apply the theorem to prove the impossibility of the three never-accomplished constructions that were described earlier in the section, we observe that [Fi : Fi−1 ] in the theorem equals 1 or 2 for each i. Consequently every member of the k. constructible set C lies in a ﬁnite algebraic extension of Q of degree 2k for some √ 3 For the problem of doubling√a cube, the question amounts to constructing 2. √ We argue by contradiction. If 3 2 lies in Fn as in the theorem, then Q( 3 2 ) ⊆ Fn . With k as the integer ≤ n such that [Fn : Q] = 2k , Corollary 9.7 gives √ √ √ 3 3 3 2k = [Fn : Q] = [Fn : Q( 2 )] [Q( 2 ) : Q] = 3[Fn : Q( 2 )]. Thus 3 must divide a power of 2, and we have arrived at a contradiction. We conclude that it is not possible to double a cube with straightedge and compass. For the problem of trisecting any constructible angle, let us show that a 60◦ angle cannot be trisected. A 60◦ angle is itself constructible, being the angle between two sides in an equilateral triangle. Trisecting a 60◦ angle amounts to constructing cos 20◦ ; sin 20◦ is then (1 − cos2 20◦ )1/2 . To proceed, we derive an equation satisﬁed by cos 20◦ , starting from (cos 20◦ + i sin 20◦ )3 = cos 60◦ + i sin 60◦ =

1 2

+

√ i 3 2 .

We expand the left side and extract the real part of both sides to obtain cos3 20◦ − 3 cos 20◦ sin2 20◦ = 12 . Substituting sin2 20◦ = 1 − cos2 20◦ and simplifying, we see that r = cos 20◦ satisﬁes 4r 3 − 3r − 12 = 0.

468

IX. Fields and Galois Theory

Arguing with Corollary 8.20 as in Example 2 of splitting ﬁelds in Section 2, we readily check that 4X 3 − 3X − 12 is irreducible over Q. Hence [Q(cos 20◦ ) : Q] = 3, and we are led to the same contradiction as for the problem of doubling the cube. Therefore it is not possible to trisect a 60◦ angle with straightedge and compass. For the problem of squaring a circle, let A be the area of the circle, and let 2 2 r be the radius. √ , with r given. √ If the square has side x, then x = A = πr Thus x = r π, and the essence of the matter is to construct π . However, π is known to be transcendental by a theorem of √ F. Lindemann (1882); we give a proof in Section 14. Since π is transcendental, π is transcendental. A fourth notable problem, which leads to further insights, concerns the construction of a regular polygon of outer radius 1 with n sides. This construction is easy with straightedge and compass when n is a power of 2 or is 3 times a power of 2, and Euclid showed that a construction is possible for n = 5. But a construction cannot be managed with straightedge and compass for n = 9, for example, because a central angle in this case is 40◦ and the constructibility of cos 40◦ would imply the constructibility of cos 20◦ . Thus the question is, for what values of n can a regular n-gon be constructed with straightedge and compass? The remarkable answer was given by Gauss. By a Fermat number is meant N any integer of the form 22 + 1. A Fermat prime is a Fermat number that is prime. The Fermat numbers for N = 0, 1, 2, 3, 4 are 3, 5, 17, 257, 65537, and each is a Fermat prime. No larger Fermat primes are known.2 The answer given by Gauss, which we shall prove in stages in Sections 6–9, is as follows. Theorem 9.25 (Gauss).3 A regular n-gon is constructible with straightedge and compass if and only if n is the product of distinct Fermat primes and a power of 2. We can show the relevance of Fermat primes right now, and we can give an indication that if n is a prime number, then a regular n-gon can be constructed if and only if n is a Fermat prime. But a full proof even of this statement will make use of Galois groups, which we take up in the next three sections. For the necessity let n be prime, and suppose that a regular n-gon is constructible. Returning from degrees to radians, we observe that each central angle is 2π/n. Thus the constructibility implies the constructibility of cos 2π/n, and it Fermat numbers for N ≥ 5 are known not to be prime, sometimes by the discovery of N an explicit factor and sometimes by a veriﬁcation that 3 to the power 22 −1 is not congruent to −1 N 5 2 modulo 2 + 1. (Cf. Lemma 9.46.) For example Euler discovered that 641 divides 22 + 1. 3 Gauss announced both the necessity and the sufﬁciency in this theorem in his Disquisitiones Arithmeticae in 1801, but he included a proof of only the sufﬁciency (partly in his articles 336 and 365). A proof of the necessity appeared in a paper of Pierre-Laurent Wantzel in 1837. 2 Many

6. Separable Extensions

469

follows that e2πi/n = cos 2π/n + i sin 2π/n is in the ﬁeld C + iC of constructible points in the complex plane. We have the factorization X n − 1 = (X − 1)(X n−1 + X n−2 + · · · + X + 1). and e2πi/n is a root of the second factor. The ﬁrst example of Eisenstein’s criterion (Corollary 8.22) in Section VIII.5 shows that the second factor is irreducible. According to the results of Section 1, Q(e2πi/n ) is a simple algebraic extension of Q of degree n − 1. Applying Theorem 9.24, we see that n − 1 must be a power of two. Let us write n − 1 = 2m . Suppose m = a2 N with a odd. If a > 1, then the equality N N n = 2a2 + 1 = (22 )a + 1a exhibits n as the sum of two a th powers, necessarily N divisible by 22 +1. Since n is assumed prime, we conclude that a = 1. Therefore N n = 22 + 1, and n is a Fermat prime. We do not quite succeed in proving the converse at this point. If n is the Fermat N prime 22 + 1, then the above argument shows that the degree of Q(e2πi/n ) over N Q is 22 . However, we cannot yet conclude that Q(e2πi/n ) can be built from Q by successively adjoining 2 N square roots, and thus the converse part of Theorem 9.24 is not immediately applicable. Once we have the theory of Galois groups in hand, we shall see that the existence of these intermediate extensions involving square roots is ensured, and then the constructibility follows. 6. Separable Extensions The Galois group Gal(K/k) of a ﬁeld extension K/k is deﬁned to be the set Gal(K/k) = {k automorphisms of K} with composition as group operation. An instance of this group was introduced in the context of Example 9 of Section IV.1; in this example the ﬁeld k was the ﬁeld Q of rationals and the ﬁeld K was a number ﬁeld Q[θ], where θ is algebraic over Q. In studying Gal(K/k) in this chapter, we ordinarily assume that dimk K < ∞, but there will be instances where we do not want to make such an assumption. Beginning in this section, we take up a study of Galois groups in general. We shall be interested in relationships between ﬁelds L with k ⊆ L ⊆ K and subgroups of Gal(K/k). If H is a subgroup of Gal(K/k), then K H = x ∈ K | ϕ(x) = x for all ϕ ∈ H is a ﬁeld called the ﬁxed ﬁeld of H ; it provides an example of an intermediate ﬁeld L and gives a hint of the relationships we shall investigate. We begin with some examples; in each case the base ﬁeld k is the ﬁeld Q of rationals.

470

IX. Fields and Galois Theory

EXAMPLES OF GALOIS GROUPS. √ (1a) K = Q( −1 ). If ϕ is in Gal(K/Q), then we must have ϕ Q = 1, and √ √ √ a root of X 2 + 1. Thus ϕ( −1 ) = ± −1. Since Q and ϕ( √ −1 ) must be √ −1 √ √ generate Q( √−1 ), there are at most two such ϕ’s. On√the other hand, Q( −1 ) and Q(− −1 ) are simple extensions of Q such that −1 and − −1 have the same minimal polynomial. √ Theorem √ 9.11 therefore produces a Q auto√ morphism of Q( −1 ) with ϕ( −1 ) = − −1, namely complex conjugation. We conclude that Gal(K/Q) has order 2, hence that Gal(K/Q) ∼ = C2 . √ (1b) K = Q( 2 ). The same argument applies as in Example 1a, and the conclusion Gal(K/Q) ∼ = C2 . The nontrivial element of the Galois group √ is that √ carries 2 into − 2 and is different from complex conjugation. √ √ (2) K = Q( 3 2 ). If ϕ is in Gal(K/Q), then ϕ = 1, and ϕ( 3 2 ) has to be Q

a root of X 3 − 2. But K√is a subﬁeld of R,√and there is only one root of X 3 − 2 √ √ 3 3 3 3 in R. Hence ϕ( 2 ) = 2. Since Q and 2 generate Q( 2 ) as a ﬁeld, we see that ϕ = 1. We conclude that Gal(K/Q) has order 1, i.e., is the trivial group. (3) K = Q(r ), where r is a root of X 3 − X − 13 . Any ϕ in Gal(K/Q) ﬁxes Q and sends r to a root of X 3 − X − 13 . In Example 2 of splitting ﬁelds in Section 2, we saw that all three complex roots of X 3 − X − 13 lie in K. Arguing as in Example 1a, we see that Gal(K/Q) has order 3, hence that Gal(K/Q) ∼ = C3 . (4) K = Q(e2πi/17 ). According to Section 5, this is the ﬁeld we need to consider in addressing the constructibility of a regular 17-gon. We saw in that section that [K : Q] = 16 and that the minimal polynomial of e2πi/17 over Q is X 16 + X 15 + · · · + X + 1. The other roots of the minimal polynomial in C are e2πil/17 for 2 ≤ l ≤ 16, and these all lie in K. Theorem 9.11 therefore gives us a Q automorphism ϕl of K sending e2πi/17 into e2πil/17 for each l with 1 ≤ l ≤ 16. Since Q and e2πi/17 generate K, a Q automorphism of K is completely determined by its effect on e2πi/17 . Thus the order of Gal(K/Q) is 16. Let us determine the group structure. Since ϕl sends e2πi/17 into e2πil/17 , it sends e2πir/17 = (e2πi/17 )r into (e2πil/17 )r = e2πilr/17 . If we drop the exponential from the notation, we can think of ϕl as deﬁned on the integers modulo 17, the formula being ϕl (r ) = rl mod 17. From this viewpoint ϕl is an automorphism of the additive group of F17 . Lemma 4.45 shows that the group of additive automorphisms of F17 is isomorphic to F× 17 , and it follows from Corollary 4.27 that Gal(K/Q) ∼ = C16 . For our application of constructibility of a regular 17gon, we would like to know whether the elements of K are constructible. Taking Theorem 9.24 into account, we therefore seek an intermediate ﬁeld L of which K is a quadratic extension. Since we know that Gal(K/Q) is cyclic, we can let H ⊆ Gal(K/Q) ∼ = C16 be the 2-element subgroup, and it is natural to try the ﬁxed ﬁeld L = K H . To understand this ﬁxed ﬁeld, we need to understand the

6. Separable Extensions

471

∼ isomorphism F× 17 = C 16 better. Modulo 17, we have 32 = 9,

34 = −22 ,

38 = 24 = −1,

316 = 1.

8 Consequently 3 is a generator of the cyclic group F× 17 . Then H = {3 , 1} = {±1}, 2πir/17 ) = e−2πir/17 = and L = {x ∈ K | ϕ−1 (x) = ϕ+1 (x) = x}. Since ϕ−1 (e e2πir/17 with the overbar indicating complex conjugation, we see that

¯ L = K H = {x ∈ K | x = x}. It is not hard to check that indeed [K : L] = 2. Next we need a subﬁeld L of L with [L : L ] = 2. We try L = K H with H equal to the 4-element cyclic subgroup of Gal(K/Q). Here we have a harder time checking whether L is indeed a quadratic extension of L , but we shall see in Section 8 that it is.4 We continue in this way, and ultimately we end up with the chain of subﬁelds that exhibits the members of K as constructible. We seek to formulate the kind of argument in the above √ examples as a general theorem. We have to rule out the bad behavior of Q( 3 2 ), where one root of the minimal polynomial lies in the ﬁeld but others do not, and we shall do this by assuming that the extension ﬁeld is a “normal” extension, in a sense to be deﬁned in Section 7. In addition, our style of argument shows that we might run into trouble if our irreducible polynomials over k can have repeated roots in K. We shall rule out this bad behavior by insisting that the extension be “separable,” a condition that we introduce now. The extension will automatically be separable if K has characteristic 0. For the remainder of this section, ﬁx the base ﬁeld k. An irreducible polynomial F(X ) in k[X ] is called separable if it splits into distinct degree-one factors in its splitting ﬁeld, i.e., if f (X ) = an (X − x1 ) · · · (X − xn )

with xi = x j for i = j.

Once this splitting into distinct degree-one factors occurs in the splitting ﬁeld, it occurs in any larger ﬁeld as well. Lemma 9.26. A polynomial F(X ) in k[X ] has no repeated roots in its splitting ﬁeld K if and only if GCD(F, F ) = 1, where F (X ) is the derivative of F(X ). 4 Actually, Section 8 will point out how Corollary 9.36 in Section 7 already handles this step. In fact, Corollary 9.37 handles this step with no supplementary argument.

472

IX. Fields and Galois Theory

PROOF. The polynomial F(X ) has repeated roots in K if and only if F(X ) is divisible by (X − r )2 for some r ∈ K, if and only if some r ∈ K has F(r ) = F (r ) = 0 (by Corollary 9.17), if and only if some r ∈ K has (X − r ) dividing F(X ) and also F (X ) (by the Factor Theorem), if and only if some r ∈ K has (X − r ) dividing GCD(F, F ) when the GCD is computed in K, if and only if GCD(F, F ) = 1 when the GCD is computed in K (by unique factorization in K[X ]). However, the Euclidean algorithm calculates GCD(F, F ) without reference to the ﬁeld, and the GCD is therefore the same when computed in K as it is when computed in k. The lemma follows. Proposition 9.27. An irreducible polynomial F(X ) in k[X ] is separable if and only if F (X ) = 0. In particular, every irreducible (necessarily nonconstant) polynomial is separable if k has characteristic 0. PROOF. Since the polynomial F(X ) is irreducible and GCD(F, F ) divides F(X ), GCD(F, F ) equals 1 or F(X ) in all cases. If F (X ) = 0, then GCD(F, F ) = F(X ), and Lemma 9.26 implies that F(X ) is not separable. Conversely if F (X ) = 0, then the facts that GCD(F, F ) divides F (X ) and that deg F < deg F together imply that GCD(F, F ) cannot equal F(X ). So GCD(F, F ) = 1, and Lemma 9.26 implies that F(X ) is separable. Fix an algebraic extension K of k. We say that an element x of K is separable over k if the minimal polynomial of x over k is separable. We say that K is a separable extension of k if every x in K is separable over k. EXAMPLES OF SEPARABLE EXTENSIONS AND EXTENSIONS NOT SEPARABLE. (1) In characteristic 0, every algebraic extension K of k is separable, by Proposition 9.27. (2) Every algebraic extension K of a ﬁnite ﬁeld k is separable. In fact, if x is in K, then [k(x) : k] is ﬁnite. Hence k(x) is a ﬁnite ﬁeld. Then we may assume that K is a ﬁnite ﬁeld, say of order q = p n with p prime. Since the multiplicative group K× has order q − 1, every nonzero element of K is a root of X q−1 − 1, and every root of K is therefore a root of X q − X . The minimal polynomial F(X ) of x over k must then divide X q − X . However, we know that X q − X splits over K and has no repeated roots. Thus F(X ) splits over K and has no repeated roots. Then F(X ) is separable over k, and x is separable over k. (3) Let k = F p (x) be a transcendental extension of the ﬁnite ﬁeld F p . Because this extension is transcendental, X p − x is irreducible over k. Let K be the simple algebraic extension k[X ]/(X p − x), which we can write more simply as k(x 1/ p ). The minimal polynomial of x 1/ p over k is X p − x, and its derivative is p X p−1 = 0 since the derivative of the constant x is 0. By Proposition 9.27, x 1/ p is not separable over k.

6. Separable Extensions

473

The way that separability enters considerations with Galois groups is through the following theorem, explicitly or implicitly. One of the corollaries of the theorem is that if K/k is an algebraic extension, then the set of elements in K separable over k is a subﬁeld of K. Theorem 9.28. Let k ⊆ L ⊆ K be an inclusion of ﬁelds such that K is a simple algebraic extension of L of the form K = L(α), let K be an algebraic closure of K, and let M(X ) be the minimal polynomial of α over L. Then the number of ﬁeld mappings of K into K ﬁxing k is the product of the number of distinct roots of M(X ) in K by the number of ﬁeld mappings of L into K ﬁxing k. REMARKS. An algebraic closure K of K exists by Theorem 9.22. Because K is known to exist, the present theorem reduces to Theorem 9.11 when L = k. PROOF. Any ﬁeld mapping ϕ : K → K is uniquely determined by ϕ L and ϕ(α). If σ = ϕ L , then the equality M(α) = 0 implies that M σ (ϕ(α)) = 0, and thus ϕ(α) has to be a root of M σ (X ). The number of distinct roots of M σ (X ) in K equals the number of distinct roots of M(X ) in K; hence the number of possibilities for ϕ(α) is at most the number of distinct roots of M(X ) in K. Consequently the number of such ϕ’s ﬁxing k is bounded above by the product of the number of distinct roots of M(X ) in K times the number of ﬁeld mappings σ of L into K ﬁxing k. For an inequality in the reverse direction, let σ : L → K be any ﬁeld mapping of L into K ﬁxing k, put L = σ (L), let x be any root of M σ (X ), and form the subﬁeld L (x) of K. Theorem 9.11 shows that there exists a ﬁeld isomorphism ϕ : L(α) → L (x) with ϕ L = σ and ϕ(α) = x, and we can regard ϕ as a ﬁeld mapping of K into K ﬁxing k, extending σ , and having ϕ(α) = x. Thus the number of ﬁeld mappings ϕ : K → k ﬁxing k is bounded below by the product of the number of distinct roots of M(X ) in K times the number of ﬁeld homomorphisms σ of L into K ﬁxing k. Corollary 9.29. Let K = k(α1 , . . . , αn ) be a ﬁnite algebraic extension of the ﬁeld k, and let K be an algebraic closure of K. Then the number of ﬁeld mappings of K into K ﬁxing k is ≤ [K : k]. Moreover, the following conditions are equivalent: (a) the number of ﬁeld mappings of K into K ﬁxing k equals [K : k], (b) each α j is separable over k(α1 , . . . , α j−1 ) for 1 ≤ j ≤ n, (c) each α j is separable over k for 1 ≤ j ≤ n. PROOF. For 1 ≤ j ≤ n, let M j (X ) be the minimal polynomial of α j over k(α1 , . . . , α j−1 ), let d j be the degree of M j (X ), and let s j be the number of distinct roots of M j (X ) in K. Then s j ≤ d j with equality for a particular j if and only if

474

IX. Fields and Galois Theory

α j is separable over k(α1 , . . . , α j−1 ), by deﬁnition. Also, [K : k] = nj=1 d j by Corollary 9.7, and the number of ﬁeld mappings of K into K ﬁxing k is nj=1 s j by iterated application of Theorem 9.28. From these facts, the ﬁrst conclusion of the corollary is immediate, and so is the equivalence of (a) and (b). Condition (a) is independent of the order of enumeration of α1 , . . . , αn . Since we can always take any particular α j to be ﬁrst, we obtain the equivalence of (a) and (c). Corollary 9.30. Let K = k(α1 , . . . , αn ) be a ﬁnite algebraic extension of the ﬁeld k. If each α j for 1 ≤ j ≤ n is separable over k, then K/k is a separable extension. PROOF. Let β be in K, We apply the equivalence of (a) and (c) in Corollary 9.29 once to the set of generators {α1 , . . . , αn } and once to the set of generators {β, α1 , . . . , αn }, and the result is immediate. Corollary 9.31. If K/k is an algebraic ﬁeld extension, then the subset L of elements of K that are separable over k is a subﬁeld of K. PROOF. If α and β are given in L, we apply Corollary 9.30 to the extension k(α, β) of k to see that L contains the subﬁeld generated by k and the elements α and β. Proposition 9.32. If K/k is a separable algebraic extension and if L is a ﬁeld with k ⊆ L ⊆ K, then K is separable over L, and L is separable over k. PROOF. The separability assertion about L/k says the same thing about elements of L that separability of K/k says about those same elements, and it is therefore immediate that L/k is separable. Next let us consider K/L. If x is in K, let F(X ) be its minimal polynomial over k, and let G(X ) be its minimal polynomial over L. Since F(X ) is in L[X ] and F(x) = G(x) = 0, G(X ) divides F(X ). Since K/k is separable, F(X ) splits into distinct degree-one factors in its splitting ﬁeld F. The ﬁeld F contains a splitting ﬁeld of G(X ), and thus the degree-one factors of G(X ) in F[X ] are a subset of the degree-one factors of F(X ) in F[X ]. There are no repeated factors for F(X ), and there can be no repeated factors for G(X ). Thus x is separable over L, and K/L is a separable extension. In studying Galois groups, we shall be chieﬂy interested in the following situation in Corollary 9.29: K is an algebraic ﬁeld extension K = k(α1 , . . . , αn ) of k for which every ﬁeld mapping of K into an algebraic closure that ﬁxes k actually carries K into itself. We seek conditions under which this situation arises,

6. Separable Extensions

475

and then we mine the consequences. As we did in the study begun in Theorem 9.28, we begin with the case of a simple algebraic extension. Let K = k(γ ) be a simple algebraic extension of k, and let F(X ) be the minimal polynomial of γ over k. Any member ϕ of the Galois group Gal(K/k) carries γ to another root γ of F(X ), and ϕ is uniquely determined by γ since k and γ generate the ﬁeld K. An element ϕ of Gal(K/k) carrying γ to γ can exist only if γ is in K. If γ is in K, then k(γ ) ⊇ k(γ ), and the equal ﬁnite dimensionality of k(γ ) and k(γ ) forces k(γ ) = k(γ ). In other words, if γ is in K, then the unique k isomorphism k(γ ) → k(γ ) of Theorems 9.10 and 9.11 carrying γ to γ is a member of Gal(K/k). Making a count of what happens to all the elements γ , we see that we have proved the following. Proposition 9.33. Let K = k(γ ) be a simple algebraic extension of k, and let F(X ) be the minimal polynomial of γ . Then | Gal(K/k)| ≤ [K : k] with equality if and only if F(X ) is a separable polynomial and K is a splitting ﬁeld of F(X ) over k. √ EXAMPLE. For K = Q( 3 2 ) with minimal polynomial F(X ), we know that F(X ) does not split in K; the nonreal roots of F(X ) do not lie in K. Proposition 9.33 gives us | Gal(K/Q)| < [K : Q] = 3, and a glance at the argument preceding Proposition 9.33 shows that | Gal(K/Q)| has to be 1. It is possible to investigate the case of several generators directly, but it is more illuminating to reduce it to the case of a single generator as in Proposition 9.33. The tool for doing so is the following important theorem. Theorem 9.34 (Theorem of the Primitive Element). Let K/k be a separable algebraic extension with [K : k] < ∞. Then there exists an element γ in K such that K = k(γ ). PROOF. We can write K = k(x1 , . . . , xn ), and we proceed by induction on n, the case n = 1 being trivial. For general n, let L = k(x1 , . . . , xn−1 ), so that K = L(xn ). By the inductive hypothesis, L is of the form L = k(α) for some α in K, and thus K = k(α, xn ). Changing notation, we see that it is enough to prove that whenever K is a separable algebraic extension of the form K = k(α, β), then K is of the form K = k(γ ) for some γ . We shall show this for some γ of the form γ = β + cα with c in k. Because every ﬁnite extension of a ﬁnite ﬁeld is separable (by Example 2 of separable extensions), we may assume that k is an inﬁnite ﬁeld.

476

IX. Fields and Galois Theory

Let F(X ) and G(X ) be the minimal polynomials of α and β over k, and let K be an extension in which F(X )G(X ) splits, i.e., in which F(X ) and G(X ) both split. Let α1 = α, α2 , . . . , αm and β1 = β, β2 , . . . , βn be the roots of F(X ) and G(X ) in K , in each case necessarily distinct by deﬁnition of separability of α and β. Deﬁne L = k(γ ) with γ = β + cα, where c is a member of k to be speciﬁed. For suitable c, we shall show that α is in L. Then β = γ − cα must be in L, and we obtain K ⊆ L. Since γ is in K, the reverse inclusion is built into the construction, and thus we will have K = L. We shall compute the minimal polynomial of α over L. We know that α is a root of F(X ), and we put H (X ) = G(γ − cX ). Then H (X ) is in L[X ] ⊆ K [X ], and G(β) = 0 implies H (α) = 0. Therefore X − α divides both F(X ) and H (X ) in the ring K [X ]. Let us determine GCD(F, H ) in K [X ]. The separability of α says that X − α divides F(X ) only once. Since F(X ) splits in K [x], any other prime divisor of GCD(F, H ) in K [X ] has to be of the form X − α j with j = 1. The deﬁnition of H (X ) gives H (α j ) = G(γ − cα j ). If G(γ − cα j ) = 0, then γ − cα j = βi for some i, with the consequence that β + cα − cα j = βi and c = (βi − β)(α − α j )−1 . Since k is an inﬁnite ﬁeld, some choice of c in K makes GCD(F, H ) = X − α in K [X ]. Then GCD(F, H ) = X − α, up to a scalar factor, in L[X ] since F(X ) and H (X ) are in L[X ] and since the GCD can be computed without reference to the ﬁeld containing both elements. The ratio of the constant term to the coefﬁcient of X has to be in L independently of the scalar factor multiplying X − α, and therefore α is in L. This completes the proof.

7. Normal Extensions Proposition 9.33 suggests that the failure of equality to hold in the inequality | Gal(K/k)| ≤ [K : k] has something to do with the failure of polynomials over k to split √ fully in K once they have at least one root in K. In the case of the extension Q( 3 2 )/Q, where equality fails, the Galois group is trivial and therefore gives us no information about the extension. Thus it makes sense to regard the failure of equality to hold as an undesirable situation. Accordingly, we make a deﬁnition, choosing among several equivalent conditions one that is easy to apply. A ﬁnite separable5 algebraic extension K of a ﬁeld k is said to be normal over k if K is the splitting ﬁeld of some F(X ) in k[X ]. The following proposition asserts some powerful consequences of this condition. 5 A more advanced treatment might proceed without the assumption of separability for as long as possible. But it is unnecessary to do so in this volume, and the assumption of separability makes the Theorem of the Primitive Element available to us.

7. Normal Extensions

477

Proposition 9.35. Let K be a ﬁnite separable algebraic extension of a ﬁeld k, so that | Gal(K/k)| ≤ [K : k]. Then the following are equivalent. (a) K is the splitting ﬁeld of some F(X ) in k[X ], i.e., K is normal over k, (b) every irreducible polynomial F(X ) in k[X ] with a root in K splits in K, i.e., K contains a splitting ﬁeld for each such F(X ), (c) | Gal(K/k)| = [K : k], (d) k = KG for G = Gal(K/k). REMARKS. We prove that (a) and (c) are equivalent, that the equivalent (a) and (c) imply (d), that (d) implies (b), and that (b) implies (a). PROOF. By separability and Theorem 9.34 we can write K = k(γ ) throughout the proof for some γ in K. Let M(X ) be the minimal polynomial of γ over k. Suppose (a) holds. We prove (c). Write K = k(γ ) = k(α1 , . . . , αn ), where α1 , . . . , αn are the roots of some F(X ) in k[X ] that splits over k. We may assume that F(X ) has no repeated prime factors and therefore, by separability of K/k, that α1 , . . . , αn are distinct. Then γ = H (α1 , . . . , αn ) for some H in k[X 1 , . . . , X n ]. Proposition 9.33 will establish (c) if we show that M(X ) splits over K. Let K ⊇ K be a ﬁnite extension in which M(X ) splits, and let γ be a root of M(X ) in K . We are to show that γ is in K. Theorem 9.11 produces a k isomorphism ϕ : k(γ ) → k(γ ) with ϕ(γ ) = γ . Since ϕ(αi ) is a root of F(X ) for each i, ϕ(αi ) = α j (i) for some j = j (i) that is unique since α1 , . . . , αn are distinct. Thus ϕ permutes {α1 , . . . , αn }, and γ = ϕ(γ ) = ϕ(H (α1 , . . . , αn )) = H1 (α1 , . . . , αn ) for some H1 in k[X 1 , . . . , X n ]. Therefore γ is in k(α1 , . . . , αn ) = K. This proves (c). Suppose (c) holds. We prove (a). Proposition 9.33, in the presence of condition (c) and the given separability of K/k, implies that K is a splitting ﬁeld of M(X ) over k. Thus (a) holds. Suppose (a) holds. We prove (d). Let k = KG . Since every member of Gal(K/k) ﬁxes k , Gal(K/k) ⊆ Gal(K/k ). Meanwhile, (a) for K/k implies (a) for K/k , and K is separable over k by Proposition 9.32. Since (a) implies (c), (c) holds for both k and k, and we have [K : k] = | Gal(K/k)| ≤ | Gal(K/k )| = [K : k ]. Since k ⊇ k, the inequality of dimensions implies that k = k. Thus (d) holds. Suppose (d) holds. We prove (b). Let F(X ) be an irreducible polynomial in k[X ] having a root r in K. The polynomial F(X ) is necessarily the minimal polynomial of r over k. Enumerate {ϕ(r ) | ϕ ∈ Gal(K/k)} as r1 , . . . , rn , with

IX. Fields and Galois Theory

478

n any possible repetitions included. If J (X ) is deﬁned to be i=1 (X − ri ), then expansion of the product gives |G|−1 ri X + ri r j X |G|−2 − · · · ± ri . J (X ) = X |G| − i

i< j

i

Each member ϕ of Gal(K/k) carries each coefﬁcient of J (X ) into itself since ϕ permutes the elements ri . Since KG = k, we see therefore that J (X ) is in k[X ]. Since J (r ) = 0 and F(X ) is the minimal polynomial of r , F(X ) divides J (X ). Over K, J (X ) splits because of its deﬁnition. By unique factorization, F(X ) must split too. Thus (b) holds. Finally if (b) holds, then M(X ), being irreducible over k and having γ as a root in K, splits in K. Thus K is a splitting ﬁeld for M(X ) over k, and (a) holds. Corollary 9.36. If K is a ﬁnite normal separable extension of k and if L is a ﬁeld with k ⊆ L ⊆ K, then K is a ﬁnite normal separable extension of L, and the subgroup H = Gal(K/L) of Gal(K/k) has |H | · [L : k] = | Gal(K/k)| . PROOF. The ﬁeld K is a separable extension of the intermediate ﬁeld L by Proposition 9.32, and it is a normal extension by Proposition 9.35a. Therefore Proposition 9.35c gives | Gal(K/L)| = [K : L], and we have |H |·[L : k] = | Gal(K/L)|·[L : k] = [K : L]·[L : k] = [K : k] = | Gal(K/k)|, the last two equalities holding by Corollary 9.7 and Proposition 9.35c.

Corollary 9.37. Let K/k be a separable algebraic extension, and suppose that H is a ﬁnite subgroup of Gal(K/k). Then K/K H is a ﬁnite normal separable extension, H is the subgroup Gal(K/K H ) of Gal(K/k), and [K : K H ] = |H |. PROOF. Proposition 9.32 shows that K is separable over K H . For an arbitrary element x of K, form the polynomial in K[X ] given by F(X ) =

ϕ∈H

(X − ϕ(x)).

If ϕ0 is in H , then F ϕ0 is given by replacing each ϕ(x) by ϕ0 ϕ(x), and the product is unchanged. Therefore F(X ) = F ϕ0 (X ), and F(X ) is in K H [X ]. Thus F(X ) is a polynomial in K H [X ] that has x as a root and splits in K. The minimal polynomial M(X ) of x over K H must divide F(X ), and it too has x as a root.

8. Fundamental Theorem of Galois Theory

479

By unique factorization in K[X ], M(X ) must split in K. Thus K/K H will be a normal extension if it is shown that [K : K H ] < ∞. The element x has [K H (x) : K H ] = deg M(X ) ≤ deg F(X ) = |H |, and the claim is that [K : K H ] ≤ |H |. Assuming the contrary, we would at some point have an inequality [K H (x1 , . . . , xn ) : K H ] > |H | because every element of K is algebraic over k. By the Theorem of the Primitive Element (Theorem 9.34), K H (x1 , . . . , xn ) = K H (z) for some element z, and therefore [K H (x1 , . . . , xn ) : K H ] = [K H (z) : K H ] ≤ |H |, contradiction. We conclude that [K : K H ] ≤ |H |. From the previous paragraph, K/K H is a ﬁnite separable normal extension. The deﬁnition of K H shows that H ⊆ Gal(K/K H ), and Proposition 9.35c gives | Gal(K/K H )| = [K : K H ]. Putting these facts together with the inequality [K : K H ] ≤ |H | from the previous paragraph, we have |H | ≤ | Gal(K/K H )| = [K : K H ] ≤ |H | with equality on the left only if H = Gal(K/K H ). Equality must hold throughout the displayed line since the ends are equal, and therefore H = Gal(K/K H ). 8. Fundamental Theorem of Galois Theory We are now in a position to obtain the main result in Galois theory. Theorem 9.38 (Fundamental Theorem of Galois Theory). If K is a ﬁnite normal separable extension of k, then there is a one-one inclusion-reversing correspondence between the subgroups H of Gal(K/k) and the subﬁelds L of K that contain k, corresponding elements H and L being given by L = KH

and

H = Gal(K/L).

The effect of the theorem is to take an extremely difﬁcult problem, namely ﬁnding intermediate ﬁelds, and reduce it to a problem that is merely difﬁcult, namely ﬁnding the Galois group. For example the ﬁniteness of Gal(K/k) implies that there are only ﬁnitely many subgroups of Gal(K/k), and the theorem therefore implies that there are only ﬁnitely many intermediate ﬁelds; this ﬁniteness of the number of intermediate ﬁelds is not so obvious without the theorem. As a reminder of the availability of Theorem 9.38, Proposition 9.35, and Corollary 9.36, it is customary to refer to a ﬁnite normal separable extension as a ﬁnite Galois extension. Before coming to the proof of the theorem, let us examine what the theorem says for the examples in Section 6. In each case the ﬁeld k is the ﬁeld Q of rationals. The extensions are separable because the characteristic is 0.

480

IX. Fields and Galois Theory

EXAMPLES. √ (1a) K = Q( −1 ). This is a splitting ﬁeld for X 2 + 1. Proposition 9.33 gives | Gal(K/Q)| = [K : Q] = 2. Thus Gal(K/Q) ∼ = C2 . There are no nontrivial subgroups, and there are consequently no intermediate ﬁelds. We knew this already since there cannot be any intermediate Q vector spaces between Q and K. Thus the theorem tells us nothing new. √ (1b) K = Q( 2 ). Similar remarks apply. √ and (2) K = Q( 3 2 ). This extension is not normal, √ √ the theorem does not apply to K. If we adjoin r to K with r 2 + ( 3 2 )r + ( 3 2 )2 = 0, we obtain a splitting ﬁeld K for X 3 − 2 over Q. Then K is a normal extension of Q, and the theorem applies. Since each element of Gal(K /Q) permutes the three roots of X 3 − 2 and is determined by its effect on these roots, Gal(K /Q) is isomorphic to a subgroup of the symmetric group S3 . The Galois group Gal(K /Q) has order [K : Q] = 6 and hence is isomorphic to the whole symmetric group S3 . The group S3 has three subgroups of order 2 and one subgroup of order 3. Therefore K has three intermediate ﬁelds of degree 3 and one of degree 2. The intermediate ﬁelds of degree 3 are the three ﬁelds generated by Q and one of the three roots of X 3 − 2. The intermediate ﬁeld of degree 2 corresponds to the alternating subgroup of order 3 and is the subﬁeld generated by Q and the cube roots of 1. It is a splitting ﬁeld for X 2 + X + 1 over Q. (3) K = Q(r ), where r is a root of X 3 − X − 13 . We know from Section 2 that X 3 − X − 13 is irreducible over Q and splits in K, and K by deﬁnition is therefore normal. Proposition 9.33 tells us that Gal(K/Q) has order 3 and hence is isomorphic to C3 . There are no nontrivial subgroups, and Theorem 9.38 tells us that there are no intermediate ﬁelds. We could have seen in more elementary fashion that there are no intermediate ﬁelds by using Corollary 9.7, since the corollary tells us that the degree of an intermediate ﬁeld would have to divide 3. (4) K = Q(e2π 1/17 ). We have seen that [K : Q] = 16 and that Gal(K/Q) ∼ = × ∼ F17 = C16 . Let c be a generator of the cyclic Galois group. Let H2 = {1, c8 }, H4 = {1, c4 , c8 , c12 }, and H8 = {1, c2 , c4 , c6 , c8 , c10 , c12 , c14 }. Then put L2 = K H2 ,

L4 = K H4 ,

L8 = K H8 .

The inclusions among our subgroups are {1} ⊆ H2 ⊆ H4 ⊆ H8 ⊆ Gal(K/Q), and the theorem says that the correspondence with intermediate ﬁelds reverses inclusions. Then we have K ⊇ L2 ⊇ L4 ⊇ L8 ⊇ Q.

8. Fundamental Theorem of Galois Theory

481

Applying Corollary 9.36, we see that each of these subﬁelds is a quadratic extension of the next-smaller one. Theorem 9.24 says that the members of K are therefore constructible with straightedge and compass. Consequently a regular 17-gon is constructible with straightedge and compass. The constructibility or nonconstructibility of regular n-gons for general n will be settled in similar fashion in the next section. In Section 12 we return to the question of using Galois theory to guide us through the actual steps of the construction when it is possible. PROOF OF THEOREM 9.38. The function L → Gal(K/L) has domain the set of all intermediate ﬁelds and range the set of all subgroups of Gal(K/k), since an element in Gal(K/L) is necessarily in Gal(K/k). Each such extension K/L is separable by Proposition 9.32 and is normal by Proposition 9.35a. Thus Proposition 9.35d applies to each K/L and shows that L = KGal(K/L) . Consequently the function L → Gal(K/L) is one-one. If H is a subgroup of Gal(K/k), then Corollary 9.37 shows that L = K H is an intermediate ﬁeld for which H = Gal(K/L), and therefore the function L → Gal(K/L) is onto. It is immediate from the deﬁnition of Galois group that L1 ⊆ L2 implies Gal(K/L1 ) ⊇ Gal(K/L2 ), and it is immediate from the formula L = KGal(K/L) that Gal(K/L1 ) ⊇ Gal(K/L2 ) implies L1 ⊆ L2 . This completes the proof. Corollary 9.39. If K is a ﬁnite Galois extension of k and if L is a subﬁeld of K that contains k, then L is a normal extension of k if and only if Gal(K/L) is a normal subgroup of Gal(K/k). In this case, the map Gal(K/k) → Gal(L/k) given by restriction from K to L is a group homomorphism that descends to a group isomorphism ( Gal(K/k) Gal(K/L) ∼ = Gal(L/k). PROOF. Let L correspond to H = Gal(K/L) in Theorem 9.38, so that L = K H . If ϕ is in Gal(K/k), then Kϕ H ϕ

−1

= {k ∈ K | ϕhϕ −1 (k) = k for all h ∈ H } = {ϕ(k ) ∈ K | ϕh(k ) = ϕ(k ) for all h ∈ H } = {ϕ(k ) ∈ K | h(k ) = k for all h ∈ H } = ϕ(K H ) = ϕ(L).

Since the correspondence of Theorem 9.38 is one-one onto, ϕ H ϕ −1 = H if and only if ϕ(L) = L. Therefore H is a normal subgroup of Gal(K/k) if and only if ϕ(L) = L for all ϕ ∈ Gal(K/k). Now suppose that H is a normal subgroup of Gal(K/k). We have just seen that ϕ(L) = L for all ϕ ∈ Gal(K/k). Then each ϕ deﬁnes by restriction a member

482

IX. Fields and Galois Theory

ϕ = ϕ L of Gal(L/k), and ϕ → ϕ is certainly a group homomorphism. The kernel of ϕ → ϕ is the subgroup of Gal(K/k) given by ϕ ∈ Gal(K/k) ϕ L = 1 , and this is just ( Gal(K/L). Thus ϕ → ϕ descends to a one-one homomorphism of Gal(K/k) Gal(K/L) into Gal(L/k), and we have | Gal(K/k)|/| Gal(K/L)| ≤ | Gal(L/k)|. We make use of Corollary 9.7 relating degrees of extensions. Applying Proposition 9.35c to K/k and K/L, as well as Proposition 9.33 to L/k, we obtain ( [L : k] = [K : k] [K : L] = | Gal(K/k)|/| Gal(K/L)| ≤ | Gal(L/k)| ≤ [L : k], with equality at the ﬁrst ≤ sign only if ϕ → ϕ is onto Gal(L/k) and with equality at the second ≤ sign only if L is the splitting ﬁeld over k of the minimal polynomial of a certain element γ of L. Equality must hold in both cases because the end members of the display are equal, and we conclude that ϕ → ϕ is onto and that L/k is a normal extension. We are left with proving that if L/k is a normal extension, then H is a normal subgroup of Gal(K/k). Thus let L/k be normal. In view of the conclusion of the ﬁrst paragraph of the proof, it is enough to prove that ϕ(L) = L for all ϕ ∈ Gal(K/k). By deﬁnition of normal extension, L is the splitting ﬁeld of some polynomial F(X ) in k[X ]. We may assume that F(X ) is monic. Let us write F(X ) = (X − x1 ) · · · (X − xn )

with all x j in L.

Applying a given member ϕ of Gal(K/k) to the coefﬁcients, we obtain F(X ) = (X − ϕ(x1 )) · · · (X − ϕ(xn )), and here the ϕ(x j )’s are known only to be in K. By unique factorization in K[X ], ϕ(xi ) = x j (i) for some j = j (i). Therefore ϕ(xi ) is in L for all i. Since L is the splitting ﬁeld of F(X ) over k, L = k(x1 , . . . , xn ). Thus ϕ maps L into L. The examples of Galois groups given in Section 6 all involved ﬁelds that are ﬁnite extensions of the rationals Q. As we shall see in Section 17, it is important for the understanding of Galois groups of ﬁnite extensions of Q to be able to identify Galois groups of ﬁnite extensions of ﬁnite ﬁelds. This matter is addressed in the following proposition.

9. Application to Constructibility of Regular Polygons

483

Proposition 9.40. Let K be a ﬁnite extension of the ﬁnite ﬁeld Fq , where q = pa and p is prime, and suppose that [K : Fq ] = n. Then K is a Galois extension of Fq , the Galois group Gal(K/Fq ) is cyclic of order n, and a generator a is the a th -power Frobenius automorphism x → x q = x p . n

PROOF. Theorem 9.14 shows that K is a splitting ﬁeld for X q − X over F p . n Hence it is a splitting ﬁeld for X q − X over Fq , and K/Fq is a normal extension. n The polynomial X q − X has no multiple roots, and it follows that K/Fq is a separable extension. Deﬁne ϕ by ϕ(x) = x q . Lemma 9.18 shows that ϕ is an automorphism of K. Since every member of Fq× has order dividing q − 1, every nonzero element of Fq is ﬁxed by ϕ. The map ϕ certainly carries 0 to 0, and thus ϕ is in Gal(K/Fq ). By a similar argument, ϕ n ﬁxes every element of K, and hence ϕ n = 1. Corollary 4.27 shows that K× is cyclic, hence that there exists an element y in K× such that y l = 1 for 1 ≤ l < q n − 1. This y has y l = y for 2 ≤ l ≤ q n − 1. Then k ϕ k (y) = y q cannot be 1 for 1 ≤ k ≤ n − 1, and ϕ must have order exactly n. This shows that ϕ generates a cyclic subgroup of order n in Gal(K/Fq ). Since n is an upper bound for the order of Gal(K/Fq ) by Proposition 9.33, this cyclic subgroup exhausts the Galois group. EXAMPLE. Suppose that we are given a polynomial with coefﬁcients in F p and we want to ﬁnd the Galois group of a splitting ﬁeld. Since there are efﬁcient computer programs for factoring the polynomial into irreducible polynomials, let us take that factorization as done. The Galois group will be cyclic of some order with generator the Frobenius automorphism x → x p . For an irreducible polynomial of degree n, the splitting ﬁeld has degree n, and the smallest power of x → x p that gives the identity is the n th power. The conclusion is that the Galois group is cyclic of order equal to the least common multiple of the degrees of the irreducible constituents, a generator being the Frobenius automorphism.

9. Application to Constructibility of Regular Polygons In this section we use Galois theory to give a proof of Theorem 9.25 concerning the constructiblity of regular n-gons. Let us recall the statement. THEOREM 9.25 (Gauss). A regular n-gon is constructible with straightedge and compass if and only if n is the product of distinct Fermat primes and a power of 2. N

PROOF OF SUFFICIENCY. First suppose that n is a Fermat prime n = 22 + 1. N Let K = Q(e2πi/n ). We saw in Section 5 that the degree [K : Q] is 22 , hence is

484

IX. Fields and Galois Theory

a power of 2. Furthermore we know that K is a separable extension of Q, being of characteristic 0, and it is normal, being the splitting ﬁeld for X n − 1 over Q. N In Section 6 we saw that the Galois group Gal(K/Q) is cyclic of order 22 . Let c be a generator of this group. For each integer k with 0 ≤ k ≤ 2 N , let H2k be 2 N −k the unique cyclic subgroup of Gal(K/Q) of order 2k . For this subgroup, c2 is a generator. Put L2k = K H2k . Then we have inclusions {1} ⊆ H2 ⊆ H22 ⊆ · · · H2k ⊆ · · · ⊆ H22 N −1 ⊆ H22 N = Gal(K/Q), the index being 2 at each stage. Theorem 9.38 says that the correspondence with intermediate ﬁelds reverses inclusions and that the degree of each consecutive extension of subﬁelds matches the index of the corresponding consecutive subgroups. The intermediate ﬁelds are therefore of the form K ⊇ L2 ⊇ L22 ⊇ · · · L2k ⊇ · · · ⊇ L22 N −1 ⊇ L22 N = Q, and the degree in each case is 2. In view of the formula for the roots of a quadratic polynomial, each extension is obtained by adjoining some square root. By Theorem 9.24 the members of K are constructible with straightedge and compass. In particular, e2πi/n is constructible, and a regular n-gon is constructible. Next suppose that e2πi/r and e2πi/s are both constructible and that GCD(r, s) = 1. Choose integers a and b with ar + bs = 1, so that as + br = r1s . Then the equality (e2πi/s )a (e2πi/r )b = e2πi/(r s) shows that e2πi/(r s) is constructible. This proves the sufﬁciency for any product of distinct Fermat primes. Bisection of an angle is always possible with straightedge and compass, as was observed in the third paragraph of Section 5, and the proof of the sufﬁciency in Theorem 9.25 is therefore complete. REMARKS. The above proof shows that the construction is possible, but it gives little clue how to carry out the construction. We shall address this matter further in Section 12. We turn our attention to the necessity—that n has to be the product of distinct Fermat primes and a power of 2 if a regular n-gon is constructible. For the moment let n ≥ 1 be any integer. Let us consider the distinct n th roots of 1 in C, which are ek2πi/n for 0 ≤ k < n. The order of each of these elements divides n, and the order is exactly n if and only if GCD(k, n) = 1. In this case we say that ek2πi/n is a primitive n th root of 1. Deﬁne the cyclotomic polynomial n (X ) by (X − ek2πi/n ). n (X ) = GCD(k,n)=1, 0≤k 0, then ϕ(n) =

r j=1

k −1

pj j

( p j − 1).

For constructibility this must be a power of 2. Then each p j dividing n must be 1 more than a power of 2, i.e., must be 2 or a Fermat prime, and the only p j allowed to have p 2j dividing n is p j = 2.

10. Application to Proving the Fundamental Theorem of Algebra In this section we use Galois theory to give a proof of the Fundamental Theorem of Algebra. Let us recall the statement. THEOREM 1.18 (Fundamental Theorem of Algebra). Any polynomial in C[X ] with degree ≥ 1 has at least one root.

10. Application to Proving the Fundamental Theorem of Algebra

487

We begin with a lemma that handles three easy special cases. Lemma 9.43. There are no ﬁnite extensions of R of odd degree greater than 1, the only extension of R of degree 2 up to R isomorphism is C, and there are no ﬁnite extensions of C of degree 2. PROOF. If K is a ﬁnite extension of R of odd degree and if x is in K, then [R(x) : R] is odd, and consequently the minimal polynomial F(X ) of x over R is irreducible of odd degree. By Proposition 1.20, which is derived from the Intermediate Value Theorem of Section A3 of the appendix, F(X ) has at least one root in R. Therefore F(X ) has degree 1, and x is in R. If F(X ) is an irreducible polynomial in R[X ] of degree 2, then F(X ) splits in C by the quadratic formula, and hence the only extension of R of degree 2 is C, up to R isomorphism, by the uniqueness of splitting ﬁelds (Theorem 9.13). Let G(X ) = X 2 + bX + c be a polynomial in C[X ] of degree 2. Then G(X ) has a root x in C given by the quadratic formula since every member of C has a square root6 in C, and G(X ) cannot be irreducible. Since any ﬁnite extension of C of degree 2 would have to be of the form C(x), with x equal to a root of an irreducible quadratic polynomial over C, there can be no such extension. PROOF OF THEOREM 1.18. First let us show that every irreducible member F(X ) of R[X ] splits over C. Let K be a splitting ﬁeld for F(X ). Say that [K : R] = 2m N with N odd. Then K is a Galois extension of R, and | Gal(K/R)| = 2m N . By the Sylow Theorems (particularly Theorem 4.59a), let H be a Sylow 2-subgroup of Gal(K/R). This H has |H | = 2m . The ﬁeld L = K H that corresponds to H under Theorem 9.38 has [L : R] = N with N odd, and the ﬁrst conclusion of Lemma 9.43 shows that N = 1. Thus | Gal(K/R)| = 2m . Corollary 4.40 shows that Gal(K/R) has nested subgroups of all orders 2m−k with 0 ≤ k ≤ m, and Theorem 9.38 says that the corresponding ﬁxed ﬁelds are nested and have respective degrees 2k with 0 ≤ k ≤ m. The extension ﬁeld of R for k = 1 is necessarily C by Lemma 9.43, and Lemma 9.43 shows that there are no quadratic extensions of C. Therefore m = 0 or m = 1, and the possible splitting ﬁelds for F(X ) are R and C in the two cases. To complete the proof, suppose that K is a ﬁnite algebraic extension of C of degree n. Then K is a ﬁnite algebraic extension of R of degree 2n. The Theorem of the Primitive Element allows us to write K = R(x) for some x ∈ K, and the minimal polynomial of x over R necessarily has degree 2n. The previous paragraph shows that this polynomial splits in C. Thus x is in C, and K = C. This completes the proof. see that every member of C has a square root in C, let √c + di be given with c and √d real and with d = 0. Let a and b be real numbers with a 2 = 12 (c + c2 + d 2 ), b2 = 12 (−c + c2 + d 2 ), and sgn(ab) = sgn d. Then (a + bi)2 = c + di. 6 To

488

IX. Fields and Galois Theory

11. Application to Unsolvability of Polynomial Equations with Nonsolvable Galois Group The quadratic formula for ﬁnding the roots of a quadratic polynomial has in principle been known since the time of the Babylonians about 400 B.C.7 The corresponding problem of ﬁnding roots of cubics was unsolved until the sixteenth century, and Cardan’s formula was discovered at that time. The original formula assumes real coefﬁcients and was in two parts, a ﬁrst case corresponding to what we now view as one real root and two complex roots, the second case corresponding to what we view as three real roots.8 There is a similar formula, but more complicated, for solving quartics. Further centuries passed with no progress on ﬁnding a corresponding formula for the roots of a polynomial of degree 5 or higher. The introduction of Galois theory in the early nineteenth century made it possible to prove a surprising negative statement about all degrees beyond 4. Suppose that we are given a polynomial equation with coefﬁcients in the ﬁeld Q or a more general ﬁeld k of characteristic 0. In this section we use Galois theory to address the question whether the roots of the equation in a splitting ﬁeld can be expressed in terms of k and the adjunction of ﬁnitely many n th roots to the ﬁeld, for various values of n. For the moment let us say in this case that the roots are “expressible in terms of the members of k and radicals.” We shall make this notion more precise shortly. Recall from Section IV.8 that with a ﬁnite group G, we can ﬁnd a strictly decreasing sequence of subgroups starting with G and ending with {1} such that each subgroup is normal in the next larger one and each quotient group is simple. Such a series was deﬁned to be a composition series for G. The Jordan– H¨older Theorem (Corollary 4.50) says that the respective consecutive quotients are isomorphic for any two composition series, apart from the order in which they appear. We deﬁne the ﬁnite group G to be solvable if each of the consecutive quotients is cyclic of prime order, rather than nonabelian. It is enough that the group have a normal series for which each of the consecutive quotients is abelian. Examples of solvable and nonsolvable groups are obtainable from the calculations in Section IV.8: abelian groups and groups of prime-power order are always solvable, the symmetric group S4 and each of its subgroups are solvable, and the 7 The Babylonians did not actually have equations but had an algorithmic method that amounted to completing the square. 8 Cardan’s name was Girolamo Cardano. The solution in the ﬁrst case of the cubic seems to have been discovered by Scipione dal Ferro and later by Nicolo Tartaglia. Dal Ferro died in 1526 and passed the secret method to his student Antonio Fior. In 1535 Fior engaged in a public contest with Tartaglia at solving cubics, and he lost. Cardano wheedled the solution method in the ﬁrst case from Tartaglia, published it in 1539, and discovered and published the solution in the second case. Cardano’s student Lodovico Ferrari discovered how to solve quartics, and Cardano published that solution as well. See “St. Andrews” in the Selected References for more information.

11. Application to Unsolvability of Polynomial Equations with Nonsolvable Group

489

symmetric group S5 is not solvable since a composition series is S5 ⊇ A5 ⊇ {1} and the group A5 is simple (Theorem 4.47). Modulo a precise deﬁnition for a ﬁeld k of the words “expressible in terms of the members of k and radicals,” the answer to our main question is as follows. Theorem 9.44 (Abel, Galois).9 Let k be a ﬁeld of characteristic 0, let F(X ) be in k[X ], and let K be a splitting ﬁeld of F(X ) over k. Then the roots of F(X ) are expressible in terms of the members of k and radicals if and only if the group Gal(K/k) is solvable. EXAMPLE. With k = Q, let F(X ) be the polynomial F(X ) = X 5 − 5X + 1 in Q[X ]. We shall show that (i) F(X ) is irreducible over Q, (ii) F(X ) has three roots in R and one pair of conjugate complex roots in C, (iii) the splitting ﬁeld K over Q of any polynomial of degree 5 for which (i) and (ii) hold has Galois group with Gal(K/Q) ∼ = S5 . We know that from Theorem 4.47 that S5 is not solvable, and Theorem 9.44 therefore allows us to conclude that the roots of X 5 − 5X + 1 are not expressible in terms of the members of Q and radicals. To prove (i), we apply Eisenstein’s criterion (Corollary 8.22) to the polynomial F(X − 1) = X 5 − 5X 4 + 10X 3 − 10X 2 + 5 and to the prime p = 5, and the irreducibility is immediate. To prove (ii), we observe that F(−2) < 0, F(0) > 0, F(1) < 0, F(2) > 0. Applying the Intermediate Value Theorem (Section A3 of the appendix), we see that there are at least three roots in R. Since F (X ) = 5(X 4 − 1) has exactly the two roots ±1 in R, F(X ) has at most three roots in R by an application of the Mean Value Theorem. To prove (iii), label the roots 1, 2, 3, 4, 5 with 1 and 2 denoting the nonreal

Advisory Board Anthony W. Knapp, State University of New York at Stony Brook, Emeritus

Anthony W. Knapp

Basic Algebra Along with a companion volume Advanced Algebra

Birkh¨auser Boston • Basel • Berlin

Anthony W. Knapp 81 Upper Sheep Pasture Road East Setauket, NY 11733-1729 U.S.A. e-mail to: [email protected] http://www.math.sunysb.edu/˜ aknapp/books/b-alg.html

Cover design by Mary Burgess. Mathematics Subject Classiciﬁcation (2000): 15-01, 20-02, 13-01, 12-01, 16-01, 08-01, 18A05, 68P30 Library of Congress Control Number: 2006932456 ISBN-10 0-8176-3248-4 ISBN-13 978-0-8176-3248-9

eISBN-10 0-8176-4529-2 eISBN-13 978-0-8176-4529-8

Advanced Algebra Basic Algebra and Advanced Algebra (Set)

ISBN 0-8176-4522-5 ISBN 0-8176-4533-0

Printed on acid-free paper. c 2006 Anthony W. Knapp All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkh¨auser Boston, c/o Springer Science+Business Media LLC, 233 Spring Street, New York, NY 10013, USA) and the author, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identiﬁed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

9 8 7 6 5 4 3 2 1 www.birkhauser.com

(EB)

To Susan and To My Algebra Teachers: Ralph Fox, John Fraleigh, Robert Gunning, John Kemeny, Bertram Kostant, Robert Langlands, Goro Shimura, Hale Trotter, Richard Williamson

CONTENTS

Contents of Advanced Algebra List of Figures Preface Dependence Among Chapters Standard Notation Guide for the Reader I.

II.

x xi xiii xvii xviii xix

PRELIMINARIES ABOUT THE INTEGERS, POLYNOMIALS, AND MATRICES 1. Division and Euclidean Algorithms 2. Unique Factorization of Integers 3. Unique Factorization of Polynomials 4. Permutations and Their Signs 5. Row Reduction 6. Matrix Operations 7. Problems

1 1 4 9 15 19 24 30

VECTOR SPACES OVER Q, R, AND C 1. Spanning, Linear Independence, and Bases 2. Vector Spaces Deﬁned by Matrices 3. Linear Maps 4. Dual Spaces 5. Quotients of Vector Spaces 6. Direct Sums and Direct Products of Vector Spaces 7. Determinants 8. Eigenvectors and Characteristic Polynomials 9. Bases in the Inﬁnite-Dimensional Case 10. Problems

33 33 38 42 50 54 58 65 73 77 82

III. INNER-PRODUCT SPACES 1. Inner Products and Orthonormal Sets 2. Adjoints 3. Spectral Theorem 4. Problems vii

88 88 98 104 111

viii

Contents

IV.

GROUPS AND GROUP ACTIONS 1. Groups and Subgroups 2. Quotient Spaces and Homomorphisms 3. Direct Products and Direct Sums 4. Rings and Fields 5. Polynomials and Vector Spaces 6. Group Actions and Examples 7. Semidirect Products 8. Simple Groups and Composition Series 9. Structure of Finitely Generated Abelian Groups 10. Sylow Theorems 11. Categories and Functors 12. Problems

116 117 128 134 140 147 158 166 170 174 183 188 198

V.

THEORY OF A SINGLE LINEAR TRANSFORMATION 1. Introduction 2. Determinants over Commutative Rings with Identity 3. Characteristic and Minimal Polynomials 4. Projection Operators 5. Primary Decomposition 6. Jordan Canonical Form 7. Computations with Jordan Form 8. Problems

209 209 212 216 224 226 229 235 239

VI.

MULTILINEAR ALGEBRA 1. Bilinear Forms and Matrices 2. Symmetric Bilinear Forms 3. Alternating Bilinear Forms 4. Hermitian Forms 5. Groups Leaving a Bilinear Form Invariant 6. Tensor Product of Two Vector Spaces 7. Tensor Algebra 8. Symmetric Algebra 9. Exterior Algebra 10. Problems

245 246 250 253 255 257 260 274 280 288 292

VII. ADVANCED GROUP THEORY 1. Free Groups 2. Subgroups of Free Groups 3. Free Products 4. Group Representations

303 303 314 319 326

Contents

VII. ADVANCED GROUP THEORY (Continued) 5. Burnside’s Theorem 6. Extensions of Groups 7. Problems

ix

342 344 357

VIII. COMMUTATIVE RINGS AND THEIR MODULES 1. Examples of Rings and Modules 2. Integral Domains and Fields of Fractions 3. Prime and Maximal Ideals 4. Unique Factorization 5. Gauss’s Lemma 6. Finitely Generated Modules 7. Orientation for Algebraic Number Theory and Algebraic Geometry 8. Noetherian Rings and the Hilbert Basis Theorem 9. Integral Closure 10. Localization and Local Rings 11. Dedekind Domains 12. Problems

367 367 378 381 384 390 396

IX.

448 449 453 457 460 464 469 476 479 483 486

FIELDS AND GALOIS THEORY 1. Algebraic Elements 2. Construction of Field Extensions 3. Finite Fields 4. Algebraic Closure 5. Geometric Constructions by Straightedge and Compass 6. Separable Extensions 7. Normal Extensions 8. Fundamental Theorem of Galois Theory 9. Application to Constructibility of Regular Polygons 10. Application to Proving the Fundamental Theorem of Algebra 11. Application to Unsolvability of Polynomial Equations with Nonsolvable Galois Group 12. Construction of Regular Polygons 13. Solution of Certain Polynomial Equations with Solvable Galois Group 14. Proof That π Is Transcendental 15. Norm and Trace 16. Splitting of Prime Ideals in Extensions 17. Two Tools for Computing Galois Groups 18. Problems

408 414 417 425 434 439

488 493 501 510 514 521 527 534

x

X.

Contents

MODULES OVER NONCOMMUTATIVE RINGS 1. Simple and Semisimple Modules 2. Composition Series 3. Chain Conditions 4. Hom and End for Modules 5. Tensor Product for Modules 6. Exact Sequences 7. Problems

APPENDIX A1. Sets and Functions A2. Equivalence Relations A3. Real Numbers A4. Complex Numbers A5. Partial Orderings and Zorn’s Lemma A6. Cardinality Hints for Solutions of Problems Selected References Index of Notation Index

CONTENTS OF ADVANCEDALGEBRA I. II. III. IV. V. VI. VII. VIII. IX. X.

Transition to Modern Number Theory Wedderburn–Artin Ring Theory Brauer Group Homological Algebra Three Theorems in Algebraic Number Theory Reinterpretation with Adeles and Ideles Inﬁnite Field Extensions Background for Algebraic Geometry The Number Theory of Algebraic Curves Methods of Algebraic Geometry

544 544 551 556 558 565 574 579 583 583 589 591 594 595 599 603 697 699 703

LIST OF FIGURES

2.1. The vector space of lines v + U in R2 parallel to a given line U through the origin 2.2. Factorization of linear maps via a quotient of vector spaces 2.3. Three 1-dimensional vector subspaces of R2 such that each pair has intersection 0 2.4. Universal mapping property of a direct product of vector spaces 2.5. Universal mapping property of a direct sum of vector spaces 3.1. Geometric interpretation of the parallelogram law 3.2. Resolution of a vector into a parallel component and an orthogonal component 4.1. Factorization of homomorphisms of groups via the quotient of a group by a normal subgroup 4.2. Universal mapping property of an external direct product of groups 4.3. Universal mapping property of a direct product of groups 4.4. Universal mapping property of an external direct sum of abelian groups 4.5. Universal mapping property of a direct sum of abelian groups 4.6. Factorization of homomorphisms of rings via the quotient of a ring by an ideal 4.7. Substitution homomorphism for polynomials in one indeterminate 4.8. Substitution homomorphism for polynomials in n indeterminates 4.9. A square diagram 4.10. Diagrams obtained by applying a covariant functor and a contravariant functor 4.11. Universal mapping property of a product in a category 4.12. Universal mapping property of a coproduct in a category 5.1. Example of a nilpotent matrix in Jordan form 5.2. Powers of the nilpotent matrix in Figure 5.1 6.1. Universal mapping property of a tensor product 6.2. Diagrams for uniqueness of a tensor product xi

55 56 62 64 65 91 93 132 136 136 138 139 146 150 156 193 193 195 197 231 232 261 261

xii

6.3. 6.4. 6.5. 7.1. 7.2. 7.3. 7.4. 8.1. 8.2. 8.3. 8.4. 8.5. 8.6. 8.7. 9.1. 9.2. 9.3. 9.4. 10.1.

List of Figures

Commutative diagram of a natural transformation {TX } Commutative diagram of a triple tensor product University mapping property of a tensor algebra Universal mapping property of a free group Universal mapping property of a free product An intertwining operator for two representations Equivalent group extensions Universal mapping property of the integral group ring of G Universal mapping property of a free left R module Factorization of R homomorphisms via a quotient of R modules Universal mapping property of the group algebra RG Universal mapping property of the ﬁeld of fractions of R Real points of the curve y 2 = (x − 1)x(x + 1) Universal mapping property of the localization of R at S Closure of positive constructible x coordinates under multiplication and division Closure of positive constructible x coordinates under square roots Construction of a regular pentagon Construction of a regular 17-gon Universal mapping property of a tensor product of a right R module and a left R module

265 274 279 305 320 330 349 371 374 376 378 380 409 428 465 466 496 500 566

PREFACE

Basic Algebra and its companion volume Advanced Algebra systematically develop concepts and tools in algebra that are vital to every mathematician, whether pure or applied, aspiring or established. These two books together aim to give the reader a global view of algebra, its use, and its role in mathematics as a whole. The idea is to explain what the young mathematician needs to know about algebra in order to communicate well with colleagues in all branches of mathematics. The books are written as textbooks, and their primary audience is students who are learning the material for the ﬁrst time and who are planning a career in which they will use advanced mathematics professionally. Much of the material in the books, particularly in Basic Algebra but also in some of the chapters of Advanced Algebra, corresponds to normal course work. The books include further topics that may be skipped in required courses but that the professional mathematician will ultimately want to learn by self-study. The test of each topic for inclusion is whether it is something that a plenary lecturer at a broad international or national meeting is likely to take as known by the audience. The key topics and features of Basic Algebra are as follows: • Linear algebra and group theory build on each other throughout the book. A small amount of linear algebra is introduced ﬁrst, as the topic likely to be better known by the reader ahead of time, and then a little group theory is introduced, with linear algebra providing important examples. • Chapters on linear algebra develop notions related to vector spaces, the theory of linear transformations, bilinear forms, classical linear groups, and multilinear algebra. • Chapters on modern algebra treat groups, rings, ﬁelds, modules, and Galois groups, including many uses of Galois groups and methods of computation. • Three prominent themes recur throughout and blend together at times: the analogy between integers and polynomials in one variable over a ﬁeld, the interplay between linear algebra and group theory, and the relationship between number theory and geometry. • The development proceeds from the particular to the general, often introducing examples well before a theory that incorporates them. • More than 400 problems at the ends of chapters illuminate aspects of the text, develop related topics, and point to additional applications. A separate xiii

xiv

Preface

90-page section “Hints for Solutions of Problems” at the end of the book gives detailed hints for most of the problems, complete solutions for many. • Applications such as the fast Fourier transform, the theory of linear errorcorrecting codes, the use of Jordan canonical form in solving linear systems of ordinary differential equations, and constructions of interest in mathematical physics arise naturally in sequences of problems at the ends of chapters and illustrate the power of the theory for use in science and engineering. Basic Algebra endeavors to show some of the interconnections between different areas of mathematics, beyond those listed above. Here are examples: Systems of orthogonal functions make an appearance with inner-product spaces. Covering spaces naturally play a role in the examination of subgroups of free groups. Cohomology of groups arises from considering group extensions. Use of the power-series expansion of the exponential function combines with algebraic numbers to prove that π is transcendental. Harmonic analysis on a cyclic group explains the mysterious method of Lagrange resolvents in the theory of Galois groups. Algebra plays a singular role in mathematics by having been developed so extensively at such an early date. Indeed, the major discoveries of algebra even from the days of Hilbert are well beyond the knowledge of most nonalgebraists today. Correspondingly most of the subject matter of the present book is at least 100 years old. What has changed over the intervening years concerning algebra books at this level is not so much the mathematics as the point of view toward the subject matter and the relative emphasis on and generality of various topics. For example, in the 1920s Emmy Noether introduced vector spaces and linear mappings to reinterpret coordinate spaces and matrices, and she deﬁned the ingredients of what was then called “modern algebra”—the axiomatically deﬁned rings, ﬁelds, and modules, and their homomorphisms. The introduction of categories and functors in the 1940s shifted the emphasis even more toward the homomorphisms and away from the objects themselves. The creation of homological algebra in the 1950s gave a unity to algebraic topics cutting across many ﬁelds of mathematics. Category theory underwent a period of great expansion in the 1950s and 1960s, followed by a contraction and a return more to a supporting role. The emphasis in topics shifted. Linear algebra had earlier been viewed as a separate subject, with many applications, while group theory and the other topics had been viewed as having few applications. Coding theory, cryptography, and advances in physics and chemistry have changed all that, and now linear algebra and group theory together permeate mathematics and its applications. The other subjects build on them, and they too have extensive applications in science and engineering, as well as in the rest of mathematics. Basic Algebra presents its subject matter in a forward-looking way that takes this evolution into account. It is suitable as a text in a two-semester advanced

Preface

xv

undergraduate or ﬁrst-year graduate sequence in algebra. Depending on the graduate school, it may be appropriate to include also some material from Advanced Algebra. Brieﬂy the topics in Basic Algebra are linear algebra and group theory, rings, ﬁelds, and modules. A full list of the topics in Advanced Algebra appears on page x; of these, the Wedderburn theory of semisimple algebras, homological algebra, and foundational material for algebraic geometry are the ones that most commonly appear in syllabi of ﬁrst-year graduate courses. A chart on page xvii tells the dependence among chapters and can help with preparing a syllabus. Chapters I–VII treat linear algebra and group theory at various levels, except that three sections of Chapter IV and one of Chapter V introduce rings and ﬁelds, polynomials, categories and functors, and determinants over commutative rings with identity. Chapter VIII concerns rings, with emphasis on unique factorization; Chapter IX concerns ﬁeld extensions and Galois theory, with emphasis on applications of Galois theory; and Chapter X concerns modules and constructions with modules. For a graduate-level sequence the syllabus is likely to include all of Chapters I–V and parts of Chapters VIII and IX, at a minimum. Depending on the knowledge of the students ahead of time, it may be possible to skim much of the ﬁrst three chapters and some of the beginning of the fourth; then time may allow for some of Chapters VI and VII, or additional material from Chapters VIII and IX, or some of the topics in Advanced Algebra. For many of the topics in Advanced Algebra, parts of Chapter X of Basic Algebra are prerequisite. For an advanced undergraduate sequence the ﬁrst semester can include Chapters I through III except Section II.9, plus the ﬁrst six sections of Chapter IV and as much as reasonable from Chapter V; the notion of category does not appear in this material. The second semester will involve categories very gently; the course will perhaps treat the remainder of Chapter IV, the ﬁrst ﬁve or six sections of Chapter VIII, and at least Sections 1–3 and 5 of Chapter IX. More detailed information about how the book can be used with courses can be deduced by using the chart on page xvii in conjunction with the section “Guide for the Reader” on pages xix–xxii. In my own graduate teaching, I have built one course around Chapters I–III, Sections 1–6 of Chapter IV, all of Chapter V, and about half of Chapter VI. A second course dealt with the remainder of Chapter IV, a little of Chapter VII, Sections 1–6 of Chapter VIII, and Sections 1–11 of Chapter IX. The problems at the ends of chapters are intended to play a more important role than is normal for problems in a mathematics book. Almost all problems are solved in the section of hints at the end of the book. This being so, some blocks of problems form additional topics that could have been included in the text but were not; these blocks may either be regarded as optional topics, or they may be treated as challenges for the reader. The optional topics of this kind

xvi

Preface

usually either carry out further development of the theory or introduce signiﬁcant applications. For example one block of problems at the end of Chapter VII carries the theory of representations of ﬁnite groups a little further by developing the Poisson summation formula and the fast Fourier transform. For a second example blocks of problems at the ends of Chapters IV, VII, and IX introduce linear error-correcting codes as an application of the theory in those chapters. Not all problems are of this kind, of course. Some of the problems are really pure or applied theorems, some are examples showing the degree to which hypotheses can be stretched, and a few are just exercises. The reader gets no indication which problems are of which type, nor of which ones are relatively easy. Each problem can be solved with tools developed up to that point in the book, plus any additional prerequisites that are noted. Beyond a standard one-variable calculus course, the most important prerequisite for using Basic Algebra is that the reader already know what a proof is, how to read a proof, and how to write a proof. This knowledge typically is obtained from honors calculus courses, or from a course in linear algebra, or from a ﬁrst junior–senior course in real variables. In addition, it is assumed that the reader is comfortable with a small amount of linear algebra, including matrix computations, row reduction of matrices, solutions of systems of linear equations, and the associated geometry. Some prior exposure to groups is helpful but not really necessary. The theorems, propositions, lemmas, and corollaries within each chapter are indexed by a single number stream. Figures have their own number stream, and one can ﬁnd the page reference for each ﬁgure from the table on pages xi–xii. Labels on displayed lines occur only within proofs and examples, and they are local to the particular proof or example in progress. Some readers like to skim or skip proofs on ﬁrst reading; to facilitate this procedure, each occurrence of the word “PROOF” or “PROOF” is matched by an occurrence at the right margin of the symbol to mark the end of that proof. I am grateful to Ann Kostant and Steven Krantz for encouraging this project and for making many suggestions about pursuing it. I am especially indebted to an anonymous referee, who made detailed comments about many aspects of a preliminary version of the book, and to David Kramer, who did the copyediting. The typesetting was by AMS-TEX, and the ﬁgures were drawn with Mathematica. I invite corrections and other comments from readers. I plan to maintain a list of known corrections on my own Web page. A. W. KNAPP August 2006

DEPENDENCE AMONG CHAPTERS

Below is a chart of the main lines of dependence of chapters on prior chapters. The dashed lines indicate helpful motivation but no logical dependence. Apart from that, particular examples may make use of information from earlier chapters that is not indicated by the chart.

I, II

III

IV.1–IV.6

IV.7–IV.11

VII

V

VI

VIII.1–VIII.6

X

IX.1–IX.13

VIII.7–VIII.11

IX.14–IX.17

xvii

STANDARD NOTATION

See the Index of Notation, pp. 699–701, for symbols deﬁned starting on page 1. Item

Meaning

#S or |S| ∅ {x ∈ E | P} Ec E ∪ F, E∩ F, E − F α Eα , α Eα E ⊆ F, E ⊇ F E F, E F E × F, ×s∈S X s (a1 , . . . , an ), {a1 , . . . , an } f : E → F, x → f (x) f ◦ g or f g, f E f ( · , y) f (E), f −1 (E) δi j

number of elements in S empty set the set of x in E such that P holds complement of the set E union, intersection, difference of sets union, intersection of the sets E α E is contained in F, E contains F E properly contained in F, properly contains F products of sets ordered n-tuple, unordered n-tuple function, effect of function composition of g followed by f , restriction to E the function x → f (x, y) direct and inverse image of a set Kronecker delta: 1 if i = j, 0 if i = j binomial coefﬁcient n > 0, n < 0 integers, rationals, reals, complex numbers maximum of a ﬁnite subset of a totally ordered set sum or product, possibly with a limit operation ﬁnite or in one-one correspondence with Z greatest integer ≤ x if x is real real and imaginary parts of complex z complex conjugate of z absolute value of z multiplicative identity identity matrix or operator identity function on X spaces of column vectors diagonal matrix is isomorphic to, is equivalent to

n k

n positive, n negative Z, Q, R, C max (and similarly min) or countable [x] Re z, Im z z¯ |z| 1 1 or I 1X Qn , Rn , Cn diag(a1 , . . . , an ) ∼ =

xviii

GUIDE FOR THE READER

This section is intended to help the reader ﬁnd out what parts of each chapter are most important and how the chapters are interrelated. Further information of this kind is contained in the abstracts that begin each of the chapters. The book pays attention to at least three recurring themes in algebra, allowing a person to see how these themes arise in increasingly sophisticated ways. These are the analogy between integers and polynomials in one indeterminate over a ﬁeld, the interplay between linear algebra and group theory, and the relationship between number theory and geometry. Keeping track of how these themes evolve will help the reader understand the mathematics better and anticipate where it is headed. In Chapter I the analogy between integers and polynomials in one indeterminate over the rationals, reals, or complex numbers appears already in the ﬁrst three sections. The main results of these sections are theorems about unique factorization in each of the two settings. The relevant parts of the underlying structures for the two settings are the same, and unique factorization can therefore be proved in both settings by the same argument. Many readers will already know this unique factorization, but it is worth examining the parallel structure and proof at least quickly before turning to the chapters that follow. Before proceeding very far into the book, it is worth looking also at the appendix to see whether all its topics are familiar. Readers will ﬁnd Section A1 useful at least for its summary of set-theoretic notation and for its emphasis on the distinction between range and image for a function. This distinction is usually unimportant in analysis but becomes increasingly important as one studies more advanced topics in algebra. Readers who have not speciﬁcally learned about equivalence relations and partial orderings can learn about them from Sections A2 and A5. Sections A3 and A4 concern the real and complex numbers; the emphasis is on notation and the Intermediate Value Theorem, which plays a role in proving the Fundamental Theorem of Algebra. Zorn’s Lemma and cardinality in Sections A5 and A6 are usually unnecessary in an undergraduate course. They arise most importantly in Sections II.9 and IX.4, which are normally omitted in an undergraduate course, and in Proposition 8.8, which is invoked only in the last few sections of Chapter VIII. The remainder of this section is an overview of individual chapters and pairs of chapters. xix

xx

Guide for the Reader

Chapter I is in three parts. The ﬁrst part, as mentioned above, establishes unique factorization for the integers and for polynomials in one indeterminate over the rationals, reals, or complex numbers. The second part deﬁnes permutations and shows that they have signs such that the sign of any composition is the product of the signs; this result is essential for deﬁning general determinants in Section II.7. The third part will likely be a review for all readers. It establishes notation for row reduction of matrices and for operations on matrices, and it uses row reduction to show that a one-sided inverse for a square matrix is a two-sided inverse. Chapters II–III treat the fundamentals of linear algebra. Whereas the matrix computations in Chapter I were concrete, Chapters II–III are relatively abstract. Much of this material is likely to be a review for graduate students. The geometric interpretation of vectors spaces, subspaces, and linear mappings is not included in the chapter, being taken as known previously. The fundamental idea that a newly constructed object might be characterized by a “universal mapping property” appears for the ﬁrst time in Chapter II, and it appears more and more frequently throughout the book. One aspect of this idea is that it is sometimes not so important what certain constructed objects are, but what they do. A related idea being emphasized is that the mappings associated with a newly constructed object are likely to be as important as the object, if not more so; at the least, one needs to stop and ﬁnd what those mappings are. Section II.9 uses Zorn’s Lemma and can be deferred until Chapter IX if one wants. Chapter III discusses special features of real and complex vector spaces endowed with inner products. The main result is the Spectral Theorem in Section 3. Many of the problems at the end of the chapter make contact with real analysis. The subject of linear algebra continues in Chapter V. Chapter IV is the primary chapter on group theory and may be viewed as in three parts. Sections 1–6 form the ﬁrst part, which is essential for all later chapters in the book. Sections 1–3 introduce groups and some associated constructions, along with a number of examples. Many of the examples will be seen to be related to speciﬁc or general vector spaces, and thus the theme of the interplay between group theory and linear algebra is appearing concretely for the ﬁrst time. In practice, many examples of groups arise in the context of group actions, and abstract group actions are deﬁned in Section 6. Of particular interest are group representations, which are group actions on a vector space by linear mappings. Sections 4–5 are a digression to deﬁne rings, ﬁelds, and ring homomorphisms, and to extend the theories concerning polynomials and vector spaces as presented in Chapters I–II. The immediate purpose of the digression is to make prime ﬁelds, their associated multiplicative groups, and the notion of characteristic available for the remainder of the chapter. The deﬁnition of vector space is extended to allow scalars from any ﬁeld. The deﬁnition of polynomial is extended to allow coefﬁcients from any commutative ring with identity, rather than just the

Guide for the Reader

xxi

rationals or reals or complex numbers, and to allow more than one indeterminate. Universal mapping properties for polynomial rings are proved. Sections 7–10 form the second part of the chapter and are a continuation of group theory. The main result is the Fundamental Theorem of Finitely Generated Abelian Groups, which is in Section 9. Section 11 forms the third part of the chapter. This section is a gentle introduction to categories and functors, which are useful for working with parallel structures in different settings within algebra. As S. Mac Lane says in his book, “Category theory asks of every type of Mathematical object: ‘What are the morphisms?’; it suggests that these morphisms should be described at the same time as the objects. . . . This emphasis on (homo)morphisms is largely due to Emmy Noether, who emphasized the use of homomorphisms of groups and rings.” The simplest parallel structure reﬂected in categories is that of an isomorphism. The section also discusses general notions of product and coproduct functors. Examples of products are direct products in linear algebra and in group theory. Examples of coproducts are direct sums in linear algebra and in abelian group theory, as well as disjoint unions in set theory. The theory in this section helps in unifying the mathematics that is to come in Chapters VI–VIII and X. The subject of group theory in continued in Chapter VII, which assumes knowledge of the material on category theory. Chapters V and VI continue the development of linear algebra. Chapter VI uses categories, but Chapter V does not. Most of Chapter V concerns the analysis of a linear transformation carrying a ﬁnite-dimensional vector space over a ﬁeld into itself. The questions are to ﬁnd invariants of such transformations and to classify the transformations up to similarity. Section 2 at the start extends the theory of determinants so that the matrices are allowed to have entries in a commutative ring with identity; this extension is necessary in order to be able to work easily with characteristic polynomials. The extension of this theory is carried out by an important principle known as the “permanence of identities.” Chapter VI largely concerns bilinear forms and tensor products, again in the context that the coefﬁcients are from a ﬁeld. This material is necessary in many applications to geometry and physics, but it is not needed in Chapters VII–IX. Many objects in the chapter are constructed in such a way that they are uniquely determined by a universal mapping property. Problems 18–22 at the end of the chapter discuss universal mapping properties in the general context of category theory, and they show that a uniqueness theorem is automatic in all cases. Chapter VII continues the development of group theory, making use of category theory. It is in two parts. Sections 1–3 concern free groups and the topic of generators and relations; they are essential for abstract descriptions of groups and for work in topology involving fundamental groups. Section 3 constructs a notion of free product and shows that it is the coproduct functor for the category of groups. Sections 4–6 continue the theme of the interplay of group theory and

xxii

Guide for the Reader

linear algebra. Section 4 analyzes group representations of a ﬁnite group when the underlying ﬁeld is the complex numbers, and Section 5 applies this theory to obtain a conclusion about the structure of ﬁnite groups. Section 6 studies extensions of groups and uses them to motivate the subject of cohomology of groups. Chapter VIII introduces modules, giving many examples in Section 1, and then goes on to discuss questions of unique factorization in integral domains. Section 6 obtains a generalization for principal ideal domains of the Fundamental Theorem of Finitely Generated Abelian Groups, once again illustrating the ﬁrst theme—similarities between the integers and certain polynomial rings. Section 7 introduces the third theme, the relationship between number theory and geometry, as a more sophisticated version of the ﬁrst theme. The section compares a certain polynomial ring in two variables with a certain ring of algebraic integers that extends the ordinary integers. Unique factorization of elements fails for both, but the geometric setting has a more geometrically meaningful factorization in terms of ideals that is evidently unique. This kind of unique factorization turns out to work for the ring of algebraic integers as well. Sections 8–11 expand the examples in Section 7 into a theory of unique factorization of ideals in any integrally closed Noetherian domain whose nonzero prime ideals are all maximal. Chapter IX analyzes algebraic extensions of ﬁelds. The ﬁrst 13 sections make use only of Sections 1–6 in Chapter VIII. Sections 1–5 of Chapter IX give the foundational theory, which is sufﬁcient to exhibit all the ﬁnite ﬁelds and to prove that certain classically proposed constructions in Euclidean geometry are impossible. Sections 6–8 introduce Galois theory, but Theorem 9.28 and its three corollaries may be skipped if Sections 14–17 are to be omitted. Sections 9–11 give a ﬁrst round of applications of Galois theory: Gauss’s theorem about which regular n-gons are in principle constructible with straightedge and compass, the Fundamental Theorem of Algebra, and the Abel–Galois theorem that solvability of a polynomial equation with rational coefﬁcients in terms of radicals implies solvability of the Galois group. Sections 12–13 give a second round of applications: Gauss’s method in principle for actually constructing the constructible regular n-gons and a converse to the Abel–Galois theorem. Sections 14–17 make use of Sections 7–11 of Chapter VIII, proving that π is transcendental and obtaining two methods for computing Galois groups. Chapter X is a relatively short chapter developing further tools for dealing with modules over a ring with identity. The main construction is that of the tensor product over a ring of a unital right module and a unital left module, the result being an abelian group. The chapter makes use of material from Chapters VI and VIII, but not from Chapter IX.

Basic Algebra

CHAPTER I Preliminaries about the Integers, Polynomials, and Matrices

Abstract. This chapter is mostly a review, discussing unique factorization of positive integers, unique factorization of polynomials whose coefﬁcients are rational or real or complex, signs of permutations, and matrix algebra. Sections 1–2 concern unique factorization of positive integers. Section 1 proves the division and Euclidean algorithms, used to compute greatest common divisors. Section 2 establishes unique factorization as a consequence and gives several number-theoretic consequences, including the Chinese Remainder Theorem and the evaluation of the Euler ϕ function. Section 3 develops unique factorization of rational and real and complex polynomials in one indeterminate completely analogously, and it derives the complete factorization of complex polynomials from the Fundamental Theorem of Algebra. The proof of the fundamental theorem is postponed to Chapter IX. Section 4 discusses permutations of a ﬁnite set, establishing the decomposition of each permutation as a disjoint product of cycles. The sign of a permutation is introduced, and it is proved that the sign of a product is the product of the signs. Sections 5–6 concern matrix algebra. Section 5 reviews row reduction and its role in the solution of simultaneous linear equations. Section 6 deﬁnes the arithmetic operations of addition, scalar multiplication, and multiplication of matrices. The process of matrix inversion is related to the method of row reduction, and it is shown that a square matrix with a one-sided inverse automatically has a two-sided inverse that is computable via row reduction.

1. Division and Euclidean Algorithms The ﬁrst three sections give a careful proof of unique factorization for integers and for polynomials with rational or real or complex coefﬁcients, and they give an indication of some ﬁrst consequences of this factorization. For the moment let us restrict attention to the set Z of integers. We take addition, subtraction, and multiplication within Z as established, as well as the properties of the usual ordering in Z. A factor of an integer n is a nonzero integer k such that n = kl for some integer l. In this case we say also that k divides n, that k is a divisor of n, and that n is a multiple of k. We write k | n for this relationship. If n is nonzero, any product formula n = kl1 · · · lr is a factorization of n. A unit in Z is a divisor 1

I. Preliminaries about the Integers, Polynomials, and Matrices

2

of 1, hence is either +1 or −1. The factorization n = kl of n = 0 is called nontrivial if neither k nor l is a unit. An integer p > 1 is said to be prime if it has no nontrivial factorization p = kl. The statement of unique factorization for positive integers, which will be given precisely in Section 2, says roughly that each positive integer is the product of primes and that this decomposition is unique apart from the order of the factors.1 Existence will follow by an easy induction. The difﬁculty is in the uniqueness. We shall prove uniqueness by a sequence of steps based on the “Euclidean algorithm,” which we discuss in a moment. In turn, the Euclidean algorithm relies on the following. Proposition 1.1 (division algorithm). If a and b are integers with b = 0, then there exist unique integers q and r such that a = bq + r and 0 ≤ r < |b|. PROOF. Possibly replacing q by −q, we may assume that b > 0. The integers n with bn ≤ a are bounded above by |a|, and there exists such an n, namely n = −|a|. Therefore there is a largest such integer, say n = q. Set r = a − bq. Then 0 ≤ r and a = bq + r . If r ≥ b, then r − b ≥ 0 says that a = b(q + 1) + (r − b) ≥ b(q + 1). The inequality q + 1 > q contradicts the maximality of q, and we conclude that r < b. This proves existence. For uniqueness when b > 0, suppose a = bq1 + r1 = bq2 + r2 . Subtracting, we obtain b(q1 − q2 ) = r2 − r1 with |r2 − r1 | < b, and this is a contradiction unless r2 − r1 = 0. Let a and b be integers not both 0. The greatest common divisor of a and b is the largest integer d > 0 such that d | a and d | b. Let us see existence. The integer 1 divides a and b. If b, for example, is nonzero, then any such d has |d| ≤ |b|, and hence the greatest common divisor indeed exists. We write d = GCD(a, b). Let us suppose that b = 0. The Euclidean algorithm consists of iterated application of the division algorithm (Proposition 1.1) to a and b until the remainder term r disappears: a = bq1 + r1 ,

0 ≤ r1 < b,

b = r 1 q2 + r 2 ,

0 ≤ r2 < r1 ,

r 1 = r 2 q3 + r 3 , .. .

0 ≤ r3 < r2 ,

rn−2 = rn−1 qn + rn ,

0 ≤ rn < rn−1

(with rn = 0, say),

rn−1 = rn qn+1 . 1 It

is to be understood that the prime factorization of 1 is as the empty product.

1. Division and Euclidean Algorithms

3

The process must stop with some remainder term rn+1 equal to 0 in this way since b > r1 > r2 > · · · ≥ 0. The last nonzero remainder term, namely rn above, will be of interest to us. EXAMPLE. For a = 13 and b = 5, the steps read 13 = 5 · 2 + 3, 5 = 3 · 1 + 2, 3=2·1+ 1 , 2 = 1 · 2. The last nonzero remainder term is written with a box around it. Proposition 1.2. Let a and b be integers with b = 0, and let d = GCD(a, b). Then (a) the number rn in the Euclidean algorithm is exactly d, (b) any divisor d of both a and b necessarily divides d, (c) there exist integers x and y such that ax + by = d. EXAMPLE, CONTINUED. We rewrite the steps of the Euclidean algorithm, as applied in the above example with a = 13 and b = 5, so as to yield successive substitutions: 13 = 5 · 2 + 3,

3 = 13 − 5 · 2,

5 = 3 · 1 + 2,

2 = 5 − 3 · 1 = 5 − (13 − 5 · 2) · 1 = 5 · 3 − 13 · 1,

3=2·1+ 1 ,

1 = 3 − 2 · 1 = (13 − 5 · 2) − (5 · 3 − 13 · 1) · 1 = 13 · 2 − 5 · 5.

Thus we see that 1 = 13x + 5y with x = 2 and y = −5. This shows for the example that the number rn works in place of d in Proposition 1.2c, and the rest of the proof of the proposition for this example is quite easy. Let us now adjust this computation to obtain a complete proof of the proposition in general. PROOF OF PROPOSITION 1.2. Put r0 = b and r−1 = a, so that rk−2 = rk−1 qk + rk

for 1 ≤ k ≤ n.

(∗)

The argument proceeds in three steps. Step 1. We show that rn is a divisor of both a and b. In fact, from rn−1 = rn qn+1 , we have rn | rn−1 . Let k ≤ n, and assume inductively that rn divides

4

I. Preliminaries about the Integers, Polynomials, and Matrices

rk−1 , . . . , rn−1 , rn . Then (∗) shows that rn divides rk−2 . Induction allows us to conclude that rn divides r−1 , r0 , . . . , rn−1 . In particular, rn divides a and b. Step 2. We prove that ax + by = rn for suitable integers x and y. In fact, we show by induction on k for k ≤ n that there exist integers x and y with ax + by = rk . For k = −1 and k = 0, this conclusion is trivial. If k ≥ 1 is given and if the result is known for k − 2 and k − 1, then we have ax2 + by2 = rk−2 , ax1 + by1 = rk−1

(∗∗)

for suitable integers x2 , y2 , x1 , y1 . We multiply the second of the equalities of (∗∗) by qk , subtract, and substitute into (∗). The result is rk = rk−2 − rk−1 qk = a(x2 − qk x1 ) + b(y2 − qk y1 ), and the induction is complete. Thus ax + by = rn for suitable x and y. Step 3. Finally we deduce (a), (b), and (c). Step 1 shows that rn divides a and b. If d > 0 divides both a and b, the result of Step 2 shows that d | rn . Thus d ≤ rn , and rn is the greatest common divisor. This is the conclusion of (a); (b) follows from (a) since d | rn , and (c) follows from (a) and Step 2. Corollary 1.3. Within Z, if c is a nonzero integer that divides a product mn and if GCD(c, m) = 1, then c divides n. PROOF. Proposition 1.2c produces integers x and y with cx + my = 1. Multiplying by n, we obtain cnx + mny = n. Since c divides mn and divides itself, c divides both terms on the left side. Therefore it divides the right side, which is n. Corollary 1.4. Within Z, if a and b are nonzero integers with GCD(a, b) = 1 and if both of them divide the integer m, then ab divides m. PROOF. Proposition 1.2c produces integers x and y with ax + by = 1. Multiplying by m, we obtain amx + bmy = m, which we rewrite in integers as ab(m/b)x + ab(m/a)y = m. Since ab divides each term on the left side, it divides the right side, which is m.

2. Unique Factorization of Integers We come now to the theorem asserting unique factorization for the integers. The precise statement is as follows.

2. Unique Factorization of Integers

5

Theorem 1.5 (Fundamental Theorem of Arithmetic). Each positive integer n can be written as a product of primes, n = p1 p2 · · · pr , with the integer 1 being written as an empty product. This factorization is unique in the following sense: if n = q1 q2 · · · qs is another such factorization, then r = s and, after some reordering of the factors, q j = p j for 1 ≤ j ≤ r . The main step is the following lemma, which relies on Corollary 1.3. Lemma 1.6. Within Z, if p is a prime and p divides a product ab, then p divides a or p divides b. PROOF. Suppose that p does not divide a. Since p is prime, GCD(a, p) = 1. Taking m = a, n = b, and c = p in Corollary 1.3, we see that p divides b. PROOF OF EXISTENCE IN THEOREM 1.5. We induct on n, the case n = 1 being handled by an empty product expansion. If the result holds for k = 1 through k = n − 1, there are two cases: n is prime and n is not prime. If n is prime, then n = n is the desired factorization. Otherwise we can write n = ab nontrivially with a > 1 and b > 1. Then a ≤ n − 1 and b ≤ n − 1, so that a and b have factorizations into primes by the inductive hypothesis. Putting them together yields a factorization into primes for n = ab. PROOF OF UNIQUENESS IN THEOREM 1.5. Suppose that n = p1 p2 · · · pr = q1 q2 · · · qs with all factors prime and with r ≤ s. We prove the uniqueness by induction on r , the case r = 0 being trivial and the case r = 1 following from the deﬁnition of “prime.” Inductively from Lemma 1.6 we have pr | qk for some k. Since qk is prime, pr = qk . Thus we can cancel and obtain p1 p2 · · · pr −1 = q1 q2 · · · qk · · · qs , the hat indicating an omitted factor. By induction the factors on the two sides here are the same except for order. Thus the same conclusion is valid when comparing the two sides of the equality p1 p2 · · · pr = q1 q2 · · · qs . The induction is complete, and the desired uniqueness follows. In the product expansion of Theorem 1.5, it is customary to group factors that are equal, thus writing the positive integer n as n = p1k1 · · · prkr with the primes p j distinct and with the integers k j all ≥ 0. This kind of decomposition is unique k up to order if all factors p j j with k j = 0 are dropped, and we call it a prime factorization of n. Corollary 1.7. If n = p1k1 · · · prkr is a prime factorization of a positive integer n, then the positive divisors d of n are exactly all products d = p1l1 · · · prlr with 0 ≤ l j ≤ k j for all j. REMARK. A general divisor of n within Z is the product of a unit ±1 and a positive divisor.

6

I. Preliminaries about the Integers, Polynomials, and Matrices

PROOF. Certainly any such product divides n. Conversely if d divides n, write n = d x for some positive integer x. Apply Theorem 1.5 to d and to x, form the resulting prime factorizations, and multiply them together. Then we see from the uniqueness for the prime factorization of n that the only primes that can occur in the expansions of d and x are p1 , . . . , pr and that the sum of the exponents of p j in the expansions of d and x is k j . The result follows. If we want to compare prime factorizations for two positive integers, we can insert 0th powers of primes as necessary and thereby assume that the same primes appear in both expansions. Using this device, we obtain a formula for greatest common divisors. Corollary 1.8. If two positive integers a and b have expansions as products of powers of r distinct primes given by a = p1k1 · · · prkr and b = p1l1 · · · prlr , then GCD(a, b) = p1min(k1 ,l1 ) · · · prmin(kr ,lr ) . PROOF. Let d be the right side of the displayed equation. It is plain that d is positive and that d divides a and b. On the other hand, two applications of Corollary 1.7 show that the greatest common divisor of a and b is a number d of the form p1m 1 · · · prm r with the property that m j ≤ k j and m j ≤ l j for all j. Therefore m j ≤ min(k j , l j ) for all j, and d ≤ d . Since any positive divisor of both a and b is ≤ d, we have d ≤ d. Thus d = d. In special cases Corollary 1.8 provides a useful way to compute GCD(a, b), but the Euclidean algorithm is usually a more efﬁcient procedure. Nevertheless, Corollary 1.8 remains a handy tool for theoretical purposes. Here is an example: Two nonzero integers a and b are said to be relatively prime if GCD(a, b) = 1. It is immediate from Corollary 1.8 that two nonzero integers a and b are relatively prime if and only if there is no prime p that divides both a and b. Corollary 1.9 (Chinese Remainder Theorem). Let a and b be positive relatively prime integers. To each pair (r, s) of integers with 0 ≤ r < a and 0 ≤ s < b corresponds a unique integer n such that 0 ≤ n < ab, a divides n − r , and b divides n − s. Moreover, every integer n with 0 ≤ n < ab arises from some such pair (r, s). REMARK. In notation for congruences that we introduce formally in Chapter IV, the result says that if GCD(a, b) = 1, then the congruences n ≡ r mod a and n ≡ s mod b have one and only one simultaneous solution n with 0 ≤ n < ab.

2. Unique Factorization of Integers

7

PROOF. Let us see that n exists as asserted. Since a and b are relatively prime, Proposition 1.2c produces integers x and y such that ax − by = 1. Multiplying by s − r , we obtain ax − by = s − r for suitable integers x and y. Put n = ax + r = by + s, and write by the division algorithm (Proposition 1.1) n = abq + n for some integer q and for some integer n with 0 ≤ n < ab. Then n − r = n − abq − r = ax − abq is divisible by a, and similarly n − s is divisible by b. Suppose that n and n both have the asserted properties. Then a divides n − n = (n − r ) − (n − r ), and b divides n − n = (n − s) − (n − s). Since a and b are relatively prime, Corollary 1.4 shows that ab divides n − n . But |n − n | < ab, and the only integer N with |N | < ab that is divisible by ab is N = 0. Thus n − n = 0 and n = n . This proves uniqueness. Finally the argument just given deﬁnes a one-one function from a set of ab pairs (r, s) to a set of ab elements n. Its image must therefore be all such integers n. This proves the corollary. If n is a positive integer, we deﬁne ϕ(n) to be the number of integers k with 0 ≤ k < n such that k and n are relatively prime. The function ϕ is called the Euler ϕ function. Corollary 1.10. Let N > 1 be an integer, and let N = p1k1 · · · prkr be a prime factorization of N . Then ϕ(N ) =

r

k −1

pj j

( p j − 1).

j=1

REMARK. The conclusion is valid also for N = 1 if we interpret the right side of the formula to be the empty product. PROOF. For positive integers a and b, let us check that ϕ(ab) = ϕ(a)ϕ(b)

if

GCD(a, b) = 1.

(∗)

In view of Corollary 1.9, it is enough to prove that the mapping (r, s) → n given in that corollary has the property that GCD(r, a) = GCD(s, b) = 1 if and only if GCD(n, ab) = 1. To see this property, suppose that n satisﬁes 0 ≤ n < ab and GCD(n, ab) > 1. Choose a prime p dividing both n and ab. By Lemma 1.6, p divides a or p divides b. By symmetry we may assume that p divides a. If (r, s) is the pair corresponding to n under Corollary 1.9, then the corollary says that a divides n − r . Since p divides a, p divides n − r . Since p divides n, p divides r . Thus GCD(r, a) > 1. Conversely suppose that (r, s) is a pair with 0 ≤ r < a and 0 ≤ s < b such that GCD(r, a) = GCD(s, b) = 1 is false. Without loss of generality, we may

8

I. Preliminaries about the Integers, Polynomials, and Matrices

assume that GCD(r, a) > 1. Choose a prime p dividing both r and a. If n is the integer with 0 ≤ n < ab that corresponds to (r, s) under Corollary 1.9, then the corollary says that a divides n − r . Since p divides a, p divides n − r . Since p divides r , p divides n. Thus GCD(n, ab) > 1. This completes the proof of (∗). For a power p k of a prime p with k > 0, the integers n with 0 ≤ n < p k such that GCD(n, p k ) > 1 are the multiples of p, namely 0, p, 2 p, . . . , p k − p. There are p k−1 of them. Thus the number of integers n with 0 ≤ n < p k such that GCD(n, p k ) = 1 is p k − p k−1 = p k−1 ( p − 1). In other words, ϕ( p k ) = p k−1 ( p − 1)

if p is prime and k ≥ 1.

(∗∗)

To prove the corollary, we induct on r , the case r = 1 being handled by (∗∗). If the formula of the corollary is valid for r − 1, then (∗) allows us to combine that result with the formula for ϕ( p kr ) given in (∗∗) to obtain the formula for ϕ(N ). We conclude this section by extending the notion of greatest common divisor to apply to more than two integers. If a1 , . . . , at are integers not all 0, their greatest common divisor is the largest integer d > 0 that divides all of a1 , . . . , at . This exists, and we write d = GCD(a1 , . . . , at ) for it. It is immediate that d equals the greatest common divisor of the nonzero members of the set {a1 , . . . , at }. Thus, in deriving properties of greatest common divisors, we may assume that all the integers are nonzero. Corollary 1.11. Let a1 , . . . , at be positive integers, and let d be their greatest common divisor. Then k k (a) if for each j with 1 ≤ j ≤ t, a j = p11, j · · · pr r, j is an expansion of a j as a product of powers of r distinct primes p1 , . . . , pr , it follows that min1≤ j≤r {k1, j }

d = p1

min1≤ j≤r {kr, j }

· · · pr

,

divides d, (b) any divisor d of all of a1 , . . . , at necessarily (c) d = GCD GCD(a1 , . . . , at−1 ), at if t > 1, (d) there exist integers x1 , . . . , xt such that a1 x1 + · · · + at xt = d. PROOF. Part (a) is proved in the same way as Corollary 1.8 except that Corollary 1.7 is to be applied r times rather than just twice. Further application of Corollary 1.7 shows that any positive divisor d of a1 , . . . , at is of the form d = p1m 1 · · · prm r with m 1 ≤ k1, j for all j, . . . , and with m r ≤ kr, j for all j. Therefore m 1 ≤ min1≤ j≤r {k1, j }, . . . , and m r ≤ min1≤ j≤r {kr, j }, and it follows that d divides d. This proves (b). Conclusion (c) follows by using the formula in (a), and (d) follows by combining (c), Proposition 1.2c, and induction.

3. Unique Factorization of Polynomials

9

3. Unique Factorization of Polynomials This section establishes unique factorization for ordinary rational, real, and complex polynomials. We write Q for the set of rational numbers, R for the set of real numbers, and C for the set of complex numbers, each with its arithmetic operations. The rational numbers are constructed from the integers by a process reviewed in Section A3 of the appendix, the real numbers are deﬁned from the rational numbers by a process reviewed in that same section, and the complex numbers are deﬁned from the real numbers by a process reviewed in Section A4 of the appendix. Sections A3 and A4 of the appendix mention special properties of R and C beyond those of the arithmetic operations, but we shall not make serious use of these special properties here until nearly the end of the section— after unique factorization of polynomials has been established. Let F denote any of Q, R, or C. The members of F are called scalars. We work with ordinary polynomials with coefﬁcients in F. Informally these are expressions P(X ) = an X n +· · ·+a1 X +a0 with an , . . . , a1 , a0 in F. Although it is tempting to think of P(X ) as a function with independent variable X , it is better to identify P with the sequence (a0 , a1 , . . . , an , 0, 0, . . . ) of coefﬁcients, using expressions P(X ) = an X n + · · · + a1 X + a0 only for conciseness and for motivation of the deﬁnitions of various operations. The precise deﬁnition therefore is that a polynomial in one indeterminate with coefﬁcients in F is an inﬁnite sequence of members of F such that all terms of the sequence are 0 from some point on. The indexing of the sequence is to begin with 0. We may refer to a polynomial P as P(X ) if we want to emphasize that the indeterminate is called X . Addition, subtraction, and scalar multiplication are deﬁned in coordinate-by-coordinate fashion: (a0 , a1 , . . . , an , 0, 0, . . . ) + (b0 ,b1 , . . . , bn , 0, 0, . . . ) = (a0 + b0 , a1 + b1 , . . . , an + bn , 0, 0, . . . ), (a0 , a1 , . . . , an , 0, 0, . . . ) − (b0 ,b1 , . . . , bn , 0, 0, . . . ) = (a0 − b0 , a1 − b1 , . . . , an − bn , 0, 0, . . . ), c(a0 , a1 , . . . , an , 0, 0, . . . ) = (ca0 , ca1 , . . . , can , 0, 0, . . . ). Polynomial multiplication is deﬁned so as to match multiplication of expressions an X n + · · · + a1 X + a0 if the product is expanded out, powers of X are added, and then terms containing like powers of X are collected: (a0 , a1 , . . . , 0, 0, . . . )(b0 , b1 , . . . , 0, 0, . . . ) = (c0 , c1 , . . . , 0, 0, . . . ), N where c N = k=0 ak b N −k . We take it as known that the usual associative, commutative, and distributive laws are then valid. The set of all polynomials in the indeterminate X is denoted by F[X ].

10

I. Preliminaries about the Integers, Polynomials, and Matrices

The polynomial with all entries 0 is denoted by 0 and is called the zero polynomial. For all polynomials P = (a0 , . . . , an , 0, . . . ) other than 0, the degree of P, denoted by deg P, is deﬁned to be the largest index n such that an = 0. The constant polynomials are by deﬁnition the zero polynomial and the polynomials of degree 0. If P and Q are nonzero polynomials, then P+Q=0

or

deg(P + Q) ≤ max(deg P, deg Q), deg(c P) = deg P,

deg(P Q) = deg P + deg Q. In the formula for deg(P + Q), equality holds if deg P = deg Q. Implicit in the formula for deg(P Q) is the fact that P Q cannot be 0 unless P = 0 or Q = 0. A cancellation law for multiplication is an immediate consequence: P R = Q R with R = 0

implies

P = Q.

In fact, P R = Q R implies (P − Q)R = 0; since R = 0, P − Q must be 0. If P = (a0 , . . . , an , 0, . . . ) is a polynomial and r is in F, we can evaluate P at r , obtaining as a result the number P(r ) = an r n + · · · + a1r + a0 . Taking into account all values of r , we obtain a mapping P → P( · ) of F[X ] into the set of functions from F into F. Because of the way that the arithmetic operations on polynomials have been deﬁned, we have (P + Q)(r ) = P(r ) + Q(r ), (P − Q)(r ) = P(r ) − Q(r ), (c P)(r ) = c P(r ), (P Q)(r ) = P(r )Q(r ). In other words, the mapping P → P( · ) respects the arithmetic operations. We say that r is a root of P if P(r ) = 0. Now we turn to the question of unique factorization. The deﬁnitions and the proof are completely analogous to those for the integers. A factor of a polynomial A is a nonzero polynomial B such that A = B Q for some polynomial Q. In this case we say also that B divides A, that B is a divisor of A, and that A is a multiple of B. We write B | A for this relationship. If A is nonzero, any product formula A = B Q 1 · · · Q r is a factorization of A. A unit in F[X ] is a divisor of 1, hence is any polynomial of degree 0; such a polynomial is a constant polynomial A(X ) = c with c equal to a nonzero scalar. The factorization A = B Q of A = 0 is called nontrivial if neither B nor Q is a unit. A prime P in F[X ] is a nonzero polynomial that is not a unit and has no nontrivial factorization P = B Q. Observe that the product of a prime and a unit is always a prime.

3. Unique Factorization of Polynomials

11

Proposition 1.12 (division algorithm). If A and B are polynomials in F[X ] and if B not the 0 polynomial, then there exist unique polynomials Q and R in F[X ] such that (a) A = B Q + R and (b) either R is the 0 polynomial or deg R < deg B. REMARK. This result codiﬁes the usual method of dividing polynomials in high-school algebra. That method writes A/B = Q + R/B, and then one obtains the above result by multiplying by B. The polynomial Q is the quotient in the division, and R is the remainder. PROOF OF UNIQUENESS. If A = B Q + R = B Q 1 + R1 , then B(Q − Q 1 ) = R1 − R. Without loss of generality, R1 − R is not the 0 polynomial since otherwise Q − Q 1 = 0 also. Then deg B + deg(Q − Q 1 ) = deg(R1 − R) ≤ max(deg R, deg R1 ) < deg B, and we have a contradiction.

PROOF OF EXISTENCE. If A = 0 or deg A < deg B, we take Q = 0 and R = A, and we are done. Otherwise we induct on deg A. Assume the result for degree ≤ n − 1, and let deg A = n. Write A = an X n + A1 with A1 = 0 or deg A1 < deg A. Let B = bk X k + B1 with B1 = 0 or deg B1 < deg B. Put Q 1 = an bk−1 X n−k . Then A − B Q 1 = an X n + A1 − an X n − an bk−1 X n−k B1 = A1 − an bk−1 X n−k B1 with the right side equal to 0 or of degree < deg A. Then the right side, by induction, is of the form B Q 2 + R, and A = B(Q 1 + Q 2 ) + R is the required decomposition. Corollary 1.13 (Factor Theorem). If r is in F and if P is a polynomial in F[X ], then X − r divides P if and only if P(r ) = 0. PROOF. If P = (X − r )Q, then P(r ) = (r − r )Q(r ) = 0. Conversely let P(r ) = 0. Taking B(X ) = X − r in the division algorithm (Proposition 1.12), we obtain P = (X − r )Q + R with R = 0 or deg R < deg(X − r ) = 1. Thus R is a constant polynomial, possibly 0. In any case we have 0 = P(r ) = (r − r )Q(r ) + R(r ), and thus R(r ) = 0. Since R is constant, we must have R = 0, and then P = (X − r )Q. Corollary 1.14. If P is a nonzero polynomial with coefﬁcients in F and if deg P = n, then P has at most n distinct roots.

12

I. Preliminaries about the Integers, Polynomials, and Matrices

REMARKS. Since there are inﬁnitely many scalars in any of Q and R and C, the corollary implies that the function from F to F associated to P, namely r → P(r ), cannot be identically 0 if P = 0. Starting in Chapter IV, we shall allow other F’s besides Q and R and C, and then this implication can fail. For example, when F is the two-element “ﬁeld” F = {0, 1} with 1 + 1 = 0 and with otherwise the expected addition and multiplication, then P(X ) = X 2 + X is not the zero polynomial but P(r ) = 0 for r = 0 and r = 1. It is thus important to distinguish polynomials in one indeterminate from their associated functions of one variable. PROOF. Let r1 , . . . , rn+1 be distinct roots of P(X ). By the Factor Theorem (Corollary 1.13), X − r1 is a factor of P(X ). We prove inductively on k that the product (X − r1 )(X − r2 ) · · · (X − rk ) is a factor of P(X ). Assume that this assertion holds for k, so that P(X ) = (X − r1 ) · · · (X − rk )Q(X ) and 0 = P(rk+1 ) = (rk+1 − r1 ) · · · (rk+1 − rk )Q(rk+1 ). Since the r j ’s are distinct, we must have Q(rk+1 ) = 0. By the Factor Theorem, we can write Q(X ) = (X − rk+1 )R(X ) for some polynomial R(X ). Substitution gives P(X ) = (X − r1 ) · · · (X − rk )(X − rk+1 )R(X ), and (X − r1 ) · · · (X − rk+1 ) is exhibited as a factor of P(X ). This completes the induction. Consequently P(X ) = (X − r1 ) · · · (X − rn+1 )S(X ) for some polynomial S(X ). Comparing the degrees of the two sides, we ﬁnd that deg S = −1, and we have a contradiction. We can use the division algorithm in the same way as with the integers in Sections 1–2 to obtain unique factorization. Within the set of integers, we deﬁned greatest common divisors so as to be positive, but their negatives would have worked equally well. That ﬂexibility persists with polynomials; the essential feature of any greatest common divisor of polynomials is shared by any product of that polynomial by a unit. A greatest common divisor of polynomials A and B with B = 0 is any polynomial D of maximum degree such that D divides A and D divides B. We shall see that D is indeed unique up to multiplication by a nonzero scalar.2 2 For some purposes it is helpful to isolate one particular greatest common divisor by taking the coefﬁcient of the highest power of X to be 1.

3. Unique Factorization of Polynomials

13

The Euclidean algorithm is the iterative process that makes use of the division algorithm in the form A = B Q 1 + R1 ,

R1 = 0 or deg R1 < deg B,

B = R1 Q 2 + R2 ,

R2 = 0 or deg R2 < deg R1 ,

R1 = R2 Q 3 + R3 , .. .

R3 = 0 or deg R3 < deg R2 ,

Rn−2 = Rn−1 Q n + Rn ,

Rn = 0 or deg Rn < deg Rn−1 ,

Rn−1 = Rn Q n+1 . In the above computation the integer n is deﬁned by the conditions that Rn = 0 and that Rn+1 = 0. Such an n must exist since deg B > deg R1 > · · · ≥ 0. We can now obtain an analog for F[X ] of the result for Z given as Proposition 1.2. Proposition 1.15. Let A and B be polynomials in F[X ] with B = 0, and let R1 , . . . , Rn be the remainders generated by the Euclidean algorithm when applied to A and B. Then (a) Rn is a greatest common divisor of A and B, (b) any D1 that divides both A and B necessarily divides Rn , (c) the greatest common divisor of A and B is unique up to multiplication by a nonzero scalar, (d) any greatest common divisor D has the property that there exist polynomials P and Q with A P + B Q = D. PROOF. Conclusions (a) and (b) are proved in the same way that parts (a) and (b) of Proposition 1.2 are proved, and conclusion (d) is proved with D = Rn in the same way that Proposition 1.2c is proved. If D is a greatest common divisor of A and B, it follows from (a) and (b) that D divides Rn and that deg D = deg Rn . This proves (c). Using Proposition 1.15, we can prove analogs for F[X ] of the two corollaries of Proposition 1.2. But let us instead skip directly to what is needed to obtain an analog for F[X ] of unique factorization as in Theorem 1.5. Lemma 1.16. If A and B are nonzero polynomials with coefﬁcients in F and if P is a prime polynomial such that P divides AB, then P divides A or P divides B. PROOF. If P does not divide A, then 1 is a greatest common divisor of A and P, and Proposition 1.15d produces polynomials S and T such that AS + P T = 1. Multiplication by B gives AB S + P T B = B. Then P divides AB S because it divides AB, and P divides P T B because it divides P. Hence P divides B.

14

I. Preliminaries about the Integers, Polynomials, and Matrices

Theorem 1.17 (unique factorization). Every member of F[X ] of degree ≥ 1 is a product of primes. This factorization is unique up to order and up to multiplication of each prime factor by a unit, i.e., by a nonzero scalar. PROOF. The existence follows in the same way as the existence in Theorem 1.5; induction on the integers is to be replaced by induction on the degree. The uniqueness follows from Lemma 1.16 in the same way that the uniqueness in Theorem 1.5 follows from Lemma 1.6. We turn to a consideration of properties of polynomials that take into account special features of R and C. If F is R, then X 2 + 1 is prime. The reason is that a nontrivial factorization of X 2 + 1 would have to involve two ﬁrst-degree real polynomials and then r 2 +1 would have to be 0 for some real r , namely for r equal to the root of either of the ﬁrst-degree polynomials. On the other hand, X 2 + 1 is not prime when F = C since X 2 + 1 = (X + i)(X − i). The Fundamental Theorem of Algebra, stated below, implies that every prime polynomial over C is of degree 1. It is possible to prove the Fundamental Theorem of Algebra within complex analysis as a consequence of Liouville’s Theorem or within real analysis as a consequence of the Heine–Borel Theorem and other facts about compactness. This text gives a proof of the Fundamental Theorem of Algebra in Chapter IX using modern algebra, speciﬁcally Sylow theory as in Chapter IV and Galois theory as in Chapter IX. One further fact is needed; this fact uses elementary calculus and is proved below as Proposition 1.20. Theorem 1.18 (Fundamental Theorem of Algebra). Any polynomial in C[X ] with degree ≥ 1 has at least one root. Corollary 1.19. Let P be a nonzero polynomial of degree n in C[X ], and let r1 , . . . , rk be the distinct roots. Then there exist unique integers m j > 0 for 1 ≤ j ≤ k such that P(X ) is a scalar multiple of kj=1 (X − r j )m j . The numbers m j have kj=1 m j = n. PROOF. We may assume that deg P > 0. We apply unique factorization (Theorem 1.17) to P(X ). It follows from the Fundamental Theorem of Algebra (Theorem 1.18) and the Factor Theorem (Corollary 1.13) that each prime polynomial with coefﬁcients in C has degree 1. Thus the unique factorization of P(X ) n (X − zl ) for some c = 0 and for some complex has to be of the form c l=1 numbers zl that are unique up to order. The zl ’s are roots, and every root is a zl by the Factor Theorem. Grouping like factors proves the desired factorization and its uniqueness. The numbers m j have kj=1 m j = n by a count of degrees. The integers m j in the corollary are called the multiplicities of the roots of the polynomial P(X ).

4. Permutations and Their Signs

15

We conclude this section by proving the result from calculus that will enter the proof of the Fundamental Theorem of Algebra in Chapter IX. Proposition 1.20. Any polynomial in R[X ] with odd degree has at least one root. PROOF. Without loss of generality, we may take the leading coefﬁcient to be 1. Thus let the polynomial be P(X ) = X 2n+1 + a2n X 2n + · · · + a1 X + a0 = X 2n+1 + R(X ). For |r | ≥ 1, the polynomial R satisﬁes |R(r )| ≤ C|r |2n , where C = |a2n | + · · · + |a1 | + |a0 |. Thus |r | > max(C, 1) implies |P(r ) − r 2n+1 | ≤ C|r |2n < |r |2n+1 , and it follows that P(r ) has the same sign as r 2n+1 for |r | > max(C, 1). For r0 = max(C, 1) + 1, we therefore have P(−r0 ) < 0 and P(r0 ) > 0. By the Intermediate Value Theorem, given in Section A3 of the appendix, P(r ) = 0 for some r with −r0 ≤ r ≤ r0 .

4. Permutations and Their Signs Let S be a ﬁnite nonempty set of n elements. A permutation of S is a one-one function from S onto S. The elements might be listed as a1 , a2 , . . . , an , but it will simplify the notation to view them simply as 1, 2, . . . , n. We use ordinary function notation for describing the effect of permutations. Thus the value of a permutation σ at j is σ ( j), and the composition of τ followed by σ is σ ◦ τ or simply σ τ , with (σ τ )( j) = σ (τ ( j)). Composition is automatically associative, i.e., (ρσ )τ = ρ(σ τ ), because the effect of both sides on j, when we expand things out, is ρ(σ (τ ( j))). The composition of two permutations is also called their product. The identity permutation will be denoted by 1. Any permutation σ , being a one-one onto function, has a well-deﬁned inverse permutation σ −1 with the property that σ σ −1 = σ −1 σ = 1. One way of describing concisely the effect of a permutation is to list its domain

valuesand to put the corresponding range 12345 values beneath them. Thus σ = is the permutation of {1, 2, 3, 4, 5} 43512 with σ (1) = 4, σ (2) = 3, σ (3) = 5, σ (4) = 1, and σ (5) = 2. The inverse 43512 permutation is obtained by interchanging the two rows to obtain and 12345 then adjusting theentries in the rows so that the ﬁrst row is in the usual order:

1 2 3 4 5 . σ −1 = 45213 If 2 ≤ k ≤ n, a k-cycle is a permutation σ that ﬁxes each element in some subset of n − k elements and moves the remaining elements c1 , . . . , ck according to σ (c1 ) = c2 , σ (c2 ) = c3 , . . . , σ (ck−1 ) = ck , σ (ck ) = c1 . Such a cycle may be

16

I. Preliminaries about the Integers, Polynomials, and Matrices

denoted by (c1 c2 · · · ck−1 ck ) to stress its structure. For example take n = 5; 12345 then σ = (2 3 5) is the 3-cycle given in our earlier notation by . 13542 The cycle (2 3 5) is the same as the cycle (3 5 2) and the cycle (5 2 3). It is sometimes helpful to speak of the identity permutation 1 as the unique 1-cycle. A system of cycles is said to be disjoint if the sets that each of them moves are disjoint in pairs. Thus (2 3 5) and (1 4) are disjoint, but (2 3 5) and (1 3) are not. Any two disjoint cycles σ and τ commute in the sense that σ τ = τ σ . Proposition 1.21. Any permutation σ of {1, 2, . . . , n} is a product of disjoint cycles. The individual cycles in the decomposition are unique in the sense of being determined by σ .

12345 = (2 3 5)(1 4). EXAMPLE. 43512 PROOF. Let us prove existence. Working with {1, 2, . . . , n}, we show that any σ is the disjoint product of cycles in such a way that no cycle moves an element j unless σ moves j. We do so for all σ simultaneously by induction downward on the number of elements ﬁxed by σ . The starting case of the induction is that σ ﬁxes all n elements. Then σ is the identity, and we are regarding the identity as a 1-cycle. For the inductive step suppose σ ﬁxes the elements in a subset T of r elements of {1, 2, . . . , n} with r < n. Let j be an element not in T , so that σ ( j) = j. Choose k as small as possible so that some element is repeated among j, σ ( j), σ 2 ( j), . . . , σ k ( j). This condition means that σ l ( j) = σ k ( j) for some l with 0 ≤ l < k. Then σ k−l ( j) = j, and we obtain a contradiction to the minimality of k unless k − l = k, i.e., l = 0. In other words, we have σ k ( j) = j. We may thus form the k-cycle γ = ( j σ ( j) σ 2 ( j) σ k−1 ( j)). The permutation γ −1 σ then ﬁxes the r + k elements of T ∪ U , where U is the set of elements j, σ ( j), σ 2 ( j), . . . , σ k−1 ( j). By the inductive hypothesis, γ −1 σ is the product τ1 · · · τ p of disjoint cycles that move only elements not in T ∪ U . Since γ moves only the elements in U , γ is disjoint from each of τ1 , . . . , τ p . Therefore σ = γ τ1 · · · τ p provides the required decomposition of σ . For uniqueness we observe from the proof of existence that each element j generates a k-cycle C j for some k ≥ 1 depending on j. If we have two decompositions as in the proposition, then the cycle within each decomposition that contains j must be C j . Hence the cycles in the two decompositions must match. A 2-cycle is often called a transposition. The proposition allows us to see quickly that any permutation is a product of transpositions.

4. Permutations and Their Signs

17

Corollary 1.22. Any k-cycle σ permuting {1, 2, . . . , n} is a product of k − 1 transpositions if k > 1. Therefore any permutation σ of {1, 2, . . . , n} is a product of transpositions. PROOF. For the ﬁrst statement, we observe that (c1 c2 · · · ck−1 ck ) = (c1 ck )(c1 ck−1 ) · · · (c1 c3 )(c1 c2 ). The second statement follows by combining this fact with Proposition 1.21. Our ﬁnal tasks for this section are to attach a sign to each permutation and to examine the properties of these signs. We begin with the special case that our underlying set S is {1, . . . , n}. If σ is a permutation of {1, . . . , n}, consider the numerical products |σ (k) − σ ( j)| and (σ (k) − σ ( j)). 1≤ ji

rnn−1

PROOF. We show that the determinant is ⎛ 1 ⎜ r2 = (r j − r1 ) det ⎜ ⎝ ... j>1

··· ··· .. .

⎞ 1 rn ⎟ , .. ⎟ . ⎠

r2n−2 · · · rnn−2 and then the result follows by induction. In the given matrix, replace the n th row by the sum of it and −r1 times the (n − 1)st row, then the (n − 1)st row by the sum of it and −r1 times the (n − 2)nd row, and so on. The resulting determinant is ⎞ ⎛ 1 1 ··· 1 ··· rn − r1 r2 − r1 ⎟ ⎜0 ⎟ ⎜. .. .. . ⎟ ⎜ . . det ⎜ . . . . ⎟ ⎝ 0 r n−2 − r r n−3 · · · r n−2 − r r n−3 ⎠ 1 2 1 n n 2 0 r2n−1 − r1r2n−2 · · · rnn−1 − r1rnn−2 ⎞ ⎛ ··· rn − r1 r2 − r1 .. .. .. ⎟ ⎜ by Proposition 2.36a . . . ⎟ = det ⎜ ⎝ r n−2 − r r n−3 · · · r n−2 − r r n−3 ⎠ applied with j = 1 1 2 1 n n 2 n−1 n−2 n−1 n−2 r2 − r1r2 · · · rn − r1rn ⎛ ⎞ 1 ··· 1 ⎜ r2 ··· rn ⎟ , = (r2 − r1 ) · · · (rn − r1 ) det ⎜ . .. ⎟ . ⎝ .. .. . ⎠

r2n−2 · · · rnn−2 the last step following by multilinearity of the determinant in the columns (as a consequence of Proposition 2.35 and multilinearity in the rows).

II. Vector Spaces over Q, R, and C

72

The classical adjoint of the square matrix A, denoted by Aadj , is the matrix with adj !ji with A !kl deﬁned as in the statement of Proposition entries Ai j = (−1)i+ j det A !kl is the matrix A with the k th row and l th column deleted. 2.36: A

adj d −b a b = . Thus we have In the 2-by-2 case, we have −c a c d A Aadj = Aadj A = (det A)I in the 2-by-2 case. Cramer’s rule for solving simultaneous linear equations results from the n-by-n generalization of this formula. Proposition 2.38 (Cramer’s rule). If A is an n-by-n matrix, then A Aadj = A A = (det A)I , and thus det A = 0 implies A−1 = (det A)−1 Aadj . Consequently if det A = 0, then the unique solution ⎛ of the⎞simultaneous ⎛ system ⎞ Ax = b b1 x1 . . of n equations in n unknowns, in which x = ⎝ .. ⎠ and b = ⎝ .. ⎠, has xn bn adj

xj =

det B j det A

with B j equal to the n-by-n matrix obtained from A by replacing the j th column of A by b. REMARKS. If we think of the calculation of the determinant of an n-by-n matrix as requiring about n 3 steps, then application of Cramer’s rule, at least if done in an unthinking fashion, suggests that solving an invertible system requires about n 3 (n + 1) steps, i.e., n + 1 determinants are involved in the explicit solution. Use of row reduction directly to solve the system is more efﬁcient than proceeding this way. Thus Cramer’s rule is more important for its theoretical applications than it is for making computations. One simple theoretical application is the observation that each entry of the inverse of a matrix is the quotient of a polynomial function of the entries divided by the determinant. PROOF. The (i, j)th entry of Aadj A is (Aadj A)i j =

n k=1

adj

Aik Ak j =

n

! (−1)i+k (det A ki )Ak j .

k=1

If i = j, then expansion in cofactors about the j th column (Proposition 2.36a) identiﬁes the right side as det A. If i = j, consider the matrix B obtained from A by replacing the i th column of A by the j th column. Then the i th and j th columns of B are equal, and hence det B = 0. Expanding det B in cofactors about the i th column (Proposition 2.36a), we obtain 0 = det B =

n k=1

! (−1)i+k (det B ki )Bki =

n k=1

! (−1)i+k (det A ki )Ak j .

8. Eigenvectors and Characteristic Polynomials

73

Thus A Aadj = (det A)I . A similar argument proves that Aadj A = (det A)I . For the application to Ax = b, we multiply both sides on the left by Aadj and obtain (det A)x = Aadj b. Hence (det A)x j =

n

(Aadj ) ji bi =

i=1

n

!i j , (−1)i+ j bi det A

i=1

and the right side equals det B j by expansion in cofactors of det B j about the j th column (Proposition 2.36a). 8. Eigenvectors and Characteristic Polynomials A vector v = 0 in Fn is an eigenvector of the n-by-n matrix A if Av = λv for some scalar λ. We call λ the eigenvalue associated with v. When λ is an eigenvalue, the vector space of all v with Av = λv, i.e., the set consisting of the eigenvectors and the 0 vector, is called the eigenspace for λ. If we think of A as giving a linear map L from Fn to itself, an eigenvector takes on geometric signiﬁcance as a vector mapped to a multiple of itself by L. Another geometric way of viewing matters is that the eigenvector yields a 1-dimensional subspace U = Fv that is invariant, or stable, under L in the sense of satisfying L(U ) ⊆ U . Proposition 2.39. An n-by-n matrix A has an eigenvector with eigenvalue λ if and only if det(λI − A) = 0. In this case the eigenspace for λ is the kernel of λI − A. PROOF. We have Av = λv if and only if (λI − A)v = 0, if and only if v is in ker(λI − A). This kernel is nonzero if and only if det(λI − A) = 0. With A ﬁxed, the expression det(λI − A) is a polynomial in λ of degree n and is called the characteristic polynomial8 of A. To see that it is at least a polynomial function of λ, let us expand det(λI − A) as ⎞ ⎛ λ − A11 −A12 ··· −A1n ⎜ −A21 λ − A22 · · · −A2n ⎟ ⎟ det ⎜ .. .. .. .. ⎠ ⎝ . . . . =

σ

−An1

−An2

···

λ − Ann

(sgn σ )term1,σ (1) · · · termn,σ (n) .

8 Some authors call det(A − λI ) the characteristic polynomial. This is the same polynomial as det(λI − A) if n is even and is the negative of it if n is odd. The choice made here has the slight advantage of always having leading coefﬁcient 1, which is a handy property in some situations.

II. Vector Spaces over Q, R, and C

74

The term for the permutation σ = 1 has σ (k) = k for every k and gives n j=1 (λ − A j j ). All other σ ’s have σ (k) = k for at most n − 2 values of k, and λ therefore occurs at most n − 2 times. Thus the above expression is =

n

(λ − A j j ) +

j=1

= λn −

n j=1

#

$ other terms with powers of λ at most n − 2

# $ terms with powers of A j j λn−1 + + (−1)n det A. λ from n − 2 to 1

The constant term is (−1)n det A as indicated because it is the value of the polynomial at λ = 0, which is det(−A). In any event, we now see that characteristic polynomials are polynomial functions and can even be treated as polynomials in an indeterminate λ in the sense of Section I.3.9 The negative of the coefﬁcient of λn−1 is the trace of A, denoted by Tr A. Thus Tr A = nj=1 A j j . Trace is a linear functional on the vector space Mnn (F) of n-by-n matrices.

4 1 EXAMPLE 1. For A = , the characteristic polynomial is −2 1

det(λI − A) = det

λ − 4 −1 2 λ−1

= (λ − 4)(λ − 1) + 2 = λ2 − 5λ + 6 = (λ − 2)(λ − 3). The roots, and hence the eigenvalues, are λ = 2 and λ = 3. The eigenvectors for λ = 2 are computed by solving (2I − A)v = 0. The method of row reduction gives

0 0 0 −2 −1 1 12 2 − 4 −1 = → . 0 0 0 2 1 0 0 2 2−1 = −12 x2 . So the eigenvectors for λ = 2 Thus we have x1 + 12 x2 = 0 and x1

− 12 x1 = x2 . Similarly we ﬁnd are the nonzero vectors of the form x2 1 the eigenvectors for λ = 3 by starting from (3I − A)v = 0 and solving. The result for λ = 3 are the nonzero vectors of the form

is that the eigenvectors −1 x1 = x2 . For this example, there is a basis of eigenvectors. x2 1 9 In Chapter V we will allow determinants of matrices whose entries are from any “commutative ring with identity,” C[λ] being an example. Then we can think of det(λI − A) directly as involving an indeterminate λ and not initially as a function of a scalar λ.

8. Eigenvectors and Characteristic Polynomials

75

Corollary 2.40. An n-by-n matrix A has at most n eigenvalues. PROOF. Since det(λI − A) is a polynomial of degree n, this follows from Proposition 2.39 and Corollary 1.14. It will later be of interest that certain matrices A have a basis of eigenvectors. Such a basis exists for A as in Example 1 but not in general. One thing that can prevent a matrix from having a basis of eigenvectors is the failure of the characteristic polynomial to factor into ﬁrst-degree factors. Thus, for example,

0 1 A = has characteristic polynomial λ2 + 1, which does not factor −1 0 into ﬁrst-degree factors when F = R. Even when we do have a factorization into ﬁrst-degree factors, we can still fail to have a basis of eigenvectors, as the following example shows.

1 −1 , the characteristic polynomial is given EXAMPLE 2. For A = 0 1

λ−1 1 by det(λI − A) = det = (λ − 1)2 . When we solve for 0 λ − 1

x1 1 0 0 1 eigenvectors, we get = x1 , , and x2 = 0. Thus x2 0 0 0 0 and we do not have a basis of eigenvectors. What happens is that the presence of a factor (λ − c)k in the characteristic polynomial ensures the existence of an r -parameter family of eigenvectors for eigenvalue c, with 1 ≤ r ≤ k, but not necessarily with r = k. Example 2 shows that r can be strictly less than k. For purposes of deciding whether there is a basis of eigenvectors, the positive result is that the different roots of the characteristic polynomial do not interfere with each other; this is a consequence of the following proposition. Proposition 2.41. If A is an n-by-n matrix, then eigenvectors for distinct eigenvalues are linearly independent. REMARK. It follows that if the characteristic polynomial of A has n distinct eigenvalues, then it has a basis of eigenvectors. PROOF. Let Av1 = λ1 v1 , . . . , Avk = λk vk with λ1 , . . . , λk distinct, and suppose that c1 v1 + · · · + ck vk = 0.

II. Vector Spaces over Q, R, and C

76

Applying A repeatedly gives c1 λ1 v1 + · · · + ck λk vk = 0, c1 λ21 v1 + · · · + ck λ2k vk = 0, .. . k−1 c1 λk−1 1 v1 + · · · + ck λk vk = 0. ( j)

If the j th entry of vi is denoted by vi , this system of vector equations says that ⎛ ⎞ 1 ··· 1 ⎛ ⎞ ⎛ ( j) ⎞ c1 v1 0 ⎜ λ1 ··· λk ⎟ . . ⎜ . ⎟ ⎝ ⎠ ⎝ .. = .. ⎠ for 1 ≤ j ≤ n. .. ⎠ .. ⎝ .. . . ( j) 0 ck vk λk−1 · · · λk−1 1

k

The square matrix on the left side is a Vandermonde matrix, which is invertible ( j) by Corollary 2.37 since λ1 , . . . , λk are distinct. Therefore ci vi = 0 for all i ( j) and j. Each vi is nonzero in some entry vi with j perhaps depending on i, and hence ci = 0. Since all the coefﬁcients ci have to be 0, v1 , . . . , vk are linearly independent. The theory of eigenvectors and eigenvalues for square matrices allows us to develop a corresponding theory for linear maps L : V → V , where V is an n-dimensional vector space over F. If L is such a function, a vector v = 0 in V is an eigenvector of L if L(v) = λv for some scalar λ. We call λ the eigenvalue. When λ is an eigenvalue, the vector space of all v with L(v) = λv is called the eigenspace for λ under L. We can compute the eigenvalues and eigenvectors

of L by working any ordered basis of V . The equation L(v) =

in

L v v λv becomes =λ and is satisﬁed if and only if the column

v L vector is an eigenvalue of the matrix A = with eigenvalue λ. Applying Proposition 2.39 and remembering that determinants are well deﬁned on linear maps L : V → V , we see that L has an eigenvector with eigenvalue λ if and only if det(λI − L) = 0 and that in this case the eigenspace is the kernel of λI − L. What happens if we make these computations in a different

ordered basis

? L L We know from Proposition 2.17 that the matrices A = and B =

I . Computing with are similar, related by B = C −1 AC, where C =

9. Bases in the Infinite-Dimensional Case

77

v A leads to u = as eigenvector for the eigenvalue λ. The corresponding −1 −1 −1 −1 −1 result for B is that

B(C u) = C ACC u = C Au = λC u. Thus I v v C −1 u = = is an eigenvector of B with eigenvalue λ, just as it should be. These considerations about eigenvalues suggest some facts about similar matrices that we can observe more directly without ﬁrst passing from matrices to linear maps: One is that similar matrices have the same characteristic polynomial. To see this, suppose that B = C −1 AC; then det(λI − B) = det(λI − C −1 AC) = det(C −1 (λI − A)C) = (det C −1 ) det(λI − A)(det C −1 ) = (det C −1 )(det C −1 ) det(λI − A) = det(λI − A). A second fact is that similar matrices have the same trace. In fact, the trace is the negative of the coefﬁcient of λn−1 in the characteristic polynomial, and the characteristic polynomials are the same. Because of these considerations we are free in the future to speak of the characteristic polynomial, the eigenvalues, and the trace of a linear map from a ﬁnitedimensional vector space to itself, as well as the determinant, and these notions do not depend on any choice of ordered basis. We can speak unambiguously also of the eigenvectors of such a linear map. For this notion the realization of the eigenvectors in an ordered basis as column vectors depends on the ordered basis, the dependence being given by the formulas two paragraphs before the present one. One ﬁnal remark is in order. When the scalars are taken to be the complex numbers C, the Fundamental Theorem of Algebra (Theorem 1.18) is applicable: every polynomial of degree ≥ 1 has at least one root. When applied to the characteristic polynomial of a square matrix or a linear map from a ﬁnite-dimensional vector space to itself, this theorem tells us that the matrix or linear map always has at least one eigenvalue, hence an eigenvector. We shall make serious use of this fact in Chapter III.

9. Bases in the Inﬁnite-Dimensional Case So far in this chapter, the use of bases has been limited largely to vector spaces having a ﬁnite spanning set. In this case we know from Corollary 2.3 that the ﬁnite spanning set has a subset that is a basis, any linearly independent set can be extended to a basis, and any two bases have the same ﬁnite number of elements.

78

II. Vector Spaces over Q, R, and C

We called such spaces ﬁnite-dimensional and deﬁned the dimension of the vector space to be the number of elements in a basis. The ﬁrst objective in this section is to prove analogs of these results in the inﬁnite-dimensional case. We shall make use of Zorn’s Lemma as in Section A5 of the appendix, as well as the notion of cardinality discussed in Section A6 of the appendix. Once these analogs are in place, we shall examine the various results that we proved about ﬁnite-dimensional spaces to see the extent to which they remain valid for inﬁnite-dimensional spaces. Theorem 2.42. If V is any vector space over F, then (a) any spanning set in V has a subset that is a basis, (b) any linearly independent set in V can be extended to a basis, (c) V has a basis, (d) any two bases have the same cardinality. REMARKS. The common cardinality mentioned in (d) is called the dimension of the vector space V . In many applications it is enough to use +∞ in place of each inﬁnite cardinal in dimension formulas. This was the attitude conveyed in the remark with Corollary 2.24. PROOF. For (b), let E be the given linearly independent set, and let S be the collection of all linearly independent subsets of V that contain E. Partially order S by inclusion upward. The set S is nonempty because E is in S. Let T be a chain in S, and let A be the union of the members of T . We show that A is in S, and then A is certainly an upper bound of T . Because of its deﬁnition, A contains E, and we are to prove that A is linearly independent. For A to fail to be linearly independent would mean that there are vectors v1 , . . . , vn in A with c1 v1 + · · · + cn vn = 0 for some system of scalars not all 0. Let v j be in the member A j of the chain T . Since A1 ⊆ A2 or A2 ⊆ A1 , v1 and v2 are both in A1 or both in A2 . To keep the notation neutral, say they are both in A2 . Since A2 ⊆ A3 or A3 ⊆ A2 , all of v1 , v2 , v3 are in A2 or they are all in A3 . Say they are all in A3 . Continuing in this way, we arrive at one of the sets A1 , . . . , An , say An , such that all of v1 , . . . , vn are all in An . The members of An are linearly independent by assumption, and we obtain the contradiction c1 = · · · = cn = 0. We conclude that A is linearly independent. Thus the chain T has an upper bound in S. By Zorn’s Lemma, S has a maximal element, say M. By Proposition 2.1a, M is a basis of V containing E. For (a), let E be the given spanning set, and let S be the collection of all linearly independent subsets of V that are contained in E. Partially order S by inclusion upward. The set S is nonempty because ∅ is in S. Let T be a chain in S, and let A be the union of the members of T . We show that A is in S, and then A is certainly an upper bound of T . Because of its deﬁnition, A is contained in

9. Bases in the Infinite-Dimensional Case

79

E, and the same argument as in the previous paragraph shows that A is linearly independent. Thus the chain T has an upper bound in S. By Zorn’s Lemma, S has a maximal element, say M. Proposition 2.1a is not applicable, but its proof is easily adjusted to apply here to show that M spans V and hence is a basis: Given v in V , we are to prove that v lies is the linear span of M. First suppose that v is in E. If v is in M, there is nothing to prove. Since M ∪ {v} is contained in E, the assumed maximality implies that M ∪ {v} is not linearly independent, and hence cv + c1 v1 + · · · + cn vn = 0 for some scalars c, c1 , . . . , cn not all 0 and for some vectors v1 , . . . , vn in M. The scalar c cannot be 0 since M is linearly independent. Thus v = −c−1 c1 v1 − · · · − c−1 cn vn , and v is exhibited as in the linear span of M. Consequently every member of E lies in the linear span of M. Now suppose that v is not in E. Since every member of V lies in the linear span of E, every member of V lies in the linear span of M. Conclusion (c) follows from (a) by taking the spanning set to be V ; alternatively it follows from (b) by taking the linearly independent set to be ∅. For (d), let A = {vα } and B = {wβ } be two bases of V . Each member a of A can be written as a = c1 wβ1 + · · · + cn wβn uniquely with the scalars c1 , . . . , cn nonzero and with each wβj in B. Let Ba be the ﬁnite subset {wβ1 , . . . , wβn }. Then we have associated to each member of A a ﬁnite subset Ba of B. Let us see that a∈A Ba = B. If b is in B, then the linear span of B − {b} is not all of V . Thus some v in V is not in this span. Expand v in terms of A as v = d1 vα1 +· · ·+dm vαm with all d j = 0. Since v is not in the linear span of B − {b}, some a0 = vαj0 with 1 ≤ j0 ≤ m is not in this linear span. Then b is in Ba0 , and we conclude that B = a∈A Ba . By the corollary near the end of Section A6 of the appendix, card B ≤ card A. Reversing the roles of A and B, we obtain card A ≤ card B. By the Schroeder–Bernstein Theorem, A and B have the same cardinality. This proves (d). Now let us go through the results of the chapter and see how many of them extend to the inﬁnite-dimensional case and why. It is possible but not very useful in the inﬁnite-dimensional case to associate an inﬁnite “matrix” to a linear map when bases or ordered bases are speciﬁed for the domain and range. Because this association is not very useful, we shall not attempt to extend any of the results concerning matrices. The facts concerning extensions of results just dealing with dimensions and linear maps are as follows: COROLLARY 2.5. If V is any vector space and U is a vector subspace, then dim U ≤ dim V . In fact, take a basis of U and extend it to a basis of V ; a basis of U is then exhibited as a subset of a basis of V , and the conclusion about cardinal-number dimensions follows.

80

II. Vector Spaces over Q, R, and C

PROPOSITION 2.13. Let U and V be vector spaces over F, and let be a basis of U . Then to each function : → V corresponds one and only one linear map L : U → V such that L = . In fact, the proof given in Section 3 is valid with no assumption about ﬁnite dimensionality. COROLLARY 2.15. If L : U → V is a linear map between vector spaces over F, then dim(domain(L)) = dim(kernel(L)) + dim(image(L)). In fact, this formula remains valid, but the earlier proof via matrices has to be replaced. Instead, take a basis {vα | α ∈ A} of the kernel and extend it to a basis {vα | α ∈ S} of the domain. It is routine to check that {L(vα ) | α ∈ S − A} is a basis of the image of L. THEOREM 2.16 (part). The composition of two linear maps is linear. In fact, the proof in Section 3 remains valid with no assumption about ﬁnite dimensionality. PROPOSITION 2.18. Two vector spaces over F are isomorphic if and only if they have the same cardinal-number dimension. In fact, this result follows from Proposition 2.13 just as it did in the ﬁnitedimensional case; the only changes that are needed in the argument in Section 3 are small adjustments of the notation. Of course, one must not overinterpret this result on the basis of the remark with Theorem 2.42: two vector spaces with dimension +∞ need not be isomorphic. Despite the apparent deﬁnitive sound of Proposition 2.18, one must not attach too much signiﬁcance to it; vector spaces that arise in practice tend to have some additional structure, and an isomorphism based merely on equality of dimensions need not preserve the additional structure. PROPOSITION 2.19. If V is a vector space and V is its dual, then dim V ≤ dim V . (In the inﬁnite-dimensional case we do not have equality.) In fact, take a basis {vα } of V . If for each α we deﬁne vα (vβ ) = δαβ and use Proposition 2.13 to form the linear extension vα , then the set {vα } is a linearly independent subset of V that is in one-one correspondence with the basis of V . Extending {vα } to a basis of V , we obtain the result. PROPOSITION 2.20. Let V be a vector space, and let U be a vector subspace of V . Then (b) every linear functional on U extends to a linear functional on V , (c) whenever v0 is a member of V that is not in U , there exists a linear functional on V that is 0 on U and is 1 on v0 .

9. Bases in the Infinite-Dimensional Case

81

Conclusion (a) of the original Proposition 2.20, which concerns annihilators, does not extend to the inﬁnite-dimensional case. To prove (b) without the ﬁnite dimensionality, let u be a given linear functional on U , let {u α } be a basis of U , and let {vβ } be a subset of V such that {u α } ∪ {vβ } is a basis of V . Deﬁne v (u α ) = u (u α ) for each α and v (vβ ) = 0 for each β. Using Proposition 2.13, let v be the linear extension to a linear functional on V . Then v has the required properties. To prove (c) without the ﬁnite dimensionality, we take a basis {u α } of U and extend {u α } ∪ {v0 } to a basis of V . Deﬁne v to equal 0 on each u α , to equal 1 on v0 , and to equal 0 on the remaining members of the basis of V . Then the linear extension of v to V is the required linear functional. PROPOSITION 2.22. If V is any vector space over F, then the canonical map ι : V → V is one-one. The canonical map is not onto V if V is inﬁnitedimensional. The proof that it is one-one given in Section 4 is applicable in the inﬁnitedimensional case since we know from Theorem 2.42 that any linearly independent subset of V can be extended to a basis. For the second conclusion when V has a countably inﬁnite basis, see Problem 31 at the end of the chapter. PROPOSITION 2.23 THROUGH COROLLARY 2.29. For these results about quotients, the only place that ﬁnite dimensionality played a role was in the dimension formulas, Corollaries 2.24 and 2.29. We restate these two results separately. COROLLARY 2.24. If V is a vector space over F and U is a vector subspace, then (a) dim V = dim U + dim(V /U ), (b) the subspace U is the kernel of some linear map deﬁned on V . The proof in Section 5 requires no changes: Let q be the quotient map. The linear map q meets the conditions of (b). For (a), take a basis of U and extend to a basis of V . Then the images under q of the additional vectors form a basis of V /U . COROLLARY 2.29. Let M and N be vector subspaces of a vector space V over F. Then dim(M + N ) + dim(M ∩ N ) = dim M + dim N . In fact, Corollary 2.24a gives us dim(M + N ) = dim((M + N )/M) + dim M. Substituting dim((M + N )/M) = dim(N /(M ∩ N )) from Theorem 2.28 and adding dim(M ∩ N ) to both sides, we obtain dim(M + N ) + dim(M ∩ N ) = dim(M ∩ N ) + dim(N /(M ∩ N )) + dim M. The ﬁrst two terms on the right side add to dim N by Corollary 2.24a, and the result follows.

82

II. Vector Spaces over Q, R, and C

PROPOSITIONS 2.30 THROUGH 2.33. These results about direct products and direct sums did not assume any ﬁnite dimensionality. The determinants of Sections 7–8 have no inﬁnite-dimensional generalization, and Proposition 2.41 is the only result in those two sections with a valid inﬁnitedimensional analog. The valid analog in the inﬁnite-dimensional case is that eigenvectors for distinct eigenvalues under a linear map are linearly independent. The proof given for Proposition 2.41 in Section 8 adapts to handle this analog, ( j) provided we interpret components vi of a vector vi as the coefﬁcients needed to expand vi in a basis of the underlying vector space.

10. Problems 1.

2.

3.

4.

Determine bases of the following subsets of R3 : (a) the plane 3x − 2y + 5z = 0, % x = 2t (b) the line y = −t , where −∞ < t < ∞. z = 4t This problem shows that the associativity law in the deﬁnition of “vector space” implies certain more complicated formulas of which the stated law is a special case. Let v1 , . . . , vn be vectors in a vector space V . The only vector-space properties that are to be used in this problem are associativity of addition and the existence of the 0 element. (a) Deﬁne v(k) inductively upward by v(0) = 0 and v(k) = v(k−1) +vk , and deﬁne v (l) inductively downward by v (n+1) = 0 and v (l) = vl + v (l+1) . Prove that v(k) + v (k+1) is always the same element for 0 ≤ k ≤ n. (b) Prove that the same element of V results from any way of inserting parentheses in the sum v1 + · · · + vn so that each step requires the addition of only two members of V . This problem shows that the commutative and associative laws in the deﬁnition of “vector space” together imply certain more complicated formulas of which the stated commutative law is a special case. Let v1 , . . . , vn be vectors in a vector space V . The only vector-space properties that are to be used in this problem are commutativity of addition and the properties in the previous problem. Because of the previous problem, v1 + · · · + vn is a well-deﬁned element of V , and it is not necessary to insert any parentheses in it. Prove that v1 + v2 + · · · + vn = vσ (1) + vσ (2) + · · · + vσ (n) for each permutation σ of {1, . . . , n}.

1 2 −1 For the matrix A = 2 4 6 , ﬁnd 0 0 −8

(a) a basis for the row space, (b) a basis for the column space, and

10. Problems

83

(c) the rank of the matrix. 5.

Let A be an n-by-n matrix of rank one. Prove that there exists an n-dimensional column vector c and an n-dimensional row vector r such that A = cr .

6.

Let A be a k-by-n matrix, and let R be a reduced row-echelon form of A. (a) Prove for each r that the rows of R whose ﬁrst r entries are 0 form a basis for the vector subspace of all members of the row space of A whose ﬁrst r entries are 0. (b) Prove that the reduced row-echelon form of A is unique in the sense that any two sequences of steps of row reduction lead to the same reduced form.

7.

Let E be an ﬁnite set of N points, let V be the N -dimensional vector space of all real-valued functions on E, and let n be an integer with 0 < n ≤ N . Suppose that U is an n-dimensional subspace of V . Prove that there exists a subset D of n points in E such that the vector space of restrictions to D of the members of U has dimension n.

8.

2 2 A by the matrix linear map L : R → R is given in the standard ordered # basis $ −6 −12 3 −4 . Find the matrix of L in the ordered basis , 3 . 6 11 −2

9.

Let V be the real vector space of all polynomials in x of degree ≤ 2, and let L : V → V be the linear map I − D 2 , where I is the identity and D is the differentiation operator d/d x. Prove that L is invertible.

10. Let A be in Mkm (C) and B be in Mmn (C). Prove that rank(AB) ≤ max(rank A, rank B). 11. Let A be in Mkn (C) with k > n. Prove that there exists no B in Mnk (C) with AB = I . 12. Let A be in Mkn (C) and B be in Mnk (C). Give an example with k = n to show that rank(AB) need not equal rank(B A). 13. With the differential equation y (t) = y(t) in Example 2 of Section 3, two examples of linear functionals on the vector space of solutions are given by

1 (y) = y(0) and 2 (y) = y (0). Find a basis of the space of solutions such that { 1 , 2 } is the dual basis. 14. Suppose that a vector space V has a countably inﬁnite basis. Prove that the dual V has an uncountable linearly independent set. 15. (a) Give an example of a vector space and three vector subspaces L, M, and N such that L ∩ (M + N ) = (L ∩ M) + (L ∩ N ). (b) Show that inclusion always holds in one direction in (a). (c) Show that equality always holds in (a) if L ⊇ M.

84

II. Vector Spaces over Q, R, and C

16. Construct three vector subspaces M, N1 , and N2 of a vector space V such that M ⊕ N1 = M ⊕ N2 = V but N1 = N2 . What is the geometric picture corresponding to this situation? 17. Suppose that x, y, u, and v are vectors in R4 ; let M and N be the vector subspaces of R4 spanned by {x, y} and {u, v}, respectively. In which of the following cases is it true that R4 = M ⊕ N ? (a) x = (1, 1, 0, 0), y = (1, 0, 1, 0), u = (0, 1, 0, 1), v = (0, 0, 1, 1); (b) x = (−1, 1, 1, 0), y = (0, 1, −1, 1), u = (1, 0, 0, 0), v = (0, 0, 0, 1); (c) x = (1, 0, 0, 1), y = (0, 1, 1, 0), u = (1, 0, 1, 0), v = (0, 1, 0, 1). 18. Section 6 gave deﬁnitions and properties of projections and injections associated with the direct sum of two vector spaces. Write down corresponding deﬁnitions and properties for projections and injections in the case of the direct sum of n vector spaces, n being an integer > 2. 19. Let T : Rn → Rn be a linear map with ker T ∩ image T = 0. (a) Prove that Rn = ker T ⊕ image T . (b) Prove that the condition ker T ∩ image T = 0 is satisﬁed if T 2 = T . 20. If V1 and V2 are two vector spaces over F, prove that (V1 ⊕ V2 ) is canonically isomorphic to V1 ⊕ V2 . 21. Suppose that M is a vector subspace of a vector space V and that q : V → V /M is the quotient map. Corresponding to each linear functional y on V /M is a linear functional z on V given by z = yq. Why is the correspondence y → z an isomorphism between (V /M) and Ann M? 22. Let M be a vector subspace of the vector space V , and let q : V → V /M be the quotient map. Suppose that N is a vector subspace of V . Prove that V = M ⊕ N if and only if the restriction of q to N is an isomorphism of N onto V /M. 23. For a square matrix A of integers, prove that the inverse has integer entries if and only if det A = ±1. 24. Let A be in Mkn (C), and let r = rank A. Prove that r is the largest integer such that there exist r row indices i 1 , . . . , ir and r column indices j1 , . . . , jr for which the r -by-r matrix formed from these rows and columns of A has nonzero determinant. (Educational note: This problem characterizes the subset of matrices of rank ≤ r − 1 as the set in which all determinants of r -by-r submatrices are zero.) 25. Suppose that a linear combination of functions t → ect with c real vanishes for every integer t ≥ 0. Prove that it vanishes for every real t. 01 26. Find all eigenvalues and eigenvectors of A = −6 5 . 27. Let A and C be n-by-n matrices with C invertible. By making a direct calculation with the entries, prove that Tr(C −1 AC) = Tr A.

10. Problems

85

⎛ ⎜ ⎜ ⎜ 28. Find the characteristic polynomial of the n-by-n matrix ⎜ ⎜ ⎝

0 0 0 0

1 0 0 0

0 1 0 0

0 0 1 0

..

0 0 0 0

0 0 0 0

0

1

.

0 0 0 0 ···

⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠

a0 a1 a2 a3 ··· an−2 an−1

29. Let A and B be in Mnn (C). (a) Prove under the assumption that A is invertible that det(λI − AB) = det(λI − B A). (b) By working with A + I and letting tend to 0, show that the assumption in (a) that A is invertible can be dropped. 30. In proving Theorem 2.42a, it is tempting to argue by considering all spanning subsets of the given set, ordering them by inclusion downward, and seeking a minimal element by Zorn’s Lemma. Give an example of a chain in this ordering that has no lower bound, thereby showing that this line of argument cannot work. Problems 31–34 concern annihilators. Let V be a vector space, let M and N be vector subspaces, and let ι : V → V be the canonical map. 31. If V has a countably inﬁnite basis, how can we conclude that ι does not carry V onto V ? 32. Prove that Ann(M + N ) = Ann M ∩ Ann N . 33. Prove that Ann(M ∩ N ) = Ann M + Ann N . 34. (a) Prove that ι(M) ⊆ Ann(Ann M). (b) Prove that equality holds in (a) if V is ﬁnite-dimensional. (c) Give an inﬁnite-dimensional example in which equality fails in (a). Problems 35–39 concern operations by blocks within matrices. 35. Let A be a k-by-m matrix of the form A = ( A1 A2 ), where A1 has size k-by-m 1 , A2 has size 2 , and m 1 + m 2 = m. Let B by an m -by-n matrix

k-by-m B1 of the form B = , where B1 has size m 1 -by-n, B2 has size m 2 -by-n, and B2 m 1 + m 2 = m . (a) If m 1 = m 1 and m 2 = m 2 , prove that AB =A1 B1 + A2 B2 .

B1 A1 B1 A2 (b) If k = n, prove that B A = . B2 A2 B2 A2 (c) Deduce a general rule for block multiplication of matrices that are in 2-by-2 block form. 36. Let A be in Mkk (C), B be in Mkn (C), and D be in Mnn (C). Prove that A B det = det A det D. 0 D

86

II. Vector Spaces over Q, R, and C

37. Let A, B, C, and

D be inMnn (C). Suppose that A is invertible and that AC = C A. A B Prove that det = det(AD − C B). C D 38. Let A be in Mkn (C) and B be in Mnk (C) with k ≤ n. Let Ik be the k-byk identity, and let In be the n-by-n identity. Using Problem 29, prove that det(λIn − B A) = λn−k det(λIk − AB). 39. Prove the following block-form generalization of the expansion-in-cofactors formula. For each subset S of {1, . . . , n}, let S c be the complementary subset within {1, . . . , n}, and let sgn(S, S c ) be the sign of the permutation that carries (1, . . . , n) to the members of S in order, followed by the members of S c in order. Fix k with 1 ≤ k ≤ n − 1, and let the subset S have |S| = k. For an n-by-n matrix A, deﬁne A(S) to be the square matrix of size k obtained by using the rows of A indexed by 1, . . . , k and the columns indexed by the members of S. Let A(S) be the square matrix of size k − 1 obtained by using the rows of A indexed by k + 1, . . . , n and the columns indexed by the members of S c . Prove that det A = sgn(S, S c ) det A(S) det A(S). S⊆{1,...,n}, |S|=k

Problems 40–44 compute the determinants of certain matrices known as Cartan matrices. These have geometric signiﬁcance in the theory of Lie groups. ⎛ 2 −1 0 0 ··· 0 0 ⎞ ⎜ ⎜ ⎜ 40. Let An be the n-by-n matrix ⎜ ⎜ ⎝

−1

2 −1 0 ··· 0 −1 2 −1 ··· 0 0 −1 2 ···

..

0 0

0 0

0 0

0 0 0

0 ⎟ 0⎟ 0⎟

.

0 ··· 2 −1 0 ··· −1 2

⎟. Using expansion in ⎟ ⎠

cofactors about the last row, prove that det An = 2 det An−1 − det An−2 for n ≥ 3. 41. Computing det A1 and det A2 directly and using the recursion in Problem 40, prove that det An = n + 1 for n ≥ 1. 42. Let Cn for n ≥ 2 be the matrix An except that the (1, 2)th entry is changed from −1 to −2. (a) Expanding in cofactors about the last row, prove that the argument of Problem 40 is still applicable when n ≥ 4 and a recursion formula for det Cn results with the same coefﬁcients. (b) Computing det C2 and det C3 directly and using the recursion equation in (a), prove that det Cn = 2 for n ≥ 2.

10. Problems

87

43. Let Dn for n ≥ 3 be the matrix An except that the upper left 3-by-3 piece is

2 −1 0 2 0 − 1 0 2 −1 . changed from −1 2 −1 to 0 −1

2

−1 − 1

2

(a) Expanding in cofactors about the last row, prove that the argument of Problem 40 is still applicable when n ≥ 5 and a recursion formula for det Dn results with the same coefﬁcients. (b) Show that D3 can be transformed into A3 by suitable interchanges of rows and interchanges of columns, and conclude that det D3 = det A3 = 4. (c) Computing det D4 directly and using (b) and the recursion equation in (a), prove that det Dn = 4 for n ≥ 3. 44. Let E n for n ≥ 4 be the matrix An except that the upper left 4-by-4 piece is ⎛ ⎞ ⎛ ⎞ changed from

2 −1 0 0 2 −1 0 ⎠ 0 −1 2 −1 0 0 −1 2

⎝ −1

to

2 −1 0 0 2 0 −1 ⎠ . 0 0 2 −1 0 −1 −1 2

⎝ −1

(a) Expanding in cofactors about the last row, prove that the argument of Problem 40 is still applicable when n ≥ 6 and a recursion formula for det E n results with the same coefﬁcients. (b) Show that E 4 can be transformed into A4 by suitable interchanges of rows and interchanges of columns, and conclude that det E 4 = det A4 = 5. (c) Show that E 5 can be transformed into D5 by suitable interchanges of rows and interchanges of columns, and conclude that det E 5 = det D5 = 4. (d) Using (b) and (c) and the recursion equation in (a), prove that det E n = 9−n for n ≥ 4.

CHAPTER III Inner-Product Spaces

Abstract. This chapter investigates the effects of adding the additional structure of an inner product to a ﬁnite-dimensional real or complex vector space. Section 1 concerns the effect on the vector space itself, deﬁning inner products and their corresponding norms and giving a number of examples and formulas for the computation of norms. Vector-space bases that are orthonormal play a special role. Section 2 concerns the effect on linear maps. The inner product makes itself felt partly through the notion of the adjoint of a linear map. The section pays special attention to linear maps that are self-adjoint, i.e., are equal to their own adjoints, and to those that are unitary, i.e., preserve norms of vectors. Section 3 proves the Spectral Theorem for self-adjoint linear maps on ﬁnite-dimensional innerproduct spaces. The theorem says in part that any self-adjoint linear map has an orthonormal basis of eigenvectors. The Spectral Theorem has several important consequences, one of which is the existence of a unique positive semideﬁnite square root for any positive semideﬁnite linear map. The section concludes with the polar decomposition, showing that any linear map factors as the product of a unitary linear map and a positive semideﬁnite one.

1. Inner Products and Orthonormal Sets In this chapter we examine the effect of adding further geometric structure to the structure of a real or complex vector space as deﬁned in Chapter II. To be a little more speciﬁc in the cases of R2 and R3 , the development of Chapter II amounted to working with points, lines, planes, coordinates, and parallelism, but nothing further. In the present chapter, by comparison, we shall take advantage of additional structure that captures the notions of distances and angles. We take F to be R or C, continuing to call its members the scalars. We do not allow F to be Q in this chapter; the main results will make essential use of additional facts about R and C beyond those of addition, subtraction, multiplication, and division. The relevant additional facts are summarized in Sections A3 and A4 of the appendix.1 1 The theory of Chapter II will be observed in Chapter IV to extend to any “ﬁeld” F in place of Q or R or C, but the theory of the present chapter is limited to R and C, as well as some other special ﬁelds that we shall not try to isolate.

88

1. Inner Products and Orthonormal Sets

89

Many of the results that we obtain will be limited to the ﬁnite-dimensional case. The theory of inner-product spaces that we develop has an inﬁnite-dimensional generalization, but useful results for the generalization make use of a hypothesis of “completeness” for an inner-product space that we are not in a position to verify in examples.2 Let V be a vector space over F. An inner product on V is a function from V × V into F, which we here denote by ( · , · ), with the following properties: (i) the function u → (u, v) of V into F is linear, (ii) the function v → (u, v) of V into F is conjugate linear in the sense that it satisﬁes (u, v1 + v2 ) = (u, v1 ) + (u, v2 ) for v1 and v2 in V and (u, cv) = c(u, ¯ v) for v in V and c in F, (iii) (u, v) = (v, u) for u and v in V , (iv) (v, v) ≥ 0 for all v in V , (v) (v, v) = 0 only if v = 0 in V . The overbars in (ii) and (iii) indicate complex conjugation. Property (ii) reduces when F = R to the fact that v → (u, v) is linear. Properties (i) and (ii) together are summarized by saying that ( · , · ) is bilinear if F = R or sesquilinear if F = C. Property (iii) is summarized when F = R by saying that ( · , · ) is symmetric, or when F = C by saying that ( · , · ) is Hermitian symmetric. An inner-product space, for purposes of this book, is a vector space over R or C with an inner product in the above sense.3,4 EXAMPLES. as the dot product, i.e., with (x, y) = y t x = (1) V = Rn with ( · ,· ) y1 x1 . . x1 y1 + · · · + xn yn if x = .. and y = .. . The traditional notation for the xn

yn

dot product is x · y. , · ) deﬁned by (x, y) = y¯ t x = x1 y¯1 + · · · + xn y¯n if (2) V = Cn with ( · y1

x1

x=

.. .

xn

and y =

.. .

. Here y¯ denotes the entry-by-entry complex conjugate

yn

of y. The sesquilinear expression ( · , · ) is different from the complex bilinear dot product x · y = x1 y1 + · · · + xn yn . 2 A careful study in the inﬁnite-dimensional case is normally made only after the development of a considerable number of topics in real analysis. 3 When the scalars are complex, many books emphasize the presence of complex scalars by referring to the inner product as a “Hermitian inner product.” This book does not need to distinguish the complex case very often and therefore will not use the modiﬁer “Hermitian” with the term “inner product.” 4 Some authors, particularly in connection with mathematical physics, reverse the roles of the two variables, deﬁning inner products to be conjugate linear in the ﬁrst variable and linear in the second variable.

90

&1 0

III. Inner-Product Spaces

(3) V equal to the vector space of all complex-valued polynomials with ( f, g) = f (x)g(x) d x.

√ Let V be an inner-product space. If v is in V , deﬁne v = (v, v), calling · the norm associated with the inner product. The norm of v is understood to be the nonnegative square root of the nonnegative real number (v, v) and is well deﬁned as a consequence of (iv). In the case of Rn , x is the Euclidean distance '

x12 + · · · + xn2 from the origin to the column vector x = (x1 , . . . , xn ). In this interpretation the dot product of two nonzero vectors in Rn is shown in analytic geometry to be given by x · y = xy cos θ, where θ is the angle between the vectors x and y. Direct expansion of norms squared of sums of vectors using bilinearity or sesquilinearity leads to certain formulas of particular interest. The formula that we shall use most frequently is u + v2 = u2 + 2 Re(u, v) + v2 , which generalizes from R2 a version of the law of cosines in trigonometry relating the lengths of the three sides of a triangle when one of the angles is known. With the additional hypothesis that (u, v) = 0, this formula generalizes from R2 the Pythagorean Theorem u + v2 = u2 + v2 . Another such formula is the parallelogram law u + v2 + u − v2 = 2u2 + 2v2

for all u and v in V,

which is proved by computing u + v2 and u − v2 by the law of cosines and adding the results. The name “parallelogram law” is explained by the geometric interpretation in the case of the dot product for R2 and is illustrated in Figure 3.1. That ﬁgure uses the familiar interpretation of vectors in R2 as arrows, two arrows being identiﬁed if they are translates of one another; thus the arrow from v to u represents the vector u − v. The parallelogram law is closely related to a formula for recovering the inner product from the norm, namely (u, v) =

1 k i u + i k v2 , 4 k

where the sum extends for k ∈ {0, 2} if the scalars are real and extends for k ∈ {0, 1, 2, 3} if the scalars are complex. This formula goes under the name

1. Inner Products and Orthonormal Sets

91

polarization. To prove it, weexpand u + i k v2 = u2 + 2 Re(u, i k v) + v2 = u2 + 2 Re (−i)k (u, v) + v2 . Multiplying by i k and summing on k k k k 2 k shows that k i u + i v = 2 k i Re (−i) (u, v) . If k is even, then z) = Re z for any complex z, while if k is odd, then i k Re((−i)k z) = i k Re((−i)k k k i Im z. So 2 k i Re((−i) z) = 4z, and k i k u +i k v2 = 4(u, v), as asserted. u+v

v u−v u 0

FIGURE 3.1. Geometric interpretation of the parallelogram law: the sum of the squared lengths of the four sides of a parallelogram equals the sum of the squared lengths of the diagonals. Proposition 3.1 (Schwarz inequality). |(u, v)| ≤ uv for all u and v in V .

In any inner-product space V ,

REMARK. The proof is written so as to use properties (i) through (iv) in the deﬁnition of inner product but not (v), a situation often encountered with integrals. PROOF. Possibly replacing u by eiθ u for some real θ, we may assume that (u, v) is real. In the case that v = 0, the law of cosines gives u − v−2 (u, v)v 2 = u2 − 2v−2 |(u, v)|2 + v−4 |(u, v)|2 v2 . The left side is ≥ 0, and the right side simpliﬁes to u2 − v−2 |(u, v)|2 . Thus the inequality follows in this case. In the case that v = 0, it is enough to prove that (u, v) = 0 for all u. If c is a scalar, then we have u + cv2 = u2 + 2 Re c(u, v) + |c|2 v2 = u2 + 2 Re c(u, v) . The left side is ≥ 0 as c varies, but the right side is < 0 for a suitable choice of c unless (u, v) = 0. This completes the proof. Proposition 3.2. In any inner-product space V , the norm satisﬁes (a) v ≥ 0 for all v in V , with equality if and only if v = 0, (b) cv = |c|v for all v in V and all scalars c, (c) u + v ≤ u + v for all u and v in V .

92

III. Inner-Product Spaces

PROOF. Conclusion (a) is immediate from properties (iv) and (v) of an inner ¯ v) = |c|2 v2 . Finally product, and (b) follows since cv2 = (cv, cv) = cc(v, we use the law of cosines and the Schwarz inequality (Proposition 3.1) to write u +v2 = u2 +2 Re(u, v)+v2 ≤ u2 +2uv+v2 = (u+v)2 . Taking the square root of both sides yields (c). Two vectors u and v in V are said to be orthogonal if (u, v) = 0, and one sometimes writes u ⊥ v in this case. The notation is a reminder of the interpretation in the case of dot product—that dot product 0 means that the cosine of the angle between the two vectors is 0 and the vectors are therefore perpendicular. An orthogonal set in V is a set of vectors such that each pair is orthogonal. The nonzero members of an orthogonal set are linearly independent. In fact, if {v1 , . . . , vk } is an orthogonal set of nonzero vectors and some linear combination has c1 v1 + · · · + ck vk = 0, then the inner product of this relation with v j gives 0 = (c1 v1 + · · · + ck vk , v j ) = c j v j 2 , and we see that c j = 0 for each j. A unit vector in V is a vector u with u = 1. If v is any nonzero vector, then v/v is a unit vector. An orthonormal set in V is an orthogonal set of unit vectors. Under the assumption that V is ﬁnite-dimensional, an orthonormal basis of V is an orthonormal set that is a vector-space basis.5 EXAMPLES. (1) In Rn or Cn , the standard basis {e1 , . . . , en } is an orthonormal set. (2) Let V be the complex inner-product space of all complex ﬁnite linear combinations, for n from −N to +N , of the functions &x → einx on the closed π 1 interval [−π, π], the inner product being ( f, g) = 2π −π f (x)g(x) d x. With respect to this inner product, the functions einx form an orthonormal set. A simple but important exercise in an inner-product space is to resolve a vector into the sum of a multiple of a given unit vector and a vector orthogonal to the given unit vector. This exercise is solved as follows: If v is given and u is a unit vector, then v decomposes as v = (v, u)u + v − (v, u)u . Here (v, u)u is a multiple of u, and the two components are orthogonal since u, v − (v, u)u = (u, v) − (v, u)(u, u) = (u, v) − (u, v) = 0. This decomposition is unique since if v = v1 + v2 with v1 = cu and (v2 , u) = 0, then the inner product of v = v1 + v2 with u yields (v, u) = (cu, u) + (v2 , u) = c. Hence 5 In the inﬁnite-dimensional theory the term “orthonormal basis” is used for an orthonormal set that spans V when limits of ﬁnite sums are allowed, in addition to ﬁnite sums themselves; when V is inﬁnite-dimensional, an orthonormal basis is never large enough to be a vector-space basis.

1. Inner Products and Orthonormal Sets

93

c must be (v, u), v1 must be (v, u)u, and v2 must be v − (v, u)u. Figure 3.2 illustrates the decomposition, and Proposition 3.3 generalizes it by replacing the multiples of a single unit vector by the span of a ﬁnite orthonormal set. v v − (v, u)u (v, u)u u 0 FIGURE 3.2. Resolution of v into a component (v, u)u parallel to a unit vector u and a component orthogonal to u. Proposition 3.3. Let V be an inner-product space. If {u 1 , . . . , u k } is an orthonormal set in V and if v is given in V , then there exists a unique decomposition v = c1 u 1 + · · · + ck u k + v ⊥ with v ⊥ orthogonal to u j for 1 ≤ j ≤ k. In this decomposition c j = (v, u j ). REMARK. The proof illustrates a technique that arises often in mathematics. We seek to prove an existence–uniqueness theorem, and we begin by making calculations toward uniqueness that narrow down the possibilities. We are led to some formulas or conditions, and we use these to deﬁne the object in question and thereby prove existence. Although it may not be so clear except in retrospect, this was the technique that lay behind proving the equivalence of various conditions for the invertibility of a square matrix in Section I.6. The technique occurred again in deﬁning and working with determinants in Section II.7. PROOF OF UNIQUENESS. Taking the inner product of both sides with u j , we obtain (v, u j ) = (c1 u 1 + · · · + ck u k + v ⊥ , u j ) = c j for each j. Then c j = (v, u j ) is forced, and v ⊥ must be given by v − (v, u 1 )u 1 − · · · − (v, u k )u k . PROOF OF EXISTENCE. Putting c j = (v, u j ), we need check only that the difference v −(v, u 1 )u 1 −· · ·−(v, u k )u k is orthogonal to each u j with 1 ≤ j ≤ k. Direct calculation gives v − i (v, u i )u i , u j = (v, u j ) − i ((v, u i )u i , u j ) = (v, u j ) − (v, u j ) = 0, and the proof is complete.

Corollary 3.4 (Bessel’s inequality). Let V be an inner-product space. If {u 1 , . . . , u k } is an orthonormal set in V and if v is given in V , then kj=1 |(v, u j )|2 ≤ v2 with equality if and only if v is in span{u 1 , . . . , u k }.

III. Inner-Product Spaces

94

PROOF. Using Proposition 3.3, write v = orthogonal to u 1 , . . . , u k . Then k

k j=1

(v, u j )u j + v ⊥ with v ⊥

(v, u j )u j + v ⊥ ⊥ = i, j (v, u i )(v, u j )(u i , u j ) + i (v, u i )u i , v + v ⊥ , j (v, u j )u j + v ⊥ 2 = i, j (v, u i )(v, u j )δi j + 0 + 0 + v ⊥ 2 = kj=1 |(v, u j )|2 + v ⊥ 2 .

v2 =

i=1

(v, u i )u i + v ⊥ ,

k

j=1

From Proposition 3.3 we know that v is in span{u 1 , . . . , u k } if and only if v ⊥ = 0, and the corollary follows. We shall now impose the condition of ﬁnite dimensionality in order to obtain suitable kinds of orthonormal sets. The argument will enable us to give a basisfree interpretation of Proposition 3.3 and Corollary 3.4, and we shall obtain equivalent conditions for the vector v ⊥ in Proposition 3.3 and Corollary 3.4 to be 0 for every v. If an ordered set of k linearly independent vectors in the inner-product space V is given, the above proposition suggests a way of adjusting the set so that it becomes orthonormal. Let us write the formulas here and carry out the veriﬁcation via Proposition 3.3 in the proof of Proposition 3.5 below. The method of adjusting the set so as to make it orthonormal is called the Gram–Schmidt orthogonalization process. The given linearly independent set is denoted by {v1 , . . . , vk }, and we deﬁne v1 , v1 u 2 = v2 − (v2 , u 1 )u 1 , u1 =

u2 =

u 2 , u 2

u 3 = v3 − (v3 , u 1 )u 1 − (v3 , u 2 )u 2 , u3 = .. .

u 3 , u 3

u k = vk − (vk , u 1 ) − · · · − (vk , u k−1 )u k−1 , uk =

u k . u k

1. Inner Products and Orthonormal Sets

95

Proposition 3.5. If {v1 , . . . , vk } is a linearly independent set in an innerproduct space V , then the Gram–Schmidt orthogonalization process replaces {v1 , . . . , vk } by an orthonormal set {u 1 , . . . , u k } such that span{v1 , . . . , v j } = span{u 1 , . . . , u j } for all j. PROOF. We argue by induction on j. The base case is j = 1, and the result is evident in this case. Assume inductively that u 1 , . . . , u j−1 are well deﬁned and orthonormal and that span{v1 , . . . , v j−1 } = span{u 1 , . . . , u j−1 }. Proposition 3.3 shows that u j is orthogonal to u 1 , . . . , u j−1 . If u j = 0, then v j has to be in span{u 1 , . . . , u j−1 } = span{v1 , . . . , v j−1 }, and we have a contradiction to the assumed linear independence of {v1 , . . . , vk }. Thus u j = 0, and {u 1 , . . . , u j } is a well-deﬁned orthonormal set. This set must be linearly independent, and hence its linear span is a j-dimensional vector subspace of the linear span of {v1 , . . . , v j }. By Corollary 2.4, the two linear spans coincide. This completes the induction and the proof. Corollary 3.6. If V is a ﬁnite-dimensional inner-product space, then any orthonormal set in a vector subspace S of V can be extended to an orthonormal basis of S. PROOF. Extend the given orthonormal set to a basis of S by Corollary 2.3b. Then apply the Gram–Schmidt orthogonalization process. The given vectors do not get changed by the process, as we see from the formulas for the vectors u j and u j , and hence the result is an extension of the given orthonormal set to an orthonormal basis. Corollary 3.7. If S is a vector subspace of a ﬁnite-dimensional inner-product space V , then S has an orthonormal basis. PROOF. This is the special case of Corollary 3.6 in which the given orthonormal set is empty. The set of all vectors orthogonal to a subset M of the inner-product space V is denoted by M ⊥ . In symbols, M ⊥ = {u ∈ V | (u, v) = 0 for all v ∈ M}. We see by inspection that M ⊥ is a vector subspace. Moreover, M ∩ M ⊥ = 0 since any u in M ∩ M ⊥ must have (u, u) = 0. The interest in the vector subspace M ⊥ comes from the following proposition. Theorem 3.8 (Projection Theorem). If S is a vector subspace of the ﬁnitedimensional inner-product space V , then every v in V decomposes uniquely as v = v1 + v2 with v1 in S and v2 in S ⊥ . In other words, V = S ⊕ S ⊥ .

96

III. Inner-Product Spaces

REMARKS. Because of this proposition, S ⊥ is often called the orthogonal complement of the vector subspace S. PROOF. Uniqueness follows from the fact that S ∩ S ⊥ = 0. For existence, use of Corollaries 3.7 and 3.6 produces an orthonormal basis {u 1 , . . . , u r } of S and extends it to an orthonormal basis {u 1 , . . . , u n } of V . The vectors u j for j > r are orthogonal to each u i with i ≤ r and hence arein S ⊥ . If v is given nin S, we n r can write v = j=1 u j as v = v1 + v2 with v1 = i=1 u i and v2 = j=r +1 u j , and this decomposition for all v shows that V = S + S ⊥ . Corollary 3.9. If S is a vector subspace of the ﬁnite-dimensional inner-product space V , then (a) dim V = dim S + dim S ⊥ , (b) S ⊥⊥ = S. PROOF. Conclusion (a) is immediate from the direct-sum decomposition V = S ⊕ S ⊥ of Theorem 3.8. For (b), the deﬁnition of orthogonal complement gives S ⊆ S ⊥⊥ . On the other hand, application of (a) twice shows that S and S ⊥⊥ have the same ﬁnite dimension. By Corollary 2.4, S ⊥⊥ = S. Section II.6 introduced “projection” mappings in the setting of any direct sum of two vector spaces, and we shall use those mappings in connection with the decomposition V = S ⊕ S ⊥ of Theorem 3.8. We make one adjustment in working with the projections, changing their ranges from the image, namely S or S ⊥ , to the larger space V . In effect, a linear map p1 or p2 as in Section II.6 will be replaced by i 1 p1 or i 2 p2 . Speciﬁcally let E : V → V be the linear map that is the identity on S and is 0 on S ⊥ . Then E is called the orthogonal projection of V on S. The linear map I − E is the identity on S ⊥ and is 0 on S. Since S = S ⊥⊥ , I − E is the orthogonal projection of V on S ⊥ . It is the linear map that picks out the S ⊥ component relative to the direct-sum decomposition V = S ⊥ ⊕ S ⊥⊥ . Proposition 3.3 and Corollary 3.4 can be restated in terms of orthogonal projections. Corollary 3.10. Let V be a ﬁnite-dimensional inner-product space, let S be a vector subspace of V , let {u 1 , . . . , u k } be an orthonormal basis of S, and let E be the orthogonal projection of V on S. If v is in V , then E(v) =

k

(v, u j )u j

j=1

and

E(v)2 =

k j=1

|(v, u j )|2 .

1. Inner Products and Orthonormal Sets

97

The vector v ⊥ in the expansion v = kj=1 (v, u j )u j + v ⊥ of Proposition 3.3 is equal to (I − E)v, and the equality of norms v2 =

k

|(v, u j )|2 + v ⊥ 2

j=1

has the interpretations that v2 = E(v)2 + (I − E)v2 and that equality holds in Bessel’s inequality if and only if E(v) = v. PROOF. Write v = kj=1 (v, u j )u j + v ⊥ as in Proposition 3.3. Then each u j is in S, and the vector v ⊥ , being orthogonal to each member of a basis of S, is in S ⊥ . This proves the formula for E(v), and the formula for E(v)2 follows by applying Corollary 3.4 to v − v ⊥ . Reassembling v, we now have v = E(v) + v ⊥ , and hence v ⊥ = v − E(v) = (I − E)v. Finally the decomposition v = E(v) + (I − E)(v) is into orthogonal terms, and the Pythagorean Theorem shows that v2 = E(v)2 + (I − E)v2 . Theorem 3.11 (Parseval’s equality). If V is a ﬁnite-dimensional inner-product space, then the following conditions on an orthonormal set {u 1 , . . . , u m } are equivalent: (a) {u 1 , . . . , u m } is a vector-space basis of V , hence an orthonormal basis, (b) the only m vector orthogonal to all of u 1 , . . . , u m is 0, (c) v = j=1 (v, u j )u j for all v in V , (d) v2 = mj=1 |(v, u j )|2 for all v in V , (e) (v, w) = mj=1 (v, u j )(w, u j ) for all v and w in V . PROOF. Let S = span{u 1 , . . . , u m }, and let E be the orthogonal projection of V on S. If (a) holds, then S = V and S ⊥ = 0. Thus (b) holds. If (b) holds, then S ⊥ = 0 and E is the identity. Thus (c) holds by Corollary 3.10. If (c) holds, then Corollary 3.4 shows that (d) holds. If (d) holds, we use polarization to prove (e). Let k be in {0, 2} if F = R, or in {0, 1, 2, 3} if F = C. Conclusion (d) gives us m m |(v + i k w, u j )|2 = v2 + 2 Re (v, u j )i k (w, u j ) + w2 . v + i k w2 = j=1

j=1

Multiplying by i k and summing over k, we obtain m 4(v, w) = 2 i k Re (−i)k (v, u j )(w, u j ) . j=1

k

III. Inner-Product Spaces

98

In the proof of polarization, we saw that 2 k i k Re((−i)k z) = 4z. Hence 4(v, w) = 4 mj=1 (v, u j )(w, u j ). This proves (e). If (e) holds, we take w = v in (e) and apply Corollary 3.10 to see that E(v)2 = v2 for all v. Then (I − E)v2 = 0 for all v, and E(v) = v for all v. Hence S = V , and {u 1 , . . . , u m } is a basis. This proves (a). Theorem 3.12 (Riesz Representation Theorem). If is a linear functional on the ﬁnite-dimensional inner-product space V , then there exists a unique v in V with (u) = (u, v) for all u in V . PROOF. Uniqueness is immediate by subtracting two such expressions, since if (u, v) = 0 for all u, then the special case u = v gives (v, v) = 0 and v = 0. Let us prove existence. If = 0, take v = 0. Otherwise let S = ker . Corollary 2.15 shows that dim S = dim V − 1, and Corollary 3.9a then shows that dim S ⊥ = 1. Let w be a nonzero vector in S ⊥ . This vector w must have (w) = 0 since S ∩ S ⊥ = 0, and we let v be the member of S ⊥ given by

(w) w. w2

(u) w = 0, and hence u − For any u in V , we have u − (w) v=

⊥

v is in S , u − (u, v) =

(u)

(w)

w is orthogonal to v. Thus

(u)

(w)

w is in S. Since

(u)

(w) w2

(w) w, v = w, w =

(u) = (u).

(w)

(w) w2

(w) w2

(u)

This proves existence.

2. Adjoints Throughout this section, V will denote a ﬁnite-dimensional inner-product space with inner product ( · , · ) and with scalars from F, with F equal to R or C. We shall study aspects of linear maps L : V → V related to the inner product on V . The starting point is to associate to any such L another linear map L ∗ : V → V known as the “adjoint” of V , and then to investigate some of its properties. A tool in this investigation will be the scalar-valued function on V × V given by (u, v) → (L(u), v), which captures the information in any matrix of L without requiring the choice of an ordered basis. This function determines L uniquely because an equality (L(u), v) = (L (u), v) for all u and v implies (L(u) − L (u), v) = 0 for all u and v, in particular for v = L(u) − L (u); thus L(u) − L (u)2 = 0 and L(u) = L (u) for all u.

2. Adjoints

99

Proposition 3.13. Let L : V → V be a linear map on the ﬁnite-dimensional inner-product space V . For each u in V , there exists a unique vector L ∗ (u) in V such that for all v in V . (L(v), u) = (v, L ∗ (u)) As u varies, this formula deﬁnes L ∗ as a linear map from V to V . REMARK. The linear map L ∗ : V → V is called the adjoint of L. PROOF. The function v → (L(v), u) is a linear functional on V , and Theorem 3.12 shows that it is given by the inner product with a unique vector of V . Thus we deﬁne L ∗ (u) to be the unique vector of V with (L(v), u) = (v, L ∗ (u)) for all v in V . If c is a scalar, then the uniqueness and the computation (v, L ∗ (cu)) = (L(v), cu) = c(L(v), ¯ u) = c(v, ¯ L ∗ (u)) = (v, cL ∗ (u)) yield L ∗ (cu) = cL ∗ (u). Similarly the uniqueness and the computation (v, L ∗ (u 1 + u 2 )) = (L(v), u 1 + u 2 ) = (L(v), u 1 ) + (L(v), u 2 ) = (v, L ∗ (u 1 )) + (v, L ∗ (u 2 )) = (v, L ∗ (u 1 ) + L ∗ (u 2 )) yield L ∗ (u 1 + u 2 ) = L ∗ (u 1 ) + L ∗ (u 2 ). Therefore L ∗ is linear.

The passage L → L ∗ to the adjoint is a function from HomF (V, V ) to itself that is conjugate linear, and it reverses the order of multiplication: (L 1 L 2 )∗ = L ∗2 L ∗1 . Since the formula (L(v), u) = (v, L ∗ (u)) in the proposition is equivalent to the formula (u, L(v)) = (L ∗ (u), v), we see that L ∗∗ = L. All of the results in Section II.3 concerning the association of matrices to linear maps are applicable here, but our interest now will be in what happens when the bases we use are orthonormal. Recall from Section II.3 that if = (u 1 , . . . , u n) L and = (v1 , . . . , vn ) are any ordered bases of V , then the matrix A =

L(u j ) associated to the linear map L : V → V has Ai j = . i Lemma 3.14. If L : V → V is a linear map on the ﬁnite-dimensional innerproduct space V and if = (u 1 , . . . , u n ) and

= (v 1 , . . . , vn ) are ordered L orthonormal bases of V , then the the matrix A = has Ai j = (L(u j ), vi ). PROOF. Applying Theorem 3.11c, we have

L(u j ) (L(u j ), vi )vi i = Ai j = i i

v = (L(u j ), vi ) i = (L(u j ), vi )δii = (L(u j ), vi ). i i

i

100

III. Inner-Product Spaces

Proposition 3.15. If L : V → V is a linear map on the ﬁnite-dimensional ) are ordered inner-product space V and if = (u 1 , . . . , u n ) and

= (v1 , . . . , vn

L∗ L ∗ of L orthonormal bases of V , then the matrices A = and A = and its adjoint are related by Ai∗j = A ji . PROOF. Lemma 3.14 and the deﬁnition of L ∗ give Ai∗j = (L ∗ (v j ), u i ) = (v j , L(u i )) = (L(u i ), v j ) = A ji . Accordingly, we deﬁne A∗ = A t for any square matrix A, sometimes calling A the adjoint6 of A. A linear map L : V → V is called self-adjoint if L ∗ = L. Correspondingly a square matrix A is self-adjoint if A∗ = A. It is more common, however, to say that a matrix with A∗ = A is symmetric if F = R or Hermitian7 if F = C. A real Hermitian matrix is symmetric, and the term “Hermitian” is thus applicable also when F = R. Any Hermitian matrix A arises from a self-adjoint linear map L. Namely, we take V to be Fn with the usual inner product, and we let and each be the standard ordered basis = (e1 , . . . , en ). This basis is orthonormal, and we deﬁne L by the matrix product L(v) = Av for any column vector v. We know that

L = A. Since A∗ = A, we conclude from Proposition 3.15 that L ∗ = L. Thus we are free to deduce properties of Hermitian matrices from properties of self-adjoint linear maps. Self-adjoint linear maps will be of special interest to us. Nontrivial examples of self-adjoint linear maps, constructed without simply writing down Hermitian matrices, may be produced by the following proposition. ∗

Proposition 3.16. If V is a ﬁnite-dimensional inner-product space and S is a vector subspace of V , then the orthogonal projection E : V → V of V on S is self-adjoint. PROOF. Let v = v1 +v2 and u = u 1 +u 2 be the decompositions of two members of V according to V = S ⊕ S ⊥ . Then we have (v, E ∗ (u)) = (E(v), u) = (v1 , u 1 + u 2 ) = (v1 , u 1 ) = (v, u 1 ) = (v, E(u)), and the proposition follows by the uniqueness in Proposition 3.13. 6 The name “adjoint” happens to coincide with the name for a different notion that arose in connection with Cramer’s rule in Section II.7. The two notions never seem to arise at the same time, and thus no confusion need occur. 7 The term “Hermitian” is used also for a class of linear maps in the inﬁnite-dimensional case, but care is needed because the terms “Hermitian” and “self-adjoint” mean different things in the inﬁnite-dimensional case.

2. Adjoints

101

To understand Proposition 3.16 in terms of matrices, take an ordered orthonormal basis (u 1 , . . . , u r ) of S, and extend it to an ordered orthonormal basis = (u 1 , . . . , u n ) of V . Then uj for j ≤ r, E(u j ) = 0 for j > r,

E(u j ) and hence equals the j th standard basis vector e j if j ≤ r and equals 0 if

E j > r . Consequently the matrix is diagonal with 1’s in the ﬁrst r diagonal entries and 0’s elsewhere. This matrix is equal to its conjugate transpose, as it must be according to Propositions 3.15 and 3.16. Proposition 3.17. If V is a ﬁnite-dimensional inner-product space and L : V → V is a self-adjoint linear map, then (L(v), v) is in R for every v in V , and consequently every eigenvalue of L is in R. Conversely if F = C and if L : V → V is a linear map such that (L(v), v) is in R for every v in V , then L is self-adjoint. ◦ REMARK. The hypothesis F = C is essential in the converse. In fact, the 90 01 rotation L of R2 whose matrix in the standard basis is −1 0 is not self-adjoint

but does have L(v) · v = 0 for every v in R2 .

PROOF. If L = L ∗ , then (L(v), v) = (v, L ∗ (v)) = (v, L(v)) = (L(v), v), and hence (L(v), v) is real-valued. If v is an eigenvector with eigenvalue λ, then substitution of L(v) = λv into (L(v), v) = (L(v), v) gives λv2 = λ¯ v2 . Since v = 0, λ must be real. For the converse we begin with the special case that (L(w), w) = 0 for all w. For 0 ≤ k ≤ 3, we then have (−i)k (L(u), v)+i k (L(v), u) = (L(u+i k v), u+i k v)−(L(u), u)−(L(v), v) = 0. Taking k = 0 gives (L(u), v) + (L(v), u) = 0, while taking k = 1 gives (L(u), v) − (L(v), u) = 0. Hence (L(u), v) = 0 for all u and v. Since the function (u, v) → L(u, v) determines L, we obtain L = 0. In the general case, (L(v), v) real-valued implies that (L(v), v) = (L ∗ (v), v) for all v. Therefore ((L − L ∗ )(v), v) = 0 for all v, and the special case shows that L − L ∗ = 0. This completes the proof. We conclude this section by examining one further class of linear maps having a special relationship with their adjoints.

III. Inner-Product Spaces

102

Proposition 3.18. If V is a ﬁnite-dimensional inner-product space, then the following conditions on a linear map L : V → V are equivalent: (a) L ∗ L = I , (b) L carries some orthonormal basis of V to an orthonormal basis, (c) L carries each orthonormal basis of V to an orthonormal basis, (d) (L(u), L(v)) = (u, v) for all u and v in V , (e) L(v) = v for all v in V . REMARK. A linear map satisfying these equivalent conditions is said to be orthogonal if F = R and unitary if F = C. PROOF. We prove that (a), (d), and (e) are equivalent and that (b), (c), and (d) are equivalent. If (a) holds and u and v are given in V , then (L(u), L(v)) = (L ∗ L(u), v) = (I (u), v) = (u, v), and (d) holds. If (d) holds, then setting u = v shows that (e) holds. If (e) holds, we use polarization twice to write (L(u), L(v)) = =

1 k k 2 k 4 i L(u) + i L(v) = 1 k k 2 k 4 i u + i v = (u, v).

1 k k 4 i L(u

+ i k v)2

Then ((L ∗ L − I )(u), v) = 0 for all u and v, and we conclude that (a) holds. Since (b) is a special case of (c) and (c) is a special case of (d), proving that (b) implies (d) will prove that (b), (c), and (d) are equivalent. Thus let {u 1 , . . . , u n } be an orthonormal basis of V such that {L(u 1 ), . . . , L(u n )} is an orthonormal basis, and let u and v be given. Then (L(u), L(v)) = L i (u, u i )u i , L j (v, u j )u j = i, j (u, u i )(v, u j )(L(u i ), L(u j )) = i, j (u, u i )(v, u j )δi j = i (u, u i )(v, u i ) = (u, v), the last equality following from Parseval’s equality (Theorem 3.11).

As with self-adjointness, we use the geometrically meaningful deﬁnition for linear maps to obtain a deﬁnition for matrices: a square matrix A with A∗ A = I is said to be orthogonal if F = R and unitary if F = C. The condition is that A is invertible and its equals its adjoint. In terms of individual entries, inverse ∗ Ak j = δi j , hence that k Aki Ak j = δi j . This is the the condition is that k Aik condition that the columns of A form an orthonormal basis relative to the usual inner product on Rn or Cn . A real unitary matrix is orthogonal. If A is an orthogonal or unitary matrix, we can construct a corresponding orthogonal or unitary linear map on Rn or Cn relative to the standard ordered

2. Adjoints

103

basis . Namely, we deﬁne L(v) = Av, and Proposition 3.15 shows that L is orthogonal or unitary: L ∗ L(v) = A∗ Av = I v = v. Proposition 3.19 below gives a converse. Let us notice that an orthogonal or unitary matrix A necessarily has | det A| = 1. In fact, the formula A∗ = (A)t implies that det A∗ = det A. Then 1 = det I = det A∗ A = det A∗ det A = det A det A = | det A|2 . An orthogonal matrix thus has determinant ±1, while we conclude for a unitary matrix only that the determinant is a complex number of absolute value 1. EXAMPLES. (1)The 2-by-2orthogonal matrices of determinant +1 are all matrices of the cos θ sin θ form − sin θ cos θ . The 2-by-2 orthogonal matrices of determinant −1 are the 1 0 product of 0 −1 and the 2-by-2 orthogonal matrices of determinant +1. 2-by-2 unitary matrices of determinant +1 are all matrices of the form (2) The α β with |α|2 +|β|2 = 1; these may be regarded as parametrizing the points of −β¯ α¯ the unit sphere S 3 of R4 . The unitary matrices of arbitrary determinant are 2-by-2 1 0 the products of all matrices 0 eiθ and the 2-by-2 unitary matrices of determinant +1. Proposition 3.19. If V is a ﬁnite-dimensional inner-product space, if = (u 1 , . . . , u n ) and = (v1 , . . . , vn ) are ordered orthonormal bases of V , and if L : V → V is a linear

map that is orthogonal if F = R and unitary if F = C, L then the matrix A = is orthogonal or unitary.

∗

L L ∗ = PROOF. Proposition 3.15 and Theorem 2.16 give A A =

I , and the right side is the identity matrix, as required.

I One consequence of Proposition 3.19 is that any matrix relative to two ordered orthonormal bases is orthogonal or unitary, since the identity function I : V → V is certainly orthogonal or unitary. Thus a change from writing the matrix of a linear map L in one ordered orthonormal basis to writing the matrix of L in orthonormal basis is implemented by the formula

another ordered L I L −1 C, where C is the orthogonal or unitary matrix . =C

104

III. Inner-Product Spaces

L Another consequence of Proposition 3.19 is that the matrix of an orthogonal or unitary linear map L in an ordered orthonormal basis is an orthogonal or unitary matrix. We have deﬁned det L to be the determinant of

L relative to any , and we conclude that | det L| = 1. 3. Spectral Theorem In this section we deal with the geometric structure of certain kinds of linear maps from ﬁnite-dimensional inner-product spaces into themselves. We shall see that linear maps that are self-adjoint or unitary, among other possible conditions, have bases of eigenvectors in the sense of Section II.8. Moreover, such a basis may be taken to be orthonormal. When an ordered basis of eigenvectors is used for expressing the linear map as a matrix, the result is that the matrix is diagonal. Thus these linear maps have an especially uncomplicated structure. In terms of matrices, the result is that a Hermitian or unitary matrix A is similar to a diagonal matrix D, and the matrix C with D = C −1 AC may be taken to be unitary. We begin with a lemma. Lemma 3.20. If L : V → V is a self-adjoint linear map on an inner-product space V , then v → (L(v), v) is real-valued, every eigenvalue of L is real, eigenvalues under L for distinct eigenvalues are orthogonal, and every vector subspace S of V with L(S) ⊆ S has L(S ⊥ ) ⊆ S ⊥ . PROOF. The ﬁrst two conclusions are contained in Proposition 3.17. If v1 and v2 are eigenvectors of L with distinct real eigenvalues λ1 and λ2 , then (λ1 − λ2 )(v1 , v2 ) = (λ1 v1 , v2 ) − (v1 , λ2 v2 ) = (L(v1 ), v2 ) − (v1 , L(v2 )) = 0. Since λ1 = λ2 , we must have (v1 , v2 ) = 0. If S is a vector subspace with L(S) ⊆ S, then also L(S ⊥ ) ⊆ S ⊥ because s ∈ S and s ⊥ ∈ S ⊥ together imply 0 = (L(s), s ⊥ ) = (s, L(s ⊥ )).

Theorem 3.21 (Spectral Theorem). Let L : V → V be a self-adjoint linear map on an inner-product space V . Then V has an orthonormal basis of eigenvectors of L. In addition, for each scalar λ, let Vλ = {v ∈ V | L(v) = λv}, so that Vλ when nonzero is the eigenspace of L for the eigenvalue λ. Then the eigenvalues of L are all real, the vector subspaces Vλ are mutually orthogonal,

3. Spectral Theorem

105

and any orthonormal basis of V of eigenvectors of L is the union of orthonormal bases of the Vλ ’s. Correspondingly if A is any Hermitian n-by-n matrix, then there exists a unitary matrix C such that C −1 AC is diagonal with real entries. If the matrix A has real entries, then C may be taken to be an orthogonal matrix. PROOF. Lemma 3.20 shows that the eigenvalues of L are all real and that the vector subspaces Vλ are mutually orthogonal. To proceed further, we ﬁrst assume that F = C. Applying the Fundamental Theorem of Algebra (Theorem 1.18) to the characteristic polynomial of L, we see that L has at least one eigenvalue, say λ1 . Then L(Vλ1 ) ⊆ Vλ1 , and Lemma 3.20 shows that L((Vλ1 )⊥ ) ⊆ (Vλ1 )⊥ . The vector subspace (Vλ1 )⊥ is an inner-product space, and the claim is that L (Vλ )⊥ is self-adjoint. In fact, if v1 and v2 are in (Vλ1 )⊥ , then (L (Vλ

)⊥ 1

1

)∗ (v1 ), v2 = v1 , L (Vλ

(v2 ) = (v1 , L(v2 )) = (L(v1 ), v2 ) = L (Vλ )⊥ (v1 ), v2 , )⊥ 1

1

and the claim is proved. Since λ1 is an eigenvalue of L, dim(Vλ1 )⊥ < dim V . Therefore we can now set up an induction that ultimately exhibits V as an orthogonal direct sum V = Vλ1 ⊕ · · · ⊕ Vλk . If v is an eigenvector of L with eigenvalue λ , then either λ = λ j for some j in this decomposition, in which case v is in Vλj , or λ is not equal to any λ j , in which case v, by the lemma, is orthogonal to all vectors in Vλ1 ⊕ · · · ⊕ Vλk , hence to all vectors in V ; being orthogonal to all vectors in V , v must be 0. Choosing an orthonormal basis for each Vλj and taking their union provides an orthonormal basis of eigenvectors and completes the proof for L when F = C. Next assume that A is a Hermitian n-by-n matrix. We deﬁne a linear map L : Cn → Cn by L(v) = Av, and we know from Proposition 3.15 that L is selfadjoint. The case just proved shows that L has an ordered orthonormal basis of eigenvectors, all the

eigenvalues being real. If denotes the standard ordered L is diagonal with real entries and is equal to basis of Cn , then D =

I

L

I

= C −1 AC,

L . The matrix C is unitary by Proposition 3.19, and the formula D = C −1 AC shows that A is as asserted. Now let us return to L and suppose that F = R. The idea is to use the same argument as above in the case that F = C, but we need a substitute for

where C =

106

III. Inner-Product Spaces

the use of the Fundamental Theorem of Algebra. Fixing any orthonormal basis of V , let A be the matrix of L. Then A is Hermitian with real entries. The previous paragraph shows that any Hermitian matrix, whether or not real, has a characteristic polynomial that splits as a product mj=1 (λ − r j )m j with all r j real. Consequently L has this property as well. Thus any self-adjoint L when F = R has an eigenvalue. Returning to the argument for L above when F = C, we readily see that it now applies when F = R. Finally if A is a Hermitian matrix with real entries, then we can deﬁne a selfadjoint linear map L : Rn → Rn by L(v) = Av, obtain an orthonormal basis of eigenvectors for L, and argue as above to obtain D = C −1 AC, where D is diagonal and C is unitary. The matrix C has columns that are eigenvectors in Rn of the associated L, and these have real entries. Thus C is orthogonal. An important application of the Spectral Theorem is to the formation of a square root for any “positive semideﬁnite” linear map. We say that a linear map L : V → V on a ﬁnite-dimensional inner-product space is positive semideﬁnite if L ∗ = L and (L(v), v) ≥ 0 for all v in V . If F = C, then the condition L ∗ = L is redundant, according to Proposition 3.17, but that fact will not be important for us. Similarly an n-by-n matrix A is positive semideﬁnite if A∗ = A and x¯ t Ax ≥ 0 for all column vectors x. An example of a positive semideﬁnite n-by-n matrix is any matrix A = B ∗ B, where B is an arbitrary k-by-n matrix. In fact, if x is in Fn , then x¯ t B ∗ Bx = (Bx)t (Bx), and the right side is ≥ 0, being a sum of absolute values squared. Corollary 3.22. Let L : V → V be a positive semideﬁnite linear map on a ﬁnite-dimensional inner-product space, and let A be an n-by-n Hermitian matrix. Then (a) L or A is positive semideﬁnite if and only if all of its eigenvalues are ≥ 0. (b) whenever L or A is positive semideﬁnite, L or A is invertible if and only if (L(v), v) > 0 for all v = 0 or x¯ t Ax > 0 for all x = 0. (c) whenever L or A is positive semideﬁnite, L or A has a unique positive semideﬁnite square root. REMARKS. A positive semideﬁnite linear map or matrix satisfying the condition in (b) is said to be positive deﬁnite, and the content of (b) is that a positive semideﬁnite linear map or matrix is positive deﬁnite if and only if it is invertible. PROOF. We apply the Spectral Theorem (Theorem 3.21). For each conclusion the result for a matrix A is a special case of the result for the linear map L, and it is enough to treat only L. In (a), let (u 1 , . . . , u n ) be an ordered basis of eigen-

3. Spectral Theorem

107

vectors with respective eigenvalues λ1 , . . . , λn , not necessarily distinct. Then (L(u j ), u j ) = λ j shows the necessity of having λ j ≥ 0, while the computation (L(v), v) = L i (v, u i )u i , j (v, u j )u j = i λi (v, u i )u i , j (v, u j )u j = i λi |(v, u i )|2 shows the sufﬁciency. In (b), if L fails to be invertible, then 0 is an eigenvalue for some eigenvector v = 0, and v has (L(v), v) = 0. Conversely if L is invertible, then all the eigenvalues λi are > 0 by (a), and the computation in (a) yields λi |(v, u i )|2 ≥ min λ j |(v, u i )|2 = min λ j v2 , (L(v), v) = j

i

j

i

the last step following from Parseval’s equality (Theorem 3.11). For existence in (c), the Spectral Theorem says that there exists an ordered orthonormal basis = (u 1 , . . . , u n ) of eigenvectors of L, say with respective eigenvalues λ1 , . . . , λn . The eigenvalues are all ≥ 0 by (a). The linear extension 1/2 of the function P with P(u j ) = λ j u j is given by P(v) =

n

1/2

λ j (v, u j )u j ,

j=1

and it has P 2 (v) =

j

λ j (v, u j )u j =

j

(v, u j )L(u j ) = L

j

(v, u j )u j = L(v).

Thus P 2 = L. Relative to , we have

P 1/2 = (P(u j ), u 1 )u 1 + · · · + (P(u j ), u n )u n i = (P(u j ), u i ) = λ j δi j , i j and this is a Hermitian matrix; Proposition 3.15 therefore shows that P ∗ = P. Finally 1/2 1/2 2 (P(v), v) = i λi (v, u i )u i , j (v, u j )u j = λi |(v, u i )| ≥ 0, and thus P is positive semideﬁnite. This proves existence. For uniqueness in (c), let P satisfy P ∗ = P and P 2 = L, and suppose P is positive semideﬁnite. Choose an orthonormal basis of eigenvectors u 1 , . . . , u n of P, say with eigenvalues c1 , . . . , cn , all ≥ 0. Then L(u j ) = P 2 (u j ) = c2j u j , and we see that u 1 , . . . , u n form an orthonormal basis of eigenvectors of L with eigenvalues c2j . On the space where L acts as the scalar λi , P must therefore act 1/2

as the scalar λi . We conclude that P is unique.

108

III. Inner-Product Spaces

The technique of proof of (c) allows one, more generally, to deﬁne f (L) for any function f : R → C whenever L is self-adjoint. Actually, the function f needs to be deﬁned only on the set of eigenvalues of L for the deﬁnition to make sense. At the end of this section, we shall use the existence of the square root in (c) to obtain the so-called “polar decomposition” of square matrices. But before doing that, let us mine three additional easy consequences of the Spectral Theorem. The ﬁrst deals with several self-adjoint linear maps rather than one, and the other two apply that conclusion to deal with single linear maps that are not necessarily self-adjoint. Corollary 3.23. Let V be a ﬁnite-dimensional inner-product space, and let L 1 , . . . , L m be self-adjoint linear maps from V to V that commute in the sense that L i L j = L j L i for all i and j. Then V has an orthonormal basis of simultaneous eigenvectors of L 1 , . . . , L m . In addition, for each m-tuple of scalars λ1 , . . . , λm , let Vλ1 ,...,λm = {v ∈ V | L j (v) = λ j v for 1 ≤ j ≤ m} consist of 0 and the simultaneous eigenvectors of L 1 , . . . , L m corresponding to λ1 , . . . , λm . Then all the eigenvalues λ j are real, the vector subspaces Vλ1 ,...,λm are mutually orthogonal, and any orthonormal basis of V of simultaneous eigenvectors of L 1 , . . . , L m is the union of orthonormal bases of the Vλ1 ,...,λm ’s. Correspondingly if A1 , . . . , Am are commuting Hermitian n-by-n matrices, then there exists a unitary matrix C such that C −1 A j C is diagonal with real entries for all j. If all the matrices A j have real entries, then C may be taken to be an orthogonal matrix. PROOF. This follows by iterating the Spectral Theorem (Theorem 3.21). In fact, let {Vλ1 } be the system of vector subspaces produced by the theorem for L 1 . For each j, the commutativity of the linear maps L i forces L 1 (L i (v)) = L i (L 1 (v)) = L i (λ1 v) = λ1 L i (v)

for v ∈ Vλ1 ,

and thus L i (Vλ1 ) ⊆ Vλ1 . The restrictions of L 1 , . . . , L m to Vλ1 are self-adjoint and commute. Let {Vλ1 ,λ2 } be the system of vector subspaces produced by the Spectral Theorem for L 2 Vλ . Each of these, by the commutativity, is carried 1 into itself by L 3 , . . . , L m , and the restrictions of L 3 , . . . , L m to Vλ1 ,λ2 form a commuting family of self-adjoint linear maps. Continuing in this way, we arrive at the decomposition asserted by the corollary for L 1 , . . . , L m . The assertion of the corollary about commuting Hermitian matrices is a special case, in the same way that the assertions in Theorem 3.21 about matrices were special cases of the assertions about linear maps.

3. Spectral Theorem

109

A linear map L : V → V , not necessarily self-adjoint, is said to be normal if L commutes with its adjoint: L L ∗ = L ∗ L. Corollary 3.24. Suppose that F = C, and let L : V → V be a normal linear map on the ﬁnite-dimensional inner-product space V . Then V has an orthonormal basis of eigenvectors of L. In addition, for each complex scalar λ, let Vλ = {v ∈ V | L(v) = λv}, so that Vλ when nonzero is the eigenspace of L for the eigenvalue λ. Then the vector subspaces Vλ are mutually orthogonal, and any orthonormal basis of V of eigenvectors of L is the union of orthonormal bases of the Vλ ’s. Correspondingly if A is any n-by-n complex matrix such that A A∗ = A∗ A, then there exists a unitary matrix C such that C −1 AC is diagonal. if F = R: for the linear map L : R2 → R2 REMARK. The corollaryfails 01 with L(v) = Av and A = −1 0 , L ∗ = L −1 commutes with L, but L has no

eigenvectors in R2 since the characteristic polynomial λ2 + 1 has no ﬁrst-degree factors with real coefﬁcients. PROOF. The point is that L = 12 (L + L ∗ ) +i 2i1 (L − L ∗ ) and that 12 (L + L ∗ ) and 2i1 (L − L ∗ ) are self-adjoint. If L commutes with L ∗ , then T1 = 12 (L + L ∗ ) and T2 = 2i1 (L − L ∗ ) commute with each other. We apply Corollary 3.23 to the commuting self-adjoint linear maps T1 and T2 . The vector subspace Vα,β produced by Corollary 3.23 coincides with the vector subspace Vα+iβ deﬁned in the present corollary, and the result for L follows. The result for matrices is a special case. Corollary 3.25. Suppose that F = C, and let L : V → V be a unitary linear map on the ﬁnite-dimensional inner-product space V . Then V has an orthonormal basis of eigenvectors of L. In addition, for each complex scalar λ, let Vλ = {v ∈ V | L(v) = λv}, so that Vλ when nonzero is the eigenspace of L for the eigenvalue λ. Then the eigenvalues of L all have absolute value 1, the vector subspaces Vλ are mutually orthogonal, and any orthonormal basis of V of eigenvectors of L is the union of orthonormal bases of the Vλ ’s. Correspondingly if A is any n-by-n unitary matrix, then there exists a unitary matrix C such that C −1 AC is diagonal; the diagonal entries of C −1 AC all have absolute value 1. PROOF. This is a special case of Corollary 3.24 since a unitary linear map L has L L ∗ = I = L ∗ L. The eigenvalues all have absolute value 1 as a consequence of Proposition 3.18e.

110

III. Inner-Product Spaces

Now we come to the polar decomposition of linear maps and of matrices. When F = C, this is a generalization of the polar decomposition z = eiθ r of complex numbers. When F = R, it generalizes the decomposition x = (sgn x)|x| of real numbers. Theorem 3.26 (polar decomposition). If L : V → V is a linear map on a ﬁnite-dimensional inner-product space, then L decomposes as L = U P, where P is positive semideﬁnite and U is orthogonal if F = R and unitary if F = C. The linear map P is unique, and U is unique if L is invertible. Correspondingly any n-by-n matrix A decomposes as A = U P, where P is a positive semideﬁnite matrix and U is an orthogonal matrix if F = R and a unitary matrix if F = C. The matrix P is unique, and U is unique if A is invertible. REMARKS. As we have already seen in other situations, the motivation for the proof comes from the uniqueness. PROOF OF UNIQUENESS. Let L = U P = U P . Then L ∗ L = P 2 = P 2 . The linear map L ∗ L is positive semideﬁnite since its adjoint is (L ∗ L)∗ = L ∗ L ∗∗ = L ∗ L and since (L ∗ L(v), v) = (L(v), L(v)) ≥ 0. Therefore Corollary 3.22c shows that L ∗ L has a unique positive semideﬁnite square root. Hence P = P . If L is invertible, then P is invertible and L = U P implies that U = L P −1 . The same argument applies in the case of matrices. PROOF OF EXISTENCE. If L is given, then we have just seen that L ∗ L is positive semideﬁnite. Let P be its unique positive semideﬁnite square root. The proof is clearer when L is invertible, and we consider that case ﬁrst. Then we can set U = L P −1 . Since U ∗ = (P −1 )∗ L ∗ = P −1 L ∗ , we ﬁnd that U ∗ U = P −1 L ∗ L P −1 = P −1 P 2 P −1 = I , and we conclude that U is unitary. When L is not necessarily invertible, we argue a little differently with the positive semideﬁnite square root P of L ∗ L. The kernel K of P is the 0 eigenspace of P, and the Spectral Theorem (Theorem 3.21) shows that the image of P is the sum of all the other eigenspaces and is just K ⊥ . Since K ∩ K ⊥ = 0, P is one-one from K ⊥ onto itself. Thus P(v) → L(v) is a one-one linear map from K ⊥ into V . Call this function U , so that U (P(v)) = L(v). For any v1 and v2 in V , we have (L(v1 ), L(v2 )) = (L ∗ L(v1 ), v2 ) = (P 2 (v1 ), v2 ) = (P(v1 ), P(v2 )),

(∗)

and hence U : K ⊥ → V preserves inner products. Let {u 1 , . . . , u k } be an orthonormal basis of K ⊥ , and let {u k+1 , . . . , u n } be an orthonormal basis of K . Since U preserves inner products and is linear, {U (u 1 ), . . . , U (u k )} is an orthonormal basis of U (K ⊥ ). Extend {U (u 1 ), . . . , U (u k )} to an orthonormal basis of V by adjoining vectors vk+1 , . . . , vn , deﬁne U (u j ) = v j for k + 1 ≤

4. Problems

111

j ≤ n, and write U also for the linear extension to all of V . Since U carries one orthonormal basis {u 1 , . . . , u n } of V to another, U is unitary. We have U P = L on K ⊥ , and equation (∗) with v1 = v2 shows that ker L = ker P = K . Therefore U P = L everywhere.

4. Problems 1.

Let V = Mnn (C), and deﬁne an inner product on V by A, B = Tr(B ∗ A). The norm · HS obtained from this inner product is called the Hilbert–Schmidt norm of the matrix in question. (a) Prove that A2HS = i, j |Ai j |2 for A in V . (b) Let E i j be the matrix that is 1 in the (i, j)th entry and is 0 elsewhere. Prove that the set of all E i j is an orthonormal basis of V . (c) Interpret (a) in the light of (b). (d) Prove that the Hilbert–Schmidt norm is given on any matrix A in V by A2HS =

j

Au j 2 =

i, j

|vi∗ Au j |2 ,

where {u 1 , . . . , u n } and {v1 , . . . , vn } are any orthonormal bases of Cn and v ∗ refers to the conjugate transpose of any member v of Cn . (e) Let W be the vector subspace of all diagonal matrices in V . Describe explicitly the orthogonal complement W ⊥ , and ﬁnd its dimension. 2.

Let Vn be the inner-product space over R of all polynomials on [0, 1] of degree ≤ n with real coefﬁcients. (The 0 polynomial is to be included.) The Riesz Representation Theorem says that there is a unique polynomial pn such that &1 f 12 = 0 f (x) pn (x) d x for all f in Vn . Set up a system of linear equations whose solution tells what pn is.

3.

Let V be a ﬁnite-dimensional inner-product space, and suppose that L and M are self-adjoint linear maps from V to V . Show that L M is self-adjoint if and only if L M = M L.

4.

Let V be a ﬁnite-dimensional inner-product space. If L : V → V is a linear map with adjoint L ∗ , prove that ker L = (image L ∗ )⊥ .

5.

Find all 2-by-2 Hermitian matrices A with characteristic polynomial λ2 + 4λ + 6.

6.

Let V1 and V2 be ﬁnite-dimensional inner-product spaces over the same F, the inner products being ( · , · )1 and ( · , · )2 . (a) Using the case when V1 = V2 as a model, deﬁne the adjoint of a linear map L : V1 → V2 , proving its existence. The adjoint is to be a linear map L ∗ : V2 → V1 .

112

III. Inner-Product Spaces

(b) If is an orthonormal basis of V1 and is an orthonormal basis of V2 , prove that the matrices of L and L ∗ in these bases are conjugate transposes of one another. 7.

Suppose that a ﬁnite-dimensional inner-product space V is a direct sum V = S ⊕ T of vector subspaces. Let E : V → V be the linear map that is the identity on S and is 0 on T . (a) Prove that V = S ⊥ ⊕ T ⊥ . (b) Prove that E ∗ : V → V is the linear map that is the identity on T ⊥ and is 0 on S ⊥ .

8.

(Iwasawa decomposition) Let g be an invertible n-by-n complex matrix. Apply the Gram–Schmidt orthogonalization process to the basis {ge1 , . . . , gen }, where {e1 , . . . , en } is the standard basis, and let the resulting orthonormal basis be {v1 , . . . , vn }. Deﬁne an invertible n-by-n matrix k such that k −1 v j = e j for 1 ≤ j ≤ n. Prove that k −1 g is upper triangular with positive diagonal entries, and conclude that g = k(k −1 g) exhibits g as the product of a unitary matrix and an upper triangular matrix whose diagonal entries are positive.

9.

Let A be an n-by-n positive deﬁnite matrix. (a) Prove that det A > 0. (b) Prove for any subset of integers 1 ≤ i 1 < i 2 < · · · < i k ≤ n that the submatrix of A built from rows and columns indexed by (i 1 , . . . , i k ) is positive deﬁnite.

10. Prove that if A is a positive deﬁnite n-by-n matrix, then there exists an n-by-n upper-triangular matrix B with positive diagonal entries such that A = B ∗ B. 11. The most general 2-by-2 Hermitian matrix is of the form A = ab¯ db with a and d real and with b complex. Find a diagonal matrix D and a unitary matrix U such that D = U −1 AU . 12. In the previous problem, (a) what conditions on A make A positive deﬁnite? (b) when A is positive deﬁnite, how can its positive deﬁnite square root be computed explicitly? 13. Prove that if an n-by-n real symmetric matrix A has v t Av = 0 for all v in Rn , then A = 0. 14. Let L : Cn → Cn be a self-adjoint linear map. Show for each x ∈ Cn that there is some y ∈ Cn such that (I − L)2 (y) = (I − L)(x). 15. In the polar decomposition L = U P, prove that if P and U commute, then L is normal. 16. Let V be an n-dimensional inner-product space over R. What is the largest possible dimension of a commuting family of self-adjoint linear maps L : V → V ?

4. Problems

113

17. Let v1 , . . . , vn be an ordered list of vectors in an inner-product space. The associated Gram matrix is the Hermitian matrix of inner products given by G(v1 , . . . , vn ) = [(vi , v j )], and det G(v1 , . . . , vn ) is called its Gram c1 determinant. .. t (a) If c1 , . . . , cn are in C, let c = . . Prove that c G(v1 , . . . , vn )c¯ = cn

c1 v1 + · · · + cn vn 2 , and conclude that G(v1 , . . . , vn ) is positive semideﬁnite. (b) Prove that det G(v1 , . . . , vn ) ≥ 0 with equality if and only if v1 , . . . , vn are linearly dependent. (This generalizes the Schwarz inequality.) (c) Under what circumstances does equality hold in the Schwarz inequality? Problems 18–23 introduce the Legendre polynomials and establish some of their elementary properties, including their orthogonality under the inner product P, Q = &1 −1 P(x)Q(x) d x. They form the simplest family of classical orthogonal polynomials. They are uniquely determined by the conditions that the n th one Pn , for n ≥ 0, is of degree n, they are orthogonal under · , · , and they are normalized so that Pn (1) = 1. But these conditions are a little hard to work with initially, and instead we adopt the recursive deﬁnition P0 (x) = 1, P1 (x) = x, and (n + 1)Pn+1 (x) = (2n + 1)x Pn (x) − n Pn−1 (x)

for n ≥ 1.

18. (a) Prove that Pn (x) has degree n, that Pn (−x) = (−1)n Pn (x), and that Pn (1) = 1. In particular, Pn is an even function if n is even and is an odd function if n is odd. (b) Let c(n) be the constant term of Pn if n is even and the coefﬁcient of x if n is odd, so that c(0) = c(1) = 1. Prove that c(n) = − n−n 1 c(n−2) for n ≥ 2. 19. This part establishes a useful concrete formula for Pn (x). Let D = d/d x and X = x 2 −1, writing X = 2x, X = 2, and X = 0 for the Two parts derivatives. of this problem make use of the Leibniz rule D n ( f g) = nk=0 nk (D n−k f )(D k g) for higher-order derivatives of a product. (a) Verify that D 2 (X n+1 ) = (2n + 1)D(X n X ) − n(2n + 1)X X n − 4n 2 X n−1 . (b) By applying D n−1 to the result of (a) and rearranging terms, show that D n+1 (X n+1 ) = (2n + 1)X D n (X n ) − 4n 2 D n−1 (X n−1 ). (c) Put Rn (x) = (2n n!)−1 D n (X n ) for n ≥ 0. Show that R0 (x) = 1, R1 (x) = x, and (n + 1)Rn+1 (x) = (2n + 1)x Rn (x) − n Rn−1 (x) for n ≥ 1. n (d) (Rodrigues’s formula) Conclude that 2n n!Pn (x) = ddx [(x 2 − 1)n ]. 20. Using Rodrigues’s formula and iterated integration by parts, prove that &1 for m < n. −1 Pm (x)Pn (x) d x = 0 Conclude that {P0 , P1 , . . . , Pn } is an orthogonal basis of the inner-product space of polynomials on [−1, 1] of degree ≤ n with inner product · , · .

114

III. Inner-Product Spaces

21. Arguing as in the previous problem and taking for granted that −1 2(2n n !)2 , prove that Pn , Pn = n + 12 . (2n+1)!

&1 −1

(1−x 2 )n d x =

22. This problem shows that Pn (x) satisﬁes a certain second-order differential equation. Let D = d/d x. The ﬁrst two parts of this problem use the Leibniz rule quoted in Problem 19. Let X = x 2 − 1 and K n = 2n n!, so that Rodrigues’s formula says that K n Pn = D n (X n ). (a) Expand D n+1 [(D(X n ))X ] by the Leibniz rule. (b) Observe that (D(X n ))X = n X n X , and expand D n+1 [(n X n )X ] by the Leibniz rule. (c) Equating the results of the previous two parts, conclude that y = Pn (x) satisﬁes the differential equation (1 − x 2 )y − 2x y + n(n + 1)y = 0. 23. Let Pn (x) = nk=0 ck x k . Using the differential equation, show that the coefﬁcients ck satisfy k(k − 1)ck = [(k − 2)(k − 1) − n(n + 1)]ck−2 for k ≥ 2 and that ck = 0 unless n − k is even. Problems 24–28 concern the complex conjugate of an inner-product space over C. For any ﬁnite-dimensional inner-product space V , the Riesz Representation Theorem identiﬁes the dual V with V , saying that each member of V is given by taking the inner product with some member of V . When the scalars are real, this identiﬁcation is linear; thus the Riesz theorem uses the inner product to construct a canonical isomorphism of V onto V . When the scalars are complex, the identiﬁcation is conjugate linear, and we do not get an isomorphism of V with V . The complex conjugate of V provides a substitute result. 24. Let V be a ﬁnite-dimensional vector space over C. Deﬁne a new complex vector space V as follows: The elements of V are the elements of V , and the deﬁnition of addition is unchanged. However, there is a change in the deﬁnition of scalar multiplication, in that if v is in V , then the product cv in V is to equal the product cv ¯ in V . Verify that V is indeed a complex vector space. 25. If V is a complex vector space and L : V → V is a linear map, deﬁne L : V → V to be the same function as L. Prove that L is linear. 26. Suppose that the complex vector space V is actually a ﬁnite-dimensional innerproduct space, with inner product ( · , · )V . Deﬁne (u, v)V = (v, u)V . Verify that V is an inner-product space. 27. With V as in the previous problem, show that the Riesz Representation Theorem uses the inner product to set up a canonical isomorphism of V with V . 28. With V and V as in the two previous problems, let L : V → V be linear, so that (L)∗ : V → V is linear. Under the identiﬁcation of the previous problem of V with V , show that (L)∗ corresponds to the contragredient L t as deﬁned in Section II.4.

4. Problems

115

Problems 29–32 use inner-product spaces to obtain a decomposition of polynomials in several variables. A real-valued polynomial function p in x1 , . . . , xn is said to be homogeneous of degree N if every monomial in p has total degree N . Let VN be the space of real-valued polynomials in x1 , . . . , xn homogeneous of degree N . For any homogeneous polynomial p, we deﬁne a differential operator ∂( p) with constant coefﬁcients by requiring that ∂( · ) be linear in ( · ) and that ∂(x1k1 · · · xnkn ) =

∂ k1 +···+kn ∂ x1k1 · · · ∂ xnkn

.

For example, if |x|2 stands for x12 + · · · + xn2 , then ∂(|x|2 ) = =

∂2 ∂ x12

+ ··· +

∂2 . ∂ xn2

If p and q are in the same VN , then ∂(q) p is a constant polynomial, and we deﬁne p, q to be that constant. Then · , · is bilinear. 29. (a) Prove that · , · satisﬁes p, q = q, p. (b) Prove that x1k1 · · · xnkn , x1l1 · · · xnln is positive if (k1 , . . . , kn ) = (l1 , . . . , ln ) and is 0 otherwise. (c) Deduce that · , · is an inner product on VN . 30. Call p ∈ VN harmonic if ∂(|x|2 ) p = 0, and let HN be the vector subspace of harmonic polynomials. Prove that the orthogonal complement of |x|2 VN −2 in VN relative to · , · is HN . 31. Deduce from Problem 30 that each p ∈ VN decomposes uniquely as p = h N + |x|2 h N −2 + |x|4 h N −4 + · · · with h N , h N −2 , h N −4 , . . . homogeneous harmonic of the indicated degrees. 32. For n = 2, describe a computational procedure for decomposing the element x14 + x24 of V4 as in Problem 31. Problems 33–34 concern products of n-by-n positive semideﬁnite matrices. They make use of Problem 26 in Chapter II, which says that det(λI −C D) = det(λI − DC). 33. Let A and B be positive semideﬁnite. Using the positive deﬁnite square root of B, prove that every eigenvalue of AB is ≥ 0. 34. Let A, B, and C be positive semideﬁnite, and suppose that ABC is Hermitian. Under the assumption that C is invertible, introduce the positive deﬁnite square root P of C. By considering P −1 ABC P −1 , prove that ABC is positive semideﬁnite.

CHAPTER IV Groups and Group Actions

Abstract. This chapter develops the basics of group theory, with particular attention to the role of group actions of various kinds. The emphasis is on groups in Sections 1–3 and on group actions starting in Section 6. In between is a two-section digression that introduces rings, ﬁelds, vector spaces over general ﬁelds, and polynomial rings over commutative rings with identity. Section 1 introduces groups and a number of examples, and it establishes some easy results. Most of the examples arise either from number-theoretic settings or from geometric situations in which some auxiliary space plays a role. The direct product of two groups is discussed brieﬂy so that it can be used in a table of some groups of low order. Section 2 deﬁnes coset spaces, normal subgroups, homomorphisms, quotient groups, and quotient mappings. Lagrange’s Theorem is a simple but key result. Another simple but key result is the construction of a homomorphism with domain a quotient group G/H when a given homomorphism is trivial on H . The section concludes with two standard isomorphism theorems. Section 3 introduces general direct products of groups and direct sums of abelian groups, together with their concrete “external” versions and their universal mapping properties. Sections 4–5 are a digression to deﬁne rings, ﬁelds, and ring homomorphisms, and to extend the theories concerning polynomials and vector spaces as presented in Chapters I–II. The immediate purpose of the digression is to make prime ﬁelds and the notion of characteristic available for the remainder of the chapter. The deﬁnitions of polynomials are extended to allow coefﬁcients from any commutative ring with identity and to allow more than one indeterminate, and universal mapping properties for polynomial rings are proved. Sections 6–7 introduce group actions. Section 6 gives some geometric examples beyond those in Section 1, it establishes a counting formula concerning orbits and isotropy subgroups, and it develops some structure theory of groups by examining speciﬁc group actions on the group and its coset spaces. Section 7 uses a group action by automorphisms to deﬁne the semidirect product of two groups. This construction, in combination with results from Sections 5–6, allows one to form several new ﬁnite groups of interest. Section 8 deﬁnes simple groups, proves that alternating groups on ﬁve or more letters are simple, and then establishes the Jordan–H¨older Theorem concerning the consecutive quotients that arise from composition series. Section 9 deals with ﬁnitely generated abelian groups. It is proved that “rank” is well deﬁned for any ﬁnitely generated free abelian group, that a subgroup of a free abelian group of ﬁnite rank is always free abelian, and that any ﬁnitely generated abelian group is the direct sum of cyclic groups. Section 10 returns to structure theory for ﬁnite groups. It begins with the Sylow Theorems, which produce subgroups of prime-power order, and it gives two sample applications. One of these classiﬁes the groups of order pq, where p and q are distinct primes, and the other provides the information necessary to classify the groups of order 12. Section 11 introduces the language of “categories” and “functors.” The notion of category is a precise version of what is sometimes called a “context” at points in the book before this section, 116

1. Groups and Subgroups

117

and some of the “constructions” in the book are examples of “functors.” The section treats in this language the notions of “product” and “coproduct,” which are abstractions of “direct product” and “direct sum.”

1. Groups and Subgroups Linear algebra and group theory are two foundational subjects for all of algebra, indeed for much of mathematics. Chapters II and III have introduced the basics of linear algebra, and the present chapter introduces the basics of group theory. In this section we give the deﬁnition and notation for groups and provide examples that ﬁt with the historical development of the notion of group. Many readers will already be familiar with some group theory, and therefore we can be brief at the start. A group is a nonempty set G with an operation G × G → G satisfying the three properties (i), (ii), and (iii) below. In the absence of any other information the operation is usually called multiplication and is written (a, b) → ab with no symbol to indicate the multiplication. The deﬁning properties of a group are (i) (ab)c = a(bc) for all a, b, c in G (associative law), (ii) there exists an element 1 in G such that a1 = 1a = a for all a in G (existence of identity), (iii) for each a in G, there exists an element a −1 in G with aa −1 = a −1 a = 1 (existence of inverses). It is immediate from these properties that • 1 is unique (since 1 = 1 1 = 1), • a −1 is unique (since (a −1 ) = (a −1 ) 1 = (a −1 ) (a(a −1 )) = ((a −1 ) a)(a −1 ) = 1(a −1 ) = (a −1 )), • the existence of a left inverse for each element implies the existence of a right inverse for each element (since ba = 1 and cb = 1 together imply c = c(ba) = (cb)a = a and hence also ab = cb = 1), • 1 is its own inverse (since 11 = 1), • ax = ay implies x = y, and xa = ya implies x = y (cancellation laws) (since x = 1x = (a −1 a)x = a −1 (ax) = a −1 (ay) = (a −1 a)y = 1y = y and since a similar argument proves the second implication). Problem 2 at the end of Chapter II shows that the associative law extends to products of any ﬁnite number of elements of G as follows: parentheses can be inserted in any fashion in such a product, and the value of the product is unchanged; hence any expression a1 a2 · · · an in G is well deﬁned without the use of parentheses. The group whose only element is the identity 1 will be denoted by {1}. It is called the trivial group.

118

IV. Groups and Group Actions

We come to other examples in a moment. First we make three more deﬁnitions and offer some comments. A subgroup H of a group G is a subset containing the identity that is closed under multiplication and inverses. Then H itself is a group because the associativity in G implies associativity in H . The intersection of any nonempty collection of subgroups of G is again a subgroup. An isomorphism of a group G 1 with a group G 2 is a function ϕ : G 1 → G 2 that is one-one onto and satisﬁes ϕ(ab) = ϕ(a)ϕ(b) for all a and b in G 1 . It is immediate that • ϕ(1) = 1 (by taking a = b = 1), • ϕ(a −1 ) = ϕ(a)−1 (by taking b = a −1 ), • ϕ −1 : G 2 → G 1 satisﬁes ϕ −1 (cd) = ϕ −1 (c)ϕ −1 (d) (by taking c = ϕ(a) and d = ϕ(b) on the right side and then observing that ϕ ϕ −1 (c)ϕ −1 (d) = ϕ(ab) = ϕ(a)ϕ(b) = cd = ϕ(ϕ −1 (cd))). The ﬁrst and second of these properties show that an isomorphism respects all the structure of a group, not just products. The third property shows that the inverse of an isomorphism is an isomorphism, hence that the relation “is isomorphic to” is symmetric. Since the identity isomorphism exhibits this relation as reﬂexive and since the use of compositions shows that it is transitive, we see that “is isomorphic to” is an equivalence relation. Common notation for an isomorphism between G 1 and G 2 is G 1 ∼ = G 2 ; because of the symmetry, one can say that G 1 and G 2 are isomorphic. An abelian group is a group G with the additional property (iv) ab = ba for all a and b in G (commutative law). In an abelian group the operation is sometimes, but by no means always, called addition instead of “multiplication.” Addition is typically written (a, b) → a+b, and then the identity is usually denoted by 0 and the inverse of a is denoted by −a, the negative of a. Depending on circumstances, the trivial abelian group may be denoted by {0} or 0. Problem 3 at the end of Chapter II shows for an abelian group G with its operation written additively that n-fold sums of elements of G can be written in any order: a1 + a2 + · · · + an = aσ (1) + aσ (2) + · · · + aσ (n) for each permutation σ of {1, . . . , n}. Historically the original examples of groups arose from two distinct sources, and it took a while for the above deﬁnition of group to be distilled out as the essence of the matter. One of the two sources involved number systems and vectors. Here are examples. EXAMPLES. (1) Additive groups of familiar number systems. The systems in question are the integers Z, the rational numbers Q, the real numbers R, and the complex

1. Groups and Subgroups

119

numbers C. In each case the set with its usual operation of addition forms an abelian group. The group properties of Z under addition are taken as known in advance in this book, as mentioned in Section A3 of the appendix, and the group properties of Q, R, and C under addition are sketched in Sections A3 and A4 of the appendix as part of the development of these number systems. (2) Multiplicative groups connected with familiar number systems. In the cases of Q, R, and C, the nonzero elements form a group under multiplication. These groups are denoted by Q× , R× , and C× . Again the properties of a group for each of them are properties that are sketched during the development of each of these number systems in Sections A3 and A4 of the appendix. With Z, the nonzero integers do not form a group under multiplication, because only the two units, i.e., the divisors +1 and −1 of 1, have inverses. The units do form a group, however, under multiplication, and the group of units is denoted by Z× . (3) Vector spaces under addition. Spaces such as Qn and Rn and Cn provide us with further examples of abelian groups. In fact, the deﬁning properties of addition in a vector space are exactly the deﬁning properties of an abelian group. Thus every vector space provides us with an example of an abelian group if we simply ignore the scalar multiplication. (4) Integers modulo m, under addition. Another example related to number systems is the additive group of integers modulo a positive integer m. Let us say that an integer n 1 is congruent modulo m to an integer n 2 if m divides n 1 − n 2 . One writes n 1 ≡ n 2 or n 1 ≡ n 2 mod m or n 1 = n 2 mod m for this relation.1 It is an equivalence relation, and we can write [n] for the equivalence class of n when it is helpful to do so. The division algorithm (Proposition 1.1) tells us that each equivalence class has one and only one member between 0 and m − 1. Thus there are exactly m equivalence classes, and we know a representative of each. The set of classes will be denoted by2 Z/mZ. The point is that Z/mZ inherits an abelian-group structure from the abelian-group structure of Z. Namely, we attempt to deﬁne [a] + [b] = [a + b]. To see that this formula actually deﬁnes an operation on Z/mZ, we need to check that the result is meaningful if the representatives of the classes [a] and [b] are changed. Thus let [a] = [a ] and [b] = [b ]. Then m divides a − a and b − b , and m must divide the sum (a − a ) + (b − b ) = (a + b) − (a + b ); consequently [a + b] = [a + b ], and addition is well deﬁned. The same kind of 1 This notation was anticipated in a remark explaining the classical form of the Chinese Remainder Theorem (Corollary 1.9). 2 The notation Z /(m) is an allowable alternative. Some authors, particularly in topology, write Zm for this set, but the notation Zm can cause confusion since Z p is the standard notation for the “ p-adic integers” when p is prime. These are deﬁned in Advanced Algebra.

120

IV. Groups and Group Actions

argument shows that the associativity and commutativity of addition in Z imply associativity and commutativity in Z/mZ. The identity element is [0], and group inverses (negatives) are given by −[a] = [−a]. Therefore Z/mZ is an abelian group under addition, and it has m elements. If x and y are members of Z/mZ, their sum is often denoted by x + y mod m. The other source of early examples of groups historically has the members of the group operating as transformations of some auxiliary space. Before abstracting matters, let us consider some concrete examples, ignoring some of the details of verifying the deﬁning properties of a group. EXAMPLES, CONTINUED. (5) Permutations. A permutation of a nonempty ﬁnite set E of n elements is a one-one function from E onto itself. Permutations were introduced in Section I.4. The product of two permutations is just the composition, deﬁned by (σ τ )(x) = σ (τ (x)) for x in E, with the symbol ◦ for composition dropped. The resulting operation makes the set of permutations of E into a group: we already observed in Section I.4 that composition is associative, and it is plain that the identity permutation may be taken as the group identity and that the inverse function to a permutation is the group inverse. The group is called the symmetric group on the n letters of E. It has n! members for n ≥ 1. The notation Sn is often used for this group, especially when E = {1, . . . , n}. Signs ±1 were deﬁned for permutations in Section I.4, and we say that a permutation is even or odd according as its sign is +1 or −1. The sign of a product is the product of the signs, according to Proposition 1.24, and it follows that the even permutations form a subgroup of Sn . This subgroup is called the alternating group on n letters and is denoted by An . It has 12 (n!) members if n ≥ 2. (6) Symmetries of a regular polygon. Imagine a regular polygon in R2 centered at the origin. The plane-geometry rotations and reﬂections about the origin that carry the polygon to itself form a group. If the number of sides of the polygon is n, then the group always contains the rotations through all multiples of the angle 2π/n. The rotations themselves form an n-element subgroup of the group of all symmetries. To consider what reﬂections give symmetries, we distinguish the cases n odd and n even. When n is odd, the reﬂection in the line that passes through any vertex and bisects the opposite side carries the polygon to itself, and no other reﬂections have this property. Thus the group of symmetries contains n reﬂections. When n is even, the reﬂection in the line passing through any vertex and the opposite vertex carries the polygon to itself, and so does the reﬂection in the line that bisects a side and also the opposite side. There are n/2 reﬂections of each kind, and hence the group of symmetries again contains n reﬂections. The group of symmetries thus has 2n elements in all cases. It is called the dihedral

1. Groups and Subgroups

121

group Dn . The group Dn is isomorphic to a certain subgroup of the permutation group Sn . Namely, we number the vertices of the polygon, and we associate to each member of Dn the permutation that moves the vertices the way the member of Dn does. (7) General linear group. With F equal to Q or R or C, consider any ndimensional vector space V over F. One possibility is V = Fn , but we do not insist on this choice. Among all one-one functions carrying V onto itself, let G consist of the linear ones. The composition of two linear maps is linear, and the inverse of an invertible function is linear if the given function is linear. The result is a group known as the general linear group GL(V ). When V = Fn , we know from Chapter II that we can identify linear maps from Fn to itself with matrices in Mnn (F) and that composition corresponds to matrix multiplication. It follows that the set of all invertible matrices in Mnn (F) is a group, which is denoted by GL(n, F), and that this group is isomorphic to GL(Fn ). The set SL(V ) or SL(n, F) of all members of GL(V ) or GL(n, F) of determinant 1 is a group since the determinant of a product is the product of the determinants; it is called the special linear group. The dihedral group Dn is isomorphic to a subgroup of GL(2, R) since each rotation and reﬂection of R2 that ﬁxes the origin is given by the operation of a 2-by-2 matrix. (8) Orthogonal and unitary groups. If V is a ﬁnite-dimensional inner-product space over R or C, Chapter III referred to the linear maps carrying the space to itself and preserving lengths of vectors as orthogonal in the real case and unitary in the complex case. Such linear maps are invertible. The condition of preserving lengths of vectors is maintained under composition and inverses, and it follows that the orthogonal or unitary linear maps form a subgroup O(V ) or U(V ) of the general linear group GL(V ). One writes O(n) for O(Rn ) and U(n) for U(Cn ). The subgroup of members of O(V ) or O(n) of determinant 1 is called the rotation group SO(V ) or SO(n). The subgroup of members of U(V ) or U(n) of determinant 1 is called the special unitary group SU(V ) or SU(n). Before coming to Example 9, let us establish a closure property under the arithmetic operations for certain subsets of C. We are going to use the theories of polynomials as in Chapter I and of vector spaces as in Chapter II with the rationals Q as the scalars. Fix a complex number θ, and form the result of evaluating at θ every polynomial in one indeterminate with coefﬁcients in Q. The resulting set of complex numbers comes by substituting θ for X in the members of Q[X ], and we denote this subset of C by Q[θ]. Suppose that θ has the property that the set {1, θ, θ 2 , . . . , θ n } is linearly dependent over Q for some integer n ≥ 1, i.e., has the property that F0 (θ )√= 0 for some nonzero F0 of Q[X ] of degree ≤ n. For example, if θ = 2, then √ member √ √ the set {1, 2, ( 2)2 } is linearly dependent since 2 − ( 2)2 = 0; if θ = e2πi/5 ,

122

IV. Groups and Group Actions

then {1, θ, θ 2 , θ 3 , θ 4 , θ 5 } is linearly dependent since 1 − θ 5 = 0, or alternatively since 1 + θ + θ 2 + θ 3 + θ 4 = 0. Returning to the general θ, we lose no generality if we assume that the polynomial F0 has degree exactly n. If we divide the equation F0 (θ ) = 0 by the leading coefﬁcient, we obtain an equality θ n = G 0 (θ ), where G 0 is the zero polynomial or is a nonzero polynomial of degree at most n − 1. Then θ n+m = θ m G 0 (θ ), and we see inductively that every power θ r with r ≥ n is a linear combination of the members of the set {1, θ, θ 2 , . . . , θ n−1 }. This set is therefore a spanning set for the vector space Q[θ], and we ﬁnd that Q[θ] is ﬁnite-dimensional, with dimension at most n. Since every positive integer power of θ lies in Q[θ] and since these powers are closed under multiplication, the vector space Q[θ] is closed under multiplication. More striking is that Q[θ] is closed under division, as is asserted in the following proposition. Proposition 4.1. Let θ be in C, and suppose for some integer n ≥ 1 that the set {1, θ, θ 2 , . . . , θ n } is linearly dependent over Q. Then the ﬁnite-dimensional rational vector space Q[θ] is closed under taking reciprocals (of nonzero elements), as well as multiplication, and hence is closed under division. REMARKS. Under the hypotheses of Proposition 4.1, Q[θ] is called an algebraic number ﬁeld,3 or simply a number ﬁeld, and θ is called an algebraic number. The relevant properties of C that are used in proving the proposition are that C is closed under the usual arithmetic operations, that these satisfy the usual properties, and that Q is a subset of C. The deeper closure properties of C that are developed in Sections A3 and A4 of the appendix play no role. PROOF. We have seen that Q[θ] is closed under multiplication. If x is a nonzero member of Q[θ], then all positive powers of x must be in Q[θ], and the fact that dim Q[θ] ≤ n forces {1, x, x 2 , . . . , x n } to be linearly dependent. Therefore there are integers j and k with 0 ≤ j < k ≤ n such that c j x j +c j+1 x j+1 +· · ·+ck x k = 0 for some rational numbers c j , . . . , ck with ck = 0. Since x is assumed nonzero, we can discard unnecessary terms and arrange that c j = 0. Then −1 −1 k− j−1 1 = x(−c−1 ), j c j+1 − c j c j+2 x − c j ck x

and the reciprocal of x has been exhibited as in Q[θ].

EXAMPLES, CONTINUED. (9) Galois’s notion of automorphisms of number ﬁelds. Let θ be a complex number as in Proposition 4.1. The subject of Galois theory, whose details will 3 The deﬁnition of “algebraic number ﬁeld” that is given later in the book is ostensibly more general, but the Theorem of the Primitive Element in Chapter IX will show that it amounts to the same thing as this.

1. Groups and Subgroups

123

be discussed in Chapter IX and whose full utility will be glimpsed only later, works in an important special case with the “automorphisms” of Q[θ] that ﬁx Q. The automorphisms are the one-one functions from Q[θ] onto itself that respect addition and multiplication and carry every element of Q to itself. The identity is such a function, the composition of two such functions is again one, and the inverse of such a function is again one. Therefore the automorphisms of Q[θ] form a group under composition. We call this group Gal(Q[θ]/Q). Let us see that it is ﬁnite. In fact, if σ is in Gal(Q[θ]/Q), then σ is determined by its effect on θ, since we must have σ (F(θ)) = F(σ (θ )) for every F in Q[X ]. We know that there is some nonzero polynomial F0 (X ) such that F0 (θ ) = 0. Applying σ to this equality, we see that F0 (σ (θ )) = 0. Therefore σ (θ) has to be a root of F0 . Viewing F0 as in C[X ], we can apply Corollary 1.14 and see that F0 has only ﬁnitely many complex roots. Therefore there are only ﬁnitely many possibilities for σ , and the group Gal(Q[θ]/Q) has to be ﬁnite. Galois theory shows that this group gives considerable insight into the structure of Q[θ]. For example it allows one to derive the Fundamental Theorem of Algebra (Theorem 1.18) just from algebra and the Intermediate Value Theorem (Section A3 of the appendix); it allows one to show the impossibility of certain constructions in plane geometry by straightedge and compass; and it allows one to show that a quintic polynomial with rational coefﬁcients need not have a root that is expressible in terms of rational numbers, arithmetic operations, and the extraction of square roots, cube roots, and so on. We return to these matters in Chapter IX. Examples 5–9, which all involve auxiliary spaces, ﬁt the pattern that the members of the group are invertible transformations of the auxiliary space and the group operation is composition. This notion will be abstracted in Section 6 and will lead to the notion of a “group action.” For now, let us see why we obtained groups in each case. If X is any nonempty set, then the set of invertible functions f : X → X forms a group under composition, composition being deﬁned by ( f g)(x) = f (g(x)) with the usual symbol ◦ dropped. The associative law is just a matter of unwinding this deﬁnition: (( f g)h)(x) = ( f g)(h(x)) = f (g(h(x))) = f ((gh)(x)) = ( f (gh))(x). The identity function is the identity of the group, and inverse functions provide the inverse elements in the group. For our examples, the set X was E in Example 5, R2 in Example 6, V or Fn in Example 7, V or Qn or Rn or Cn in Example 8, and Q[θ] in Example 9. All that was needed in each case was to know that our set G of invertible functions from X to itself formed a subgroup of the set of all invertible functions from X to itself. In other words, we had only to check that G contained the identity and was closed under composition and inversion. Associativity was automatic for G because it was valid for the group of all invertible functions from X to itself.

124

IV. Groups and Group Actions

Actually, any group can be realized in the fashion of Examples 5–9. This is the content of the next proposition. Proposition 4.2 (Cayley’s Theorem). Any group G is isomorphic to a subgroup of invertible functions on a set X . The set X can be taken to be G itself. In particular any ﬁnite group with n elements is isomorphic to a subgroup of the symmetric group Sn . PROOF. Deﬁne X = G, put f a (x) = ax for a in G, and let G = { f a | a ∈ G}. To see that G is a group, we need G to contain the identity and to be closed under composition and inverses. Since f 1 is the identity, the identity is indeed in G . Since f ab (x) = (ab)x = a(bx) = f a (bx) = f a ( f b (x)) = ( f a f b )(x), G is closed under composition. The formula f a f a −1 = f 1 = f a −1 f a then shows that f a −1 = ( f a )−1 and that G is closed under inverses. Thus G is a group. Deﬁne ϕ : G → G by ϕ(a) = f a . Certainly ϕ is onto G , and it is oneone because ϕ(a) = ϕ(b) implies f a = f b , f a (1) = f b (1), and a = b. Also, ϕ(ab) = f ab = f a f b = ϕ(a)ϕ(b), and hence ϕ is an isomorphism. In the case that G is ﬁnite with n elements, G is exhibited as isomorphic to a subgroup of the group of permutations of the members of G. Hence it is isomorphic to a subgroup of Sn . It took the better part of a century for mathematicians to sort out that two distinct notions are involved here—that of a group, as deﬁned above, and that of a group action, as will be deﬁned in Section 6. In sorting out these matters, mathematicians realized that it is wise to study the abstract group ﬁrst and then to study the group in the context of its possible group actions. This does not at all mean ignoring group actions until after the study of groups is complete; indeed, we shall see in Sections 6, 7, and 10 that group actions provide useful tools for the study of abstract groups. We turn to a discussion of two general group-theoretic notions—cyclic group and the direct product of two or more groups. The second of these notions will be discussed only brieﬂy now; more detail will come in Section 3. If a is an element of a group, we deﬁne a n for integers n > 0 inductively by a 1 = a and a n = a n−1 a. Then we can put a 0 = 1 and a −n = (a −1 )n for n > 0. A little checking, which we omit, shows that the ordinary rules of exponents apply: a m+n = a m a n and a mn = (a m )n for all integers m and n. If the underlying group is abelian and additive notation is being used, these formulas read (m + n)a = ma + na and (mn)a = n(ma). A cyclic group is a group with an element a such that every element is a power of a. The element a is called a generator of the group, and the group is said to be generated by a.

1. Groups and Subgroups

125

Proposition 4.3. Each cyclic group G is isomorphic either to the additive group Z of integers or to the additive group Z/mZ of integers modulo m for some positive integer m. PROOF. If all a n are distinct, then the rule a m+n = a m a n implies that the function n → a n is an isomorphism of Z with G. On the other hand, if a k = a l with k > l, then a k−l = 1 and there exists a positive integer n such that a n = 1. Let m be the least positive integer with a m = 1. For any integers q and r , we have a qm+r = (a m )q a r = a r . Thus the function ϕ : Z/mZ → G given by ϕ([n]) = a n is well deﬁned, is onto G, and carries sums in Z/mZ to products in G. If 0 ≤ l < k < m, then a k = a l since otherwise a k−l would be 1. Hence ϕ is one-one, and we conclude that ϕ : Z/mZ → G is an isomorphism. Let us denote abstract cyclic groups by C∞ and Cm , the subscript indicating the number of elements. Finite cyclic groups arise in guises other than as Z/mZ. For example the set of all elements e2πik/m in C, with multiplication as operation, forms a group isomorphic to Cm . So does the set of all rotation matrices cos 2π k/m − sin 2πk/m with matrix multiplication as operation. sin 2π k/m cos 2π k/m Proposition 4.4. Any subgroup of a cyclic group is cyclic. PROOF. Let G be a cyclic group with generator a, and let H be a subgroup. We may assume that H = {1}. Then there exists a positive integer n such that a n is in H , and we let k be the smallest such positive integer. If n is any integer such that a n is in H , then Proposition 1.2 produces integers x and y such that xk + yn = d, where d = GCD(k, n). The equation a d = (a k )x (a n ) y exhibits a d as in H , and the minimality of k forces d ≥ k. Since GCD(k, n) ≤ k, we conclude that d = k. Hence k divides n. Consequently H consists of the powers of a k and is cyclic. A notion of the direct product of two groups is deﬁnable in the same way as was done with vector spaces in Section II.6, except that a little care is needed in saying how this construction interacts with mappings. As with the corresponding construction for vector spaces, one can deﬁne an explicit “external” direct product, and one can recognize a given group as an “internal” direct product, i.e., as isomorphic to an external direct product. We postpone a fuller discussion of direct product, as well as all comments about direct sums and mappings associated with direct sums and direct products, to Section 3. The external direct product G 1 × G 2 of two groups G 1 and G 2 is a group whose underlying set is the set-theoretic product of G 1 and G 2 and whose group law is (g1 , g2 )(g1 , g2 ) = (g1 g1 , g2 g2 ). The identity is (1, 1), and the formula for inverses is (g1 , g2 )−1 = (g1−1 , g2−1 ). The two subgroups G 1 × {1} and {1} × G 2 of G 1 × G 2 commute with each other.

IV. Groups and Group Actions

126

A group G is the internal direct product of two subgroups G 1 and G 2 if the function from the external direct product G 1 × G 2 to G given by (g1 , g2 ) → g1 g2 is an isomorphism of groups. The literal analog of Proposition 2.30, which gave three equivalent deﬁnitions of internal direct product4 of vector spaces, fails here. It is not sufﬁcient that G 1 and G 2 be two subgroups such that G 1 ∩ G 2 = {1} and every element in G decomposes as a product g1 g2 with g1 ∈ G 1 and g2 ∈ G 2 . For example, with G = S3 , the two subgroups G 1 = {1, (1 2)}

and

G 2 = {1, (1 2 3), (1 3 2)}

have these properties, but G is not isomorphic to G 1 × G 2 because the elements of G 1 do not commute with the elements of G 2 . Proposition 4.5. If G is a group and G 1 and G 2 are subgroups, then the following conditions are equivalent: (a) G is the internal direct product of G 1 and G 2 , (b) every element in G decomposes uniquely as a product g1 g2 with g1 ∈ G 1 and g2 ∈ G 2 , and every member of G 1 commutes with every member of G 2, (c) G 1 ∩ G 2 = {1}, every element in G decomposes as a product g1 g2 with g1 ∈ G 1 and g2 ∈ G 2 , and every member of G 1 commutes with every member of G 2 . PROOF. We have seen that (a) implies (b). If (b) holds and g is in G 1 ∩ G 2 , then the formula 1 = gg −1 and the uniqueness of the decomposition of 1 as a product together imply that g = 1. Hence (c) holds. If (c) holds, deﬁne ϕ : G 1 × G 2 → G by ϕ(g1 , g2 ) = g1 g2 . This map is certainly onto G. To see that it is one-one, suppose that ϕ(g1 , g2 ) = ϕ(g1 , g2 ). Then g1 g2 = g1 g2 and hence g1 −1 g1 = g2 g2−1 . Since G 1 ∩ G 2 = {1}, g1 −1 g1 = g2 g2−1 = 1. Thus (g1 , g2 ) = (g1 , g2 ), and ϕ is one-one. Finally the fact that elements of G 1 commute with elements of G 2 implies that ϕ((g1 , g2 )(g1 , g2 )) = ϕ(g1 g1 , g2 g2 ) = g1 g1 g2 g2 = g1 g2 g1 g2 = ϕ(g1 , g2 )ϕ(g1 , g2 ). Therefore ϕ is an isomorphism, and (a) holds. Here are two examples of internal direct products of groups. In each let R+ be the multiplicative group of positive real numbers. The ﬁrst example is R× ∼ = C2 ×R+ with C2 providing the sign. The second example is C× ∼ = S 1 ×R+ , where S 1 is the multiplicative group of complex numbers of absolute value 1; the isomorphism here is given by the polar-coordinate mapping (eiθ , r ) → eiθ r . 4 The direct sum and direct product of two vector spaces were deﬁned to be the same thing in Chapter II.

1. Groups and Subgroups

127

We conclude this section by giving an example of a group that falls outside the pattern of the examples above and by summarizing what groups we have identiﬁed with ≤ 15 elements. EXAMPLES, CONTINUED. (10) Groups associated with the quaternions. The set H of quaternions is an object like R or C in that it has both an addition/subtraction and a multiplication/division, but H is unlike R and C in that multiplication is not commutative. We give two constructions. In one we start from R4 with the standard basis vectors written as 1, i, j, k. The multiplication table for these basis vectors is 11 = 1,

1i = i,

1j = j,

1k = k,

i1 = i,

ii = −1,

ij = k,

ik = −j,

j1 = j,

ji = −k,

jj = −1,

jk = i,

kj = −i,

kk = −1,

k1 = k, ki = j,

and the multiplication is extended to general elements by the usual distributive laws. The multiplicative identity is 1, and multiplicative inverses of nonzero elements are given by (a1 + bi + cj + k)−1 = s −1 a1 − s −1 bi − s −1 cj − s −1 dk √ with s = a 2 + b2 + c2 + d 2 . Since ij = k while ji = −k, multiplication is not commutative. What takes work to see is that multiplication is associative. To see this, we give another construction, using M22 (C). Within M22 (C), take i 0 0 −1 0 −i 1 = 10 01 , i = 0 −i , j= 1 0 , k = −i 0 , and deﬁne H to be the linear span, with real coefﬁcients, of these matrices. The operations are the usual matrix addition and multiplication. Then multiplication is associative, and we readily verify the multiplication table for 1, i, j, k. A little computation veriﬁes also the formula for multiplicative inverses. The set H× of nonzero elements forms a group under multiplication, and it is isomorphic to R+ × SU(2), where $ # α β 2 2 + |β| = 1 |α| SU(2) = ¯ −β α¯ is the 2-by-2 special unitary group deﬁned in Example 8. Of interest for our current purposes is the 8-element subgroup ±1, ±i, ±j, ±k, which is called the quaternion group and will be denoted by H8 .

IV. Groups and Group Actions

128

The order of a ﬁnite group is the number of elements in the group. Let us list some of the groups we have discussed that have order at most 15: 1 2 3 4 5 6 7 8

C1 C2 C3 C4 , C2 × C2 C5 C 6 , D3 C7 C8 , C4 × C2 , C2 × C2 × C2 , D4 , H8

9 10 11 12 13 14 15

C9 , C3 × C3 C10 , D5 C11 C12 , C6 × C2 , D6 , A4 C13 C14 , D7 C15

No two groups in the above table are isomorphic, as one readily checks by counting elements of each “order” in the sense of the next section. We shall see in Section 10 and in the problems at the end of the chapter that the above table is complete through order 15 except for one group of order 12. Some groups that we have discussed have been omitted from the above table because of isomorphisms with the groups above. For example, S2 ∼ = C2 , A3 ∼ = C3 , C3 × C2 ∼ = C6 , S3 ∼ = D3 , ∼ ∼ ∼ , C × C , D × C , C × C C5 × C2 ∼ C C D C = 10 4 3 = 12 3 2 = 6 7 2 = 14 , and ∼ C5 × C3 = C15 .

2. Quotient Spaces and Homomorphisms Let G be a group, and let H be a subgroup. For purposes of this paragraph, say that g1 in G is equivalent to g2 in G if g1 = g2 h for some h in H . The relation “equivalent” is an equivalence relation: it is reﬂexive because 1 is in H , it is symmetric since H is closed under inverses, and it is transitive since H is closed under products. The equivalence classes are called left cosets of H in G. The left coset containing an element g of G is the set g H = {gh | h ∈ H }. EXAMPLES. (1) When G = Z and H = m Z , the left cosets are the sets r + mZ, i.e., the sets {x ∈ Z | x ≡ r mod m} for the various values of r . (2) When G = S3 and H = {(1), (1 3)}, there are three left cosets: H , (1 2)H = {(1 2), (1 3 2)}, and (2 3)H = {(2 3), (1 2 3)}. Similarly one can deﬁne the right cosets H g of H in G. When G is nonabelian, these need not coincide with the left cosets; in Example 2 above with G = S3 and H = {(1), (1 3)}, the right coset H (1 2) = {(1 2), (1 2 3)} is not a left coset.

2. Quotient Spaces and Homomorphisms

129

Lemma 4.6. If H is a subgroup of the group G, then any two left cosets of H in G have the same cardinality, namely card H . REMARKS. We shall be especially interested in the case that card H is ﬁnite, and then we write |H | = card H for the number of elements in H . PROOF. If g1 H and g2 H are given, then the map g → g2 g1−1 g is one-one on G and carries g1 H onto g2 H . Hence g1 H and g2 H have the same cardinality. Taking g1 = 1, we see that this common cardinality is card H . We write G/H for the set {g H } of all left cosets of H in G, calling it the quotient space or left-coset space of G by H . The set {H g} of right cosets is denoted by H \G. Theorem 4.7 (Lagrange’s Theorem). If G is a ﬁnite group, then |G| = |G/H | |H |. Consequently the order of any subgroup of G divides the order of G. PROOF. Lemma 4.6 shows that each left coset has |H | elements. The left cosets are disjoint and exhaust G, and there are |G/H | left cosets. Thus G has |G/H | |H | elements. If a is an element of a group G, then we have seen that the powers a n of a form a cyclic subgroup of G that is isomorphic either to Z or to some group Z/mZ for a positive integer m. We say that a has ﬁnite order m when the cyclic group is isomorphic to Z/mZ. Otherwise a has inﬁnite order. In the ﬁnite-order case the order of a is thus the least positive integer n such that a n = 1. Corollary 4.8. If G is a ﬁnite group, then each element a of G has ﬁnite order, and the order of a divides the order of G. PROOF. The order of a equals |H | if H = {a n | n ∈ Z}, and Corollary 4.8 is thus a special case of Theorem 4.7. Corollary 4.9. If p is a prime, then the only group of order p, up to isomorphism, is the cyclic group C p , and it has no subgroups other than {1} and C p itself. PROOF. Suppose that G is a ﬁnite group of order p and that H = {1} is a subgroup of G. Let a = 1 be in H , and let P = {a n | n ∈ Z}. Since a = 1, Corollary 4.8 shows that the order of a is an integer > 1 that divides p. Since p is prime, the order of a must equal p. Then |P| = p. Since P ⊆ H ⊆ G and |G| = p, we must have P = G.

130

IV. Groups and Group Actions

Let G 1 and G 2 be groups. We say that ϕ : G 1 → G 2 is a homomorphism if ϕ(ab) = ϕ(a)ϕ(b) for all a and b in G. In other words, ϕ is to respect products, but it is not assumed that ϕ is one-one or onto. Any homomorphism ϕ automatically respects the identity and inverses, in the sense that • ϕ(1) = 1 (since ϕ(1) = ϕ(11) = ϕ(1)ϕ(1)), • ϕ(a −1 ) = ϕ(a)−1 (since 1 = ϕ(1) = ϕ(aa −1 ) = ϕ(a)ϕ(a −1 ) and similarly 1 = ϕ(a −1 )ϕ(a)). EXAMPLES. The following functions are homomorphisms: any isomorphism, the function ϕ : Z → Z/mZ given by ϕ(k) = k mod m, the function ϕ : Sn → {±1} given by ϕ(σ ) = sgn σ , the function ϕ : Z → G given for ﬁxed a in G by ϕ(n) = a n , and the function ϕ : G L(n, F) → F× given by ϕ(A) = det A. The image of a homomorphism ϕ : G 1 → G 2 is just the image of ϕ considered as a function. It is denoted by image ϕ = ϕ(G 1 ) and is necessarily a subgroup of G 2 since if ϕ(g1 ) = g2 and ϕ(g1 ) = g2 , then ϕ(g1 g1 ) = g2 g2 and ϕ(g1−1 ) = g2−1 . The kernel of a homomorphism ϕ : G 1 → G 2 is the set ker ϕ = ϕ −1 ({1}) = {x ∈ G 1 | ϕ(x) = 1}. This is a subgroup since if ϕ(x) = 1 and ϕ(y) = 1, then ϕ(x y) = ϕ(x)ϕ(y) = 1 and ϕ(x −1 ) = ϕ(x)−1 = 1. The homomorphism ϕ : G 1 → G 2 is one-one if and only if ker ϕ is the trivial group {1}. The necessity follows since 1 is already in ker ϕ, and the sufﬁciency follows since ϕ(x) = ϕ(y) implies that ϕ(x y −1 ) = 1 and therefore that x y −1 is in ker ϕ. The kernel H of a homomorphism ϕ : G 1 → G 2 has the additional property of being a normal subgroup of G 1 in the sense that ghg −1 is in H whenever g is in G 1 and h is in H , i.e., g H g −1 = H . In fact, if h is in ker ϕ and g is in G 1 , then ϕ(ghg −1 ) = ϕ(g)ϕ(h)ϕ(g)−1 = ϕ(g)ϕ(g)−1 = 1 shows that ghg −1 is in ker ϕ. EXAMPLES. (1) Any subgroup H of an abelian group G is normal since ghg −1 = gg −1 h = h. The alternating subgroup An of the symmetric group Sn is normal since An is the kernel of the homomorphism σ → sgn σ . (2) The subgroup H = {1, (1 3)} of S3 is not normal since (1 2)H (1 2)−1 = {1, (2 3)}. (3) If a subgroup H of a group G has just two left cosets, then H is normal even if G is an inﬁnite group. In fact, suppose G = H ∪ g0 H whenever g0 is not in H . Taking inverses of all elements of G, we see that G = H ∪ H g1 whenever g1 is not in H . If g in G is given, then either g is in H and g H g −1 = H , or g is not in H and g H = H g, so that g H g −1 = H in this case as well.

2. Quotient Spaces and Homomorphisms

131

Let H be a subgroup of G. Let us look for the circumstances under which G/H inherits a multiplication from G. The natural deﬁnition is ?

(g1 H )(g2 H ) = g1 g2 H, but we have to check that this deﬁnition makes sense. The question is whether we get the same left coset as product if we change the representatives of g1 H and g2 H from g1 and g2 to g1 h 1 and g2 h 2 . Since our prospective deﬁnition makes (g1 h 1 H )(g2 h 2 H ) = g1 h 1 g2 h 2 H , the question is whether g1 h 1 g2 h 2 H equals g1 g2 H . That is, we ask whether g1 h 1 g2 h 2 = g1 g2 h for some h in H . If this equality holds, then h 1 g2 h 2 = g2 h, and hence g2−1 h 2 g2 equals hh −1 2 , which is −1 an element of H . Conversely if every expression g2 h 2 g2 is in H , then we can go backwards and see that g1 h 1 g2 h 2 = g1 g2 h for some h in H , hence see that G/H indeed inherits a multiplication from G. Thus a necessary and sufﬁcient condition for G/H to inherit a multiplication from G is that the subgroup H is normal. According to the next proposition, the multiplication inherited by G/H when this condition is satisﬁed makes G/H into a group. Proposition 4.10. If H is a normal subgroup of a group G, then G/H becomes a group under the inherited multiplication (g1 H )(g2 H ) = (g1 g2 )H , and the function q : G → G/H given by q(g) = g H is a homomorphism of G onto G/H with kernel H . Consequently every normal subgroup of G is the kernel of some homomorphism. REMARKS. When H is normal, the group G/H is called a quotient group of G, and the homomorphism q : G → G/H is called the quotient homomorphism.5 In the special case that G = Z and H = mZ, the construction reduces to the construction of the additive group of integers modulo m and accounts for using the notation Z/mZ for that group. PROOF. The coset 1H is the identity, and (g H )−1 = g −1 H . Also, the computation (g1 H g2 H )g3 H = g1 g2 g3 H = g1 H (g2 H g3 H ) proves associativity. Certainly q is onto G/H . It is a homomorphism since q(g1 g2 ) = g1 g2 H = g1 H g2 H = q(g1 )q(g2 ). In analogy with what was shown for vector spaces in Proposition 2.25, quotients in the context of groups allow for the factorization of certain homomorphisms of groups. The appropriate result is stated as Proposition 4.11 and is pictured in Figure 4.1. We can continue from there along the lines of Section II.5. 5 Some

authors call G/H a “factor group.” A “factor set,” however, is something different.

132

IV. Groups and Group Actions

Proposition 4.11. Let ϕ : G 1 → G 2 be a homomorphism between groups, let H0 = ker ϕ, let H be a normal subgroup of G 1 contained in H0 , and deﬁne q : G 1 → G 1 /H to be the quotient homomorphism. Then there exists a homomorphism ϕ : G 1 /H → G 2 such that ϕ = ϕ ◦ q, i.e, ϕ(g1 H ) = ϕ(g1 ). It has the same image as ϕ, and ker ϕ = {h 0 H | h 0 ∈ H0 }. G1 ⏐ ⏐ q

ϕ

−−−→ G 2 ϕ

G 1 /H FIGURE 4.1. Factorization of homomorphisms of groups via the quotient of a group by a normal subgroup. REMARK. One says that ϕ factors through G 1 /H or descends to G 1 /H . See Figure 4.1. PROOF. We will have ϕ ◦ q = ϕ if and only if ϕ satisﬁes ϕ(g1 H ) = ϕ(g1 ). What needs proof is that ϕ is well deﬁned. Thus suppose that g1 and g1 are in the same left coset, so that g1 = g1 h with h in H . Then ϕ(g1 ) = ϕ(g1 )ϕ(h) = ϕ(g1 ) since H ⊆ ker ϕ, and ϕ is therefore well deﬁned. The computation ϕ(g1 H g2 H ) = ϕ(g1 g2 H ) = ϕ(g1 g2 ) = ϕ(g1 )ϕ(g2 ) = ϕ(g1 H )ϕ(g2 H ) shows that ϕ is a homomorphism. Since image ϕ = image ϕ, ϕ is onto image ϕ. Finally ker ϕ consists of all g1 H such that ϕ(g1 H ) = 1. Since ϕ(g1 H ) = ϕ(g1 ), the condition that g1 is to satisfy is that g1 be in ker ϕ = H0 . Hence ker ϕ = {h 0 H | h 0 ∈ H0 }, as asserted. Corollary 4.12. Let ϕ : G 1 → G 2 be a homomorphism between groups, and suppose that ϕ is onto G 2 and has kernel H . Then ϕ exhibits the group G 1 /H as canonically isomorphic to G 2 . PROOF. Take H = H0 in Proposition 4.11, and form ϕ : G 1 /H → G 2 with ϕ = ϕ ◦ q. The proposition shows that ϕ is onto G 2 and has trivial kernel, i.e., the identity element of G 1 /H . Having trivial kernel, ϕ is one-one. Theorem 4.13 (First Isomorphism Theorem). Let ϕ : G 1 → G 2 be a homomorphism between groups, and suppose that ϕ is onto G 2 and has kernel K . Then the map H1 → ϕ(H1 ) gives a one-one correspondence between (a) the subgroups H1 of G 1 containing K and (b) the subgroups of G 2 . Under this correspondence normal subgroups correspond to normal subgroups. If H1 is normal in G 1 , then g H1 → ϕ(g)ϕ(H1 ) is an isomorphism of G 1 /H1 onto G 2 /ϕ(H1 ).

2. Quotient Spaces and Homomorphisms

133

REMARK. In the special case of the last statement that ϕ : G 1 → G 2 is a quotient map q : G → G/K and H is a normal subgroup of G containing K , the last statement of the theorem asserts the isomorphism ( G/H ∼ = (G/K ) (H/K ). PROOF. The passage from (a) to (b) is by direct image under ϕ, and the passage from (b) to (a) will be by inverse image under ϕ −1 . Certainly the direct image of a subgroup as in (a) is a subgroup as in (b). To prove the one-one correspondence, we are to show that the inverse image of a subgroup as in (b) is a subgroup as in (a) and that these two constructions invert one another. For any subgroup H2 of G 2 , ϕ −1 (H2 ) is a subgroup of G 1 . In fact, if g1 and g1 are in ϕ −1 (H2 ), we can write ϕ(g1 ) = h 2 and ϕ(g1 ) = h 2 with h 2 and h 2 in H2 . Then the equations ϕ(g1 g1 ) = h 2 h 2 and ϕ(g1−1 ) = ϕ(g1 )−1 = h −1 2 show −1 −1 that h 2 h 2 and h 2 are in ϕ (H2 ). Moreover, the subgroup ϕ −1 (H2 ) contains ϕ −1 ({1}) = K . Therefore the inverse image under ϕ of a subgroup as in (b) is a subgroup as in (a). Since ϕ is a function, we have ϕ(ϕ −1 (H2 )) = H2 . Thus passing from (b) to (a) and back recovers the subgroup of G 2 . If H1 is a subgroup of G 1 containing K , we still need to see that H1 = −1 ϕ (ϕ(H1 )). Certainly H1 ⊆ ϕ −1 (ϕ(H1 )). For the reverse inclusion let g1 be in ϕ −1 (ϕ(H1 )). Then ϕ(g1 ) is in ϕ(H1 ), i.e., ϕ(g1 ) = ϕ(h 1 ) for some h 1 in H1 . −1 Since ϕ is a homomorphism, ϕ(g1 h −1 1 ) = 1. Thus g1 h 1 is in ker ϕ = K , which −1 is contained in H1 by assumption. Then h 1 and g1 h 1 are in H1 , and hence their −1 product (g1 h −1 1 )h 1 = g1 is in H1 . We conclude that ϕ (ϕ(H1 )) ⊆ H1 , and thus passing from (a) to (b) and then back recovers the subgroup of G 1 containing K . Next let us show that normal subgroups correspond to normal subgroups. If H2 is normal in G 2 , let H1 be the subgroup ϕ −1 (H2 ) of G 1 . For h 1 in H1 and g1 in G 1 , we can write ϕ(h 1 ) = h 2 with h 2 in H2 , and then ϕ(g1 h 1 g1−1 ) = ϕ(g1 )h 2 ϕ(g1 )−1 is in ϕ(g1 )H2 ϕ(g1 )−1 = H2 . Hence g1 h 1 g1−1 is in ϕ −1 (H2 ) = H1 . In the reverse direction let H1 be normal in G 1 , and let g2 be in G 2 . Since ϕ is onto G 2 , we can write g2 = ϕ(g1 ) for some g1 in G 1 . Then g2 ϕ(H1 )g2−1 = ϕ(g1 )ϕ(H1 )ϕ(g1 )−1 = ϕ(g1 H1 g1−1 ) = ϕ(H1 ). Thus ϕ(H1 ) is normal. For the ﬁnal statement let H2 = ϕ(H1 ). We have just proved that this image is normal, and hence G 2 /H2 is a group. The mapping : G 1 → G 2 /H2 given by (g1 ) = ϕ(g1 )H2 is the composition of two homomorphisms and hence is a homomorphism. Its kernel is {g1 ∈ G 1 | ϕ(g1 ) ∈ H2 } = {g1 ∈ G 1 | ϕ(g1 ) ∈ ϕ(H1 )} = ϕ −1 (ϕ(H1 )), and this equals H1 by the ﬁrst conclusion of the theorem. Applying Corollary 4.12 to , we obtain the required isomorphism : G 1 /H1 → G 2 /ϕ(H1 ).

134

IV. Groups and Group Actions

Theorem 4.14 (Second Isomorphism Theorem). Let H1 and H2 be subgroups of a group G with H2 normal in G. Then H1 ∩ H2 is a normal subgroup of H1 , the set H1 H2 of products is a subgroup of G with H2 as a normal subgroup, and the map h 1 (H1 ∩ H2 ) → h 1 H2 is a well-deﬁned canonical isomorphism of groups H1 /(H1 ∩ H2 ) ∼ = (H1 H2 )/H2 . PROOF. The set H1 ∩ H2 is a subgroup, being the intersection of two subgroups. −1 For h 1 in H1 , we have h 1 (H1 ∩ H2 )h −1 1 ⊆ h 1 H1 h 1 ⊆ H1 since H1 is a subgroup −1 and h 1 (H1 ∩ H2 )h −1 1 ⊆ h 1 H2 h 1 ⊆ H2 since H2 is normal in G. Therefore −1 h 1 (H1 ∩ H2 )h 1 ⊆ H1 ∩ H2 , and H1 ∩ H2 is normal in H1 . The set H1 H2 of products is a subgroup since h 1 h 2 h 1 h 2 = h 1 h 1 (h 1 −1 h 2 h 1 )h 2 −1 −1 and since (h 1 h 2 )−1 = (h −1 2 h 1 h 2 )h 2 , and H2 is normal in H1 H2 since H2 is normal in G. The function ϕ(h 1 (H1 ∩ H2 )) = h 1 H2 is well deﬁned since H1 ∩ H2 ⊆ H2 , and ϕ respects products. The domain of ϕ is {h 1 (H1 ∩ H2 ) | h 1 ∈ H1 }, and the kernel is the subset of this such that h 1 lies in H2 as well as H1 . For this to happen, h 1 must be in H1 ∩ H2 , and thus the kernel is the identity coset of H1 /(H1 ∩ H2 ). Hence ϕ is one-one. To see that ϕ is onto (H1 H2 )/H2 , let h 1 h 2 H2 be given. Then h 1 (H1 ∩ H2 ) maps to h 1 H2 , which equals h 1 h 2 H2 . Hence ϕ is onto.

3. Direct Products and Direct Sums We return to the matter of direct products and direct sums of groups, direct products having been discussed brieﬂy in Section 1. In a footnote in Section II.4 we mentioned a general principle in algebra that “whenever a new systematic construction appears for the objects under study, it is well to look for a corresponding construction with the functions relating these new objects.” This principle will be made more precise in Section 11 of the present chapter with the aid of the language of “categories” and “functors.” Another principle that will be relevant for us is that constructions in one context in algebra often recur, sometimes in slightly different guise, in other contexts. One example of the operation of this principle occurs with quotients. The construction and properties of the quotient of a vector space by a vector subspace, as in Section II.5, is analogous in this sense to the construction and properties of the quotient of a group by a normal subgroup, as in Section 2 in the present chapter. The need for the subgroup to be normal is an example of what is meant by “slightly different guise.” Anyway, this principle too will be made more precise in Section 11 of the present chapter using the language of categories and functors.

3. Direct Products and Direct Sums

135

Let us proceed with an awareness of both these principles in connection with direct products and direct sums of groups, looking for analogies with what happened for vector spaces and expecting our work to involve constructions with homomorphisms as well as with groups. The external direct product G 1 × G 2 was deﬁned as a group in Section 1 to be the set-theoretic product with coordinate-by-coordinate multiplication. There are four homomorphisms of interest connected with G 1 × G 2 , namely i1 : G 1 → G 1 × G 2

given by i 1 (g1 ) = (g1 , 1),

i2 : G 2 → G 1 × G 2

given by i 2 (g2 ) = (1, g2 ),

p1 : G 1 × G 2 → G 1

given by

p1 (g1 , g2 ) = g1 ,

given by p2 (g1 , g2 ) = g2 . p2 : G 1 × G 2 → G 2 Recall from the discussion before Proposition 4.5 that Proposition 2.30 for the direct product of two vector spaces does not translate directly into an analog for the direct product of groups; instead that proposition is replaced by Proposition 4.5, which involves some condition of commutativity. Warned by this anomaly, let us work with mappings rather than with groups and subgroups, and let us use mappings in formulating a deﬁnition of the direct product of groups. As with the direct product of two vector spaces, the mappings to use are p1 and p2 but not i 1 and i 2 . The way in which p1 and p2 enter is through the effect of the direct product on homomorphisms. If ϕ1 : H → G 1 and ϕ2 : H → G 2 are two homomorphisms, then h → (ϕ1 (h), ϕ2 (h)) is the corresponding homomorphism of H into G 1 × G 2 . In order to state matters fully, let us give the deﬁnition with an arbitrary number of factors. Let S be an arbitrary nonempty set of groups, and let G s be the group corresponding to the member s of S. The external direct product of the G s ’s consists of a group s∈S G s and a system of group homomorphisms. The group as a set is ×s∈S G s , whose elements are arbitrary functions from S to that the value of the function at s is in G s , and the group law is s∈S G ssuch are the coordinate {gs }s∈S {gs }s∈S = {gs gs }s∈S . The group homomorphisms mappings ps0 : s∈S G s → G s0 with ps0 {gs }s∈S = gs0 . The individual groups product of n groups may be written as G s are called the factors, and a direct G 1 ×· · ·×G n instead of with the symbol . The group s∈S G s has the universal mapping property described in Proposition 4.15 and pictured in Figure 4.2. Proposition 4.15 (universal mapping property of external direct product). Let {G s | s ∈ S} be a nonempty set of groups, and let s∈S G s be the external direct product, the associated group homomorphisms being the coordinate mappings ps0 : s∈S G s → G s0 . If H is any group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : H → G s , then there exists a unique group homomorphism ϕ : H → s∈S G s such that ps0 ◦ ϕ = ϕs0 for all s0 ∈ S.

136

IV. Groups and Group Actions ϕs

G s0 ←−−− H ϕ p s0 ⏐ ⏐ s∈S G s FIGURE 4.2. Universal mapping property of an external direct product of groups. PROOF s0 (ϕ(h)) . Existence of ϕ is proved by taking ϕ(h) = {ϕs (h)}s∈S . Then p = ps0 {ϕs (h)}s∈S = ϕs0 (h) as required. For uniqueness let ϕ : H → s∈S G s be a homomorphism with ps0 ◦ ϕ = ϕs0 for all s0 ∈ S. For each h in H , we can write ϕ (h) = {ϕ (h)s }s∈S . For s0 in S, we then have ϕs0 (h) = ( ps0 ◦ ϕ )(h) = ps0 (ϕ (h)) = ϕ (h)s0 , and we conclude that ϕ = ϕ. Now we give an abstract deﬁnition of direct product that allows for the possibility that the direct product is “internal” in the sense that the various factors are identiﬁed as subgroups of a given group. The deﬁnition is by means of the above universal mapping property and will be seen to characterize the direct product up to canonical isomorphism. Let S be an arbitrary nonempty set of groups, and let G s be the group corresponding to the member s of S. A direct product of the G s ’s consists of a group G and a system of group homomorphisms ps : G → G s for s ∈ S with the following universal mapping property: whenever H is a group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : H → G s , then there exists a unique group homomorphism ϕ : H → G such that ps ◦ ϕ = ϕs for all s ∈ S. Proposition 4.15 proves existence of a direct product, and the next proposition addresses uniqueness. A direct product is internal if each G s is a subgroup of G and each restriction ps G s is the identity map. ϕs

G s ←−−− H ⏐ ϕ ps ⏐ G FIGURE 4.3. Universal mapping property of a direct product of groups. Proposition 4.16. Let S be a nonempty set of groups, and let G s be the group corresponding to the member s of S. If (G, { ps }) and (G , { ps }) are two direct products, then the homomorphisms ps : G → G s and ps : G → G s are onto G s , there exists a unique homomorphism : G → G such that ps = ps ◦ for all s ∈ S, and is an isomorphism. PROOF. In Figure 4.3 let H = G and ϕs = ps . If : G → G is the homomorphism produced by the fact that G is a direct product, then we have

3. Direct Products and Direct Sums

137

ps ◦ = ps for all s. Reversing the roles of G and G , we obtain a homomorphism : G → G with ps ◦ = ps for all s. Therefore ps ◦(◦ ) = ps ◦ = ps . In Figure 4.3 we next let H = G and ϕs = ps for all s. Then the identity 1G on G has the same property ps ◦ 1G = ps relative to all ps that ◦ has, and the uniqueness says that ◦ = 1G . Reversing the roles of G and G , we obtain ◦ = 1G . Therefore is an isomorphism. For uniqueness suppose that : G → G is another homomorphism with ps = ps ◦ for all s ∈ S. Then the argument of the previous paragraph shows that ◦ = 1G . Applying on the left gives = (◦ )◦ = ◦( ◦) = ◦ 1G = . Thus = . Finally we have to show that the s th mapping of a direct product is onto G s . It isenough to show that ps is onto G s . Taking G as the external direct product s∈S G s with ps equal to the coordinate mapping, form the isomorphism : G → G that has just been proved to exist. This satisﬁes ps = ps ◦ for all s ∈ S. Since ps is onto G s , ps must be onto G s . Let us turn to direct sums. Part of what we seek is a deﬁnition that allows for an abstract characterization of direct sums in the spirit of Proposition 4.16. In particular, the interaction with homomorphisms is to be central to the discussion. In the case of two factors, we use i 1 and i 2 rather than p1 and p2 . If ϕ1 : G 1 → H and ϕ2 : G 2 → H are two homomorphisms, then the corresponding homomorphism ϕ of G 1 ⊕ G 2 to H is to satisfy ϕ1 = ϕ ◦ i 1 and ϕ2 = ϕ ◦ i 2 . With G 1 ⊕ G 2 deﬁned, as expected, to be the same group as G 1 × G 2 , we are led to the formula ϕ(g1 , g2 ) = ϕ(g1 , 1)ϕ(1, g2 ) = ϕ1 (g1 )ϕ2 (g2 ). The images of commuting elements under a homomorphism have to commute, and hence H had better be abelian. Then in order to have an analog of Proposition 4.16, we will want to specialize H at some point to G 1 ⊕ G 2 , and therefore G 1 and G 2 had better be abelian. With these observations in place, we are ready for the general deﬁnition. Let S be an arbitrary nonempty set of abelian groups, and let G s be the group corresponding to the member s of S. We shall use additive notation for the group operation in each G s . The external direct sum of the G s ’s consists of an abelian and a system of group homomorphisms i s for s ∈ S. The group is group s∈S G s the subgroup of s∈S G s of all elements that are equal to 0 in all but ﬁnitely many coordinates. The group homomorphisms are the mappings i s0 : G s0 → s∈S G s carrying a member gs0 of G s0 to the element that is gs0 in coordinate s0 and is 0 at all other coordinates. The individual groups are called the summands, and adirect sum of n abelian groups may be written as G 1 ⊕ · · · ⊕ G n . The group s∈S G s has the universal mapping property described in Proposition 4.17 and pictured in Figure 4.4.

138

IV. Groups and Group Actions

Proposition 4.17 (universal mapping property of external direct sum). Let {G s | s ∈ S} be a nonempty set of abelian groups, and let s∈S G s be the external direct sum, the associated group homomorphisms being the embedding mappings i s0 : G s0 → s∈S G s . If H is any abelian group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : G s → H , then there exists a unique group homomorphism ϕ : s∈S G s → H such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. G s0 ⏐ ⏐ i s0 s∈S

ϕs

−−−→ H ϕ

Gs

FIGURE 4.4. Universal mapping property of an external direct sum of abelian groups. PROOF. Existence of ϕ is proved by taking ϕ {gs }s∈S = s ϕs (gs ). The sum on the right side is meaningful since the element {gs }s∈S of the direct sum has only ﬁnitely many nonzero coordinates. Since H is abelian, the computation ϕ {gs }s∈S + ϕ {gs }s∈S = s ϕs (gs ) + s ϕs (gs ) = s (ϕs (gs ) + ϕs (gs )) = s ϕs (gs + gs ) = ϕ {gs + gs }s∈S = ϕ {gs }s∈S + {gs }s∈S shows that ϕ is a homomorphism. If gs0 is given and {gs }s∈S denotes the elth ement that is g s0 in the s0 coordinate and is 0 elsewhere, then ϕ(i s0 (gs0 )) = ϕ {gs }s∈S = s ϕs (gs ), and the right side equals ϕs0 (gs0 ) since gs = 0 for all other s’s. Thus ϕ ◦ i s0 = ϕ s0 . For uniqueness let ϕ : s∈S G s → H be a homomorphism with ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. Then the value of ϕ is determined at all elements of s∈S G s that are 0 in all but one coordinate. Since the most general member of s∈S G s is a ﬁnite sum of such elements, ϕ is determined on all of s∈S G s . Now we give an abstract deﬁnition of direct sum that allows for the possibility that the direct sum is “internal” in the sense that the various constituents are identiﬁed as subgroups of a given group. Again the deﬁnition is by means of a universal mapping property and will be seen to characterize the direct sum up to canonical isomorphism. Let S be an arbitrary nonempty set of abelian groups, and let G s be the group corresponding to the member s of S. A direct sum of the G s ’s consists of an abelian group G and a system of group homomorphisms i s : G s → G for s ∈ S with the following universal mapping property: whenever H is an abelian group and {ϕs | s ∈ S} is a system of group homomorphisms

3. Direct Products and Direct Sums

139

ϕs : G s → H , then there exists a unique group homomorphism ϕ : G → H such that ϕ ◦ i s = ϕs for all s ∈ S. Proposition 4.17 proves existence of a direct sum, and the next proposition addresses uniqueness. A direct sum is internal if each G s is a subgroup of G and each mapping i s is the inclusion mapping. ϕs

G s −−−→ H ⏐ ⏐ ϕ is G FIGURE 4.5. Universal mapping property of a direct sum of abelian groups. Proposition 4.18. Let S be a nonempty set of abelian groups, and let G s be the group corresponding to the member s of S. If (G, {i s }) and (G , {i s }) are two direct sums, then the homomorphisms i s : G s → G and i s : G s → G are one-one, there exists a unique homomorphism : G → G such that i s = ◦ i s for all s ∈ S, and is an isomorphism. PROOF. In Figure 4.5 let H = G and ϕs = i s . If : G → G is the homomorphism produced by the fact that G is a direct sum, then we have ◦ i s = i s for all s. Reversing the roles of G and G , we obtain a homomorphism : G → G with ◦ i s = i s for all s. Therefore ( ◦ ) ◦ i s = ◦ i s = i s . In Figure 4.5 we next let H = G and ϕs = i s for all s. Then the identity 1G on G has the same property 1G ◦ i s = i s relative to all i s that ◦ has, and the uniqueness says that ◦ = 1G . Reversing the roles of G and G , we obtain ◦ = 1G . Therefore is an isomorphism. For uniqueness suppose that : G → G is another homomorphism with i s = ◦ i s for all s ∈ S. Then the argument of the previous paragraph shows that ◦ = 1G . Applying on the left gives = ( ◦ ) ◦ = ◦ ( ◦ ) = ◦ 1G = . Thus = . Finally we have to show that the s th mapping of a direct sum is one-one on G s . Itis enough to show that i s is one-one on G s . Taking G as the external direct sum s∈S G s with i s equal to the embedding mapping, form the isomorphism : G → G that has just been proved to exist. This satisﬁes i s = ◦ i s for all s ∈ S. Since i s is one-one, i s must be one-one. EXAMPLE. The group Q× is the direct sum of copies of Z, one for each prime, plus one copy of Z/2Z. If p is a prime, the mapping i p : Z → Q× is given by i p (n) = p n . The remaining coordinate gives the sign. The isomorphism results from unique factorization, only ﬁnitely many primes being involved for any particular nonzero rational number.

140

IV. Groups and Group Actions

4. Rings and Fields In this section we begin a two-section digression in order to develop some more number theory beyond what is in Chapter I and to make some deﬁnitions as new notions arise. In later sections of the present chapter, some of this material will yield further examples of concrete groups and tools for working with them. We begin with the additive group Z/mZ of integers modulo a positive integer m. We continue to write [a] for the equivalence class of the integer a when it is helpful to do so. Our interest will be in the multiplication structure that Z/mZ inherits from multiplication in Z. Namely, we attempt to deﬁne [a][b] = [ab]. To see that this formula is meaningful in Z/mZ, we need to check that the same equivalence class results on the right side if the representatives of [a] and [b] are changed. Thus let [a] = [a ] and [b] = [b ]. Then m divides a − a and b − b and must divide the sum of products (a − a )b + a (b − b ) = ab − a b . Consequently [ab] = [a b ], and multiplication is well deﬁned. If x and y are in Z/mZ, their product is often denoted by x y mod m. The same kind of argument as just given shows that the associativity of multiplication in Z and the distributive laws imply corresponding facts about Z/mZ. The result is that Z/mZ is a “commutative ring with identity” in the sense of the following deﬁnitions. A ring is a set R with two operations R × R → R, usually called addition and multiplication and often denoted by (a, b) → a + b and (a, b) → ab, such that (i) R is an abelian group under addition, (ii) multiplication is associative in the sense that a(bc) = (ab)c for all a, b, c in R, (iii) the two distributive laws a(b + c) = (ab) + (ac)

and

(b + c)a = (ba) + (ca)

hold for all a, b, c in R. The additive identity is denoted by 0, and the additive inverse of a is denoted by −a. A sum a + (−b) is often abbreviated a − b. By convention when parentheses are absent, multiplications are to be carried out before additions and subtractions. Thus the distributive laws may be rewritten as a(b + c) = ab + ac

and

(b + c)a = ba + ca.

A ring R is called a commutative ring if multiplication satisﬁes the commutative law (iv) ab = ba for all a and b in R.

4. Rings and Fields

141

A ring R is called a ring with identity6 if there exists an element 1 such that 1a = a1 = a for all a in R. It is immediate from the deﬁnitions that • 0a = 0 and a0 = 0 in any ring (since, in the case of the ﬁrst formula, 0 = 0a − 0a = (0 + 0)a − 0a = 0a + 0a − 0a = 0a), • the multiplicative identity is unique in a ring with identity (since 1 = 1 1 = 1), • (−1)a = −a = a(−1) in any ring with identity (partly since 0 = 0a = (1 + (−1))a = 1a + (−1)a = a + (−1)a). In a ring with identity, it will be convenient not to insist that the identity be different from the zero element 0. If 1 and 0 do happen to coincide in R, then it readily follows that 0 is the only element of R, and R is said to be the zero ring. The set Z of integers is a basic example of a commutative ring with identity. Returning to Z/mZ, suppose now that m is a prime p. If [a] is in Z/ pZ with a in {1, 2, . . . , p − 1}, then GCD(a, p) = 1 and Proposition 1.2 produces integers r and s with ar + ps = 1. Modulo p, this equation reads [a][r ] = [1]. In other words, [r ] is a multiplicative inverse of [a]. The result is that Z/ pZ, when p is a prime, is a “ﬁeld” in the sense of the following deﬁnition. A ﬁeld F is a commutative ring with identity such that F = 0 and such that (v) to each a = 0 in F corresponds an element a −1 in F such that aa −1 = 1. In other words, F× = F − {0} is an abelian group under multiplication. Inverses are necessarily unique as a consequence of one of the properties of groups. When p is prime, we shall write F p for the ﬁeld Z/ pZ. Its multiplicative group F× p has order p − 1, and Lagrange’s Theorem (Corollary 4.8) immediately implies that a p−1 ≡ 1 mod p whenever a and p are relatively prime. This result is known as Fermat’s Little Theorem.7 For general m, certain members of Z/mZ have multiplicative inverses. The product of two such elements is again one, and the inverse of one is again one. Thus, even though Z/mZ need not be a ﬁeld, the subset (Z/mZ)× of members of Z/mZ with multiplicative inverses is a group. The same argument as when m is prime shows that the class of a has an inverse if and only if GCD(a, m) = 1. The number of such classes was deﬁned in Chapter I in terms of the Euler ϕ function as ϕ(m), and a formula for ϕ(m) was obtained in Corollary 1.10. The 6 Some authors, particularly when discussing only algebra, ﬁnd it convenient to incorporate the existence of an identity into the deﬁnition of a ring. However, in real analysis some important natural rings do not have an identity, and the theory is made more complicated by forcing an identity into the picture. For example the space of integrable functions on R forms a very natural ring, with convolution as multiplication, and there is no identity; forcing an identity into the picture in such a way that the space remains stable under translations makes the space large and unwieldy. The distinction between working with rings and working with rings with identity will be discussed further in Section 11. 7 As opposed to Fermat’s Last Theorem, which lies deeper.

IV. Groups and Group Actions

142

conclusion is that (Z/mZ)× is an abelian group of order ϕ(m). Application of Lagrange’s Theorem yields Euler’s generalization of Fermat’s Little Theorem, namely that a ϕ(m) ≡ 1 mod m for every positive integer m and every integer a relatively prime to m. More generally, in any ring R with identity, a unit is deﬁned to be any element a such that there exists an element a −1 with aa −1 = a −1 a = 1. The element a −1 is unique if it exists8 and is called the multiplicative inverse of a. The units of R form a group denoted by R × . For example the group Z× consists of +1 and −1, and the zero ring R has R × = {0}. If R is a nonzero ring, then 0 is not in R × . Here are some further examples of ﬁelds. EXAMPLES OF FIELDS. (1) Q, R, and C. These are all ﬁelds. (2) Q[θ]. This was introduced between Examples 8 and 9 of Section 1. It is assumed that θ is a complex number and that there exists an integer n > 0 such that the complex numbers 1, θ, θ 2 , . . . , θ n are linearly dependent over Q. The set Q[θ] is deﬁned to be the linear span over Q of all powers 1, θ, θ 2 , . . . of θ, which is the same as the linear span of the ﬁnite set 1, θ, θ 2 , . . . , θ n−1 . The set Q[θ] was shown in Proposition 4.1 to be a subset of C that is closed under the arithmetic operations, including the passage to reciprocals in the case of the nonzero elements. It is therefore a ﬁeld. (3) A ﬁeld of 4 elements. Let F4 = {0, 1, θ, θ +1}, where θ is some symbol not standing for 0 or 1. Deﬁne addition in F4 and multiplication in F× 4 by requiring that a + 0 = 0 + a = a for all a, that 1 + 1 = 0,

1 + θ = (θ + 1),

1 + (θ + 1) = θ,

θ + 1 = (θ + 1),

θ + θ = 0,

θ + (θ + 1) = 1,

(θ + 1) + θ = 1,

(θ + 1) + (θ + 1) = 0,

11 = 1,

1θ = θ,

1(θ + 1) = (θ + 1),

θ1 = θ,

θ θ = (θ + 1),

θ (θ + 1) = 1,

(θ + 1) + 1 = θ, and that

(θ + 1)1 = (θ + 1),

(θ + 1)θ = 1,

(θ + 1)(θ + 1) = θ.

The result is a ﬁeld. With this direct approach a certain amount of checking is necessary to verify all the properties of a ﬁeld. We shall return to this matter in Chapter IX when we consider ﬁnite ﬁelds more generally, and we shall then have a way of constructing F4 that avoids tedious checking. 8 In fact, if b and c exist with ab = ca = 1, then a is a unit with a −1 = b = c because b = 1b = (ca)b = c(ab) = c1 = c.

4. Rings and Fields

143

In analogy with the theory of groups, we deﬁne a subring of a ring to be a nonempty subset that is closed under addition, negation, and multiplication. The set 2Z of even integers is a subring of the ring Z of integers. A subﬁeld of a ﬁeld is a subset containing 0 and 1 that is closed under addition, negation, multiplication, and multiplicative inverses for its nonzero elements. The set Q of rationals is a subﬁeld of the ﬁeld R of reals. Intermediate between rings and ﬁelds are two kinds of objects—integral domains and division rings—that arise frequently enough to merit their own names. The setting for the ﬁrst is a commutative ring R. A nonzero element a of R is called a zero divisor if there is some nonzero b in R with ab = 0. For example the element 2 in the ring Z/6Z is a zero divisor because 2 · 3 = 0. An integral domain is a nonzero commutative ring with identity having no zero divisors. Fields have no zero divisors since if a and b are nonzero, then ab = 0 would force b = 1b = (a −1 a)b = a −1 (ab) = a −1 0 = 0 and would give a contradiction; therefore every ﬁeld is an integral domain. The ring of integers Z is another example of an integral domain, and the polynomial rings Q[X ] and R[X ] and C[X ] introduced in Section I.3 are further examples. A cancellation law for multiplication holds in any integral domain: ab = ac with a = 0

implies

b = c.

In fact, ab = ac implies a(b − c) = 0; since a = 0, b − c must be 0. The other object with its own name is a division ring, which is a nonzero ring with identity such that every nonzero element is a unit. The commutative division rings are the ﬁelds, and we have encountered only one noncommutative division ring so far. That is the set H of quaternions, which was introduced in Section 1. Division rings that are not ﬁelds will play only a minor role in this book but are of importance in Advanced Algebra. Let us turn to mappings. A function ϕ : R → R between two rings is an isomorphism of rings if ϕ is one-one onto and satisﬁes ϕ(a + b) = ϕ(a) + ϕ(b) and ϕ(ab) = ϕ(a)ϕ(b) for all a and b in R. In other words, ϕ is to be an isomorphism of the additive groups and to satisfy ϕ(ab) = ϕ(a)ϕ(b). Such a mapping carries the identity, if any, in R to the identity of R . The relation “is isomorphic to” is an equivalence relation. Common notation for an isomorphism of rings is R ∼ = R ; because of the symmetry, one can say that R and R are isomorphic. A function ϕ : R → R between two rings is a homomorphism of rings if ϕ satisﬁes ϕ(a + b) = ϕ(a) + ϕ(b) and ϕ(ab) = ϕ(a)ϕ(b) for all a and b in R. In other words, ϕ is to be a homomorphism of the additive groups and to satisfy ϕ(ab) = ϕ(a)ϕ(b). EXAMPLES OF HOMOMORPHISMS OF RINGS. (1) The mapping ϕ : Z → Z/mZ given by ϕ(k) = k mod m.

144

IV. Groups and Group Actions

(2) The evaluation mapping ϕ : R[X ] → R given by P(X ) → P(r ) for some ﬁxed r in R. (3) Mappings with the direct product Z×Z. The additive group Z×Z becomes a commutative ring with identity under coordinate-by-coordinate multiplication, namely (a, a ) + (b, b ) = (a + b, a + b ). The identity is (1, 1). Projection (a, a ) → a to the ﬁrst coordinate is a homomorphism of rings Z × Z → Z that carries identity to identity. Inclusion a → (a, 0) of Z into the ﬁrst coordinate is a homomorphism of rings Z → Z × Z that does not carry identity to identity.9 Proposition 4.19. If R is a ring with identity 1 R , then there exists a unique homomorphism of rings ϕ1 : Z → R such that ϕ(1) = 1 R . PROOF. The formulas for manipulating exponents of an element in a group, when translated into the additive notation for addition in R, say that n → nr satisﬁes (m + n)r = mr + nr and (mn)r = m(nr ) for all r in R and all integers m and n. The ﬁrst of these formulas implies, for any r in R, that ϕr (n) = nr is a homomorphism between the additive groups of Z and R, and it is certainly uniquely determined by its value for n = 1. The distributive laws imply that ψr (r ) = r r is another homomorphism of additive groups. Hence ψr ◦ ϕr and ϕr r are homomorphisms between the additive groups of Z and R. Since (ψr ◦ ϕr )(1) = ψr (r ) = r r = ϕr r (1), we must have (ψr ◦ ϕr )(m) = ϕr r (m) for all integers m. Thus (mr )r = m(r r ) for all m. Putting r = n1 R and r = 1 R proves the fourth equality of the computation ϕ1 (mn) = (mn)1 R = m(n1 R ) = m(1 R (n1 R )) = (m1 R )(n1 R ) = ϕ1 (m)ϕ1 (n), and shows that ϕ1 is in fact a homomorphism of rings.

The image of a homomorphism ϕ : R → R of rings is a subring of R , as is easily checked. The kernel turns out to be more than just of subring of R. If a is in the kernel and b is any element of R, then ϕ(ab) = ϕ(a)ϕ(b) = 0ϕ(b) = 0 and similarly ϕ(ba) = 0. Thus the kernel of a ring homomorphism is closed under products of members of the kernel with arbitrary members of R. Adapting a deﬁnition to this circumstance, one says that an ideal I of R (or two-sided ideal in case of ambiguity) is an additive subgroup such that ab and ba are in I whenever a is in I and b is in R. Brieﬂy then, the kernel of a homomorphism of rings is an ideal. Conversely suppose that I is an ideal in a ring R. Since I is certainly an additive subgroup of an abelian group, we can form the additive quotient group 9 Sometimes authors who build the existence of an identity into the deﬁnition of “ring” insist as a matter of deﬁnition that homomorphisms of rings carry identity to identity. Such authors would then exclude this particular mapping from consideration as a homomorphism.

4. Rings and Fields

145

R/I . It is customary to write the individual cosets in additive notation, thus as r + I . In analogy with Proposition 4.10, we have the following result for the present context. Proposition 4.20. If I is an ideal in a ring R, then a well-deﬁned operation of multiplication is obtained within the additive group R/I by the deﬁnition (r1 + I )(r2 + I ) = r1r2 + I , and R/I becomes a ring. If R has an identity 1, then 1 + I is an identity in R/I . With these deﬁnitions the function q : R → R/I given by q(r ) = r + I is a ring homomorphism of R onto R/I with kernel I . Consequently every ideal of R is the kernel of some homomorphism of rings. REMARKS. When I is an ideal, the ring R/I is called a quotient ring10 of R, and the homomorphism q : R → R/I is called the quotient homomorphism. In the special case that R = Z and I = mZ, the construction of R/I reduces to the construction of Z/mZ as a ring at the beginning of this section. PROOF. If we change the representatives of the cosets from r1 and r2 to r1 + i 1 and r2 + i 2 with i 1 and i 2 in I , then (r1 + i 1 )(r2 + i 2 ) = r1r2 + (i 1r1 + r1 i 2 + i 1 i 2 ) is in r1r2 + I by the closure properties of I . Hence multiplication is well deﬁned. The associativity of this multiplication follows from the associativity of multiplication in R because (r1 + I )(r2 + I ) (r3 + I ) = (r1r2 + I )(r3 + I ) = (r1r2 )r3 + I = r1 (r2r3 ) + I = (r1 + I )(r2r3 + I ) = (r1 + I ) (r2 + I )(r3 + I ) . Similarly the computation (r1 + I ) (r2 + I ) + (r3 + I ) = r1 (r2 + r3 ) + I = (r1r2 + r1r3 ) + I = (r1 + I )(r2 + I ) + (r1 + I )(r3 + I ) yields one distributive law, and the other distributive law is proved in the same way. If R has an identity 1, then (1 + I )(r + I ) = 1r + I = r + I and (r + I )(1 + I ) = r 1 + I = r + I show that 1 + I is an identity in R/I . Finally we know that the quotient map q : R → R/I is a homomorphism of additive groups, and the computation q(r1r2 ) = r1r2 + I = (r1 + I )(r2 + I ) = q(r1 )q(r2 ) shows that q is a homomorphism of rings. EXAMPLES OF IDEALS. (1) The ideals in the ring Z coincide with the additive subgroups and are the sets mZ; the reason each mZ is an ideal is that if a and b are integers and m divides a, then m divides ab. 10 Quotient rings are known also as “factor rings.” A “ring of quotients,” however, is something different.

146

IV. Groups and Group Actions

(2) The ideals in a ﬁeld F are 0 and F itself, no others; in fact, if a = 0 is in an ideal and b is in F, then the equality b = (ba −1 )a shows that b is in the ideal and that the ideal therefore contains all elements of F. (3) If R is Q[X ] or R[X ] or C[X ], then every ideal I is of the form I = R f (X ) for some polynomial f (X ). In fact, we can take f (X ) = 0 if I = 0. If I = 0, let f (X ) be a nonzero member of I of lowest possible degree. If A(X ) is in I , then Proposition 1.12 shows that A(X ) = f (X )B(X ) + C(X ) with C(X ) = 0 or deg C < deg f . The equality C(X ) = A(X ) − f (X )B(X ) shows that C(X ) is in I , and the minimality of deg f implies that C(X ) = 0. Thus A(X ) = f (X )B(X ). (4) In a ring R with identity 1, an ideal I is a proper subset of R if and only if 1 is not in I . In fact, I is certainly a proper subset if 1 is not in I . In the converse direction if 1 is in I , then every element r = r 1, for r in R, lies in I . Hence I = R, and I is not a proper subset. In analogy with what was shown for vector spaces in Proposition 2.25 and for groups in Proposition 4.11, quotients in the context of rings allow for the factorization of certain homomorphisms of rings. The appropriate result is stated as Proposition 4.21 and is pictured in Figure 4.6. Proposition 4.21. Let ϕ : R1 → R2 be a homomorphism of rings, let I0 = ker ϕ, let I be an ideal of R1 contained in I0 , and let q : R1 → R1 /I be the quotient homomorphism. Then there exists a homomorphism of rings ϕ : R1 /I → R2 such that ϕ = ϕ ◦ q, i.e., ϕ(r1 + I ) = ϕ(r1 ). It has the same image as ϕ, and ker ϕ = {r + I | r ∈ I0 }. R1 ⏐ ⏐ q

ϕ

−−−→ R2 ϕ

R1 /I FIGURE 4.6. Factorization of homomorphisms of rings via the quotient of a ring by an ideal. REMARK. One says that ϕ factors through R1 /I or descends to R1 /I . PROOF. Proposition 4.11 shows that ϕ descends to a homomorphism ϕ of the additive group of R1 /I into the additive group of R2 and that all the other conclusions hold except possibly for the fact that ϕ respects multiplication. To see that ϕ respects multiplication, we just compute that ϕ((r + I )(r + I )) = ϕ(rr + I ) = ϕ(rr ) = ϕ(r )ϕ(r ) = ϕ(r + I )ϕ(r + I ).

5. Polynomials and Vector Spaces

147

An example of special interest occurs when ϕ is a homomorphism of rings ϕ : Z → R and the ideal mZ of Z is contained in the kernel of ϕ. Then the proposition says that ϕ descends to a homomorphism of rings ϕ : Z/mZ → R. We shall make use of this result shortly. But ﬁrst let us state a different special case as a corollary. Corollary 4.22. Let ϕ : R1 → R2 be a homomorphism of rings, and suppose that ϕ is onto R2 and has kernel I . Then ϕ exhibits the ring R1 /I as canonically isomorphic to R2 . PROOF. Take I = I0 in Proposition 4.21, and form ϕ : R1 /I → R2 with ϕ = ϕ ◦ q. The proposition shows that ϕ is onto R2 and has trivial kernel, i.e., the identity element of R1 /I . Having trivial kernel, ϕ is one-one. Proposition 4.23. Any ﬁeld F contains a subﬁeld isomorphic to the rationals Q or to some ﬁeld F p with p prime. REMARKS. The subﬁeld in the proposition is called the prime ﬁeld of F. The characteristic of F is deﬁned to be 0 if the prime ﬁeld is isomorphic to Q and to be p if the prime ﬁeld is isomorphic to F p . PROOF. Proposition 4.19 produces a homomorphism of rings ϕ1 : Z → F with ϕ1 (1) = 1. The kernel of ϕ1 is an ideal, necessarily of the form mZ with m an integer ≥ 0, and the image of ϕ1 is a commutative subring with identity in F. Let ϕ 1 : Z/mZ → F be the descended homomorphism given by Proposition 4.21. The integer m cannot factor nontrivially, say as m = r s, because otherwise ϕ 1 (r ) and ϕ 1 (s) would be nonzero members of F with ϕ 1 (r )ϕ 1 (s) = ϕ 1 (r s) = ϕ 1 (0) = 0, in contradiction to the fact that a ﬁeld has no zero divisors. Thus m is prime or m is 0. If m is a prime p, then Z/ pZ is a ﬁeld, and the image of ϕ 1 is the required subﬁeld of F. Thus suppose that m = 0. Then ϕ1 is one-one, and F contains a subring with identity isomorphic to Z. Deﬁne a function 1 : Q → F by saying that if k and l are integers with l = 0, then 1 (kl −1 ) = ϕ1 (k)ϕ1 (l)−1 . This is well deﬁned because ϕ1 (l) = 0 and because k1l1−1 = k2l2−1 implies k1l2 = k2l1 and hence ϕ1 (k1 )ϕ1 (l2 ) = ϕ1 (k2 )ϕ1 (l1 ) and ϕ1 (k1 )ϕ1 (l1 )−1 = ϕ1 (k2 )ϕ1 (l2 )−1 . We readily check that 1 is a homomorphism with kernel 0. Then F contains the subﬁeld 1 (Q) isomorphic to Q.

5. Polynomials and Vector Spaces In this section we complete the digression begun in Section 4. We shall be using the elementary notions of rings and ﬁelds established in Section 4 in order to

148

IV. Groups and Group Actions

work with (i) polynomials over any commutative ring with identity and (ii) vector spaces over arbitrary ﬁelds. It is an important observation that a good deal of what has been proved so far in this book concerning polynomials when F is Q or R or C remains valid when F is any ﬁeld. Speciﬁcally all the results in Section I.3 through Theorem 1.17 on the topic of polynomials in one indeterminate remain valid as long as the coefﬁcients are from a ﬁeld. The theory breaks down somewhat when one tries to extend it by allowing coefﬁcients that are not in a ﬁeld or by allowing more than one indeterminate. Because of this circumstance and because we have not yet announced a universal mapping property for polynomial rings and because we have not yet addressed the several-variable case, we shall brieﬂy review matters now while extending the reach of the theory that we have. Let R be a nonzero commutative ring with identity, so that 1 = 0. A polynomial in one indeterminate is to be an expression P(X ) = an X n +· · ·+a2 X 2 +a1 X +a0 in which X is a symbol, not a variable. Nevertheless, the usual kinds of manipulations with polynomials are to be valid. This description lacks precision because X has not really been deﬁned adequately. To make a precise deﬁnition, we remove X from the formalism and simply deﬁne the polynomial to be the tuple (a0 , a1 , . . . , an , 0, 0, . . . ) of its coefﬁcients. Thus a polynomial in one indeterminate with coefﬁcients in R is an inﬁnite sequence of members of R such that all terms of the sequence are 0 from some point on. The indexing of the sequence is to begin with 0, and X is to refer to the polynomial (0, 1, 0, 0, . . . ). We may refer to a polynomial P as P(X ) if we want to emphasize that the indeterminate is called X . Addition and negation of polynomials are deﬁned in coordinate-by-coordinate fashion by (a0 , a1 , . . . , an , 0, 0, . . . ) + (b0 ,b1 , . . . , bn , 0, 0, . . . ) = (a0 + b0 , a1 + b1 , . . . , an + bn , 0, 0, . . . ), −(a0 , a1 , . . . , an , 0, 0, . . . ) = (−a0 , −a1 , . . . , −an , 0, 0, . . . ), and the set R[X ] of polynomials is then an abelian group isomorphic to the direct sum of inﬁnitely many copies of the additive group of R. As in Section I.3, X n is to be the polynomial whose coefﬁcients are 1 in the n th position, with n ≥ 0, and 0 in all other positions. Polynomial multiplication is then deﬁned so as to match multiplication of expressions an X n + · · · + a1 X + a0 if the product is expanded out, powers of X are added, and the terms containing like powers of X are collected. Thus the precise deﬁnition is that (a0 , a1 , . . . , 0, 0, . . . )(b0 , b1 , . . . , 0, 0, . . . ) = (c0 , c1 , . . . , 0, 0, . . . ), N ak b N −k . It is a simple matter to check that this multiplication where c N = k=0 makes R[X ] into a commutative ring.

5. Polynomials and Vector Spaces

149

The polynomial with all entries 0 is denoted by 0 and is called the zero polynomial. For all polynomials P = (a0 , . . . , an , 0, . . . ) other than 0, the degree of P, denoted by deg P, is deﬁned to be the largest index n such that an = 0. In this case, an is called the leading coefﬁcient, and an X n is called the leading term; if an = 1, the polynomial is called monic. The usual convention with the 0 polynomial is either to leave its degree undeﬁned or to say that the degree is −∞; let us follow the latter approach in this section in order not to have to separate certain formulas into cases. There is a natural one-one homomorphism of rings ι : R → R[X ] given by ι(c) = (c, 0, 0, . . . ) for c in R. This sends the identity of R to the identity of R[X ]. Thus we can identify R with the constant polynomials, i.e., those of degree ≤ 0. If P and Q are nonzero polynomials, then deg(P + Q) ≤ max(deg P, deg Q). In this formula equality holds if deg P = deg Q. In the case of multiplication, let P and Q have respective leading terms am X m and bn X n . All the coefﬁcients of P Q are 0 beyond the (m + n)th , and the (m + n)th is am bn . This in principle could be 0 but is nonzero if R is an integral domain. Thus P and Q nonzero implies ≤ deg P + deg Q for general R, deg(P Q) = deg P + deg Q if R is an integral domain. It follows in particular that R[X ] is an integral domain if R is. Normally we shall write out speciﬁc polynomials using the informal notation with powers of X , using the more precise notation with tuples only when some ambiguity might otherwise result. In the special case that R is a ﬁeld, Section I.3 introduced the notion of evaluation of a polynomial P(X ) at a point r in the ﬁeld, thus providing a mapping P(X ) → P(r ) from R[X ] to R for each r in R. We listed a number of properties of this mapping, and they can be summarized in our present language by the statement that the mapping is a homomorphism of rings. Evaluation is a special case of a more sweeping property of polynomials given in the next proposition as a universal mapping property of R[X ]. Proposition 4.24. Let R be a nonzero commutative ring with identity, and let ι : R → R[X ] be the identiﬁcation of R with constant polynomials. If T is any commutative ring with identity, if ϕ : R → T is a homomorphism of rings sending 1 into 1, and if t is in T , then there exists a unique homomorphism of rings : R[X ] → T carrying identity to identity such that (ι(r )) = ϕ(r ) for all r ∈ R and (X ) = t.

150

IV. Groups and Group Actions

REMARKS. The mapping is called the substitution homomorphism extending ϕ and substituting t for X , and the mapping is written P(X ) → P ϕ (t). The notation means that ϕ is to be applied to the coefﬁcients of P and then X is to be replaced by t. A diagram of this homomorphism as a universal mapping property appears in Figure 4.7. In the special case that T = R and ϕ is the identity, reduces to evaluation at t, and the mapping is written P(X ) → P(t), just as in Section I.3. R ⏐ ⏐ ι

ϕ

−−−→ T

R[X ] FIGURE 4.7. Substitution homomorphism for polynomials in one indeterminate. PROOF. Deﬁne (a0 , a1 , . . . , an , 0, . . . ) = ϕ(a0 ) + ϕ(a1 )t + · · · + ϕ(an )t n . It is immediate that is a homomorphism of rings sending the identity ι(1) = (1, 0, 0, . . . ) of R[X ] to the identity ϕ(1) of T . If r is in R, then (ι(r )) = (r, 0, 0, . . . ) = ϕ(r ). Also, (X ) = (0, 1, 0, 0, . . . ) = t. This proves existence. Uniqueness follows since ι(R) and X generate R[X ] and since a homomorphism deﬁned on R[X ] is therefore determined by its values on ι(R) and X . The formulation of the proposition with the general ϕ : R → T , rather than just the identity mapping on R, allows several kinds of applications besides the routine evaluation mapping. An example of one kind occurs when R = C[X ] and ϕ : C → C[X ] is the composition of complex conjugation on C followed by the identiﬁcation of complex numbers with constant polynomials in C[X ]; the proposition then says that complex conjugation of the coefﬁcients of a member of C[X ] is a ring homomorphism. This observation simpliﬁes the solution of Problem 7 in Chapter I. Similarly one can set up matters so that the proposition shows the passage from Z[X ] to (Z/mZ)[X ] by reduction of coefﬁcients modulo m to be a ring homomorphism. Still a third kind of application is to take T in the proposition to be a ring with the same kind of universal mapping property that R[X ] has, and the consequence is an abstract characterization of R[X ]. We carry out the details below as Proposition 4.25. This result will be applied later in this section to the several-indeterminate case to show that introducing several indeterminates at once yields the same ring, up to canonical isomorphism, as introducing them one at a time. Proposition 4.25. Let R and S be nonzero commutative rings with identity, let X be an element of S, and suppose that ι : R → S is a one-one ring

5. Polynomials and Vector Spaces

151

homomorphism of R into S carrying 1 to 1. Suppose further that (S, ι , X ) has the following property: whenever T is a commutative ring with identity, ϕ : R → T is a homomorphism of rings sending 1 into 1, and t is in T , then there exists a unique homomorphism : S → T carrying identity to identity such that (ι (r )) = ϕ(r ) for all r ∈ R and (X ) = t. Then there exists a unique homomorphism of rings : R[X ] → S such that ◦ ι = ι and (X ) = X , and is an isomorphism. REMARK. A somewhat weaker conclusion than in the proposition is that any triple (S, ι , X ) having the same universal mapping property as (R[X ], ι, X ) is isomorphic to (S, ι , X ), the isomorphism being unique. PROOF. In the universal mapping property for S, take T = R[X ], ϕ = ι, and t = X . The hypothesis gives us a ring homomorphism : S → R[X ] with (1) = 1, ◦ ι = ι, and (X ) = X . Next apply Proposition 4.24 with T = S, ϕ = ι , and t = X . We obtain a ring homomorphism : R[X ] → S with (1) = 1, ◦ ι = ι , and (X ) = X . Then ◦ is a ring homomorphism from R[X ] to itself carrying 1 to 1, ﬁxing X , and having ◦ ι(R) = ι. From the uniqueness in Proposition 4.24 when T = R[X ], ϕ = ι, and t = X , we see that ◦ is the identity on R[X ]. Reversing the roles of and and applying the uniqueness in the universal mapping property for S, we see that ◦ is the identity on S. Therefore may be taken as the isomorphism in the statement of the proposition. This proves existence for , and uniqueness follows since ι(R) and X together generate R[X ] and since is a homomorphism. If P is a polynomial over R in one indeterminate and r is in R, then r is a root of P if P(r ) = 0. We know as a consequence of Corollary 1.14 that for any prime p, any polynomial in F p [X ] of degree n ≥ 1 has at most n roots. This result does not extend to Z/mZ for all positive integers m: when m = 8, the polynomial X 2 − 1 has 4 roots, namely 1, 3, 5, 7. This result about F p [X ] has the following consequence. Proposition 4.26. If F is a ﬁeld, then any ﬁnite subgroup of the multiplicative group F× is cyclic. PROOF. Let C be a subgroup of F× of ﬁnite order n. Lagrange’s Theorem (Corollary 4.8) shows that the order of each element of C divides n. With h deﬁned as the maximum order of an element of C, it is enough to show that h = n. Let a be an element of order h. The polynomial X h − 1 has at most h roots by Corollary 1.14, and a is one of them, by deﬁnition of “order.” If h < n, then it follows that some member b of C is not a root of X h − 1. The order h of b is then a divisor of n but cannot be a divisor of h since otherwise we would have bh = (bh )h/ h = 1h/ h = 1. Consequently there exists a prime p such that

152

IV. Groups and Group Actions

some power pr of p divides h but not h. Let s < r be the exact power of p s dividing h, and write h = mp s , so that GCD(m, pr ) = 1 and a = a p has order m. Put q = h / pr , so that b = bq has order pr . The proof will be completed by showing that c = a b has order mpr = hpr −s > h, in contradiction to the maximality of h. r r r Let t be the order of c. On the one hand, from cmp = (a )mp (b )mp = r −s r r −s r −s a hp bmp q = a hp bmh = (a h ) p (bh )m = 1, we see that t divides mpr . On t t the other hand, 1 = c says that (a ) = (b )−t . Raising both sides to the pr r r power gives 1 = ((b ) p )−t = (a )t p , and hence m divides t pr ; by Corollary 1.3, m divides t. Raising both sides of (a )t = (b )−t to the m th power gives 1 = ((a )m )t = (b )−tm , and hence pr divides tm; by Corollary 1.3, pr divides t. Applying Corollary 1.4, we conclude that mpr divides t. Therefore t = mpr , and the proof is complete. Corollary 4.27. The multiplicative group of a ﬁnite ﬁeld is cyclic. PROOF. This is a special case of Proposition 4.26.

A ﬁnite ﬁeld F can have a nonzero polynomial that is 0 at every element of F. Indeed, every element of F p is a root of X p − X , as a consequence of Fermat’s Little Theorem. It is for this reason that it is unwise to confuse a polynomial in an indeterminate with a “polynomial function.” Let us make the notion of a polynomial function of one variable rigorous. If P(X ) is a polynomial with coefﬁcients in the commutative ring R with identity, then Proposition 4.24 gives us an evaluation homomorphism P → P(r ) for each r in R. The function r → P(r ) from R into R is the polynomial function associated to the polynomial P. This function is a member of the commutative ring of all R-valued functions on R, and the mapping P → r → P(r ) is a homomorphism of rings. What we know from Corollary 1.14 is that this homomorphism is one-one if R is an inﬁnite ﬁeld. A negative result is that if R is a ﬁnite commutative ring with identity, then r ∈R (X − r ) is a polynomial that maps to the 0 function, and hence the homomorphism is not one-one. A more general positive result than the one above for inﬁnite ﬁelds is the following. Proposition 4.28. (a) If R is a nonzero commutative ring with identity and P(X ) is a member of R[X ] with a root r , then P(X ) = (X − r )Q(X ) for some Q(X ) in R[X ]. (b) If R is an integral domain, then a nonzero member of R[X ] of degree n has at most n roots. (c) If R is an inﬁnite integral domain, then the ring homomorphism of R[X ] to the ring of polynomial functions from R to R, given by evaluation, is one-one.

5. Polynomials and Vector Spaces

153

PROOF. For (a), we proceed by induction on the degree of P, the base case of the induction being degree ≤ 0. If the conclusion has been proved for degree < n with n ≥ 1, let the leading term of P be an X n . Then P(X ) = an (X −r )n + A(X ) with deg A < n. Evaluation at r gives, by virtue of Proposition 4.24, 0 = 0+ A(r ). By the inductive hypothesis, A(X ) = (X −r )B(X ). Then P(X ) = (X −r )Q(X ) with Q(X ) = an (X − r )n−1 + B(X ), and the induction is complete. For (b), let P(X ) have degree n with at least n + 1 distinct roots r1 , . . . , rn+1 . Part (a) shows that P(X ) = (X − r1 )P1 (X ) with deg P1 = n − 1. Also, 0 = P(r2 ) = (r2 − r1 )P1 (r2 ). Since r2 − r1 = 0 and since R has no zero divisors, P1 (r2 ) = 0. Part (a) then shows that P1 (X ) = (X − r2 )P2 (X ), and substitution gives P(X ) = (X − r1 )(X − r2 )P2 (X ). Continuing in this way, we obtain P(X ) = (X − r1 ) · · · (X − rn )Pn (X ) with deg Pn = 0. Since P = 0, Pn = 0. So Pn is a nonzero constant polynomial Pn (X ) = c = 0. Evaluating at rn+1 , we obtain 0 = (rn+1 − r1 ) · · · (rn+1 − rn )c with each factor nonzero, in contradiction to the fact that R is an integral domain. For (c), a polynomial in the kernel of the ring homomorphism has every member of R as a root. If R is inﬁnite, (b) shows that such a polynomial is necessarily the zero polynomial. Thus the kernel is 0, and the ring homomorphism has to be one-one. Let us turn our attention to polynomials in several indeterminates. Fix the nonzero commutative ring R with identity, and let n be a positive integer. Informally a polynomial over R in n indeterminates is to be a ﬁnite sum j r j1 ,..., jn X 11 · · · X njn j1 ≥0,..., jn ≥0

with each r j1 ,..., jn in R. To make matters precise, we work just with the system of coefﬁcients, just as in the case of one indeterminate. Let J be the set of integers ≥ 0, and let J n be the set of n-tuples of elements of J . A member of J n may be written as j = ( j1 , . . . , jn ). Addition of members of J n is deﬁned coordinate by coordinate. Thus j + j = ( j1 + j1 , . . . , jn + jn ) if j = ( j1 , . . . , jn ) and j = ( j1 , . . . , jn ). A polynomial in n indeterminates with coefﬁcients in R is a function f : J n → R such that f ( j) = 0 for only ﬁnitely many j ∈ J n . Temporarily let us write S for the set of all such polynomials for a particular n. If f and g are two such polynomials, their sum h and product k are the polynomials deﬁned by h( j) = f ( j) + g( j), f ( j)g( j ). k(i) = j+ j =i

Under these deﬁnitions, S is a commutative ring.

154

IV. Groups and Group Actions

Deﬁne a mapping ι : R → S by r ι(r )( j) = 0

if j = (0, . . . , 0), otherwise.

Then ι is a one-one homomorphism of rings, ι(0) is the zero element of S and is called simply 0, and ι(1) is a multiplicative identity for S. The polynomials in the image of ι are called the constant polynomials. For 1 ≤ k ≤ n, let ek be the member of J n that is 1 in the k th place and is 0 elsewhere. Deﬁne X k to be the polynomial that assigns 1 to ek and assigns 0 to all other members of J n . We say that X k is an indeterminate. If j = ( j1 , . . . , jn ) is in J n , deﬁne X j to be the product j

X j = X 11 · · · X njn . If r is in R, we allow ourselves to abbreviate ι(r )X j as r X j , and any such polynomial is called a monomial. The monomial r X j is the polynomial that assigns r to j and assigns 0 to all other members of J n . Then it follows immediately from the deﬁnitions that each polynomial has a unique expansion as a ﬁnite sum of nonzero monomials. Thus the most general member of S is of the form j∈J n r j X j with only ﬁnitely many nonzero terms. This is called the monomial expansion of the given polynomial. We may now write R[X 1 , . . . , X n ] for S. A polynomial j∈J n r j X j may be conveniently abbreviated as P or as P(X ) or as P(X 1 , . . . , X n ) when its monomial expansion is either understood or irrelevant. The degree of the 0 polynomial is deﬁned for this section to be −∞, and the degree of any monomial r X j with r = 0 is deﬁned to be the integer | j| = j1 + · · · + jn

if j = ( j1 , . . . , jn ).

Finally the degree of any nonzero polynomial P, denoted by deg P, is deﬁned to be the maximum of the degrees of the terms in its monomial expansion. If all the nonzero monomials in the monomial expansion of a polynomial P have the same degree d, then P is said to be homogeneous of degree d. Under these deﬁnitions the 0 polynomial has degree −∞ but is homogeneous of every degree. If P and Q are homogeneous polynomials of degrees d and d , then P Q is homogeneous of degree dd (and possibly equal to the 0 polynomial). In any event, by grouping terms in the monomial expansion of a polynomial according to their degree, we see that every polynomial is uniquely the sum of nonzero homogeneous polynomials of distinct degrees. Let us call this the homogeneous-polynomial expansion of the given polynomial. Let us expand two such nonzero polynomials P and Q in this fashion, writing P = Pd1 +· · ·+Pdk

5. Polynomials and Vector Spaces

155

and Q = Q d1 + · · · + Q dl with d1 < · · · < dk and d1 < · · · < dl . Then we see directly that deg(P + Q) ≤ max(deg P, deg Q), deg(P Q) ≤ deg P + deg Q. In the formula for deg(P + Q), the term that is potentially of largest degree is Pdk + Q dl , and it is of degree max(deg P, deg Q) if deg P = deg Q. In the formula for deg(P Q), the term that is potentially of largest degree is Pdk Q dl . It is homogeneous of degree dk + dl , but it could be 0. Some proof is required that it is not 0 if R is an integral domain, as follows. Proposition 4.29. If R is an integral domain, then R[X 1 , . . . , X n ] is an integral domain. PROOF. Let P and Q be nonzero homogeneous polynomials with deg P = d and deg Q = d . We are to prove that P Q = 0. We introduce an ordering on the set of all members j of J n , saying j = ( j1 , . . . , jn ) > j = ( j1 , . . . , jn ) if there i < k and jk > jk . In the monomial expansion is some k such that ji = ji for j of P as P(X ) = | j|=d a j X , let i be the largest n-tuple j in the ordering such that a j = 0. Similarly with Q(X ) = | j |=d b j X j , let i be the largest n-tuple j in the ordering such that b j = 0. Then a j b j X j+ j , P(X )Q(X ) = ai bi X i+i + j, j with ( j, j ) =(i,i )

and all terms in the sum j, j on the right side have j + j < i + i . Thus ai bi X i+i is the only term in the monomial expansion of P(X )Q(X ) involving the monomial X i+i . Since R is an integral domain and ai and bi are nonzero, ai bi is nonzero. Thus P(X )Q(X ) is nonzero. Proposition 4.30. Let R be a nonzero commutative ring with identity, let R[X 1 , . . . , X n ] be the ring of polynomials in n indeterminates, and deﬁne ι : R → R[X 1 , . . . , X n ] to be the identiﬁcation of R with constant polynomials. If T is any commutative ring with identity, if ϕ : R → T is a homomorphism of rings sending 1 into 1, and if t1 , . . . , tn are in T , then there exists a unique homomorphism : R[X 1 , . . . , X n ] → T carrying identity to identity such that (ι(r )) = ϕ(r ) for all r ∈ R and (X j ) = t j for 1 ≤ j ≤ n. REMARKS. The mapping is called the substitution homomorphism extending ϕ and substituting t j for X j for 1 ≤ j ≤ n, and the mapping is written P(X 1 , . . . , X n ) → P ϕ (t1 , . . . , tn ). The notation means that ϕ is to be applied to each coefﬁcient of P and then X 1 , . . . , X n are to be replaced by t1 , . . . , tn .

156

IV. Groups and Group Actions

A diagram of this homomorphism as a universal mapping property appears in Figure 4.8. In the special case that T = R × · · · × R (cf. Example 3 of homomorphisms in Section 4) and ϕ is the identity, reduces to evaluation at (t1 , . . . , tn ), and the mapping is written P(X 1 , . . . , X n ) → P(t1 , . . . , tn ). R ⏐ ⏐ ι

ϕ

−−−→ T

R[X 1 , . . . , X n ] FIGURE 4.8. Substitution homomorphism for polynomials in n indeterminates. j j PROOF. If P(X 1 , . . . , X n ) = j1 ≥0,..., jn ≥0 a j1 ,..., jn X 11 · · · X nn is the monomial expansion of a member P of R[X 1 , . . . , X n ], then (P) is deﬁned to be the cor j j responding ﬁnite sum j1 ≥0,..., jn ≥0 a j1 ,..., jn t1 1 · · · tn n . Existence readily follows, and uniqueness follows since ι(R) and X 1 , . . . , X n generate R[X 1 , . . . , X n ] and since is a homomorphism. Corollary 4.31. If R is a nonzero commutative ring with identity, then R[X 1 , . . . , X n−1 ][X n ] is isomorphic as a ring to R[X 1 , . . . , X n ]. REMARK. The proof will show that the isomorphism is the expected one. PROOF. In the notation with n-tuples and J n , any (n − 1)-tuple may be identiﬁed with an n-tuple by adjoining 0 as its n th coordinate, and in this way, every monomial in R[X 1 , . . . , X n−1 ] can be regarded as a monomial in R[X 1 , . . . , X n ]. The extension of this mapping to sums gives us a one-one homomorphism of rings ι : R[X 1 , . . . , X n−1 ] → R[X 1 , . . . , X n ]. We are going to use Proposition 4.25 to prove the isomorphism of rings R[X 1 , . . . , X n−1 ][X n ] ∼ = R[X 1 , . . . , X n ]. In the notation of that proposition, the role of R is played by R[X 1 , . . . , X n−1 ], we take S = R[X 1 , . . . , X n ], and we have constructed ι . We are to show that (S, ι , X n ) satisﬁes a certain universal mapping property. Thus suppose that T is a commutative ring with identity, that t is in T , and that ϕ : R[X 1 , . . . , X n−1 ] → T is a homomorphism of rings carrying identity to identity. We shall apply Proposition 4.30 in order to obtain the desired homomorphism : S → T . Let ιn−1 : R → R[X 1 , . . . , X n−1 ] be the identiﬁcation of R with constant polynomials in R[X 1 , . . . , X n−1 ], and let ιn = ι ◦ ιn−1 be the identiﬁcation of R with constant polynomials in S. Deﬁne ϕ : R → T by ϕ = ϕ ◦ιn−1 , and take tn = t and t j = ϕ (X j ) for 1 ≤ j ≤ n−1. Then Proposition 4.30 produces a homomorphism of rings : S → T with (ιn (r )) = ϕ(r ) for r ∈ R, (ι (X j )) = ϕ (X j ) for 1 ≤ j ≤ n − 1, and (X n ) = tn . The equations (ι (ιn−1 (r ))) = (ιn (r )) = ϕ(r ) = ϕ (ιn−1 (r )) and

(ι (X j )) = ϕ (X j )

5. Polynomials and Vector Spaces

157

show that ◦ ι = ϕ on R[X 1 , . . . , X n ]. Also, (X n ) = tn = t. Thus the mapping sought by Proposition 4.25 exists. It is unique since R[X 1 , . . . , X n−1 ] and X n together generate S. The conclusion from Proposition 4.25 is that S is isomorphic to R[X 1 , . . . , X n−1 ][X n ] via the expected isomorphism of rings. We conclude the discussion of polynomials in several variables by making the notion of a polynomial function of several variables rigorous. If P(X 1 , . . . , X n ) is a polynomial in n indeterminates with coefﬁcients in the commutative ring R with identity, then Proposition 4.30 gives us an evaluation homomorphism P → P(r1 , . . . , rn ) for each n-tuple (r1 , . . . , rn ) of members of R. The function (r1 , . . . , rn ) → P(r1 , . . . , rn ) from R × · · · × R into R is the polynomial function associated to the polynomial P. This function is a member of the commutative ring of all R-valued functions on R × · · · × R, and the mapping P → (r1 , . . . , rn ) → P(r1 , . . . , rn ) is a homomorphism of rings. Corollary 4.32. If R is an inﬁnite integral domain, then the ring homomorphism of R[X 1 , . . . , X n ] to polynomial functions from R × · · · × R to R, given by evaluation, is one-one. REMARK. This result extends Proposition 4.28 to several indeterminates. PROOF. We proceed by induction on n, the case n = 1 being handled by Proposition 4.28. Assume the result for n − 1 indeterminates. If P = 0 is in R[X 1 , . . . , X n ], Corollary 4.31 allows us to write P(X 1 , . . . , X n ) =

k

Pi (X 1 , . . . , X n−1 )X ni

i=1

for some k, with each Pi in R[X 1 , . . . , X n−1 ] and with Pk (X 1 , . . . , X n−1 ) = 0. By the inductive hypothesis, Pk (r1 , . . . , rn−1 ) is nonzero for some elements k Pi (r1 , . . . , rn−1 )X ni in R[X n ] is not r1 , . . . , rn−1 of R. So the polynomial i=0 the 0 polynomial, and Proposition 4.28 shows that it is not 0 when evaluated at some rn . Then P(r1 , . . . , rn ) = 0. It is possible also to introduce polynomial rings in inﬁnitely many variables. These will play roles only as counterexamples in this book, and thus we shall not stop to treat them in detail. We complete this section with some remarks about vector spaces. The deﬁnition of a vector space over a general ﬁeld F remains the same as in Section II.1, where F is assumed to be Q or R or C. We shall make great use of the fact that all the results in Chapter II concerning vector spaces remain valid when Q or R or

158

IV. Groups and Group Actions

C is replaced by a general ﬁeld F. The proofs need no adjustments, and it is not necessary to write out the details. For the moment we make only the following application of vector spaces over general ﬁelds, but the extended theory of vector spaces will play an important role in most of the remaining chapters of this book. Proposition 4.33. If F is a ﬁnite ﬁeld, then the number of elements in F is a power of a prime. REMARK. We return to this matter in Chapter IX, showing at that time that for each prime power p n > 1, there is one and only one ﬁeld with p n elements, up to isomorphism. PROOF. The characteristic of F cannot be 0 since F is ﬁnite, and hence it is some prime p. Denote the prime ﬁeld of F by F p . By restricting the multiplication so that it is deﬁned only on F p × F, we make F into a vector space over F p , necessarily ﬁnite-dimensional. Proposition 2.18 shows that F is isomorphic as a vector space to the space (F p )n of n-dimensional column vectors for some n, and hence F must have p n elements.

6. Group Actions and Examples Let X be a nonempty set, let F(X ) be the group of invertible functions from X onto itself, the group operation being composition, and let G be a group. A group action of G on X is a homomorphism of G into F(X ). Examples 5–9 of groups in Section 1 were in fact subgroups of various groups F(X ) and are therefore examples of group actions. Thus every group of permutations of {1, . . . , n}, every dihedral group acting on R2 , and every general linear group or subgroup acting on a ﬁnite-dimensional vector space over Q or R or C or an arbitrary ﬁeld F provides an example. So do the orthogonal and unitary groups acting on Rn and Cn , as well as the automorphism group of any number ﬁeld. We saw an indication in Section 1 that many early examples of groups arose in this way. One source of examples that is of some importance and was not listed in Section 1 occurs in the geometry of R2 . The translations in R2 , together with the rotations about arbitrary points of R2 and the reﬂections about arbitrary lines in R2 , form a group G of rigid motions of the plane.11 This group G is a subgroup of F(R2 ), and thus G acts on R2 . More generally, whenever a nonempty set X has a notion of distance, the set of isometries of X , i.e., the distance-preserving members of F(X ), forms a subgroup of F(X ), and thus the group of isometries of X acts on X . 11 One

can show that G is the full group of rigid motions of R2 , but this fact will not concern us.

6. Group Actions and Examples

159

At any rate a group action τ of G on X , being a homomorphism of G into F(X ), is of the form g → τg , where τg is in F(X ) and τg1 g2 = τg1 τg2 . There is an equivalent way of formulating matters that does not so obviously involve the notion of a homomorphism. Namely, we write τg (x) = gx. In this notation the group action becomes a function G × X → X with (g, x) → gx such that (i) (g1 g2 )x = g1 (g2 x) for all g1 and g2 in G and for all x in X (from the fact that τg1 g2 = τg1 τg2 ), (ii) 1x = x for all x in X (from the fact that τ1 = 1). Conversely if G × X → X satisﬁes (i) and (ii), then the formulas x = 1x = (gg −1 )x = g(g −1 x) and x = 1x = (g −1 g)x = g −1 (gx) show that the function x → gx from X to itself is invertible with inverse x → g −1 x. Consequently the deﬁnition τg (x) = gx makes g → τg a function from G into F(X ), and (i) shows that τ is a homomorphism. Thus (i) and (ii) indeed give us an equivalent formulation of the notion of a group action. Both formulations are useful. Quite often the homomorphism G → F(X ) of a group action is one-one, and then G can be regarded as a subgroup of F(X ). Here is an important geometric example in which the homomorphism is not one-one. EXAMPLE. Linear fractional transformations. Let X = C ∪ {∞}, a set that becomes the Riemann sphere in complex analysis. The group G = GL(2, C) acts on X by the linear fractional transformations

a c

b d

(z) =

az + b , cz + d

the understanding being that the image of ∞ is ac−1 and the image of −dc−1 is ∞, just as if we were to pass to a limit in each case. Property (ii) of a group action is clear. To verify (i), we simply calculate that

a c

b d

a c

b d

a az+b cz+d + b (z) = az+b c cz+d + d (a a + b c)z + (a b + b d) (c a + d c)z + (c b + d d)

a b a b = (z), c d c d

=

and indeed we have a group action. Let SL(2, R) be the subgroup of real matrices in GL(2, C) of determinant 1, and let Y be the subset of X where Im z > 0, not

IV. Groups and Group Actions

160

including ∞. The members of SL(2, R) carry the subset Y into itself, as we see from the computation (az + b)(c¯z + d) adz + bc¯z az + b = Im = Im Im 2 cz + d |cz + d| |cz + d|2 =

(ad − bc) Im z Im z = . 2 |cz + d| |cz + d|2

Since the effect of a matrix g −1 is to invert the effect of g, and since both g and g −1 carry Y to itself, we conclude that SL(2, R) acts on Y = {z ∈ C | Im z > 0} by linear fractional transformations. In similar fashion one can verify that the subgroup %

α β 2 2 − |β| = 1 α ∈ C, β ∈ C, |α| SU(1, 1) = β¯ α¯ of GL(2, C) acts on {z ∈ C | |z| < 1} by linear fractional transformations. One group action can yield many others. For example, from an action of G on X , we can construct an action on the space of all complex-valued functions on X . The deﬁnition is (g f )(x) = f (g −1 x), the use of the inverse being necessary in order to verify property (i) of a group action: ((g1 g2 ) f )(x) = f ((g1 g2 )−1 x) = f ((g2−1 g1−1 )x) = f (g2−1 (g1−1 x)) = (g2 f )(g1−1 x) = (g1 (g2 f ))(x). There is nothing special about the complex numbers as range for the functions here. We can allow any set as range, and we can even allow G to act on the range, as well as on the domain.12 If G acts on X and Y , then the set of functions from X to Y inherits a group action under the deﬁnition (g f )(x) = g( f (g −1 x)), as is easily checked. In other words, we are to use g −1 where the domain enters the formula and we are to use g where the range enters the formula. If V is a vector space over a ﬁeld F, a representation of G on V is a group action of G on V by linear functions. Speciﬁcally for each g ∈ G, τg is to be a member of the group of linear maps from V into itself. Usually one writes τ (g) instead of τg in representation theory, and thus the condition is that τ (g) is to be linear for each g ∈ G and we are to have τ (1) = 1 and τ (g1 g2 ) = τ (g1 )τ (g2 ) for all g1 and g2 . There are interesting examples both when V is ﬁnite-dimensional and when V is inﬁnite-dimensional.13 12 When C was used as range in the previous display, the group action of G on C was understood to be trivial in the sense that gz = z for every g in G and z in C. 13 In some settings a continuity assumption may be added to the deﬁnition of a representation, or the ﬁeld F may be restricted in some way. We impose no such assumption here at this time.

6. Group Actions and Examples

161

EXAMPLES OF REPRESENTATIONS. (1) If m ≥ 1, then the additive group Z/mZ acts linearly on R2 by

τ (k) =

− sin 2πk m cos 2πk m

cos 2πk m sin 2πk m

,

k ∈ {0, 1, 2, . . . , m − 1}.

Each τ (k) is a rotation matrix about the origin through an angle that is a multiple of 2π/m. These transformations of R2 form a subgroup of the group of symmetries of a regular k-gon centered at the origin in R2 . (2) The dihedral group D3 acts linearly on R2 with τ (1) =

10 01

, τ (2 3) =

1 0 0 −1

τ (1 2 3) =

, τ (1 2) =

− 12 − √

3 2

√ 3 2

− 12

− 12

√ 3 2

√

3 2

1 2

, τ (1 3) =

, τ (1 3 2) =

− 12

√ − 23

√ 3 2

− 12 − −

√ 3 2

√ 3 2 1 2

,

− 12

.

Each of these matrices carries into itself the equilateral triangle with center at the origin and one vertex at (1, 0). To obtain these matrices, we number the vertices #1, #2, #3 counterclockwise with the vertex at (1, 0) as #1. (3) The symmetric group Sn acts linearly on Rn by permuting the indices of standard basis vectors. For example, with n = 3, we have (1 3)e1 = e3 , (1 3)e2 = e2 , etc. The matrices may be computed by the techniques of Section II.3. With n = 3, we obtain, for example, (1 3) →

0 0 1 010 100

and

(1 2 3) →

0 0 1 100 010

.

(4) If G acts on a set X , then the corresponding action (g f )(x) = f (g −1 x) on complex-valued functions is a representation on the vector space of all complexvalued functions on X . This vector space is inﬁnite-dimensional if X is an inﬁnite set. The linearity of the action on functions follows from the deﬁnitions of addition and scalar multiplication of functions. In fact, let functions f 1 and f 2 be given, and let c be a scalar. Then (g( f 1 + f 2 ))(x) = ( f 1 + f 2 )(g −1 x) = f 1 (g −1 x) + f 2 (g −1 x) = (g f 1 )(x) + (g f 2 )(x) = (g f 1 + g f 2 )(x) and (g(c f 1 ))(x) = (c f 1 )(g −1 x) = c( f 1 (g −1 x)) = c((g f 1 )(x)) = (c(g f 1 ))(x).

162

IV. Groups and Group Actions

One more important class of group actions consists of those that are closely related to the structure of the group itself. Two simple ones are the action of G on itself by left translations (g1 , g2 ) → g1 g2 and the action of G on itself by right translations (g1 , g2 ) → g2 g1−1 . More useful is the action of G on a quotient space G/H , where H is a subgroup. This action is given by (g1 , g2 H ) → g1 g2 H . There are still others, and some of them are particularly handy in analyzing ﬁnite groups. We give some applications in the present section and the next, and we postpone others to Section 10. Before describing some of these actions in detail, let us make some general deﬁnitions and establish two easy results. Let G × X → X be a group action. If p is in X , then G p = {g ∈ G | gp = p} is a subgroup of G called the isotropy subgroup at p. This is not always a normal subgroup; however, the subgroup p∈G G p that ﬁxes all points of X is the kernel of the homomorphism G → F(X ) deﬁning the group action, and such a kernel has to be normal. Let p and q be in X . We say that p is equivalent to q for the purposes of this paragraph if p = gq for some g ∈ G. The result is an equivalence relation: it is reﬂexive since p = 1 p, it is symmetric since p = gq implies g −1 p = q, and it is transitive since p = gq and q = g r together imply p = (gg )r . The equivalence classes are called orbits of the group action. The orbit of a point p in X is Gp = {gp | g ∈ G}. If Y = Gp is an orbit, or more generally if Y is any subset of X carried to itself by every element of G, then G × Y → Y is a group action. In fact, each function y → gy is invertible on Y with y → g −1 y as the inverse function, and properties (i) and (ii) of a group action follow from the same properties for X . A group action G × X → X is said to be transitive if there is just one orbit, hence if X = Gp for each p in X . It is simply transitive if it is transitive and if for each p and q in X , there is just one element g of G with gp = q. Proposition 4.34. Let G × X → X be a group action, let p be in X , and let H be the isotropy subgroup at p. Then the map G → Gp given by g → gp descends to a well-deﬁned map G/H → Gp that is one-one from G/H onto the orbit Gp and respects the group actions. REMARK. In other words, a group action of G on a single orbit is always isomorphic as a group action to the action of G on some quotient space G/H . PROOF. Let ϕ : G → Gp be deﬁned by ϕ(g) = gp. For h in H = G p , ϕ(gh) = (gh) p = g(hp) = gp = ϕ(g) shows that ϕ descends to a well-deﬁned function ϕ : G/H → Gp, and ϕ is certainly onto Gp. If ϕ(g1 H ) = ϕ(g2 H ), then g1 p = ϕ(g1 p) = ϕ(g2 p) = g2 p, and hence g2−1 g1 p = p, g2−1 g1 is in H , g1 is in g2 H , and g1 H = g2 H . Thus ϕ is one-one. Respecting the group action means that ϕ(gg H ) = gϕ(g H ), and this identity holds since gϕ(g H ) = gϕ(g ) = g(g p) = (gg ) p = ϕ(gg ) = ϕ(gg H ).

6. Group Actions and Examples

163

A simple consequence is the following important counting formula in the case of a group action by a ﬁnite group. Corollary 4.35. Let G be a ﬁnite group, let G × X → X be a group action, let p be in X , and G p be the isotropy group at p, and let Gp be the orbit of p. Then |G| = |Gp| |G p |. PROOF. Proposition 4.34 shows that the action of G on some G/G p is the most general group action on a single orbit, G p being the isotropy subgroup. Thus the corollary follows from Lagrange’s Theorem (Theorem 4.7) with H = G p and G/H = Gp. We turn to applications of group actions to the structure of groups. If H is a subgroup of a group G, the index of H in G is the number of elements in G/H , ﬁnite or inﬁnite. The ﬁrst application notes a situation in which a subgroup of a ﬁnite group is automatically normal. Proposition 4.36. Let G be a ﬁnite group, and let p be the smallest prime dividing the order of G. If H is a subgroup of G of index p, then H is normal. REMARKS. The most important case is p = 2: any subgroup of index 2 is automatically normal, and this conclusion is valid even if G is inﬁnite, as was already pointed out in Example 3 of Section 2. If G is ﬁnite and if 2 divides the order of G, there need not, however, be any subgroup of index 2; for example, the alternating group A4 has order 12, and Problem 11 at the end of the chapter shows that A4 has no subgroup of order 6. PROOF. Let X = G/H , and restrict the group action G × X → X to an action H × X → X . The subset {1H } is a single orbit under H , and the remaining p − 1 members of G/H form a union of orbits. Corollary 4.35 shows that the number of elements in an orbit has to be a divisor of |H |, and the smallest divisor of |H | other than 1 is ≥ p since the smallest divisor of |G| other than 1 equals p and since |H | divides |G|. Hence any orbit of H containing more than one element has at least p elements. Since only p − 1 elements are left under consideration, each orbit under H contains only one element. Therefore hg H = g H for all h in H and g in G. Then g −1 hg is in H , and we conclude that H is normal. If G is a group, the center Z G of G is the set of all elements x such that gx = xg for all g in G. The center of G is a subgroup (since gx = xg and gy = yg together imply g(x y) = xgy = (x y)g and xg −1 = g −1 (gx)g −1 = g −1 (xg)g −1 = g −1 x), and every subgroup of the center is normal since x ∈ Z G and g ∈ G together imply gxg −1 = x. Here are examples: the center of a group G is G itself if and only if G is abelian, the center of the quaternion group H8 is {±1}, and the center of any symmetric group Sn with n ≥ 3 is {1}.

164

IV. Groups and Group Actions

If x is in G, the centralizer of x in G, denoted by Z G (x), is the set of all g such that gx = xg. This is a subgroup of G, and it equals G itself if and only if x is in the center of G. For example the centralizer of i in H8 is the 4-element subgroup {±1, ±i}. Having made these deﬁnitions, we introduce a new group action of G on G, namely (g, x) → gxg −1 . The orbits are called the conjugacy classes of G. If x and y are two elements of G, we say that x is conjugate to y if x and y are in the same conjugacy class. In other words, x is conjugate to y if there is some g in G with gxg −1 = y. The result is an equivalence relation. Let us write C (x) for the conjugacy class of x. We can easily compute the isotropy subgroup G x at x under this action; it consists of all g ∈ G such that gxg −1 = x and hence is exactly the centralizer Z G (x) of x in G. In particular, C (x) = {x} if and only if x is in the center Z G . Applying Corollary 4.35, we immediately obtain the following result. Proposition 4.37. If G is a ﬁnite group, then |G| = | C (x)| |Z G (x)| for all x in G. Thus | C (x)| is always a divisor of |G|, and it equals 1 if and only if x is in the center Z G . Let us apply these considerations to groups whose order is a power of a prime. Corollary 4.38. If G is a ﬁnite group whose order is a positive power of a prime, then the center Z G is not {1}. PROOF. Let |G| = p n with p prime and with n > 0. The conjugacy classes of G exhaust G, and thus the sum of all | C (x)|’s equals |G|. Since | C (x)| = 1 if and only if x is in Z G , the sum of |Z G | and all the | C (x)|’s that are not 1 is equal to |G|. All the terms | C (x)| that are not 1 are positive powers of p, by Proposition 4.37, and so is |G|. Therefore p divides |Z G |. Corollary 4.39. If G is a ﬁnite group of order p 2 with p prime, then G is abelian. PROOF. From Corollary 4.38 we see that either |Z G | = p 2 , in which case G is abelian, or |Z G | = p. We show that the latter is impossible. If fact, if x is not in Z G , then Z G (x) is a subgroup of G that contains Z G and the element x. It must then have order p 2 and be all of G. Hence every element of G commutes with x, and x is in Z G , contradiction. Corollary 4.40. If G is a ﬁnite group whose order is a positive power p n of a prime p, then there exist normal subgroups G k of G for 0 ≤ k ≤ n such that |G| = p k for all k ≤ n and such that G k ⊆ G k+1 for all k < n.

6. Group Actions and Examples

165

PROOF. We proceed by induction on n. The base case of the induction is n = 1 and is handled by Corollary 4.9. Assume inductively that the result holds for n, and let G have order p n+1 . Corollary 4.39 shows that Z G = {1}. Any element = 1 in Z G must have order a power of p, and some power of it must therefore have order p. Thus let a be an element of Z G of order p, and let H be the subgroup consisting of the powers of a. Then H is normal and has order p. Let G = G/H be the quotient group, and let ϕ : G → G be the quotient homomorphism. The group G has order p n , and the inductive hypothesis shows that G has normal subgroups G k for 0 ≤ k ≤ n such that |G k | = p k for k ≤ n and G k ⊆ G k+1 for k ≤ n − 1. For 1 ≤ k ≤ n + 1, deﬁne G k = ϕ −1 (G k−1 ), and let G 0 = {1}. The First Isomorphism Theorem (Theorem 4.13) shows that each G k for k ≥ 1 is a normal subgroup of G containing H and that ϕ(G k ) = G k−1 . Then ϕ G k is a homomorphism of G k onto G k−1 with kernel H , and hence |G k | = |G k−1 | |H | = p k−1 p = p k . Therefore the G k ’s will serve as the required subgroups of G. It is not always so easy to determine the conjugacy classes in a particular group. For example, in GL(n, C) the question of conjugacy is the question whether two matrices are similar in the sense of Section II.3; this will be one of the main problems addressed in Chapter V. By contrast, the problem of conjugacy in symmetric groups has a simple answer. Recall that every permutation is uniquely the product of disjoint cycles. The cycle structure of a permutation consists of the number of cycles of each length in this decomposition. Lemma 4.41. Let σ and τ be members of the symmetric group Sn . If σ is expressed as the product of disjoint cycles, then τ σ τ −1 has the same cycle structure as σ , and the expression for τ σ τ −1 as the product of disjoint cycles is obtained from that for σ by substituting τ (k) for k throughout. . For example, ifσ = (a b)(c d e), then τ σ τ −1 decomposes as REMARK τ (a) τ (b) τ (c) τ (d) τ (e) . PROOF. Because the conjugate of a product equals the product of the conjugates, it is enough to handle a cycle γ = (a1 a2 · · · an ) appearing in σ . The corresponding cycle γ = τ γ τ −1 is asserted to be γ = (τ (a1 ) τ (a2 ) · · · τ (an )). Application of τ −1 to τ (a j ) yields a j , application of σ to this yields a j+1 if j < n and a1 if j = n, and application of τ to the result yields τ (a j+1 ) or τ (a1 ). For each of the symbols b not in the list {a1 , . . . , an }, τ γ τ −1 (τ (b)) = τ (b) since γ (b) = b. Thus τ γ τ −1 = γ , as asserted. Proposition 4.42. Let H be a subgroup of a symmetric group Sn . If C (x) denotes a conjugacy class in H , then all members of C (x) have the same cycle

166

IV. Groups and Group Actions

structure. Conversely if H = Sn , then the conjugacy class of a permutation σ consists of all members of Sn having the same cycle structure as σ . PROOF. The ﬁrst conclusion is immediate from Lemma 4.41. For the second conclusion, let σ and σ have the same cycle structure, and let τ be the permutation that moves, for each k, the k th symbol appearing in the disjoint-cycle expansion of σ into the k th symbol in the corresponding expansion of σ . Deﬁne τ on the remaining symbols in any fashion at all. Application of the lemma shows that τ σ τ −1 = σ . Thus any two permutations with the same cycle structure are conjugate. 7. Semidirect Products One more application of group actions to the structure theory of groups will be to the construction of “semidirect products” of groups. If H is a group, then an isomorphism of H with itself is called an automorphism. The set of automorphisms of H is a group under composition, and we denote it by Aut H . We are going to be interested in “group actions by automorphisms,” i.e., group actions of a group G on a space X when X is itself a group and the action by each member of G is an automorphism of the group structure of X ; the group action is therefore a homomorphism of the form τ : G → Aut X . EXAMPLE 1. In R2 , we can identify the additive group of the underlying vector space with the group of translations v (w) = v + w; the identiﬁcation associates a translation with the member (0) of R2 . Let H be the group of translations. about the origin in R2 , namely the linear maps with The rotations cos θ sin θ matrices − sin θ cos θ , form a group G = SO(2) that acts on R2 , hence acts on the set H of translations. The linearity of the rotations says that the action of G = SO(2) on the translations is by automorphisms of H , i.e., that each rotation, in its effect on G, is in Aut H . Out of these data—the two groups G and H and a homomorphism of G into Aut H —we will construct below what amounts to the group of all rotations (about any point) and translations of R2 . The construction is that of a “semidirect product.” EXAMPLE 2. Take any group G, and let G act on X = G by conjugation. Each conjugation x → gxg −1 is an automorphism of G, and thus the action of G on itself by conjugation is an action by automorphisms. Let G and H be groups. Suppose that a group action τ : G → F(H ) is given with G acting on H by automorphisms. That is, suppose that each map h → τg (h) is an automorphism of H . We deﬁne a group G ×τ H whose underlying set will be the Cartesian product G × H . The motivation for the deﬁnition of multiplication

7. Semidirect Products

167

comes from Example 2, in which τg (h) = ghg −1 . We want to write a product g1 h 1 g2 h 2 in the form g h , and we can do so using the formula g1 h 1 g2 h 2 = g1 g2 (g2−1 h 1 g2 )h 2 = g1 g2 (τg−1 (h 1 ))h 2 . 2

Similarly the formula for inverses is motivated by the formula (gh)−1 = h −1 g −1 = g −1 (gh −1 g −1 ) = g −1 τg (h −1 ). Proposition 4.43. Let G and H be groups, and let τ be a group action of G on H by automorphisms. Then the set-theoretic product G × H becomes a group G ×τ H under the deﬁnitions (g1 , h 1 )(g2 , h 2 ) = (g1 g2 , (τg−1 (h 1 ))h 2 ) 2

(g, h)

and

−1

−1

= (g , τg (h −1 )).

The mappings i 1 : G → G ×τ H and i 2 : H → G ×τ H given by i 1 (g) = (g, 1) and i 2 (h) = (1, h) are one-one homomorphisms, and p2 : G ×τ H → G given by p2 (g, h) = g is a homomorphism onto G. The images G = i 1 (G) and H = i 2 (H ) are subgroups of G ×τ H with H normal such that G ∩ H = {1}, such that every element of G ×τ H is the product of an element of G and an element of H , and such that conjugation of G on H is given by i 1 (g)i 2 (h)i 1 (g)−1 = i 2 (τg (h)). REMARK. The group G ×τ H is called the external semidirect product14 of G and H with respect to τ . PROOF. For associativity we compute directly that (g1 , h 1 )(g2 , h 2 ) (g3 , h 3 ) = (g1 g2 g3 , τg−1 (τg−1 (h 1 )h 2 )h 3 ) 3 2 and (g1 , h 1 ) (g2 , h 2 )(g3 , h 3 ) = (g1 g2 g3 , τg−1 g−1 (h 1 )τg−1 (h 2 )h 3 ). 3

2

3

Since τg−1 (τg−1 (h 1 )h 2 ) = (τg−1 τg−1 (h 1 ))τg−1 (h 2 ) = τg−1 g−1 (h 1 )τg−1 (h 2 ), 3

2

3

2

3

3

2

3

we have a match. It is immediate that (1, 1) is a two-sided identity. Since (g, h)(g −1 , τg (h −1 )) = (1, τg (h)τg (h −1 )) = (1, τg (hh −1 )) = (1, τg (1)) = (1, 1) and (g −1 , τg (h −1 ))(g, h) = (1, τg−1 (τg (h −1 ))h) = (1, τ1 (h −1 )h) = (1, 1), (g −1 , τg (h −1 )) is indeed a two-sided inverse of (g, h). It is immediate from the deﬁnition of multiplication that i 1 , i 2 , and p2 are homomorphisms, that i 1 and i 2 are one-one, that p2 is onto, that G ∩ H = {1}, and that G ×τ H = G H . Since i 1 and i 2 are homomorphisms, G and H are subgroups. Since H is the kernel of p2 , H is normal. Finally the deﬁnition of multiplication gives i 1 (g)i 2 (h)i 1 (g)−1 = (g, h)(g −1 , 1) = (1, (τg (h))1) = i 2 (τg (h)), and the proof is complete. 14 The

notation is used by some authors in place of ×τ .

168

IV. Groups and Group Actions

Proposition 4.44. Let S be a group, let G and H be subgroups with H normal, and suppose that G ∩ H = {1} and that every element of S is the product of an element of G and an element of H . For each g ∈ G, deﬁne an automorphism τg of H by τg (h) = ghg −1 . Then τ is a group action of G on H by automorphisms, and the mapping G ×τ H → S given by (g, h) → gh is an isomorphism of groups. REMARKS. In this case we call S an internal semidirect product of G and H with respect to τ . We shall not attempt to write down a universal mapping property that characterizes internal semidirect products. PROOF. Since τg1 g2 (h) = g1 g2 hg2−1 g1−1 = g1 τg2 (h)g1−1 = τg1 τg2 (h) and since each τg is an automorphism of H , τ is an action by automorphisms. Proposition 4.43 therefore shows that G ×τ H is a well-deﬁned group. The function ϕ from G ×τ H to S given by ϕ(g, h) = gh is a homomorphism by the same computation that motivated the deﬁnition of multiplication in a semidirect product, and ϕ is onto S since every element of S lies in the set G H of products. If gh = 1, then g = h −1 exhibits g as in G ∩ H = {1}. Hence g = 1 and h = 1. Therefore ϕ is one-one and must be an isomorphism. EXAMPLE 1. Dihedral groups Dn . We show that Dn is the internal semidirect product of a 2-element group and the rotation subgroup. Let H be the group of rotations about the origin through multiples of the angle 2π/n. This group is cyclic of order n, and it is normal in Dn because it is of index 2. If s is any of the reﬂections in Dn , then G = {1, s} is a subgroup of Dn of order 2 with G ∩ H = {1}. Counting the elements, we see that every element of Dn is of the form r k or sr k , in other words that the set of products G H is all of Dn . Thus Proposition 4.44 shows that Dn is an (internal) semidirect product of G and H with respect to some τ : G → Aut H . To understand the homomorphism τ , let us write the members of H as the powers of r , where r is rotation counterclockwise about the origin through the angle 2π/n. For the reﬂection s (or indeed for any reﬂection in Dn ), a look at the geometry shows that sr k s −1 = r −k for all k. In other words, the automorphism τ (1) leaves each element of H ﬁxed while τ (s) sends each k mod n to −k mod n. The map that sends each element of a cyclic group to its group inverse is indeed an automorphism of the cyclic group, and thus τ is indeed a homomorphism of G into Aut H . EXAMPLE 2. Construction of a nonabelian group of order 21. Let H = C7 , written multiplicatively with generator a, and let G = C3 , written multiplicatively with generator b. To arrange for G to act on H by automorphisms, we make use of a nontrivial automorphism of H of order 3. Such a mapping is a k → a 2k . In fact, there is no doubt that this mapping is an automorphism, and we have to see

7. Semidirect Products

169

that it has order 3. The effect of applying it twice is a k → a 4k , and the effect of applying it three times is a k → a 8k . But a 8k = a k since a 7 = 1, and thus the mapping a k → a 2k indeed has order 3. We send bn into the n th power of this automorphism, and the result is a homomorphism τ : G → Aut H . The semidirect product G ×τ H is certainly a group of order 3 × 7 = 21. To see that it is nonabelian, we observe from the group law in Proposition 4.43 that ab = bτb−1 (a) = ba 4 . Thus ab = ba, and G ×τ H is nonabelian. It is instructive to generalize the construction in Example 2 a little bit. To do so, we need a lemma. Lemma 4.45. If p is a prime, then the automorphisms of the additive group of the ﬁeld F p are the multiplications by the members of the multiplicative group F× p , and consequently Aut C p is isomorphic to a cyclic group C p−1 . PROOF. Let us write Aut F p for the automorphism group of the additive group of F p . Each function ϕa : F p → F p given by ϕa (n) = na, taken modulo p, is in Aut F p as a consequence of the distributive law. We deﬁne a function : Aut F p → F× p by (ϕ) = ϕ(1) for ϕ ∈ Aut F p . Again by the distributive law ϕ(n) = nϕ(1) for every integer n. Thus if ϕ1 and ϕ2 are in Aut F p , then (ϕ1 ◦ ϕ2 ) = (ϕ1 ◦ ϕ2 )(1) = ϕ1 (ϕ2 (1)) = ϕ2 (1)ϕ1 (1), and consequently is a homomorphism. If a member ϕ of Aut F p has (ϕ) = 1 in F× p , then ϕ(1) = 1 and therefore ϕ(n) = nϕ(1) = n for all n. Therefore ϕ is the identity in Aut F p . We conclude that is one-one. If a is given in F× p , then (ϕa ) = ϕa (1) = a, and hence is onto F× . Therefore is an isomorphism of Aut F p and F× p p . By Corollary 4.27, exhibits Aut F p as isomorphic to the cyclic group C p−1 . Proposition 4.46. If p and q are primes with p < q such that p divides q − 1, then there exists a nonabelian group of order pq. REMARKS. For p = 2, the divisibility condition is automatic, and the proof will yield the dihedral group Dq . For p = 3 and q = 7, the condition is that 3 divides 7 − 1, and the constructed group will be the group in Example 2 above. PROOF. Let G = C p with generator a, and let H = Cq . Lemma 4.45 shows that Aut Cq ∼ = Cq−1 . Let b be a generator of Aut Cq . Since p divides q − 1, b(q−1)/ p has order p. Then the map a k → bk(q−1)/ p is a well-deﬁned homomorphism τ of G into Aut H , and it determines a semidirect product S = G ×τ H , by Proposition 4.43. The order of S is pq, and the multiplication is nonabelian since for h ∈ H , we have (a, 1)(1, h) = (a, h) and (1, h)(a, 1) = (a, τa −1 (h)) = (a, b−(q−1)/ p (h)), but b−(q−1)/ p is not the identity automorphism of H because it has order p.

170

IV. Groups and Group Actions

8. Simple Groups and Composition Series A group G = {1} is said to be simple if its only normal subgroups are {1} and G. Among abelian groups the simple ones are the cyclic groups of prime order. Indeed, a cyclic group C p of prime order has no nontrivial subgroups at all, by Corollary 4.9. Conversely if G is abelian and simple, let a = 1 be in G. Then {a n } is a cyclic subgroup and is normal since G is abelian. Thus {a n } is all of G, and G is cyclic. The group Z is not simple, having the nontrivial subgroup 2Z, and the group Z/(r s)Z with r > 1 and s > 1 is not simple, having the multiples of r as a nontrivial subgroup. Thus G has to be cyclic of prime order. The interest is in nonabelian simple groups. We shall establish that the alternating groups An are simple for n ≥ 5, and some other simple groups will be considered in Problems 55–62 at the end of the chapter. Theorem 4.47. The alternating group An is simple if n ≥ 5. PROOF. Let K = {1} be a normal subgroup of An . Choose σ in K with σ = 1 such that σ (i) = i for the maximum possible number of integers i with 1 ≤ i ≤ n. The main step is to show that σ is a 3-cycle. Arguing by contradiction, suppose that σ is not a 3-cycle. Then there are two cases. The ﬁrst case is that the decomposition of σ as the product of disjoint cycles contains a k-cycle for some k ≥ 3. Without loss of generality, we may take the cycle in question to be γ = (1 2 3 · · · ), and then σ = γρ = (1 2 3 · · · )ρ with ρ equal to a product of disjoint cycles not containing the symbols appearing in γ . Being even and not being a 3-cycle, σ moves at least two other symbols besides the three listed ones, say 4 and 5. Put τ = (3 4 5). Lemma 4.41 shows that σ = τ σ τ −1 = γ ρ = (1 2 4 · · · )ρ with ρ not containing any of the symbols appearing in γ . Thus σ σ −1 moves 3 into 4 and cannot be the identity. But σ σ −1 is in K and ﬁxes all symbols other than 1, 2, 3, 4, 5 that are ﬁxed by σ . In addition, σ σ −1 ﬁxes 2, and none of 1, 2, 3, 4, 5 is ﬁxed by σ . Thus σ σ −1 is a member of K other than the identity that ﬁxes fewer symbols than σ , and we have arrived at a contradiction. The second case is that σ is a product σ = (1 2)(3 4) · · · of disjoint transpositions. There must be at least two factors since σ is even. Put τ = (1 2)(4 5), the symbol 5 existing since the group An in question has n ≥ 5. Then σ = (1 2)(3 5) · · · . Since σ σ −1 carries 4 into 5, σ σ −1 is a member of K other than the identity. It ﬁxes all symbols other than 1, 2, 3, 4, 5 that are ﬁxed by σ , and in addition it ﬁxes 1 and 2. Thus σ σ −1 ﬁxes more symbols than σ does, and again we have arrived at a contradiction. We conclude that K contains a 3-cycle, say (1 2 3). If i, j, k, l, m are ﬁve arbitrary symbols, then we can construct a permutation τ with τ (1) = i, τ (2) = j, τ (3) = k, τ (4) = l, and τ (5) = m. If τ is odd, we replace τ by τ (l m), and the

8. Simple Groups and Composition Series

171

result is even. Thus we may assume that τ is in An and has τ (1) = i, τ (2) = j, and τ (3) = k. Lemma 4.41 shows that τ σ τ −1 = (i j k). Since K is normal, we conclude that K contains all 3-cycles. To complete the proof, we show for n ≥ 3 that every element of An is a product of 3-cycles. If σ is in An , we use Corollary 1.22 to decompose σ as a product of transpositions. Since σ is even, we can group these in pairs. If the members of a pair of transpositions are not disjoint, then their product is a 3-cycle. If they are disjoint, then the identity (1 2)(3 4) = (1 2 3)(2 3 4) shows that their product is a product of 3-cycles. This completes the proof. Let G be a group. A descending sequence G n ⊇ G n−1 ⊇ · · · ⊇ G 1 ⊇ G 0 of subgroups of G with G n = G, G 0 = {1}, and each G k−1 normal in G k is called a normal series for G. The normal series is called a composition series if each inclusion G k ⊇ G k−1 is proper and if each consecutive quotient G k /G k−1 is simple. EXAMPLES. (1) Let G be a cyclic group of order N . A normal series for G consists of certain subgroups of G, all necessarily cyclic by Proposition 4.4. Their respective orders Nn , Nn−1 , . . . , N1 , N0 have Nn = N , N0 = 1, and Nk−1 | Nk for all k. The series is a composition series if and only if each quotient Nk /Nk−1 is prime. In this case the primes that occur are exactly the prime divisors of N , and a prime p occurs r times if pr is the exact power of p that divides N . Thus the consecutive quotients from a composition series of this G, up to isomorphisms, are independent of the particular composition series—though they may arise in a different order. (2) For G = Z, a normal series is of the form Z ⊇ m 1 Z ⊇ m 1 m 2 Z ⊇ m 1 m 2 m 3 Z ⊇ · · · ⊇ 0. The group G = Z has no composition series. (3) For the symmetric group G = S4 , let C2 × C2 refer to the 4-element subgroup {1, (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)}. The series S4 ⊇ A4 ⊇ C2 × C2 ⊇ {1, (1 2)} ⊇ {1} is a composition series, the consecutive quotients being C2 , C3 , C2 , C2 . Each term in the composition series except for {1, (1 2)} is actually normal in the whole group G, but there is no way to choose the 2-element subgroup to make it normal in G. The other two possible choices of 2-element subgroup, which lead to different composition series but with isomorphic consecutive quotients, are obtained by replacing {1, (1 2)} by {1, (1 3)} and again by {1, (1 4)}.

IV. Groups and Group Actions

172

(4) For the symmetric group G = S5 , the series S5 ⊇ A5 ⊇ {1} is a composition series, the consecutive quotients being C2 and A5 . (5) Let G be a ﬁnite group of order p n with p prime. Corollary 4.40 produces a composition series, and this time all the subgroups are normal in G. The successive normal subgroups have orders p k for k = n, n − 1, . . . , 0, and each consecutive quotient is isomorphic to C p . Historically the Jordan–H¨older Theorem addressed composition series for groups, showing that the consecutive quotients, up to isomorphisms, are independent of the particular composition series. They can then consistently be called the composition factors of the group. Finding the composition factors of a particular group may be regarded as a step toward understanding the structure of the group. A generalization of the Jordan–H¨older Theorem due to Zassenhaus and Schreier applies to normal series in situations in which composition series might not exist, such as Example 2 above. We prove the Zassenhaus–Schreier Theorem, and the Jordan–H¨older Theorem is then a special case. Two normal series G m ⊇ G m−1 ⊇ · · · ⊇ G 1 ⊇ G 0 and

Hn ⊇ Hn−1 ⊇ · · · ⊇ H1 ⊇ H0

for the same group G are said to be equivalent normal series if m = n and the order of the consecutive quotients G m /G m−1 , G m−1 /G m−2 , . . . , G 1 /G 0 may be rearranged so that they are respectively isomorphic to Hm /Hm−1 , Hm−1 /Hm−2 , . . . , H1 /H0 . One normal series is said to be a reﬁnement of another if the subgroups appearing in the second normal series all appear as subgroups in the ﬁrst normal series. Lemma 4.48 (Zassenhaus). Let G 1 , G 2 , G 1 , and G 2 be subgroups of a group G with G 1 ⊆ G 1 and G 2 ⊆ G 2 , G 1 normal in G 1 , and G 2 normal in G 2 . Then (G 1 ∩ G 2 )G 1 is normal in (G 1 ∩ G 2 )G 1 , (G 1 ∩ G 2 )G 2 is normal in (G 1 ∩ G 2 )G 2 , and ((G 1 ∩ G 2 )G 1 )/((G 1 ∩ G 2 )G 1 ) ∼ = ((G 1 ∩ G 2 )G 2 )/((G 1 ∩ G 2 )G 2 ). PROOF. Let us check that (G 1 ∩ G 2 )G 1 is normal in (G 1 ∩ G 2 )G 1 . Handling conjugation by members of G 1 ∩ G 2 is straightforward: If g is in G 1 ∩ G 2 ,

8. Simple Groups and Composition Series

173

then g(G 1 ∩ G 2 )g −1 = G 1 ∩ G 2 since g is in G 1 and gG 2 g −1 = G 2 . Also, gG 1 g −1 = G 1 since g is in G 1 . Hence g(G 1 ∩ G 2 )G 1 g −1 = (G 1 ∩ G 2 )G 1 . Handling conjugation by members of G 1 requires a little trick: Let g be in G 1 and let hg be in (G 1 ∩ G 2 )G 1 . Then g(hg )g −1 = h(h −1 gh)g g −1 . The left factor h is in G 1 ∩ G 2 . The remaining factors are in G 1 ; for g and g −1 , this is a matter of deﬁnition, and for h −1 gh, it follows because h is in G 1 and g is in G 1 . Thus g(G 1 ∩ G 2 )G 1 g −1 = (G 1 ∩ G 2 )G 1 , and (G 1 ∩ G 2 )G 1 is normal in (G 1 ∩ G 2 )G 1 . The other assertion about normal subgroups holds by symmetry in the indexes 1 and 2. By the Second Isomorphism Theorem (Theorem 4.14), (G 1 ∩ G 2 )/(((G 1 ∩ G 2 )G 1 ) ∩ (G 1 ∩ G 2 )) ∼ = ((G 1 ∩ G 2 )(G 1 ∩ G )G )/((G 1 ∩ G )G ) = ((G 1 ∩

2 G 2 )G 1 )/((G 1

1

∩

2

(∗)

1

G 2 )G 1 ).

Since we have ((G 1 ∩ G 2 )G 1 ) ∩ (G 1 ∩ G 2 ) = ((G 1 ∩ G 2 )G 1 ) ∩ G 2 = (G 1 ∩ G 2 )(G 1 ∩ G 2 ), we can rewrite the conclusion of (∗) as (G 1 ∩ G 2 )/((G 1 ∩ G 2 )(G 1 ∩ G 2 )) ∼ = ((G 1 ∩ G 2 )G 1 )/((G 1 ∩ G 2 )G 1 ). (∗∗) The left side of (∗∗) is symmetric under interchange of the indices 1 and 2. Hence so is the right side, and the lemma follows. Theorem 4.49 (Schreier). Any two normal series of a group G have equivalent reﬁnements. PROOF. Let the two normal series be G m ⊇ G m−1 ⊇ · · · ⊇ G 1 ⊇ G 0 ,

(∗)

Hn ⊇ Hn−1 ⊇ · · · ⊇ H1 ⊇ H0 , and deﬁne

G i j = (G i ∩ Hj )G i+1

for 0 ≤ j ≤ n,

Hji = (G i ∩ Hj )Hj+1

for 0 ≤ i ≤ m.

(∗∗)

Then we obtain respective reﬁnements of the two normal series (∗) given by G = G 00 ⊇ G 01 ⊇ · · · ⊇ G 0n ⊇ G 10 ⊇ G 11 ⊇ · · · ⊇ G 1n · · · ⊇ G m−1,n = {1}, G = H00 ⊇ H01 ⊇ · · · ⊇ H0m ⊇ H10 ⊇ H11 ⊇ · · · ⊇ H1m · · · ⊇ Hn−1,m = {1}.

(†)

174

IV. Groups and Group Actions

The containments G in ⊇ G i+1,0 and Hjm ⊇ Hj+1,0 are equalities in (†), and the only nonzero consecutive quotients are therefore of the form G i j /G i, j+1 and Hji /Hj,i+1 . For these we have G i j /G i, j+1 = ((G i ∩ Hj )G i+1 )/((G i ∩ Hj+1 )G i+1 ) ∼ = ((G i ∩ Hj )Hj+1 )/((G i+1 ∩ Hj )Hj+1 ) = Hji /Hj,i+1

by (∗∗) by Lemma 4.48 by (∗∗),

and thus the reﬁnements (†) are equivalent.

Corollary 4.50 (Jordan–H¨older Theorem). Any two composition series of a group G are equivalent as normal series. PROOF. Let two composition series be given. Theorem 4.49 says that we can insert terms in each so that the reﬁned series have the same length and are equivalent. Since the given series are composition series, the only way to insert a new term is by repeating some term, and the repetition results in a consecutive quotient of {1}. Because of Theorem 4.49 we know that the quotients {1} from the two reﬁned series must match. Thus the number of terms added to each series is the same. Also, the quotients that are not {1} must match in pairs. Thus the given composition series are equivalent. 9. Structure of Finitely Generated Abelian Groups A set of generators for a group G is a set such that each element of G is a ﬁnite product of generators and their inverses. (A generator and its inverse are allowed to occur multiple times in a product.) In this section we shall study abelian groups having a ﬁnite set of generators. Such groups are said to be ﬁnitely generated abelian groups, and our goal is to classify them up to isomorphism. We use additive notation for all our abelian groups in this section. We begin by introducing an analog Zn for the integers Z of the vector space Rn for the reals R, and along with it a generalization. A free abelian group is any abelian group isomorphic to a direct sum, ﬁnite or inﬁnite, of copies of the additive group Z of integers. The external direct sum of n copies of Z will be denoted by Zn . Let us use Proposition 4.17 to see that we can recognize groups isomorphic to free abelian groups by means of the following condition: an abelian group G is isomorphic to a free abelian group if and only if it has a Z basis, i.e., a subset that generates G and is such that no nontrivial linear combination, with integer coefﬁcients, of the members of the subset is equal to the 0 element of the group. It will be helpful to use terminology adapted from the theory of vector spaces for this latter condition—that the subset is to be linearly independent over Z.

9. Structure of Finitely Generated Abelian Groups

175

Let us give the proof that the condition is necessary and sufﬁcient for G to be free abelian. In one direction if G is an external direct sum of copies of Z, then the members of G that are 1 in one coordinate and are 0 elsewhere form a Z basis. Conversely if {gs }s∈S is a Z basis, let G s0 be the subgroup of multiples of gs0 , and let ϕs0 be the inclusion homomorphism of G s0 into G. Proposition 4.17 produces a unique group homomorphism ϕ : s∈S G s → G such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. The spanning condition for the Z basis says that ϕ is onto G, and the linear independence condition for the Z basis says that ϕ has 0 kernel. The similarity between vector-space bases and Z bases suggests further comparison of vector spaces and abelian groups. With vector spaces over a ﬁeld, every vector space has a basis over the ﬁeld. However, it is exceptional for an abelian group to have a Z basis. Two examples that hint at the difﬁculty are the additive group Z/mZ with m > 1 and the additive group Q. The group Z/mZ has no nonempty linearly independent set, while the group Q has a linearly independent set of one element, no spanning set of one element, and no linearly independent set of more than one element. Here are two positive examples. EXAMPLES. (1) The additive group of all points in Rn whose coordinates are integers. The standard basis of Rn is a Z basis. (2) The additivegroup of (x, y) in R2 with x and y both in Z or both 1all1 points 1 in Z + 2 . The set (1, 0), 2 , 2 is a Z basis. Next we take a small step that eliminates technical complications from the discussion, proving that any subgroup of a ﬁnitely generated abelian group is ﬁnitely generated. Lemma 4.51. Let ϕ : G → H be a homomorphism of abelian groups. If ker ϕ and image ϕ are ﬁnitely generated, then G is ﬁnitely generated. PROOF. Let {x1 , . . . , xm } and {y1 , . . . , yn } be respective ﬁnite sets of generators for ker ϕ and image ϕ. For 1 ≤ j ≤ n, choose x j in G with ϕ(x j ) = yj . We shall prove that {x1 , . . . , xm , x1 , . . . , xn } is a set of generators for G. Thus let x be in G. Since ϕ(x) is in image ϕ, there exist integers a1 , . . . , an with ϕ(x) = a1 y1 + · · · + an yn . The element x = a1 x1 + · · · + an xn of G has ϕ(x ) = a1 y1 + · · · + an yn = ϕ(x). Therefore ϕ(x − x ) = 0, and there exist integers b1 , . . . , bm with x − x = b1 x1 + · · · + bm xm . Hence x = b1 x1 + · · · + bm xm + x = b1 x1 + · · · + bm xm + a1 x1 + · · · + an xn . Proposition 4.52. Any subgroup of a ﬁnitely generated abelian group is ﬁnitely generated.

IV. Groups and Group Actions

176

PROOF. Let G be ﬁnitely generated with a set {g1 , . . . , gn } of n generators, and deﬁne G k = Zg1 + · · · + Zgk for 1 ≤ k ≤ n. If H is any subgroup of G, deﬁne Hk = H ∩ G k for 1 ≤ k ≤ n. We shall prove by induction on k that every Hk is ﬁnitely generated, and then the case k = n gives the proposition. For k = 1, G 1 = Zg1 is a cyclic group, and any subgroup of it is cyclic by Proposition 4.4 and hence is ﬁnitely generated. Assume inductively that every subgroup of G k is known to be ﬁnitely generated. Let q : G k+1 → G k+1 /G k be the quotient homomorphism, and let ϕ = q Hk+1 , mapping Hk+1 into G k+1 /G k . Then ker ϕ = Hk+1 ∩ G k is a subgroup of G k and is ﬁnitely generated by the inductive hypothesis. Also, image ϕ is a subgroup of G k+1 /G k , which is a cyclic group with generator equal to the coset of gk+1 . Since a subgroup of a cyclic group is cyclic, image ϕ is ﬁnitely generated. Applying Lemma 4.51 to ϕ, we see that Hk+1 is ﬁnitely generated. This completes the induction and the proof. A free abelian group has ﬁnite rank if it has a ﬁnite Z basis, hence if it is isomorphic to Zn for some n. The ﬁrst theorem is that the integer n is determined by the group. Theorem 4.53. The number of Z summands in a free abelian group of ﬁnite rank is independent of the direct-sum decomposition of the group. We deﬁne this number to be the rank of the free abelian group. Actually, “rank” is a well-deﬁned cardinal in the inﬁnite-rank case as well, because the rank coincides in that case with the cardinality of the group. In any event, Theorem 4.53 follows immediately by two applications of the following lemma. Lemma 4.54. If G is a free abelian group with a ﬁnite Z basis x1 , . . . , xn , then any linearly independent subset of G has ≤ n elements. PROOF. Let {y1 , . . . , ym } be a linearly independent set in G. Since {x 1 , . . . , xn } is a Z basis, we can deﬁne an m-by-n matrix C of integers by yi = nj=1 Ci j x j . As a matrix in Mmn (Q), C has rank ≤ n. Consequently if m > n, then the rows are linearly dependent over Q, and we can ﬁnd rational numbers q1 , . . . , qm not m qi Ci j = 0 for all j. Multiplying by a suitable all 0 such that i=1 m integer to clear fractions, we obtain integers k1 , . . . , km not all 0 such that i=1 ki Ci j = 0 for all j. Then we have m i=1

ki yi =

m i=1

ki

n j=1

Ci j x j =

n m j=1

i=1

n ki C i j x j = 0x j = 0, j=1

in contradiction to the linear independence of {y1 , . . . , ym } over Z. Therefore m ≤ n.

9. Structure of Finitely Generated Abelian Groups

177

Now we come to the two main results of this section. The ﬁrst is a special case of the second by Proposition 4.52 and Lemma 4.54. The two will be proved together, and it may help to regard the proof of the ﬁrst as a part of the proof of the second. Theorem 4.55. A subgroup H of a free abelian group G of ﬁnite rank n is free abelian of rank ≤ n. REMARK. This result persists in the case of inﬁnite rank, but we do not need the more general result and will not give a proof. Theorem 4.56 (Fundamental Theorem of Finitely Generated Abelian Groups). Every ﬁnitely generated abelian group is a ﬁnite direct sum of cyclic groups. The cyclic groups may be taken to be copies of Z and various C pk with p prime, and in this case the cyclic groups are unique up to order and to isomorphism. REMARKS. The main conclusion of the theorem is the decomposition of each ﬁnitely generated abelian group into the direct sum of cyclic groups. An alternative decomposition of the given group that forces uniqueness is as the direct sum of copies of Z and ﬁnite cyclic groups Cd1 , . . . , Cdr such that d1 | d2 , d2 | d3 , . . . , dr −1 | dr . A proof of the additional statement appears in the problems at the end of Chapter VIII. The integers d1 , . . . , dr are sometimes called the elementary divisors of the group. Let us establish the setting for the proof of Theorem 4.56. Let G be the given group, and say that it has a set of n generators. Proposition 4.17 produces a homomorphism ϕ : Zn → G that carries the standard generators x1 , . . . , xn of Zn to the generators of G, and ϕ is onto G. Let H be the kernel of ϕ. As a subgroup of Zn , H is ﬁnitely generated, by Proposition 4.52. Let y1 , . . . , ym be generators. Theorem 4.55 predicts that H is in fact free abelian, hence that {y1 , . . . , ym } could be taken to be linearly independent over Z with m ≤ n, but we do not assume that knowledge in the proof of Theorem 4.56. The motivation for the main part of the proof of Theorem 4.56 comes from the elementary theory of vector spaces, particularly from the method of using a basis for a ﬁnite-dimensional vector space to ﬁnd a basis of a vector subspace when we know a ﬁnite spanning set for the vector subspace. Thus let V be a ﬁnite-dimensional vector space over R, with basis {x j }nj=1 , and let U be a vector m subspace with spanning set {yi }i=1 . To produce a vector-space basis for U , we imagine expanding the yi ’s as linear combinations of x1 , . . . , xn . We can think the product of a row symbolically of this expansion as expressing each yi as x1

vector of real numbers times the formal “column vector”

.. .

xn

. The entries of

IV. Groups and Group Actions

178

this column vector are vectors, but there is no problem in working with itsince y1

this is all just a matter of notation anyway. Then the formal column vector

.. .

ym

of m members of U equalsthe product of an m-by-n matrix of real numbers times x1 . the formal column vector .. . We know from Chapter II that the procedure for xn

ﬁnding a basis of U is to row reduce this matrix of real numbers. The nonzero rows of the result determine a basis of the span of the m vectors we have used, and this basis is related tidily to the given basis for V . We can compare the two bases to understand the relationship between U and V . To prove Theorem 4.56, we would like to use the same procedure, but we have to work with an integer matrix and avoid division. This means that only two of the three usual row operations are fully available for the row reduction; division of a row by an integer is allowable only when the integer is ±1. A partial substitute for division comes by using the steps of the Euclidean algorithm via the division algorithm (Proposition 1.1),

but 2 1 1 even that is not enough. For example, if the m-by-n matrix is , no 0 0 3 further row reduction is possible with integer operations. However, the equations tell us that H is the subgroup of Z3 generated by (2, 1, 1) and (0, 0, 3), and it is not at all clear how to write Z3 /H as a direct sum of cyclic groups. The row operations have the effect of changing the set of generators of H while maintaining the fact that they generate H . What is needed is to allow also column reduction with integer operations. Steps of this kind have the effect of changing the Z basis of Zn . When steps of this kind are allowed, we can produce new generators of H and a new basis of Zn so that the two can be compared. With the example above, suitable column operations are

2 0

1 0

1 3

→

1 2 0 0

1 3

→

1 0 0 0

0 3

→

1 0

0 3

0 0

.

The equations with the new generators say that y1 = x1 and y2 = 3x2 . Thus H is the subgroup Z ⊕ 3Z ⊕ 0Z, nicely aligned with Z3 = Z ⊕ Z ⊕ Z. The quotient is (Z/Z) ⊕ (Z/3Z) ⊕ (Z/0Z) ∼ = C3 ⊕ Z. The proof of Theorem 4.56 will make use of an algorithm that uses row and column operations involving only allowable divisions and that converts the matrix C of coefﬁcients so that its nonzero entries are the diagonal entries Cii for 1 ≤ i ≤ r and no other entries. The algorithm in principle can be very slow, and it may be helpful to see what it does in an ordinary example. EXAMPLE. Suppose that the relationship between generators y1 , y2 , y3 of H

9. Structure of Finitely Generated Abelian Groups

and the standard Z basis {x1 , x2 } of Z2 is

3 y1 x1 , where C = 7 y2 = C x2 y3 5

5 13 9

179

.

In row reduction in vector-space theory, we would start by dividing the ﬁrst row of C by 3, but division by 3 is not available in the present context. Our target for the upper-left entry is GCD(3, 7, 5) = 1, and we use the division algorithm one step at a time to get there. To begin with, it says that 7 = 2 · 3 + 1 and hence 7 − 2 · 3 = 1. The ﬁrst step of row reduction is then to replace the second row by the difference of it and 2 times the ﬁrst row. The result can be achieved by left multiplication by 1 0 0 3 5 and is −2 1 0 1 3 . 0 0 1 5 9 We write this step as ⎛

3 5 7 13 5 9

100

⎞

left by ⎝ −2 1 0 ⎠ 001

|−−−−−−−−−−→

3 1 5

5 3 9

.

The entry 1 in the ﬁrst column is our target for this stage since GCD(3, 7, 5) = 1. The next step interchanges two rows to move the 1 to the upper left entry, and the subsequent step uses the 1 to eliminate the other entries of the ﬁrst column: ⎛

3 1 5

5 3 9

010

⎛

⎞

left by ⎝ 1 0 0 ⎠ 001

|−−−−−−−−−→

1 3 3 5 5 9

100

⎞

left by ⎝ −3 1 0 ⎠

−5 0 1

|−−−−−−−−−−→

1 0 0

3 −4 −6

.

The algorithm next seeks to eliminate the off-diagonal entry 3 in the ﬁrst row. This is done by a column operation:

1 0 0

3 −4 −6

right by

1 −3 0 1

|−−−−−−−−−→

1 0 0

0 −4 −6

.

With two further row operations we are done: ⎛

1 0 0

0 −4 −6

10

0

00

1

⎞

left by ⎝ 0 1 −1 ⎠

|−−−−−−−−−−→

⎛

1 0 0 2 0 −6

100

⎞

left by ⎝ 0 1 0 ⎠ 031

|−−−−−−−−−→

1 0 0

0 2 0

.

180

IV. Groups and Group Actions

Our steps are summarized by the fact that the matrix A with 1 0 0 1 0 0 1 0 0 0 1 0 1 A= 0 1 0 0 1 −1 −3 1 0 1 0 0 −2 0 3 1 0 0 1 −5 0 1 0 0 1 0

has

AC

1 −3 0 1

=

1 0 0

0 2 0

0 1 0

0 0 1

and by the fact that the integer matrices to the left and right of C have determinant −1 1 −3 have integer ±1. The determinant condition ensures that A−1 and 0 1 entries, according to Cramer’s rule (Proposition 2.38). Lemma 4.57. If C is an m-by-n matrix of integers, then there exist an m-by-m matrix A of integers with determinant ±1 and an n-by-n matrix B of integers with determinant ±1 such that for some r ≥ 0, the nonzero entries of D = AC B are exactly the diagonal entries D11 , D22 , . . . , Drr . PROOF. Given C, choose (i, j) with |Ci j | = 0 but |Ci j | as small as possible. (If C = 0, the algorithm terminates.) Possibly by interchanging two rows and/or then two columns (a left multiplication with determinant −1 and then a right multiplication with determinant −1), we may assume that (i, j) = (1, 1). By the division algorithm write, for each i, Ci1 = qi C11 + ri

with 0 ≤ ri < |C11 |,

and replace the i th row by the difference of the i th row and qi times the ﬁrst row (a left multiplication). If some ri is not 0, the result will leave a nonzero entry in the ﬁrst column that is < |C11 | in absolute value. Permute the least such ri = 0 to the upper left and repeat the process. Since the least absolute value is going down, this process at some point terminates with all ri equal to 0. The ﬁrst column then has a nonzero diagonal entry and is otherwise 0. Now consider C1 j and apply the division algorithm and column operations in similar fashion in order to process the ﬁrst row. If we get a smaller nonzero remainder, permute the smallest one to the ﬁrst column. Repeat this process until the ﬁrst row is 0 except for entry C11 . Continue alternately with row and column operations in this fashion until both C1 j = 0 for j > 1 and Ci1 = 0 for i > 1. Repeat the algorithm for the (m − 1)-by-(n − 1) matrix consisting of rows 2 through m and columns 2 through n, and continue inductively. The algorithm terminates when either the reduced-in-size matrix is empty or is all 0. At this point the original matrix has been converted into the desired “diagonal form.”

9. Structure of Finitely Generated Abelian Groups

181

Lemma 4.58. Let G 1 , . . . , G n be abelian groups, and for 1 ≤ j ≤ n, let Hj be a subgroup of G j . Then (G 1 ⊕ · · · ⊕ G n )/(H1 ⊕ · · · ⊕ Hn ) ∼ = (G 1 /H1 ) ⊕ · · · ⊕ (G n /Hn ). PROOF. Let ϕ : G 1 ⊕ · · · ⊕ G n → (G 1 /H1 ) ⊕ · · · ⊕ (G n /Hn ) be the homomorphism deﬁned by ϕ(g1 , . . . , gn ) = (g1 H1 , . . . , gn Hn ). The mapping ϕ is onto (G 1 /H1 ) ⊕ · · · ⊕ (G n /Hn ), and the kernel is H1 ⊕ · · · ⊕ Hn . Then Corollary 4.12 shows that ϕ descends to the required isomorphism. PROOF OF THEOREM 4.55 AND MAIN CONCLUSION OF THEOREM 4.56. Given G with n generators, we set up matters as indicated immediately after the statement of Theorem 4.56, writing y1

.. .

ym

x1

=C

.. .

,

xn

where x1 , . . . , xn are the standard generators of Zn , y1 , . . . , ym are the generators of the kernel of the homomorphism from Zn onto G, and C is a matrix of integers. Applying Lemma 4.57, let A and B be square integer matrices of determinant ±1 such that D = AC B is diagonal as in the statement of the lemma. Deﬁne y1 u1 x1 z1 .. .. .. .. −1 and . =A . . =B . . zm

ym

un

xn

Substitution gives y1 x1 u1 z1 .. . .. −1 .. = AC B ... . . = A . = (AC B)B zm

ym

xn

un

If (c1 · · · cn ) and (d1 · · · dn ) = (c1 · · · cn )B −1 are row vectors, then the formula ⎛ ⎞ ⎛ ⎞ u1 x1 . . c1 u 1 + · · · + cn u n = (c1 · · · cn ) ⎝ .. ⎠ = (d1 · · · dn ) ⎝ .. ⎠ un xn (∗) = d1 x1 + · · · + dn xn shows that {u 1 , . . . , u n } generates the same subset of Zn as {x1 , . . . , xn }. Since (c1 · · · cn ) is nonzero if and only if (d1 · · · dn ) is nonzero, the formula (∗) shows also that the linear independence of {x1 , . . . , xn } implies that of {u 1 , . . . , u n }. Hence {u 1 , . . . , u n } is a Z basis of Zn . Similarly {y1 , . . . , ym } and {z 1 , . . . , z m }

IV. Groups and Group Actions

182

generate the same subgroup H of Zn . Therefore we can compare H and Zn using {z 1 , . . . , z m } and {u 1 , . . . , u n }. Since D is diagonal, the equations relating {z 1 , . . . , z m } and {u 1 , . . . , u n } are z j = D j j u j for j ≤ min(m, n) and z j = 0 for min(m, n) < j ≤ m. If q = min(m, n), then we see that H=

m i=1

Zz i =

q i=1

Dii Zu i +

m

Zz i =

i=q+1

q

Dii Zu i .

i=1

Since the set {u 1 , . . . , u q } is linearly independent over Z, this sum exhibits H as given by H = D11 Z ⊕ · · · ⊕ Dqq Z with D11 u 1 , . . . , Dqq u q as a Z basis. Consequently H has been exhibited as free abelian of rank ≤ q ≤ n. This proves Theorem 4.55. Applying Lemma 4.58 to the quotient Zn /H and letting D11 , . . . , Drr be the nonzero diagonal entries of D, we see that H has rank r , and we obtain an expansion of G in terms of cyclic groups as G = C D11 ⊕ · · · ⊕ C Drr ⊕ Zn−r .

This proves the main conclusion of Theorem 4.56.

PROOF OF THE DECOMPOSITION WITH GROUPS OF PRIME-POWER ORDER. N CYCLIC k p j j with the p j equal to distinct primes, It is enough to prove that if m = j=1 k k then Z/mZ ∼ = (Z/ p11 Z) ⊕ · · · ⊕ (Z/ p NN Z). This is a variant of the Chinese Remainder Theorem (Corollary 1.9). For the proof let ϕ : Z → (Z/ p1k1 Z) ⊕ · · · ⊕ (Z/ p kNN Z) be the homomorphism given by ϕ(s) = s mod p1k1 , . . . , s mod p kNN for s ∈ Z. Since ϕ(m) = (0, . . . , 0), ϕ descends to a homomorphism ϕ : Z/mZ → (Z/ p1k1 Z) ⊕ · · · ⊕ (Z/ p kNN Z). k

The map ϕ is one-one because if ϕ(s) = 0, then p j j divides s for all j. Since k

the p j j are relatively prime in pairs, their product m divides s. Since m divides s, s ≡ 0 mod m. The map ϕ is onto since it is one-one and since the ﬁnite sets Z/mZ and (Z/ p1k1 Z) ⊕ · · · ⊕ (Z/ p kNN Z) both have m elements. PROOF OF UNIQUENESS OF THE DECOMPOSITION. Write G = Zs ⊕ T , where T = (Z/ p1l1 Z) ⊕ · · · ⊕ (Z/ plMM Z) and the p j ’s are not necessarily distinct. The subgroup T is the subgroup of elements of ﬁnite order in G, and it is well deﬁned independently of the decomposition of G as the direct sum of cyclic groups. The quotient G/T ∼ = Zs is

10. Sylow Theorems

183

free abelian of ﬁnite rank, and its rank s is well deﬁned by Theorem 4.53. Thus the number s of factors of Z in the decomposition of G is uniquely determined, and we need only consider uniqueness of the decomposition of the ﬁnite abelian group T . For p prime the elements of T of order pa for some a are those in the sum of l the groups Z/ p jj Z for which p j = p, and we are reduced to considering a group H = Z/ pl1 Z ⊕ · · · ⊕ Z/ pl M Z with p ﬁxed and l1 ≤ · · · ≤ l M . The set of p j powers of elements of H is a subgroup of H and is given by Z/ plt − j Z ⊕ · · · ⊕ Z/ pl M − j Z if lt is the ﬁrst index ≥ j, while the set of p j+1 powers of elements of H is given by Z/ plt − j−1 Z ⊕ · · · ⊕ Z/ pl M − j Z if lt is the ﬁrst index ≥ j + 1. Therefore Lemma 4.58 gives p j H/ p j+1H ∼ = (Z/ plt − j−1 Z)/(Z/ plt − j Z)⊕· · ·⊕(Z/ pl M − j−1 Z)/(Z/ pl M − j Z). Each term of p j H/ p j+1 H has order p, and thus | p j H/ p j+1 H | = p |{i | li > j}| . Hence H determines the integers l1 , . . . , l M , and uniqueness is proved.

10. Sylow Theorems This section continues the use of group actions to obtain results concerning structure theory for abstract groups. We shall prove the three Sylow Theorems, which are a starting point for investigations of the structure of ﬁnite groups that are deeper than those in Sections 6 and 7. We state the three theorems as the parts of Theorem 4.59. Theorem 4.59 (Sylow Theorems). Let G be a ﬁnite group of order p m r , where p is prime and p does not divide r . Then (a) G contains a subgroup of order p m , and any subgroup of G of order pl with 0 ≤ l < m is contained in a subgroup of order p m , (b) any two subgroups of order p m in G are conjugate in G, i.e., any two such subgroups P1 and P2 have P2 = a P1 a −1 for some a ∈ G, (c) the number of subgroups of order p m is of the form pk + 1 and divides r . REMARK. A subgroup of order p m as in the theorem is called a Sylow p-subgroup of G.

184

IV. Groups and Group Actions

Before coming to the proof, let us carefully give two simple applications to structure theory. The applications combine Theorem 4.59, some results of Sections 6 and 7, and Problems 35–38 and 45–48 at the end of the chapter. Proposition 4.60. If p and q are primes with p < q, then there exists a nonabelian group of order pq if and only if p divides q − 1, and in this case the nonabelian group is unique up to isomorphism. It may be taken to be a semidirect product of the cyclic groups C p and Cq with Cq normal. REMARK. It follows from Theorem 4.56 that the only abelian group of order pq, up to isomorphism, is C p × Cq ∼ = C pq . If p = 2 in the proposition, then q is odd and p divides q − 1; the proposition yields the dihedral group Dq . For p > 2, the divisibility condition may or may not hold: For pq = 15, the condition does not hold, and hence every group of order 15 is cyclic. For pq = 21, the condition does hold, and there exists a nonabelian group of order 21; this group was constructed explicitly in Example 2 in Section 7. PROOF. Existence of a nonabelian group of order pq, together with the semidirect-product structure, is established by Proposition 4.46 if p divides q −1. Let us see uniqueness and the necessity of the condition that p divide q − 1. If G has order pq, Theorem 4.59a shows that G has a Sylow p-subgroup Hp and a Sylow q-subgroup Hq . Corollary 4.9 shows that these two groups are cyclic. The conjugates of Hq are Sylow q-subgroups, and Theorem 4.59c shows that the number of such conjugates is of the form kq + 1 and divides p. Since p < q, k = 0. Therefore Hq is normal. (Alternatively, one can apply Proposition 4.36 to see that Hq is normal.) Each element of G is uniquely a product ab with a in Hp and b in Hq . For the uniqueness, if a1 b1 = a2 b2 , then a2−1 a1 = b2 b1−1 is an element of Hp ∩ Hq . Its order must divide both p and q and hence must be 1. Thus the pq products ab with a in Hp and b in Hq are all different. Since the number of them equals the order of G, every member of G is such a product. By Proposition 4.44, G is a semidirect product of Hp and Hq . If the action of Hp on Hq is nontrivial, then Problem 37 at the end of the chapter shows that p divides q − 1, and Problem 38 shows that the group is unique up to isomorphism. On the other hand, if the action is trivial, then G is certainly abelian. Proposition 4.61. If G is a group of order 12, then G contains a subgroup H of order 3 and a subgroup K of order 4, and at least one of them is normal. Consequently there are exactly ﬁve groups of order 12, up to isomorphism—two abelian and three nonabelian.

10. Sylow Theorems

185

REMARK. The second statement follows from the ﬁrst, as a consequence of Problems 45–48 at the end of the chapter. Those problems show how to construct the groups. PROOF. Theorem 4.59a shows that H may be taken to be a Sylow 3-subgroup and K may be taken to be a Sylow 2-subgroup. We have to prove that either H or K is normal. Suppose that H is not normal. Theorem 4.59c shows that the number of Sylow 3-subgroups is of the form 3k + 1 and divides 4. The subgroup H , not being normal, fails to equal one of its conjugates, which will be another Sylow 3-subgroup; hence k > 0. Therefore k = 1, and there are four Sylow 3-subgroups. The intersection of any two such subgroups is a subgroup of both and must be trivial since 3 is prime. Thus the set-theoretic union of the Sylow 3-subgroups accounts for 4 · 2 + 1 elements. None of these elements apart from the identity lies in K , and thus K contributes 3 further elements, for a total of 12. Thus every element of G lies in K or a conjugate of H . Consequently K equals every conjugate of K , and K is normal. Let us see where we are with classifying ﬁnite groups of certain orders, up to isomorphism. A group of order p is cyclic by Corollary 4.9, and a group of order p 2 is abelian by Corollary 4.39. Groups of order pq are settled by Proposition 4.60. Thus for p and q prime, we know the structure of all groups of order p, p 2 , and pq. Problems 39–44 at the end of the chapter tell us the structure of the groups of order 8, and Proposition 4.61 and Problems 45–48 tell us the structure of the groups of order 12. In particular, the table at the end of Section 1, which gives examples of nonisomorphic groups of order at most 15, is complete except for the one group of order 12 that is discussed in Problem 48. Problems 30–34 and 49–54 at the end of the chapter go in the direction of classifying ﬁnite groups of certain other orders. Now we return to Theorem 4.59. The proof of the theorem makes use of the theory of group actions as in Section 6. In fact, the proof of existence of Sylow p-subgroups is just an elaboration of the argument used to prove Corollary 4.38, saying that a group of prime-power order has a nontrivial center. The relevant action for the existence part of the proof is the one (g, x) → gxg −1 given by conjugation of the elements of the group, the orbit of x being the conjugacy class C (x). Proposition 4.37 shows that |G| = | C (x)||Z G (x)|, where Z G (x) is the centralizer of x. Since the disjoint union of the conjugacy classes is all of |G|, we have |G| = |Z G | + |G|/|Z G (x j )|, representatives x j of each conjugacy class with | C (x)| =1

186

IV. Groups and Group Actions

a formula sometimes called the class equation of G. PROOF OF EXISTENCE OF SYLOW p-SUBGROUPS IN THEOREM 4.59a. We induct on |G|, the base case being |G| = 1. Suppose that existence holds for groups of order < |G|. Without loss of generality suppose that m > 0, so that p divides |G|. First suppose that p does not divide |Z G |. Referring to the class equation of G, we see that p must fail to divide some integer |G|/|Z G (x j )| for which |Z G (x j )| < |G|. Since p m is the exact power of p dividing |G|, we conclude that p m divides this |Z G (x j )| and p m+1 does not. Since |Z G (x j )| < |G|, the inductive hypothesis shows that Z G (x j ) has a subgroup of order p m , and this is a Sylow p-subgroup of G. Now suppose that p divides |Z G |. The group Z G is ﬁnitely generated abelian, hence is a direct sum of cyclic groups by Theorem 4.56. Thus Z G contains an element c of order p. The cyclic group C generated by c then has order p. Being a subgroup of Z G , C is normal in G. The group G/C has order p m−1r , and the inductive hypothesis implies that G/C has a subgroup H of order p m−1 . If ϕ : G → G/C denotes the quotient map, then ϕ −1 (H ) is a subgroup of G of order |H || ker ϕ| = p m−1 p = p m . For the remaining parts of Theorem 4.59, we make use of a different group action. If denotes the set of all subgroups of G, then G acts on by conjugation: (g, H ) → g H g −1 . The orbit of a subgroup of H consists of all subgroups conjugate to H in G, and the isotropy subgroup at the point H in is {g ∈ G | g H g −1 = H }. This is a subgroup N (H ) of G known as the normalizer of H in G. It has the properties that N (H ) ⊇ H and that H is a normal subgroup of N (H ). The counting formula of Corollary 4.35 gives {g H g −1 | g ∈ G} = |G/N (H )|. Meanwhile, application of Lagrange’s Theorem (Theorem 4.7) to the three quotients G/H , G/N (H ), and N (H )/H shows that |G/H | = |G/N (H )||N (H )/H |, with all three factors being integers. Now assume as in the statement of Theorem 4.59 that |G| = p m r with p prime and p not dividing r . In this setting we have the following lemma.

10. Sylow Theorems

187

Lemma 4.62. If P is a Sylow p-subgroup of G and if H is a subgroup of the normalizer N (P) whose order is a power of p, then H ⊆ P. PROOF. Since H ⊆ N (P) and P is normal in N (P), the set H P of products is a group, by the same argument as used for Hp Hq in the proof of Proposition 4.60. Then H P/P ∼ = H/(H ∩ P) by the Second Isomorphism Theorem (Theorem 4.14), and hence |H P/P| is some power p k of p. By Lagrange’s Theorem (Theorem 4.7), |H P| = p m+k with k ≥ 0. Since no subgroup of G can have order pl with l > m, we must have k = 0. Thus H P = P and H ⊆ P. PROOF OF THE REMAINDER OF THEOREM 4.59. Within the set of all subgroups of G, let be the set of all subgroups of G of order p m . We have seen that is not empty. Since the conjugate of a subgroup has the same order as the subgroup, is the union of orbits of under conjugation by G. Thus we can restrict the group action by conjugation from G × → to G × → . Let P and P be members of , and let and be the G orbits of P and P under conjugation. Suppose that and are distinct orbits of G. Let us restrict the group action by conjugation from G × → to P × → . The G orbits and then break into P orbits, and the counting formula Corollary 4.35 says for each orbit that p m = |P| = #{subgroups in a P orbit} × isotropy subgroup within P . Hence the number of subgroups in a P orbit is of the form pl for some l ≥ 0. Suppose that l = 0. Then the P orbit is some singleton set {P }, and the corresponding isotropy subgroup within P is all of P: P = { p ∈ P | p P p −1 = P } ⊆ N (P ). Lemma 4.62 shows that P ⊆ P , and therefore P = P . Thus l = 0 only for the P orbit {P}. In other words, the number of elements in any P orbit other than {P} is divisible by p. Consequently || ≡ 1 mod p while | | ≡ 0 mod p, the latter because and are assumed distinct. But this conclusion is asymmetric in the G orbits and , and we conclude that and must coincide. Hence there is only one G orbit in , and it has kp + 1 members for some k. This proves parts (b) and (c) except for the fact that kp + 1 divides r . For this divisibility let us apply the counting formula Corollary 4.35 to the orbit of G. The formula gives |G| = || |isotropy subgroup|, and hence || divides |G| = p m r . Since || = kp + 1, we have GCD(||, p) = 1 and also GCD(||, p m ) = 1. By Corollary 1.3, kp + 1 divides r . Finally we prove that any subgroup H of G of order pl lies in some Sylow p-subgroup. Let = again be the G orbit in of subgroups of order p m ,

188

IV. Groups and Group Actions

and restrict the action by conjugation from G × → to H × → . Each H orbit in must have pa elements for some a, by one more application of the counting formula Corollary 4.35. Since || ≡ 1 mod p, some H orbit has one element, say the H orbit of P. Then the isotropy subgroup of H at the point P is all of H , and H ⊆ N (P). By Lemma 4.62, H ⊆ P. This completes the proof of Theorem 4.59.

11. Categories and Functors The mathematics thus far in the book has taken place in several different contexts, and we have seen that the same notions sometimes recur in more than one context, possibly with variations. For example we have worked with vector spaces, innerproduct spaces, groups, rings, and ﬁelds, and we have seen that each of these areas has its own deﬁnition of isomorphism. In addition, the notion of direct product or direct sum has arisen in more than one of these contexts, and there are other similarities. In this section we introduce some terminology to make the notion of “context” precise and to provide a setting for discussing similarities between different contexts. A category C consists of three things: • a class of objects, denoted by Obj(C ), • for any two objects A and B in the category, a set Morph(A, B) of morphisms, • for any three objects A, B, and C in the category, a law of composition for morphisms, i.e., a function carrying Morph(A, B)×Morph(B, C) into Morph(A, C), with the image of ( f, g) under composition written as g f , and these are to satisfy certain properties that we list in a moment. When more than one category is under discussion, we may use notation like MorphC (A, B) to distinguish between the categories. We are to think initially of the objects as the sets we are studying with a particular kind of structure on them; the morphisms are then the functions from one object to another that respect this additional structure, and the law of composition is just composition of functions. Indeed, the deﬁning conditions that are imposed on general categories are arranged to be obvious for this special kind of category, and this setting accounts for the order in which we write the composition of two morphisms. But the deﬁnition of a general category is not so restrictive, and it is important not to restrict the deﬁnition in this way. The properties that are to be satisﬁed to have a category are as follows: (i) the sets Morph(A1 , B1 ) and Morph(A2 , B2 ) are disjoint unless A1 = A2 and B1 = B2 (because two functions are declared to be different

11. Categories and Functors

189

unless their domains match and their ranges match, as is underscored in Section A1 of the appendix), (ii) the law of composition satisﬁes the associativity property h(g f ) = (hg) f for f ∈ Morph(A, B), g ∈ Morph(B, C), and h ∈ Morph(C, D), (iii) for each object A, there is an identity morphism 1 A in Morph(A, A) such that f 1 A = f and 1 A g = g for f ∈ Morph(A, B) and g ∈ Morph(C, A). A subcategory S of a category C by deﬁnition is a category with Obj(S ) ⊆ Obj(C ) and MorphS (A, B) ⊆ MorphC (A, B) whenever A and B are in Obj(S ), and it is assumed that the laws of composition in S and C are consistent when both are deﬁned. Here are several examples in which the morphisms are functions and the law of composition is ordinary composition of functions. They are usually identiﬁed in practice just by naming their objects, since the morphisms are understood to be all functions from one object to another respecting the additional structure on the objects. EXAMPLES OF CATEGORIES. (1) The category of all sets. An object A is a set, and a morphism in the set Morph(A, B) is a function from A into B. (2) The category of all vector spaces over a ﬁeld F. The morphisms are linear maps. (3) The category of all groups. The morphisms are group homomorphisms. (4) The category of all abelian groups. The morphisms again are group homomorphisms. This is a subcategory of the previous example. (5) The category of all rings. The morphisms are all ring homomorphisms. The kernel and the image of a morphism are necessarily objects of the category. (6) The category of all rings with identity. The morphisms are all ring homomorphisms carrying identity to identity. This is a subcategory of the previous example. The image of a morphism is necessarily an object of the category, but the kernel of a morphism is usually not in the category. (7) The category of all ﬁelds. The morphisms are as in Example 6, and the result is a subcategory of Example 6. In this case any morphism is necessarily one-one and carries inverses to inverses. (8) The category of all group actions by a particular group G. If G acts on X and on Y , then a morphism from the one space to the other is a G equivariant mapping from X to Y , i.e., a function ϕ : X → Y such that ϕ(gx) = gϕ(x) for all x in X . (9) The category of all representations by a particular group G on a vector space over a particular ﬁeld F. The morphisms are the linear G equivariant functions. This is a subcategory of the previous example.

190

IV. Groups and Group Actions

Readers who are familiar with point-set topology will recognize that one can impose topologies on everything in the above examples, insisting that the functions be continuous, and again we obtain examples of categories. For example the category of all topological spaces consists of objects that are topological spaces and morphisms that are continuous functions. The category of all continuous group actions by a particular topological group has objects that are group actions G × X → X that are continuous functions, and the morphisms are the equivariant functions that are continuous. Readers who are familiar with manifolds will recognize that another example is the category of all smooth manifolds, which consists of objects that are smooth manifolds and morphisms that are smooth functions. The morphisms in a category need not be functions in the usual sense. An important example is the “opposite category” C opp to a category C, which is a handy technical device and is discussed in Problems 78–80 at the end of the chapter. In all of the above examples of categories, the class of objects fails to be a set. This behavior is typical. However, it does not cause problems in practice because in any particular argument involving categories, we can restrict to a subcategory for which the objects do form a set.15 If C is a category, a morphism ϕ ∈ Morph(A, B) is said to be an isomorphism if there exists a morphism ψ ∈ Morph(B, A) such that ψϕ = 1 A and ϕψ = 1 B . In this case we say that A is isomorphic to B in the category C. Let us check that the morphism ψ is unique if it exists. In fact, if ψ is a member of Morph(B, A) with ψ ϕ = 1 A and ϕψ = 1 B , then ψ = 1 A ψ = (ψ ϕ)ψ = ψ (ϕψ) = ψ 1 B = ψ . We can therefore call ψ the inverse to ϕ. The relation “is isomorphic to” is an equivalence relation.16 In fact, the relation is symmetric by deﬁnition, and it is reﬂexive because 1 A ∈ Morph(A, A) has 1 A as inverse. For transitivity let ϕ1 ∈ Morph(A, B) and ϕ2 ∈ Morph(B, C) be isomorphisms, with respective inverses ψ1 ∈ Morph(B, A) and ψ2 ∈ Morph(C, B). Then ϕ2 ϕ1 is in Morph(A, C), and ψ1 ψ2 is in Morph(C, A). Calculation gives (ψ1 ψ2 )(ϕ2 ϕ1 ) = ψ1 (ψ2 (ϕ2 ϕ1 )) = ψ1 ((ψ2 ϕ2 )ϕ1 ) = ψ1 (1 B ϕ1 ) = ψ1 ϕ1 = 1 A , and similarly (ϕ2 ϕ1 )(ψ1 ψ2 ) = 1C . Therefore ϕ2 ϕ1 ∈ Morph(A, C) is an isomorphism, and “is isomorphic to” is an equivalence relation. When A is isomorphic to B, it is permissible to say that A and B are isomorphic. The next step is to abstract a frequent kind of construction that we have 15 For the interested reader, a book that pays closer attention to the inherent set-theoretic difﬁculties in the theory is Mac Lane’s Categories for the Working Mathematician. 16 Technically one considers relations only when they are deﬁned on sets, and the class of objects in a category is typically not a set. However, just as with vector spaces, groups, and so on, we can restrict attention in any particular situation to a subcategory for which the objects do form a set, and then there is no difﬁculty.

11. Categories and Functors

191

used with our categories. If C and D are two categories, a covariant functor F : C → D associates to each object A in Obj(C ) an object F(A) in Obj(D) and to each pair of objects A and B and morphism f in MorphC (A, B) a morphism F( f ) in MorphD (F(A), F(B)) such that (i) F(g f ) = F(g)F( f ) for f ∈ MorphC (A, B) and g ∈ MorphC (B, C), (ii) F(1 A ) = 1 F(A) for A in Obj(C ). EXAMPLES OF COVARIANT FUNCTORS. (1) Inclusion of a subcategory into a category is a covariant functor. (2) Let C be the category of all sets. If F carries each set X to the set 2 X of all subsets of X , then F is a covariant functor as soon as its effect on functions between sets, i.e., its effect on morphisms, is deﬁned in an appropriate way. Namely, if f : X → Y is a function, then F( f ) is to be a function from F(X ) = 2 X to F(Y ) = 2Y . That is, we need a deﬁnition of F( f )(A) as a subset of Y whenever A is a subset of X . A natural way of making such a deﬁnition is to put F( f )(A) = f (A), and then F is indeed a covariant functor. (3) Let C be any of Examples 2 through 6 of categories above, and let D be the category of all sets, as in Example 1 of categories. If F carries an object A in C (i.e., a vector space, group, ring, etc.) into its underlying set and carries each morphism into its underlying function between two sets, then F is a covariant functor and furnishes an example of what is called a forgetful functor. (4) Let C be the category of all vector spaces over a ﬁeld F, let U be a vector space over F, and let F : C → C be deﬁned on a vector space to be the vector space of linear maps F(V ) = HomF (U, V ). The set of morphisms MorphC (V1 , V 2 ) is HomF (V1 , V2 ). If f is in MorphC (V1 , V2 ), then F( f ) is to be in MorphC HomF (U, V1 ), HomF (U, V2 ) , and the deﬁnition is that F( f )(L) = f ◦ L for L ∈ HomF (U, V1 ). Then F is a covariant functor: to check that F(g f ) = F(g)F( f ) when g is in MorphC (V2 , V3 ), we write F(g f )(L) = g f ◦ L = g ◦ f L = g ◦ F( f ) = F(g)F( f ). (5) Let C be the category of all groups, let D be the category of all sets, let G be a group, and let F : C → D be the functor deﬁned as follows. For a group H , F(H ) is the set of all group homomorphisms from G into H . The set of morphisms MorphC (H1 , H2 ) is the set of group homomorphisms from H1 into H2 . If f is in MorphC (H1 , H2 ), then F( f ) is to be a function with domain the set of homomorphisms from G into H1 and with range the set of homomorphisms from G into H2 . Let F( f )(ϕ) = ϕ ◦ f . Then F is a covariant functor. (6) Let C be the category of all sets, and let D be the category of all abelian groups. To a set S, associate the free abelian group F(S) with S as Z basis. If f : S → S is a function, then the universal mapping property of external

192

IV. Groups and Group Actions

direct sums of abelian groups (Proposition 4.17) yields a corresponding group homomorphism from F(S) to F(S ), and we deﬁne this group homomorphism to be F( f ). Then F is a covariant functor. (7) Let C be the category of all ﬁnite sets, ﬁx a commutative ring R with identity, and let D be the category of all commutative rings with identity. To a ﬁnite set S, associate the commutative ring F(S) = R[{X s | s ∈ S}]. If f : S → S is a function, then the properties of substitution homomorphisms give us a corresponding homomorphism of rings with identity carrying F(S) to F(S ), and the result is a covariant functor. There is a second kind of functor of interest to us. If C and D are two categories, a contravariant functor F : C → D associates to each object A in Obj(C ) an object F(A) in Obj(D) and to each pair of objects A and B and morphism f in MorphC (A, B) a morphism F( f ) in MorphD (F(B), F(A)) such that (i) F(g f ) = F( f )F(g) for f ∈ MorphC (A, B) and g ∈ MorphD (B, C), (ii) F(1 A ) = 1 F(A) for A in Obj(C ). EXAMPLES OF CONTRAVARIANT FUNCTORS. (1) Let C be the category of all vector spaces over a ﬁeld F, let W be a vector space over F, and let F : C → C be deﬁned on a vector space to be the vector space of linear maps F(V ) = HomF (V, W ). The set of morphisms MorphC (V 1 , V2 ) is HomF (V1 , V2 ). If f is in MorphC (V1 , V2 ), then F( f ) is to be in MorphC HomF (V2 , W ), HomF (V1 , W ) , and the deﬁnition is that F( f )(L) = L ◦ f for L ∈ HomF (V1 , W ). Then F is a contravariant functor: to check that F(g f ) = F( f )F(g) when g is in MorphC (V2 , V3 ), we write F(g f )(L) = L ◦ g f = Lg ◦ f = F( f )(Lg) = F( f )F(g). (2) Let C be the category of all vector spaces over a ﬁeld F, deﬁne F of a vector space V to be the dual vector space V , and deﬁne F of a linear mapping f between two vector spaces V and W to be the contragredient f t carrying W into V , deﬁned by f t (w )(v) = w ( f (v)). This is the special case of Example 1 of contravariant functors in which W = F. Hence F is a contravariant functor. (3) Let C be the category of all groups, let D be the category of all sets, let G be a group, and let F : C → D be the functor deﬁned as follows. For a group H , F(H ) is the set of all group homomorphisms from H into G. The set of morphisms MorphC (H1 , H2 ) is the set of group homomorphisms from H1 into H2 . If f is in MorphC (H1 , H2 ), then F( f ) is to be a function with domain the set of homomorphisms from H2 into G and with range the set of homomorphisms from H1 into G. The deﬁnition is F( f )(ϕ) = f ◦ ϕ. Then F is a contravariant functor.

11. Categories and Functors

193

It is an important observation about functors that the composition of two functors is a functor. This is immediate from the deﬁnition. If the two functors are both covariant or both contravariant, then the composition is covariant. If one of them is covariant and the other is contravariant, then the composition is contravariant. α

A −−−→ ⏐ ⏐ β

B ⏐ ⏐γ

C −−−→ D δ

FIGURE 4.9. A square diagram. The square commutes if γ α = δβ. In the subject of category theory, a great deal of information is conveyed by “commutative diagrams” of objects and morphisms. By a diagram is meant a directed graph, usually but not necessarily planar, in which the vertices represent some relevant objects in a category and the arrows from one vertex to another represent morphisms of interest between pairs of these objects. Often the vertices and arrows are labeled, but in fact labels on the vertices can be deduced from the labels on the arrows since any morphism determines its “domain” and “range” as a consequence of deﬁning property (i) of categories. A diagram is said to be commutative if for each pair of vertices A and B and each directed path from A to B, the compositions of the morphisms along each path are the same. For example a square as in Figure 4.9 is commutative if γ α = δβ. The triangular diagrams in Figures 4.1 through 4.8 are all commutative. F(α)

F(A) −−−→ F(B) ⏐ ⏐ ⏐ ⏐ F(γ ) F(β) F(C) −−−→ F(D) F(δ)

G(α)

and

G(A) ←−−− G(B) ⏐ ⏐G(γ ) G(β)⏐ ⏐ G(C) ←−−− G(D) G(δ)

FIGURE 4.10. Diagrams obtained by applying a covariant functor F and a contravariant functor G to the diagram in Figure 4.9. Functors can be applied to diagrams, yielding new diagrams. For example, suppose that Figure 4.9 is a diagram in the category C, that F : C → D is a covariant functor, and that G : C → D is a contravariant functor. Then we can apply F and G to the diagram in Figure 4.9, obtaining the two diagrams in the category D that are pictured in Figure 4.10. If the diagram in Figure 4.9 is commutative, then so are the diagrams in Figure 4.10, as a consequence of the effect of functors on compositions of morphisms.

IV. Groups and Group Actions

194

The subject of category theory seeks to analyze functors that make sense for all categories, or at least all categories satisfying some additional properties. The most important investigation of this kind is concerned with homology and cohomology, as well as their ramiﬁcations, for “abelian categories,” which include several important examples affecting algebra, topology, and several complex variables. The topic in question is called “homological algebra” and is discussed further in Advanced Algebra. There are a number of other functors that are investigated in category theory, and we mention four: • • • •

products, including direct products, coproducts, including direct sums, direct limits, also called inductive limits, inverse limits, also called projective limits.

We discuss general products and coproducts in the present section, omitting a general discussion of direct limits and inverse limits. Inverse limits will arise in Advanced Algebra for one category in connection with Galois groups, but we shall handle that one situation on its own without attempting a generalization. An attempt in the 1960s to recast as much mathematics as possible in terms of category theory is now regarded by many mathematicians as having been overdone, and it seems wiser to cast bodies of mathematics in the framework of category theory only when doing so can be justiﬁed by the amount of time saved by eliminating redundant arguments. When a category C and a nonempty set S are given, we can deﬁne a category C S . The objects of C S are functions on S with the property that the value of the function at each s in S is in Obj(C ), two such functions being regarded as the same if they consist of the same ordered pairs.17 Let us refer to such a function as an S-tuple of members of Obj(C ), denoting it by an expression like {X s }s∈S . A morphism in MorphCS {X s }s∈S , {Ys }s∈S is an S-tuple { f s }s∈S of morphisms of C such that f s lies in MorphC (X s , Ys ) for all s, and the law of composition of such morphisms takes place coordinate by coordinate. Let {X s }s∈S be an object in C S . A product of {X s }s∈S is a pair (X, { ps }s∈S ) such that X is in Obj(C ) and each ps is in MorphC (X, X s ) with the following universal mapping property: whenever A in Obj(C ) is given and a morphism ϕs ∈ MorphC (A, X s ) is given for each s, then there exists a unique morphism ϕ ∈ MorphC (A, X ) such that ps ϕ = ϕs for all s. The relevant diagram is pictured in Figure 4.11. 17 In other words, the range of such a function is considered as irrelevant. We might think of the range as Obj(C) except for the fact that a function is supposed to have a set as range and Obj(C) need not be a set.

11. Categories and Functors

195

ϕs

X s ←−−− A ⏐ ϕ ps ⏐ X FIGURE 4.11. Universal mapping property of a product in a category. EXAMPLES OF PRODUCTS. (1) Products exist in the category of vector spaces over a ﬁeld F. If vector spaces Vs indexed by a nonempty set S are given, then their product exists in the category, and an example is their external direct product s∈S Vs , according to Figure 2.4 and the discussion around it. (2) Products exist in the category of all groups. If groups G s indexed by a nonempty set S are given, then their product exists in the category, and an example to Figure 4.2 and Proposition is their external direct product s∈S G s , according 4.15. If the groups G s are abelian, then s∈S G s is abelian, and it follows that products exist in the category of all abelian groups. (3) Products exist in the category of all sets. If sets X s indexed by a nonempty set S are given, then their product exists in the category, and an example is their Cartesian product ×s∈S X s , as one easily checks. (4) Products exist in the category of all rings and in the category of all rings with identity. If objects Rs in the category indexed by a nonempty set S are given, then their product may be taken as an abelian group to be the external direct product s∈S Rs , with multiplication deﬁned coordinate by coordinate, and the group homomorphisms ps are easily checked to be morphisms in the category. A product of objects in a category need not exist in the category. An artiﬁcial example may be formed as follows: Let C be a category with one object G, namely a group of order 2, and let Morph(G, G) = {0, 1G }, the law of composition being the usual composition. Let S be a 2-element set, and let the corresponding objects be X 1 = G and X 2 = G. The claim is that the product X 1 × X 2 does not exist in C. In fact, take A = G. There are four S-tuples of morphisms (ϕ1 , ϕ2 ) meeting the conditions of the deﬁnition. Yet the only possibility for the product is X = G, and then there are only two possible ϕ’s in Morph(A, X ). Hence we cannot account for all possible S-tuples of morphisms, and the product cannot exist. The thing that category theory addresses is the uniqueness. A product is always unique up to canonical isomorphism, according to Proposition 4.63. We proved uniqueness for products in the special cases of Examples 1 and 2 above in Propositions 2.32 and 4.16.

196

IV. Groups and Group Actions

Proposition 4.63. Let C be a category, and let S be a nonempty set. If {X s }s∈S is an object in C S and if (X, { ps }) and (X , { ps }) are two products, then there exists a unique morphism : X → X such that ps = ps ◦ for all s ∈ S, and is an isomorphism. REMARK. There is no assertion that ps is onto X s . In fact, “onto” has no meaning for a general category. PROOF. In Figure 4.11 let A = X and ϕs = ps . If ∈ Morph(X , X ) is the morphism produced by the fact that X is a direct product, then we have ps = ps for all s. Reversing the roles of X and X , we obtain a morphism ∈ Morph(X, X ) with ps = ps for all s. Therefore ps ( ) = ( ps ) = ps = ps . In Figure 4.11 we next let A = X and ϕs = ps for all s. Then the identity 1 X in Morph(X, X ) has the same property ps 1 X = ps relative to all ps that has, and the uniqueness in the statement of the universal mapping property implies that = 1 X . Reversing the roles of X and X , we obtain = 1 X . Therefore is an isomorphism. For uniqueness suppose that ∈ Morph(X , X ) is another morphism with ps = ps for all s ∈ S. Then the argument of the previous paragraph shows that = 1 X . Consequently = 1 X = ( ) = ( ) = 1 X = , and = . If products always exist in a particular category, they are not unique, only unique up to canonical isomorphism. Such a product is commonly denoted by s∈S X s , even though it is not uniquely deﬁned. It is customary to treat the S product over S as a covariant functor F : C → C, the effect of the functor on objects being given by F({X s }s∈S ) = s∈S X s . For a well-deﬁned functor we have to ﬁx a choice of product for each object under consideration18 in Obj(C S ). For the effect of F on morphisms, we argue with the universal mapping property. S Thus let {X s }s∈S and {Ys }s∈S be objects in C , let f s be in Morph C (X s , Ys ) for all X , { p } and s, and let the products in question be s s s∈S s∈S s∈S Ys , {qs }s∈S . Then f s0 ps0 is in MorphC X s , Ys0 for each s0 , and the universal mapping s∈S property gives us f in MorphC s∈S X s , s∈S Ys such that qs f = f s ps for all s. We deﬁne this f to be F({ f s }s∈S ), and we readily check that F is a functor. We turn to coproducts, which include direct sums. Let {X s }s∈S be an object in C S . A coproduct of {X s }s∈S is a pair (X, {i s }s∈S ) such that X is in Obj(C ) and each i s is in MorphC (X s , X ) with the following universal mapping property: whenever A in Obj(C ) is given and a morphism ϕs ∈ MorphC (X s , A) is given 18 Since Obj(C S ) need not be a set, it is best to be wary of applying the Axiom of Choice when the indexing of sets is given by Obj(C S ). Instead, one makes the choice only for all objects in some set of objects large enough for a particular application.

11. Categories and Functors

197

for each s, then there exists a unique morphism ϕ ∈ MorphC (X, A) such that ϕi s = ϕs for all s. The relevant diagram is pictured in Figure 4.12. ϕs

X s −−−→ A ⏐ ⏐ ϕ is X FIGURE 4.12. Universal mapping property of a coproduct in a category. EXAMPLES OF COPRODUCTS. (1) Coproducts exist in the category of vector spaces over a ﬁeld F. If vector spaces Vs indexed by a nonempty set S are given, then their coproduct exists in the category, and an example is their external direct sum s∈S Vs , according to Figure 2.5 and the discussion around it. (2) Coproducts exist in the category of all abelian groups. If abelian groups G s indexed by a nonempty set S are given, then their coproduct exists in the category, and an example is their external direct sum s∈S G s , according to Figure 4.4 and Proposition 4.17. (3) Coproducts exist in the category of all sets. If sets X s indexed by a nonempty set S are given, then their coproduct exists in the category, and an example is their disjoint union s∈S {(xs , s) | xs ∈ X s }. The veriﬁcation appears as Problem 74 at the end of the chapter. (4) Coproducts exist in the category of all groups. Suppose that groups G s indexed by a nonempty set S are given. It will be shown in Chapter VII that the coproduct is the “free product” s∈S G s that is deﬁned in that chapter. In the special case that each G s is the group Z of integers, the free product coincides with the free group on S. Therefore, even if all the groups G s are abelian, their coproduct need not be a subgroup of the direct product and need not even be abelian. In particular it need not coincide with the direct sum.

*

A coproduct of objects in a category need not exist in the category. Problem 76 at the end of the chapter offers an example that the reader is invited to check. Proposition 4.64. Let C be a category, and let S be a nonempty set. If {X s }s∈S is an object in C S and if (X, {i s }) and (X , {i s }) are two coproducts, then there exists a unique morphism : X → X such that i s = ◦ i s for all s ∈ S, and is an isomorphism. REMARKS. There is no assertion that i s is one-one. In fact, “one-one” has no meaning for a general category. This proposition may be derived quickly from Proposition 4.63 by a certain duality argument that is discussed in Problems

198

IV. Groups and Group Actions

78–80 at the end of the chapter. Here we give a direct argument without taking advantage of duality. PROOF. In Figure 4.12 let A = X and ϕs = i s . If ∈ Morph(X, X ) is the morphism produced by the fact that X is a coproduct, then we have i s = i s for all s. Reversing the roles of X and X , we obtain a morphism ∈ Morph(X , X ) with i s = i s for all s. Therefore ( )i s = i s = i s . In Figure 4.12 we next let A = X and ϕs = i s for all s. Then the identity 1 X in Morph(X, X ) has the same property 1 X i s = i s relative to all i s that has, and the uniqueness says that = 1 X . Reversing the roles of X and X , we obtain = 1 X . Therefore is an isomorphism. For uniqueness suppose that ∈ Morph(X, X ) is another morphism with i s = i s for all s ∈ S. Then the argument of the previous paragraph shows that = 1 X . Consequently = 1 X = ( ) = ( ) = 1 X = , and = . If coproducts always exist in a particular category, they are not unique, only unique up to canonical isomorphism. Such a coproduct is commonly denoted by ) X s∈S s , even though it is not uniquely deﬁned. As with product, it is customary to treat the coproduct over S as a covariant functor) F : C S → C, the effect of the functor on objects being given by F({X s }s∈S ) = s∈S X s . For a well-deﬁned functor we have to ﬁx a choice of coproduct for each object under consideration in Obj(C S ). For the effect of F on morphisms, we argue with the universal mapping property. Thus let {X s }s∈S and {Ys }s∈S be objects in)C S , let f s be in be s∈S X s , {i s }s∈S Morph )C (X s , Ys ) for alls, and let the coproducts in question ) and Y , { j } f is in Morph , Y for each s0 , and . Then j X s s s∈S s s s C 0 0 0 s∈S )s∈S s ) the universal mapping property gives us f in MorphC s∈S X s , s∈S Ys such that f i s = js f s for all s. We deﬁne this f to be F({ f s }s∈S ), and we readily check that F is a functor. Universal mapping properties occur in other contexts than for products and coproducts. We have already seen them in connection with homomorphisms on free abelian groups and with substitution homomorphisms on polynomial rings, and more such properties will occur in the development of tensor products in Chapter VI. A general framework for discussing universal mapping properties appears in the problems at the end of Chapter VI. 12. Problems 1.

Let G be a group in which all elements other than the identity have order 2. Prove that G is abelian.

2.

The dihedral group D4 of order 8 can be viewed as a subgroup of the symmetric group S4 of order 8. Find 8 explicit permutations in S4 forming a subgroup isomorphic to D4 .

12. Problems

199

3.

Suppose G is a ﬁnite group, H is a subgroup, and a ∈ G is an element with a l in H for some integer l with GCD(l, |G|) = 1. Prove that a is in H .

4.

Let G be a group, and deﬁne a new group G to have the same underlying set as G but to have multiplication given by a ◦ b = ba. Prove that G is a group and that it is isomorphic to G.

5.

Prove that if G is an abelian group and n is an integer, then a → a n is a homomorphism of G. Give an example of a nonabelian group for which a → a 2 is not a homomorphism.

6.

Suppose that G is a group and that H and K are normal subgroups of G with H ∩ K = {1}. Verify that the set H K of products is a subgroup and that this subgroup is isomorphic as a group to the external direct product H × K .

7.

Take as known that 8191 is prime, so that F8191 is a ﬁeld. Without carrying through the computations and without advocating trial and error, describe what steps you would carry out to solve for x mod 8191 such that 1234x ≡ 1 mod 8191.

8.

(Wilson’s Theorem) Let p be an odd prime. Starting from the fact that 1, . . . , p − 1 are roots of the polynomial X p−1 − 1 ≡ 0 mod p in F p , prove that ( p − 1)! ≡ −1 mod p.

9.

Classify, up to isomorphism, all groups of order p 2 if p is a prime.

10. This problem concerns conjugacy classes in a group G. (a) Prove that all elements of a conjugacy class have the same order. (b) Prove that if ab is in a conjugacy class, so is ba. 11. (a) Find explicitly all the conjugacy classes in the alternating group A4 . (b) For each conjugacy class in A4 , ﬁnd the centralizer of one element in the class. (c) Prove that A4 has no subgroup isomorphic to C6 or S3 . 12. Prove that the alternating group A5 has no subgroup of order 30. 13. Let G be a nonabelian group of order p n , where p is prime. Prove that any subgroup of order p n−1 is normal. 14. Let G be a ﬁnite group, and let H be a normal subgroup. If |H | = p and p is the smallest prime dividing |G|, prove that H is contained in the center of G. 15. Let G be a group. An automorphism of G of the form x → gxg −1 is called an inner automorphism. Prove that the set of inner automorphisms is a normal subgroup of the group Aut G of all automorphisms and is isomorphic to G/Z G . 16. (a) Prove that Aut Cm is isomorphic to (Z/mZ)× . (b) Find a value of m for which Aut Cm is not cyclic.

IV. Groups and Group Actions

200

17. Fix n ≥ 2. In the symmetric group Sn , for each integer k with 1 ≤ k ≤ n/2, let Ck be the set of elements in Sn that are products of k disjoint transpositions. (a) Prove that if τ is an of Sn , then τ (C1 ) = Ck for some k.

automorphism n (2k)! (b) Prove that |Ck | = . 2k 2k k! (c) Prove that |Ck | = |C1 | unless k = 1 or n = 6. (Educational note: From this, it follows that τ (C1 ) = C1 except possibly when n = 6. One can deduce as a consequence that every automorphism of Sn is inner except possibly when n = 6.) 18. Give an example: G is a group with a normal subgroup N , N has a subgroup M that is normal in M, yet M is not normal in G. 19. Show that the cyclic group Cr s is isomorphic to Cr ×Cs if and only if GCD(r, s)=1. 20. How many abelian groups, up to isomorphism, are there of order 27? 21. Let G be the free abelian group with Z basis {x1 , x2 , x3 }. Let H be the subgroup of G generated by {u 1 , u 2 , u 3 }, where u 1 = 3x1 + 2x2 + 5x3 , u 2 = x2 + 3x3 , u 3 = x2 + 5x3 . Express G/H as a direct sum of cyclic groups. 22. Let {e1 , e2 , e3 , e4 } be the standard basis of R4 . Let G be the additive subgroup of R4 generated by the four elements e1 ,

e1 + e 2 ,

1 2 (e1

+ e2 + e3 + e4 ),

1 2 (e1

+ e2 + e3 − e4 ),

and let H be the subgroup of G generated by the four elements e1 − e2 ,

e2 − e 3 ,

e 3 − e4 ,

e 3 + e4 .

Identify the abelian group G/H as a direct sum of cyclic groups. 23. Let G be the free abelian group with Z basis {x1 , . . . , xn }, and let H be the x1 u1 .. . = C .. for an m-by-n subgroup generated by {u 1 , . . . , u m }, where . um

xn

matrix C of integers. Prove that the number of summands Z in the decomposition of G/H into cyclic groups is equal to the rank of the matrix C when C is considered as in Mmn (Q). 24. Prove that every abelian group is the homomorphic image of a free abelian group. 25. Let G be a group, and let H and K be subgroups. (a) For x and y in G, prove that x H ∩ y K is empty or is a coset of H ∩ K . (b) Deduce from (a) that if H and K have ﬁnite index in G, then so does H ∩ K .

12. Problems

201

26. Let G be a free abelian group of ﬁnite rank n, and let H be a free abelian subgroup of rank n. Prove that H has ﬁnite index in G. 27. Let G = S4 be the symmetric group on four letters. (a) Find a Sylow 2-subgroup of G. How many Sylow 2-subgroups are there, and why? (b) Find a Sylow 3-subgroup of G. How many Sylow 3-subgroups are there, and why? 28. Let H be a subgroup of a group G. Prove or disprove that the normalizer N (H ) of H in G is a normal subgroup of G. 29. How many elements of order 7 are there in a simple group of order 168? 30. Let G be a group of order pq 2 , where p and q are primes with p < q. Let Sp and Sq be Sylow subgroups for the primes p and q. Prove that G is a semidirect product of Sp and Sq with Sq normal. 31. Suppose that G is a ﬁnite group and that H is a subgroup whose index in G is a prime p. By considering the action of G on the set of subgroups conjugate to H and considering the possibilities for the normalizer N (H ), determine the possibilities for the number of subgroups conjugate to H . 32. Let G be a group of order 24, let H be a subgroup of order 8, and assume that H is not normal. (a) Using the Sylow Theorems, explain why H has exactly 3 conjugates in G, counting H itself as one. (b) Show how to use the conjugates in (a) to deﬁne a homomorphism of G into the symmetric group S3 on three letters. (c) Use the homomorphism of (b) to conclude that G is not simple. 33. Let G be a group of order 36. Arguing in the style of the previous problem, show that there is a nontrivial homomorphism of G into the symmetric group S4 . 34. Let G be a group of order 2 pq, where p and q are primes with 2 < p < q. (a) Prove that if q + 1 = 2 p, then a Sylow q-subgroup is normal. (b) Suppose that q + 1 = 2 p, let H be a Sylow p-subgroup, and let K be a Sylow q-subgroup. Prove that at least one of H and K is normal, that the set H K of products is a subgroup, and that the subgroup H K is cyclic of index 2 in G. Problems 35–38 concern the detection of isomorphisms among semidirect products. For the ﬁrst two of the problems, let H and K be groups, and let ϕ1 : H → Aut K and ϕ2 : H → Aut K be homomorphisms. 35. Suppose that ϕ2 = ϕ1 ◦ϕ for some automorphism ϕ of H . Deﬁne ψ : H ×ϕ2 K → H ×ϕ1 K by ψ(h, k) = (ϕ(h), k). Prove that ψ is an isomorphism.

202

IV. Groups and Group Actions

36. Suppose that ϕ2 = ϕ ◦ ϕ1 for some inner automorphism ϕ of Aut K in the sense of Problem 15, i.e., ϕ : Aut K → Aut K is to be given by ϕ(x) = axa −1 with a in Aut K . Deﬁne ψ : H ×ϕ1 K → H ×ϕ2 K by ψ(h, k) = (h, a(k)). Prove that ψ is an isomorphism. 37. Suppose that p and q are primes and that the cyclic group C p acts on Cq by automorphisms with a nontrivial action. Prove that p divides q − 1. 38. Suppose that p and q are primes such that p divides q − 1. Let τ1 and τ2 be nontrivial homomorphisms from C p to Aut Cq . Prove that C p ×τ1 Cq ∼ = C p ×τ2 Cq , and conclude that there is only one nonabelian semidirect product C p ×τ Cq up to isomorphism. Problems 39–44 discuss properties of groups of order 8, obtaining a classiﬁcation of these groups up to isomorphism. 39. Prove that the ﬁve groups C8 , C4 × C2 , C2 × C2 × C2 , D4 , and H8 are mutually nonisomorphic and that the ﬁrst three exhaust the abelian groups of order 8, apart from isomorphisms. 40. (a) Find a composition series for the 8-element dihedral group D4 . (b) Find a composition series for the 8-element quaternion group H8 . 41. (a) Prove that every subgroup of the quaternion group H8 is normal. (b) Identify the conjugacy classes in H8 . (c) Compute the order of Aut H8 . 42. Suppose that G is a nonabelian group of order 8. Prove that G has an element of order 4 but no element of order 8. 43. Let G be a nonabelian group of order 8, and let K be the copy of C4 generated by some element of order 4. If G has some element of order 2 that is not in K , prove that G ∼ = D4 . 44. Let G be a nonabelian group of order 8, and let K be the copy of C4 generated by some element of order 4. If G has no element of order 2 that is not in K , prove that G ∼ = H8 . Problems 45–48 classify groups of order 12, making use of Proposition 4.61, Problem 15, and Problems 35–38. Let G be a group of order 12, let H be a Sylow 3-subgroup, and let K be a Sylow 2-subgroup. Proposition 4.61 says that at least one of H and K is normal. Consequently there are three cases, and these are addressed by the ﬁrst three of the problems. 45. Verify that there are only two possibilities for G up to isomorphism if G is abelian. 46. Suppose that K is normal, so that G ∼ = H ×τ K . Prove that either

(i) τ is trivial or (ii) τ is nontrivial and K ∼ = C2 × C2 , and deduce that G is abelian if (i) holds and that G ∼ = A4 if (ii) holds.

12. Problems

203

47. Suppose that H is normal, so that G = K ×τ H . Prove that one of the conditions

(i) τ is trivial, (ii) K ∼ = C2 × C2 and τ is nontrivial, (iii) K ∼ = C4 and τ is nontrivial holds, and deduce that G is abelian if (i) holds, that G ∼ = D6 if (ii) holds, and that G is nonabelian and is not isomorphic to A4 or D6 if (iii) holds. 48. In the setting of the previous problem, prove that there is one and only one group, up to isomorphism, satisfying condition (iii), and ﬁnd the order of each of its elements. Problems 49–52 assume that p and q are primes with p < q. The problems go in the direction of classifying ﬁnite groups of order p 2 q. 49. If G is a group of order p 2 q, prove that either p 2 q = 12 or a Sylow q-subgroup is normal. 50. If p 2 divides q −1, exhibit three nonabelian groups of order p 2 q that are mutually nonisomorphic. 51. If p divides q − 1 but p 2 does not divide q − 1, exhibit two nonabelian groups of order p 2 q that are not isomorphic. 52. If p does not divide q − 1, prove that any group of order p 2 q is abelian. Problems 53–54 concern nonabelian groups of order 27. 53. (a) Show that multiplication by the elements 1, 4, 7 mod 9 deﬁnes a nontrivial action of Z/3Z on Z/9Z by automorphisms. (b) Show from (a) that there exists a nonabelian group of order 27. (c) Show that the group in (b) is generated by elements a and b that satisfy a 9 = b3 = b−1 aba −4 = 1. 54. Show that any nonabelian group of order 27 having a subgroup H isomorphic to C9 and an element of order 3 not lying in H is isomorphic to the group constructed in the previous problem. Problems 55–62 give a construction of inﬁnitely many simple groups, some of them ﬁnite and some inﬁnite. Let F be a ﬁeld. For n ≥ 2, let SL(n, F) be the special linear group for the space Fn of n-dimensional column vectors. The center Z of SL(n, F) consists of the scalar multiples of the identity, the scalar being an n th root of 1. Let PSL(n, F) = SL(n, F)/Z . It is known that PSL(n, F) is simple except for PSL(2, F2 ) and PSL(2, F3 ). These problems will show that PSL(2, F) is simple if |F| > 5 and F is not of characteristic 2. Most of the argument will consider SL(2, F), and the passage to PSL will occur only at the very end. In Problems 56–61, G denotes a normal subgroup of SL(2, F) that is not contained in the center Z , and it is to be proved that G = SL(2, F).

IV. Groups and Group Actions

204

55. Suppose that F is a ﬁnite ﬁeld with q elements. (a) By considering the possibilities for the ﬁrst column of a matrix and then considering the possibilities for the second column when the ﬁrst column is ﬁxed, compute |GL(2, F)| as a function of q. (b) By using the determinant homomorphism, compute |SL(2, F)| in terms of |GL(2, F)|. (c) Taking into account that F does not have characteristic 2, prove that |PSL(2, F)| = 12 |SL(2, F)|. (d) Show for a suitable ﬁnite ﬁeld F with more than 5 elements that PSL(2, F) has order 168. 56. Let M be a member of G that is not in Z . Since M is not scalar, there exists a column vector u with Mu not a multiple of u. Deﬁne v = Mu, so that (u, v) is an ordered basis of F2 . By rewriting all matrices with the ordered basis (u, v), showthat there is no loss in generality in assuming that G contains a matrix 0 −1 A = 1 c if it is ultimately shown that G = SL(2, F). 57. Let a be a member of the multiplicative group F× to be chosen shortly, and let B be the member −1

−1

ca a −1 −a 0

of SL(2, F). Prove that

(a) B A B A is upper triangular and is in G, (b) B −1 A−1 B A has unequal diagonal entries if a 4 = 1, (c) the condition in (b) can be satisﬁed for a suitable choice of a under the assumption that |F| > 5. x y 58. Suppose that C = 0 x −1 is a member of G for some x = ±1 and some y. Taking D = 10 11 and forming C DC −1 D −1 , show that G contains a matrix E = 10 λ1 with λ = 0. 59. By conjugating E by α0 α0−1 , show that the set of λ in F such that 10 λ1 is in G is closed under multiplication by squares and under addition and subtraction. 60. Using the identity x = 14 (x + 1)2 − 14 (x − 1)2 , deduce from Problems 56–59 that G contains all matrices 10 λ1 with λ ∈ F. 1 0 61. Show that 10 λ1 is conjugate to −λ 1 , and show that the set of all matrices 1λ 1 0 and generates SL(2, F). Conclude that G = SL(2, F). 01 λ 1 62. Using the First Isomorphism Theorem, conclude that the only normal subgroup of PSL(2, F) other than {1} is PSL(2, F) itself. Problems 63–73 brieﬂy introduce the theory of error-correcting codes. Let F be the ﬁnite ﬁeld Z/2Z. The vector space Fn over F will be called Hamming space, and its members are regarded as “words” (potential messages consisting of 0’s and 1’s). The weight wt(c) of a word c is the number of nonzero entries in c. The Hamming

12. Problems

205

distance d(a, b) between words a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) is the weight of a − b, i.e., the number of indices i with 1 ≤ i ≤ n and ai = bi . A code is a nonempty subset C of Fn , and the minimal distance δ(C) of a code is the smallest value of d(a, b) for a and b in C with a = b. By convention if |C| = 1, take δ(C) = n + 1. One imagines that members of C, which are called code words, are allowable messages, i.e., words that can be stored and retrieved, or transmitted and received. A code with minimal distance δ can then detect up to δ − 1 errors in a word ostensibly from C that has been retrieved from storage or has been received in a transmission. The code can correct up to (δ − 1)/2 errors because no word of Fn can be at distance ≤ (δ − 1)/2 from more than one word in C, by Problem 63 below. The interest is in linear codes, those for which C is a vector subspace. It is desirable that each message have a high percentage of content and a relatively low percentage of further information used for error correction; thus a fundamental theoretical problem for linear codes is to ﬁnd the maximum dimension of a linear code if n and a lower bound on the minimal distance for the code are given. As a practical matter, information is likely to be processed in packets of a standard length, such as some power of 2. In many situations packets can be reprocessed if they have been found to have errors. The initial interest is therefore in codes that can recognize and possibly correct a small number of errors. The problems in this set are continued at the ends of Chapters VII and IX. 63. Prove that the Hamming distance satisﬁes d(a, b) ≤ d(a, c) + d(c, b), and conclude that if a word w in Fn is at distance ≤ (D − 1)/2 from two distinct members of the linear code C, then δ(C) < D. 64. Explain why the minimal distance δ(C) of a linear code C = {0} is given by the minimal weight of the nonzero words in C. 65. Fix n ≥ 2. List δ(C) and dim C for the following elementary linear codes: (a) C = 0. (b) C = Fn . (c) (Repetition code) C = {0, (1, 1, . . . , 1)}. (d) (Parity-check code) C = {c ∈ Fn | wt(c) is even}. (Educational note: To use this code, one sends the message in the ﬁrst n − 1 bits and adjusts the last bit so that the word is in C. If there is at most one error in the word, this parity bit will tell when there is an error, but it will not tell where the error occurs.) 66. One way to get a sense of what members of a linear code C in Fn have small weight starts by making a basis for the code into the row vectors of a matrix and row reducing the matrix. (a) Taking into account the distinction between corner variables and independent variables in the process of row reduction, show that every basis vector of C has weight at most the sum of 1 and the number of independent variables. Conclude that dim C + δ(C) ≤ n + 1. (b) Give an example of a linear code with δ(C) = 2 for which equality holds.

IV. Groups and Group Actions

206

(c) Examining the argument for (a) more closely, show that 2 ≤ dim C ≤ n − 2 implies dim C + δ(C) ≤ n.

67. Let C be a linear code with a basis consisting of the rows of

100110 010101 001011

. Show

that δ(C) = 3. Educational note: Thus for n = 6 and δ(C) = 3, we always have dim C ≤ 3, and equality is possible. 68. (Hamming codes) The Hamming code C7 of order 7 is a certain linear code having dim C7 = 4 that will be seen to have δ(C7 ) = 3. The code words of a basis, with their commas removed, may be taken as 1110000, 1001100, 0101010, 1101001. The basis may be described as follows. Bits 1, 2, 4 are used as checks. The remaining bits are used to form the standard basis of F4 . What is put in bits 1, 2, 4 is the binary representation of the position of the nonzero entry in positions 3, 5, 6, 7. When all 16 members of C7 are listed in the order dictated by the bits in positions 3, 5, 6, 7, the resulting list is Decimal value in 3, 5, 6, 7

Code word

Decimal value in 3, 5, 6, 7

Code word

0 1 2 3 4 5 6 7

0000000 1101001 0101010 1000011 1001100 0100101 1100110 0001111

8 9 10 11 12 13 14 15

1110000 0011001 1011010 0110011 0111100 1010101 0010110 1111111

For the general members of C7 , not just the basis vectors, the check bits in positions 1, 2, 4 may be described as follows: the bit in position 1 is a parity bit for the positions among 3, 5, 6, 7 having a 1 in their binary expansions, the bit in position 2 is a parity bit for the positions among 3, 5, 6, 7 having a 2 in their binary expansions, and the bit in position 4 is a parity bit for the positions among 3, 5, 6, 7 having a 4 in their binary expansions. The Hamming code C8 of order 8 is obtained from C7 by adjoining a parity bit in position 8. (a) Prove that δ(C7 ) = 3. (Educational note: Thus for n = 7 and δ(C) = 3, we always have dim C ≤ 4, and equality is possible.) (b) Prove that δ(C8 ) = 4. (c) Describe how to form a generalization that replaces n = 8 by n = 2r with r ≥ 3. The Hamming codes that are obtained will be called C2r −1 and C2r . (d) Prove that dim C2r −1 = dim C2r = 2r −r −1, δ(C2r −1 ) = 3, and δ(C2r ) = 4.

12. Problems

69. The matrix H =

207

1 0 1 0 1 0 1 0110011 0001111

, when multiplied by any column vector c in

the Hamming code C7 , performs the three parity checks done by bits 1, 2, 4 and described in the previous problem. Therefore such a c must have H c = 0. (a) Prove that the condition works in the reverse direction as well—that H c = 0 only if c is in C7 . (b) Deduce that if a received word r is not in C7 and if r is assumed to match some word of C7 except in the i th position, then Hr matches the i th column of H and this fact determines the integer i. (Educational note: Thus there is a simple procedure for testing whether a received word is a code word and for deciding, in the case that it is not a code word, what unique bit to change to convert it into a code word.) 70. Let r ≥ 4. Prove for 2r −1 ≤ n ≤ 2r − 1 that any linear code C in Fn with δ(C) ≥ 3 has dim C ≤ n − r . Observe that equality holds for C = C2r −1 . 71. The weight enumerator polynomial of a linear code C is the polynomial WC (X, Y ) in Z[X, Y ] given by WC (X, Y ) = nk=0 Nk (C)X n−k Y k , where Nk (C) is the number of words of weight k in C. (a) Compute WC (X, Y ) for the following linear codes C: the 0 code, the code Fn , the repetition code, the parity code, the code in Problem 67, the Hamming code C7 , and the Hamming code C8 . (b) Why is the coefﬁcient of X n in WC (X, Y ) necessarily equal to 1? (c) Show that WC (X, Y ) = c∈C X n−wt(c) Y wt(c) . 72. (Cyclic redundancy codes) Cyclic redundancy codes treat blocks of data as coefﬁcients of polynomials in F[X ]. With the size n of data blocks ﬁxed, one ﬁxes a monic generating polynomial G(X ) = 1 + a1 X + · · · + ag−1 X g−1 + X g with a nonzero constant term and with degree g suitably less than n. Data to be transmitted are provided as members (b0 , b1 , . . . , bn−g−1 ) of Fn−g and are converted into polynomials B(X ) = b0 + b1 X + · · · + bn−g−1 X n−g−1 . Then the n-tuple of coefﬁcients of G(X )B(X ) is transmitted. To decode a polynomial P(X ) that is received, one writes P(X ) = G(X )Q(X ) + R(X ) via the division algorithm. If R(X ) = 0, it is assumed that P(X ) is a code word. Otherwise R(X ) is deﬁnitely not a code word. Thus the code C amounts to the system of coefﬁcients of all polynomials G(X )B(X ) with B(X ) = 0 or deg B(X ) ≤ n − g − 1. A basis of C is obtained by letting B(X ) run through the monomials 1, X, . . . , X n−g−1 , and therefore dim C = n − g. Take G(X ) = 1+ X + X 2 + X 4 and n ≥ 8. Prove that δ(C) = 2. 73. (CRC-8) The cyclic redundancy code C bearing the name CRC-8 has G(X ) = 1 + X + X 2 + X 8 . Prove that if 8 ≤ n ≤ 19, then δ(C) = 4. (Educational note: It will follow from the theory of ﬁnite ﬁelds in Chapter IX, together with the problems on coding theory at the end of that chapter, that n = 255 plays a special role for this code, and δ(C) = 4 in that case.)

208

IV. Groups and Group Actions

Problems 74–77 concern categories and functors. Problem 75 assumes knowledge of point-set topology. 74. Let C be the category of all sets, the morphisms being the functions between sets. Verify that the disjoint union of sets is a coproduct. 75. Let C be the category of all topological spaces, the morphisms being the continuous functions. Let S be a nonempty set, and let X s be a topological space for each s in S. (a) Show that the Cartesian product of the spaces X s , with the product topology, is a product of the X s ’s. (b) Show that the disjoint union of the spaces X s , topologized so that a set E is open if and only if its intersection with each X s is open, is a coproduct of the X s ’s. 76. Taking a cue from the example of a category in which products need not exist, exhibit a category in which coproducts need not exist. 77. Let C be a category having just one object, say X , and suppose that every member of Morph(X, X ) is an isomorphism. Prove that Morph(X, X ) is a group under the law of composition for the category. Can every group be realized in this way, up to isomorphism? Problems 78–80 introduce a notion of duality in category theory and use it to derive Proposition 4.64 from Proposition 4.63. If C is a category, then the opposite category C opp is deﬁned to have Obj(C opp ) = Obj(C) and MorphC opp (A, B) = MorphC (B, A). If ◦ denotes the law of composition in C, then the law of composition ◦opp in C opp is deﬁned by g ◦opp f = f ◦ g for f ∈ MorphC opp (A, B) and g ∈ MorphC opp (B, C). 78. Verify that C opp is indeed a category, that (C opp )opp = C, and that to pass from a diagram involving objects and morphisms in C to a corresponding diagram involving the same objects and morphisms considered as in C opp , one leaves all the vertices and labels alone and reverses the directions of all the arrows. Verify also that the diagram of C commutes if and only if the diagram in C opp commutes. 79. Let C be the category of all sets, the morphisms in MorphC (A, B) being all functions from A to B. Show that the morphisms in MorphC opp (A, B) cannot necessarily all be regarded as functions from A to B. 80. Suppose that S is a nonempty set and that {X s }s∈S is an object in C. (a) Prove that if (X, { ps }s∈S ) is a product of {X s }s∈S in C, then (X, { ps }s∈S ) is a coproduct of {X s }s∈S in C opp , and that if (X, { ps }s∈S ) is a coproduct of {X s }s∈S in C, then (X, { ps }s∈S ) is a product of {X s }s∈S in C opp . (b) Show that Proposition 4.64 for C follows from the validity of Proposition 4.63 for C opp .

CHAPTER V Theory of a Single Linear Transformation

Abstract. This goal of this chapter is to ﬁnd ﬁnitely many canonical representatives of each similarity class of square matrices with entries in a ﬁeld and correspondingly of each isomorphism class of linear maps from a ﬁnite-dimensional vector space to itself. Section 1 frames the problem in more detail. Section 2 develops the theory of determinants over a commutative ring with identity in order to be able to work easily with characteristic polynomials det(λI − A). The discussion is built around the principle of “permanence of identities,” which allows for passage from certain identities with integer coefﬁcients to identities with coefﬁcients in the ring in question. Section 3 introduces the minimal polynomial of a square matrix or linear map. The Cayley– Hamilton Theorem establishes that such a matrix satisﬁes its characteristic equation, and it follows that the minimal polynomial divides the characteristic polynomial. It is proved that a matrix is similar to a diagonal matrix if and only if its minimal polynomial is the product of distinct factors of degree 1. In combination with the fact that two diagonal matrices are similar if and only if their diagonal entries are permutations of one another, this result solves the canonical-form problem for matrices whose minimal polynomial is the product of distinct factors of degree 1. Section 4 introduces general projection operators from a vector space to itself and relates them to vector-space direct-sum decompositions with ﬁnitely many summands. The summands of a directsum decomposition are invariant under a linear map if and only if the linear map commutes with each of the projections associated to the direct-sum decomposition. Section 5 concerns the Primary Decomposition Theorem, whose subject is the operation of a linear map L : V → V with V ﬁnite-dimensional. The statement is that if L has minimal polynomial P1 (λ)l1 · · · Pk (λ)lk with the Pj (λ) distinct monic prime, then V has a unique direct-sum decomposition in which the respective summands are the kernels of the linear maps Pj (L)l j , and moreover the minimal polynomial of the restriction of L to the j th summand is Pj (λ)l j . Sections 6–7 concern Jordan canonical form. For the case that the prime factors of the minimal polynomial of a square matrix all have degree 1, the main theorem gives a canonical form under similarity, saying that a given matrix is similar to one in “Jordan form” and that the Jordan form is completely determined up to permutation of the constituent blocks. The theorem applies to all square matrices if the ﬁeld is algebraically closed, as is the case for C. The theorem is stated and proved in Section 6, and Section 7 shows how to make computations in two different ways.

1. Introduction This chapter will work with vector spaces over a common ﬁeld of “scalars,” which will be called K. As was observed near the end of Section IV.5, all the results 209

210

V. Theory of a Single Linear Transformation

concerning vector spaces in Chapter II remain valid when the scalars are taken from K rather than just Q or R or C. The ring of polynomials in one indeterminate X over K will be denoted by K[X ]. For the ﬁeld C of complex numbers, every nonconstant polynomial in C[X ] has a root, according to the Fundamental Theorem of Algebra (Theorem 1.18). Because of this fact some results in this chapter will take an especially simple form when K = C, and this simple form will persist for any ﬁeld with this same property. Accordingly, we make a deﬁnition. Let us say that a ﬁeld K is algebraically closed if every nonconstant polynomial in K[X ] has a root. We shall work hard in Chapter IX to obtain examples of algebraically closed ﬁelds beyond K = C, but let us mention now what a few of them are. EXAMPLES. (1) The subset of C of all roots of polynomials with rational coefﬁcients is an algebraically closed ﬁeld. (2) For each prime p, we have seen that any ﬁnite ﬁeld of characteristic p has p n elements for some n. It turns out that there is one and only one ﬁeld of p n elements, up to isomorphism, for each n. If we align them suitably for ﬁxed p and take their union on n, then the result is an algebraically closed ﬁeld. (3) If K is any ﬁeld, then there exists an algebraically closed ﬁeld having K as a subﬁeld. We shall prove this existence in Chapter IX by means of Zorn’s Lemma (which appears in Section A5 of the appendix). The general problem to be addressed in this chapter is to ﬁnd “canonical forms” for linear maps from ﬁnite-dimensional vector spaces to themselves, special ways of realizing the linear maps that bring out some of their properties. Let us phrase a speciﬁc problem of this kind completely in terms of linear algebra at ﬁrst. Then we can rephrase it in terms of a combination of linear algebra and group theory, and we shall see how it ﬁts into a more general context. In terms of matrices, the speciﬁc problem is to ﬁnd a way of deciding whether two square matrices represent the same linear map in different bases. We know from Proposition 2.17 that if L : V → V is linear on the ﬁnite-dimensional vector space V and if A is the matrix of L relative to a particular ordered basis in domain and range, then the matrix B of L in another ordered basis is of the form B = C −1 AC for some invertible matrix C, i.e., A and B are similar. Thus one kind of solution to the problem would be to specify one representative of each similarity class of square matrices. But this is not a convenient kind of answer 10 to look for; in fact, the matrices A = 0 2 and B = 20 01 are similar via C = 01 10 , but there is no particular reason to prefer one of A or B to the other. Thus a “canonical form” for detecting similarity will allow more than one repre-

1. Introduction

211

sentative of each similarity class (but typically only ﬁnitely many such representatives), and a supplementary statement will tell us when two such are similar. So far, the best information that we have about solving this problem concerning square matrices comes from Section II.8. In that section the discussion of eigenvalues gave us some necessary conditions for similarity, but we did not obtain a useful necessary and sufﬁcient condition. In terms of linear maps, what we seek for a linear L : V → V is to use the geometry of L to construct an ordered basis of V such that L acts in a particularly simple way on that ordered basis. Ideally the description of how L acts on the ordered basis is to be detailed enough so that the matrix of L in that ordered basis is completely determined by the description, even though the ordered basis may not be determined by it. For example, if L were to have a basis of eigenvectors, then the description could be that “L has an ordered basis of eigenvectors with eigenvalues x1 , . . . , xn .” In any ordered basis with this property, the matrix of L would then be diagonal with diagonal entries x1 , . . . , xn . Suppose then that we have this kind of detailed description of how a linear map L acts on some ordered basis. To what extent is L completely determined? The answer is that L is determined up to an isomorphism of the underlying vector space.In fact, suppose

that L and M are linear maps from V to itself such that L M =A= for some ordered bases and . Then

L M I M I =A= =

−1

−1

M S S MS S = , =

S I where S : V → V is the invertible linear map deﬁned by = . Hence L = S −1 M S and S L = M S. In other words, if we think of having two copies of V , one called V1 and the other called V2 , that are isomorphic via S : V1 → V2 , then the effect of M in V2 corresponds under S to the effect of L in V1 . In this sense, L is determined up to an isomorphism of V . Thus we are looking for a geometric description that determines linear maps up to isomorphism. Two linear maps L and M that are related in this way have L = S −1 M S for some invertible linear map S. Passing to matrices with respect to some basis, we see that the matrices of L and M are to be similar. Consequently our two problems, one to characterize similarity for matrices and the other to characterize isomorphism for linear maps, come to the same thing. These two problems have an interpretation in terms of group theory. In the case of n-by-n matrices, the group GL(n, K) of invertible matrices acts on the set of all square matrices of size n by conjugation via (g, x) → gxg −1 ; the similarity

212

V. Theory of a Single Linear Transformation

classes are exactly the orbits of this group action, and the canonical form is to single out ﬁnitely many representatives from each orbit. In the case of linear maps, the group GL(V ) of invertible linear maps on the ﬁnite-dimensional vector space V acts by conjugation on the set of all linear maps from V into itself; the isomorphism classes of linear maps on V are the orbits, and the canonical form is to single out ﬁnitely many representatives from each orbit. The above problem, whether for matrices or for linear maps, does not have a unique acceptable solution. Nevertheless, the text of this chapter will ultimately concentrate on one such solution, known as the “Jordan canonical form.” Now that we have brought group theory into the statement of the problem, we can put matters in a more general context: The situation is that some “important” group G acts in an important way on an “interesting” vector space of matrices. The canonical-form problem for this situation is to single out ﬁnitely many representatives of each orbit and give a way of deciding, in terms of these representatives, whether two of the given matrices lie in the same orbit. We shall not pursue the more general problem in the text at this time. However, Problem 1 at the end of the chapter addresses one version beyond the one concerning similarity: to ﬁnd a canonical form for the action of GL(m, K) × GL(n, K) on m-by-n matrices by ((g, h), x) = gxh −1 . Some other groups that are important in this sense, besides products of general linear groups, are introduced in Chapter VI, and a problem at the end of Chapter VI reinterprets two theorems of that chapter as further canonical-form theorems under the action of a general linear group. Let us return to the canonical-form problems for similarity of matrices and isomorphism of linear maps. The basic tool in studying these problems is the characteristic polynomial of a matrix or a linear map, as in Chapter II. However, we subtly used a special feature of Q and R and C in working with characteristic polynomials in Chapter II: we passed back and forth between the characteristic polynomial det(λI − A) as a polynomial in one indeterminate (deﬁned by its expression after expanding it out) and as a polynomial function of λ, deﬁned for each value of λ in Q or R or C, one value at a time. This passage was legitimate because the homomorphism of the ring of polynomials in one indeterminate over a ﬁeld to the ring of polynomial functions is one-one when the ﬁeld is inﬁnite, by Proposition 4.28c or Corollary 1.14. Some care is required, however, in working with general ﬁelds, and we begin by supplying the necessary details for justifying manipulations with determinants in a more general setting than earlier. 2. Determinants over Commutative Rings with Identity Throughout this section let R be a commutative ring with identity. The main case of interest for us at this time will be that R = K[λ] is the polynomial ring in one indeterminate λ over a ﬁeld K.

2. Determinants over Commutative Rings with Identity

213

The set of n-by-n matrices with entries in R is an abelian group under entryby-entry addition, and matrix multiplication makes it into a ring with identity. Following tradition, we shall usually write Mn (R) rather than Mnn (R) for this ring. In this section we shall deﬁne a determinant function det : Mn (R) → R and establish some of its properties. For the case that R is a ﬁeld, some of our earlier proofs concerning determinants used vector-space concepts—bases, dimensions, and so forth—and these are not available for general R. Yet most of the properties of determinants remain valid for general R because of a phenomenon known as permanence of identities. We shall not try to state a general theorem about this principle but instead will be content to observe a pattern in how the relevant identities are proved. If A is in Mn (R), we deﬁne its determinant to be det A = (sgn σ )A1σ (1) A2σ (2) · · · Anσ (n) , σ ∈Sn

in effect converting into a deﬁnition the formula obtained in Theorem 2.34d when R is a ﬁeld. A sample of the kind of identity we have in mind is the formula det(AB) = det A det B

for A and B in Mn (R).

The key is that this formula says that two polynomials in 2n 2 variables, with integer coefﬁcients, are equal whenever arbitrary members of R are substituted for the variables. Thus let us introduce 2n 2 indeterminates X 11 , X 12 , . . . , X nn and Y11 , Y12 , . . . , Ynn to correspond to these variables. Forming the commutative ring S = Z[X 11 , X 12 , . . . , X nn , Y11*, Y 12 , . . . , Ynn + ], we assemble the matrices X Y X = [X i j ], Y = [Yi j ], and X Y = k ik k j in Mn (S). Consider the two members of S given by det X det Y (sgn σ )X 1σ (1) X 2σ (2) · · · X nσ (n) (sgn σ )Y1σ (1) Y2σ (2) · · · Ynσ (n) = σ ∈Sn

and

det(X Y ) =

σ ∈Sn

σ ∈Sn

(sgn σ )(X Y )1σ (1) (X Y )2σ (2) · · · (X Y )nσ (n) ,

where (X Y )i j = k X ik Yk j . If we ﬁx arbitrary elements x11 , x12 , . . . , xnn and y11 , y12 , . . . , ynn of Z, then Proposition 4.30 gives us a unique substitution homomorphism : S → Z such that (1) = 1, (X i j ) = xi j , and (Yi j ) = yi j for all i and j. Writing x = [xi j ] and y = [yi j ] and using that matrices with integer entries have det(x y) = det x det y because Z is a subset of the ﬁeld Q, we

V. Theory of a Single Linear Transformation

214

see that (det(X Y )) = (det X det Y ) for each choice of x and y. Since Z is an inﬁnite integral domain and since x and y are arbitrary, Corollary 4.32 allows us to deduce that det(X Y ) = det X det Y as an equality in S. Now we pass from an identity in S to an identity in R. Let 1 R be the identity in R. Proposition 4.19 gives us a unique homomorphism of rings ϕ1 : Z → R such that ϕ1 (1) = 1 R . If we ﬁx arbitrary elements A11 , A12 , . . . , Ann and B11 , B12 , . . . , Bnn of R, then Proposition 4.30 gives us a unique substitution homomorphism : S → R such that (1) = ϕ1 (1) = 1 R , (X i j ) = Ai j for all i and j, and (Yi j ) = Bi j for all i and j. Applying to the equality det(X Y ) = det X det Y , we obtain the identity we sought, namely det(AB) = det A det B

for A and B in Mn (R).

Proposition 5.1. If R is a commutative ring with identity, then the determinant function det : Mn (R) → R has the following properties: det(AB) = det A det B, det I = 1, det At = det A, det C = det A + det B if A, B, and C match in all rows but the j th and if the j th row of C is the sum of the j th rows of A and B, (e) det B = r det A if A and B match in all rows but the j th and if the j th row of B is equal entry by entry to r times the j th row of A for some r in R, (f) det A = 0if A has two equal rows, (g) det A0 DB = det A det D if A is in Mk (R), D is in Ml (R), and k + l = n. (a) (b) (c) (d)

REMARK. Properties (d), (e), and (f) imply that usual steps in manipulating determinants by row reduction continue to be valid. PROOF. Part (a) was proved above, and parts (c) through (f) may be proved in the same way from the corresponding facts about integer matrices in Section II.7. Part (b) is immediate from the deﬁnition. For (g), we ﬁrst prove the result when the entries are in Q, and then we argue in the same way as with (a) above. When the entries are in Q, row reduction of D allows us to reduce to the case either that a row of 0’s or that D D has is the identity. If D has a row of 0’s, then det A0 DB and det A det D are both 0 and hence are D is the identity, then further row reduction shows equal. If A B A 0 that det 0 I = det 0 I , and the right side equals det A = det A det I , as required.

2. Determinants over Commutative Rings with Identity

215

Proposition 5.2 (expansion in cofactors). Let R be a commutative ring with !i j be the member of Mn−1 (R) obtained by identity, let A be in Mn (R), and let A th th deleting the i row and the j column from A. Then n !i j , i.e., det A may be calculated (−1)i+ j Ai j det A (a) for any j, det A = i=1 th by “expansion in cofactors” about the j column, !i j , i.e., det A may be calculated (b) for any i, det A = nj=1 (−1)i+ j Ai j det A by “expansion in cofactors” about the i th row. PROOF. This may be derived in the same way from Proposition 2.36 by using the principle of permanence of identities. Corollary 5.3 (Vandermonde matrix and determinant). If r1 , . . . , rn lie in a commutative ring R with identity, then ⎞ ⎛ 1 1 ··· 1 r2 ··· rn ⎟ ⎜ r1 ⎜ 2 2 2 ⎟ r r · · · r ⎜ n ⎟= 2 (r j − ri ). det ⎜ 1 ⎟ .. .. ⎠ j>i .. ⎝ .. . . . . r1n−1 r2n−1 · · · rnn−1 PROOF. The derivation of this from Proposition 5.2 is the same as the derivation of Corollary 2.37 from Proposition 2.35. Proposition 5.4 (Cramer’s rule). Let R be a commutative ring with identity, let A be in Mn (R), and deﬁne Aadj in Mn (R) to be the classical adjoint of A, adj !ji , where A !kl deﬁned as in namely the matrix with entries Ai j = (−1)i+ j det A the statement of Proposition 5.2. Then A Aadj = Aadj A = (det A)I . PROOF. This may be derived from Proposition 2.38 in the same way as for Propositions 5.1 and 5.2 using the principle of permanence of identities. Corollary 5.5. Let R be a commutative ring with identity, and let A be in Mn (R). If det A is a unit in R, then A has a two-sided inverse in Mn (R). Conversely if A has a one-sided inverse in Mn (R), then det A is a unit in R. REMARK. If R is a ﬁeld, then A and any associated linear map are often called nonsingular if invertible, singular otherwise. When R is not a ﬁeld, terminology varies for what to call a noninvertible matrix whose determinant is not 0. PROOF. If det A is a unit in R, let r be its multiplicative inverse. Then Proposition 5.4 shows that r −1 Aadj is a two-sided inverse of A. Conversely if A has, say, a left inverse B, then B A = I implies (det B)(det A) = det I = 1, and det B is an inverse for det A. A similar argument applies if A has a right inverse.

216

V. Theory of a Single Linear Transformation

3. Characteristic and Minimal Polynomials Again let K be a ﬁeld. If A is in Mn (K), the characteristic polynomial of A is deﬁned to be the member of the ring K[λ] of polynomials in one indeterminate λ given by F(λ) = det(λI − A). The material of Section 2 shows that F(λ) is well deﬁned, being the determinant of a member of Mn (K[λ]). It is apparent from the deﬁnition of determinant in Section 2 that F(λ) is a monic polynomial of degree n with coefﬁcient − Tr A = − nj=1 A j j for λn−1 . Evaluating F(λ) at 0, we see that the constant term is (−1)n det A. Since the determinant of a product in Mn (K[λ]) is the product of the determinants (Proposition 5.1a) and since C −1 (λI − A)C = λI − C −1 AC, we have det(λI − C −1 AC) = (det C)−1 det(λI − A)(det C) = det(λI − A). Thus similar matrices have equal characteristic polynomials. If V is an ndimensional vector space over K and L : V → V is linear, then the matrices of L in any two ordered bases of V (the domain basis being assumed equal to the range basis) are similar, and their characteristic polynomials are the same. Consequently we can deﬁne the characteristic polynomial of L to be the characteristic polynomial of any matrix of L. The development of characteristic polynomials has thus be redone in a way that is valid over any ﬁeld K without making use of the ring homomorphism from polynomials in one indeterminate over K to polynomial functions from K into itself. The discussion in Section II.8 of eigenvectors and eigenvalues for members A of Mn (K) and for linear maps L : V → V with V ﬁnite-dimensional over K is now meaningful, and there is no need to repeat it. In particular, the eigenvalues of A and L are exactly the roots of their characteristic polynomial, no matter what K is. If K is algebraically closed, then the characteristic polynomial has a root, and consequently A and L each have at least one eigenvalue. If L : V → V is linear and V is ﬁnite-dimensional, then a vector subspace U of V is said to be invariant under L if L(U ) ⊆ U . In this case L U is a well-deﬁned linear map from U to itself. Since L(U ) ⊆ U , Proposition 2.25 shows that L : V → V factors through V /U as a linear map L : V /U → V /U . We shall use this construction, the existence of eigenvalues in the algebraically closed case, and an induction to prove the following. Proposition 5.6. If K is an algebraically closed ﬁeld, if V is a ﬁnitedimensional vector space over K, and if L : V → V is linear, then V has an ordered basis in which the matrix of L is upper triangular. Consequently any member of Mn (K) is similar to an upper triangular matrix.

3. Characteristic and Minimal Polynomials

⎛ REMARKS. For an upper triangular matrix A = ⎝

c1

..

217

∗ .

⎞ ⎠ in Mn (K), the

cn 0 characteristic polynomial is nj=1 (λ − c j ) because the only nonzero term in the deﬁnition of det(λI − A) is the one corresponding to the identity permutation. Triangular form is not yet the canonical form we seek for a square matrix because a particular square matrix may be similar to inﬁnitely many matrices in triangular form.

PROOF. We proceed by induction on n = dim V , with the base case n = 1 being clear. Suppose that the result holds for all linear maps from spaces of dimension < n to themselves. Given L : V → V with dim V = n, let v1 be an eigenvector of L. This exists by the remarks before the proposition since K is algebraically closed. Let U be the vector subspace Kv1 . Then L(U ) ⊆ U , and Proposition 2.25 shows that L : V → V factors through V /U as a linear map L : V /U → V /U . Since dim V /U = n − 1, the inductive hypothesis produces an ordered basis (v¯2 , . . . , v¯n ) of V /U such that the matrix of L is upper j triangular in this basis. This condition means that L(v¯ j ) = i=2 ci j v¯i for j ≥ 2. Select coset representatives v2 , . . . , vn of v¯2 , . . . , v¯n so that v¯ j = v j + U for j j ≥ 2. Then L(v j + U ) = i=2 ci j (vi + U ) for j ≥ 2, and hence L(v j ) j lies in the coset i=2 ci j vi + U for j ≥ 2. For each j ≥ 1, we then have j L(v j ) = i=2 ci j vi + c1 j v1 for some scalar c1 j , and we see that (v1 , . . . , vn ) is the required ordered basis. Let us return to the situation in which K is any ﬁeld. For a matrix A in Mn (K) and a polynomial P in K[λ], it is meaningful to form P(A). We can do so by two equivalent methods, both useful. The concrete way of forming P(A) is as P(A) = cn An + · · · + c1 A + c0 I if P(λ) = cn λn + · · · + c1 λ + c0 . The abstract way is to form the subring T of Mn (A) generated by KI and A. This subring is commutative. We let ϕ : K → T be given by ϕ(c) = cI . Then the universal mapping property of K[λ] given in Proposition 4.24 produces a unique ring homomorphism : K[λ] → T such that (c) = cI for all c ∈ K and (λ) = A. The value of P(A) is the element (P) of T . For A in Mn (K), let us study all polynomials P such that P(A) = 0. For any polynomial P and any invertible matrix C, we have P(C −1 AC) = C −1 P(A)C because if P(λ) = cn λn + · · · + c1 λ + c0 , then P(C −1 AC) = cn (C −1 AC)n + · · · + c1 C −1 AC + c0 I = C −1 (cn An + · · · + c1 A + c0 I )C.

218

V. Theory of a Single Linear Transformation

Consequently if P(A) = 0, then P(C −1 AC) = 0, and the set of matrices with P(A) = 0 is closed under similarity. We shall make use of this observation a little later in this section. Proposition 5.7. If A is in Mn (K), then there exists a nonzero polynomial P in K[λ] such that P(A) = 0. PROOF. The K vector space Mn (K) has dimension n 2 . Therefore the n 2 + 1 2 matrices I, A, A2 , . . . , An are linearly dependent, and we have 2

c0 + c1 A + c2 A2 + · · · + cn 2 An = 0 for some set of scalars not all 0. Then P(A) = 0 for the polynomial P(λ) = 2 c0 + c1 λ + c2 λ2 + · · · + cn 2 λn ; this P is not the 0 polynomial since at least one of the coefﬁcients is not 0. ALTERNATIVE PROOF IF K IS ALGEBRAICALLY CLOSED. Since the set of polynomials P with P(A) = 0 depends only on the similarity class of A, Proposition 5.6 shows that there ⎞ of generality in assuming that A is upper triangular, ⎛ is no loss λ1 ∗ say of the form ⎝ . . . ⎠. Then A − λ j I is upper triangular with 0 in the j th 0 λn diagonal entry, and nj=1 (A − λ j I ) is upper triangular with 0 in all diagonal n n = 0. entries. Therefore j=1 (A − λ j I ) With A ﬁxed, we continue to consider the set of all polynomials P(λ) such that P(A) = 0. Let us think of P(A) as being computed by the abstract procedure described above, namely as the image of A under the ring homomorphism : K[λ] → T such that (c) = cI for all c ∈ K and (λ) = A, where T is the commutative subring of Mn (K) generated by KI and A. Then the set of all polynomials P(λ) with P(A) = 0 is the kernel of the ring homomorphism . This set is therefore an ideal, and Proposition 5.7 shows that the ideal is nonzero. We shall apply the following proposition to this ideal. Proposition 5.8. If I is a nonzero ideal in K[λ], then there exists a unique monic polynomial of lowest degree in I , and every member of I is the product of this particular polynomial by some other polynomial. PROOF. Let B(λ) be a nonzero member of I of lowest possible degree; adjusting B by a scalar factor, we may assume that B is monic. If A is in I , then Proposition 1.12 produces polynomials Q and R such that A = B Q + R and either R = 0 or deg R < deg B. Since I is an ideal, B Q is in I and hence R = A − B Q is in I . From minimality of the degree of B, we conclude that R = 0. Hence A = B Q,

3. Characteristic and Minimal Polynomials

219

and A is exhibited as the product of B and some other polynomial Q. If B1 is a second monic polynomial of lowest degree in I , then we can take A = B1 to see that B1 = Q B. Since deg B1 = deg B, we conclude that deg Q = 0. Thus Q is a constant polynomial. Comparing the leading coefﬁcients of B and B1 , we see that Q(λ) = 1. With A ﬁxed in Mn (K), let us apply Proposition 5.8 to the ideal of all polynomials P in K[λ] with P(A) = 0. The unique monic polynomial of lowest degree in this ideal is called the minimal polynomial of A. Let us try to identify this minimal polynomial. Theorem 5.9 (Cayley–Hamilton Theorem). If A is in Mn (K) and if F(λ) = det(λI − A) is its characteristic polynomial, then F(A) = 0. PROOF. Let T be the commutative subring of Mn (K) generated by KI and A, and deﬁne a member B(λ) of the ring T [λ] by B(λ) = λI − A. The (i, j)th entry of B(λ) is Bi j (λ) = δi j λ − Ai j , and F(λ) = det B(λ). Let C(λ) = B(λ)adj denote the classical adjoint of B(λ) as a member of T [λ]; the form of C(λ) is given in the statement of Cramer’s rule (Proposition 5.4), and that proposition says that B(λ)C(λ) = (det B(λ))I = F(λ)I. The equality in the (i, j)th entry is the equality δi j F(λ) = j Bik (λ)Ck j (λ) of members of K[λ]. Application of the substitution homomorphism λ → A gives δi j F(A) =

Bik (A)Ck j (A) =

k

(δik A − Aik I )Ck j (A).

k

Multiplying on the right by the i th standard basis vector ei and summing on i, we obtain the equality of vectors (δik Aei − Aik ei )Ck j (A) = Ck j (A) (δik Aei − Aik ei ) F(A)e j = i

k

k

i

since Ck j (A) is a scalar. But i (δik Aei − Aik ei ) = Aek − i Aik ei = 0 for all k, and therefore F(A)e j = 0. Since j is arbitrary, F(A) = 0. Corollary 5.10. If A is in Mn (K), then the minimal polynomial of A divides the characteristic polynomial of A. PROOF. Theorem 5.9 shows that the characteristic polynomial of A lies in the ideal of all polynomials vanishing on A. Then the corollary follows from Proposition 5.8.

220

V. Theory of a Single Linear Transformation

For our matrix A in Mn (K), let F(λ) be the characteristic polynomial, and let M(λ) be the minimal polynomial. By unique factorization (Theorem 1.17), the monic polynomial F(λ) has a factorization into powers of distinct prime monic polynomials of the form F(λ) = P1 (λ)k1 · · · Pr (λ)kr , and this factorization is unique up to the order of the factors. Since M(λ) is a monic polynomial dividing F(λ), we must have M(λ) = P1 (λ)l1 · · · Pr (λ)lr with l1 ≤ k1 , . . . , lr ≤ kr , by the same argument that deduced Corollary 1.7 from unique factorization in the ring of integers. We shall see shortly that k j > 0 implies l j > 0 if Pj (λ) is of degree 1, i.e., if Pj (λ) is of the form λ − λ0 ; in other words, if λ0 is an eigenvalue of A, then λ − λ0 divides its minimal polynomial. We return to this point in a moment. Problem 31 at the end of the chapter will address the same question when Pj (λ) has degree > 1. EXAMPLES.

(1) In the 2-by-2 case, 0c 0c has minimal polynomial M(λ) = λ − c, and c 1 has M(λ) = (λ − c)2 . Both matrices have characteristic polynomial 0 c F(λ) = (λ − c)2 . (2) The k-by-k matrix

⎛c ⎜ ⎜ ⎝

1 0 ··· 0 0 ⎞ 0 c 1 ··· 0 0

..

.

0 0 0 ··· c 1 0 0 0 ··· 0 c

⎟ ⎟ ⎠

with c in every diagonal entry, with 1 in every entry just above the diagonal, and with 0 elsewhere has minimal polynomial M(λ) = (λ − c)k and characteristic polynomial F(λ) = (λ − c)k . (3) If a matrix A is made up exclusively of several blocks of the type in Example 2 with the same c in each case, the i th block being of size ki , then the maxi ki , and the characteristic polynomial minimal polynomial is M(λ) = (λ − c) k is F(λ) = (λ − c) i i . (4) If A is made up exclusively of several blocks as in Example 3 but with c different for each block, then the minimal and characteristic polynomials for A are obtained by multiplying the minimal and characteristic polynomials obtained from Example 3 for the various c’s.

3. Characteristic and Minimal Polynomials

221

To proceed further, let us change our point of view, working with linear maps L : V → V , where V is a ﬁnite-dimensional vector space over K. We have already deﬁned the characteristic polynomial of L to be the characteristic polynomial of the matrix of L in any ordered basis; this is well deﬁned because similar matrices have the same characteristic polynomial. In analogous fashion we can deﬁne the minimal polynomial of L to be the minimal polynomial of the matrix of L in any ordered basis; this is well deﬁned since, as we have seen, the set of polynomials P in one indeterminate with P(A) = 0 is the same as the set with P(C −1 AC) = 0 if C is invertible. Another way of approaching the matter of the minimal polynomial of L is to deﬁne P(L) for any polynomial P in one indeterminate. As with matrices, we can deﬁne P(L) either concretely by substituting L for λ in the expression for P(λ), or we can deﬁne P(L) abstractly by appealing to the universal mapping property in Proposition 4.24. For the latter we work with the subring T of linear maps from V to itself generated by KI and L. This subring is commutative. We let ϕ : K → T be given by ϕ(c) = cI , and we use Proposition 4.24 to obtain the unique ring homomorphism : K[λ] → T such that (c) = cI for all c ∈ K and (λ) = L. Then P(L) is the element (P) of T . Once P(L) is deﬁned, we observe that the set of polynomials P(λ) such that P(L) = 0 is a nonzero ideal in K[λ]; Proposition 5.8 yields a unique monic polynomial of lowest degree in this ideal, and that is the minimal polynomial of L. Linear maps enable us to make convenient use of invariant subspaces. Recall from earlier in the section that a vector subspace U of V is said to be invariant under the linear map L : V → V if L(U ) ⊆ U ; in this case we obtain associated linear maps L U : U → U and L : V /U → V /U . Relationships among the characteristic polynomials and minimal polynomials of these linear maps are given in the next two propositions. Proposition 5.11. Let V be a ﬁnite-dimensional vector space over K, let L : V → V be linear, let U be a proper nonzero invariant subspace under L, and let L : V /U → V /U be the induced linear map on V /U . Then the characteristic polynomials of L, L U , and L are related by det(λI − L) = det λI − L U det(λI − L). PROOF. Let U = (v1 , . . . , vk ) be an ordered basis of U , and extend U to an ordered basis = (v1 , . . . , vn ) of V . Then = (vk+1 + U, . . . , vn + U ) is an ordered basis of V /U . Since U isinvariant under L, the matrix of L in the ordered basis is of the form A0 DB , where A is the matrix of L U in the ordered basis U and D is the matrix of L in the ordered basis . Passing to the characteristic polynomials and applying Proposition 5.1g, we obtain the desired conclusion.

222

V. Theory of a Single Linear Transformation

Proposition 5.12. Let V be a ﬁnite-dimensional vector space over K, let L : V → V be linear, let U be a proper nonzero invariant subspace under L, and let L : V /U → V /U be the induced linear map on V /U . Then the minimal polynomials of L U and L divide the minimal polynomial of L. PROOF. Let N (λ) be the minimal polynomial of L U . Then N (λ) is the unique monic polynomial of lowest degree in the ideal of all polynomials P(λ) such that P(L)u = 0 for all u in U . The minimal polynomial M(λ) of L has this property because M(λ)v = 0 for all v in V . Therefore M(λ) is in the ideal and is the product of N (λ) and some other polynomial. Among linear maps S from V into V carrying U into itself, the function S → S sending S to the linear map S induced on V /U is a homomorphism of rings. It follows that if P(λ) is a polynomial with P(L) = 0, then P(L) = 0. Taking P(λ) to be the minimal polynomial of L, we see that the minimal polynomial of L is in the ideal of polynomials vanishing on L. Therefore it is the product of the minimal polynomial of L and some other polynomial. Let us come back to the unproved assertion before the examples—that k j > 0 implies l j > 0 if Pr (λ) has degree 1. We prove the linear-function version of this statement as a corollary of Proposition 5.12. Corollary 5.13. If L : V → V is linear on a ﬁnite-dimensional vector space over K and if a ﬁrst-degree polynomial λ−λ0 divides the characteristic polynomial of L, then λ − λ0 divides the minimal polynomial of L. PROOF. If λ−λ0 divides the characteristic polynomial, then λ0 is an eigenvalue of L, say with v as an eigenvector. Then U = Kv is an invariant subspace under L, and the characteristic and minimal polynomials of L U are both λ − λ0 . By Proposition 5.12, λ − λ0 divides the minimal polynomial of L. Theorem 5.14. If L : V → V is linear on a ﬁnite-dimensional vector space over K, then L has a basis of eigenvectors if and only if the minimal polynomial M(λ) of L is the product of distinct factors of degree 1; in this case, M(λ) equals (λ − λ1 ) · · · (λ − λk ), where λ1 , . . . , λk are the distinct eigenvalues of L. Consequently a matrix A in Mn (K) is similar to a diagonal matrix if and only if its minimal polynomial is the product of distinct factors of degree 1. PROOF. The easy direction is that v1 , . . . , vn are the members of a basis of eigenvectors for L with respective eigenvalues µ1 , . . . , µn . In this case, let λ1 , . . . , λk be the distinct members of the set of eigenvalues, with µi = λ j (i) for some function j : {1, . . . , n} → {1, . . . , k}. Then (L − λ j I )(v) = 0 for v equal to any vi with j (i) = j. Since the linear maps L − λ j I commute as j varies, k j=1 (L−λ j I )(v) = 0 for v equal to each of v1 , . . . , vn , hence for all v. Therefore

3. Characteristic and Minimal Polynomials

223

the minimal polynomial M(λ) of L divides kj=1 (λ − λ j ). On the other hand, Corollary 5.13 shows that the deg M(λ) ≥ k. Hence M(λ) = kj=1 (λ − λ j ). Conversely suppose that M(λ) = kj=1 (λ − λ j ) with the λ j distinct. If S1 is the linear map S1 = kj=2 (L − λ j I ), then the formula for M(λ) shows that (L − λ1 I )S1 (v) = 0 for all v in V , and hence image S1 is a vector subspace of the eigenspace of L for the eigenvalue λ1 . If v is in ker S1 ∩ image S1 , we then have 0 = S1 (v) = kj=2 (L − λ j I )(v) = kj=2 (λ1 − λ j )v. Since λ1 is distinct from λ2 , . . . , λk , we conclude that v = 0, hence that ker S1 ∩ image S1 = 0. Since dim ker S1 + dim image S1 = dim V , Corollary 2.29 therefore gives dim V = dim ker S1 + dim image S1 = dim(ker S1 + image S1 ) + dim(ker S1 ∩ image S1 ) = dim(ker S1 + image S1 ). Hence V = ker S1 + image S1 . Since ker S1 ∩ image S1 = 0, we conclude that V = ker S1 ⊕ image S1 . Actually, the same calculation of S1 (v) as above shows that image S1 is the full eigenspace of L for the eigenvalue λ1 . In fact, if L(v) = λ1 v, then S1 (v) = k −1 k v. j=2 (λ1 − λ j )v, and hence v equals the image under S1 of j=2 (λ1 − λ j ) Next, since L commutes with S , ker S is an invariant subspace under L, 1 1 and λ1 is not an eigenvalue of L ker S1 . Thus λ − λ1 does not divide the minimal polynomial of L ker S1 . On the other hand, S1 vanishes on the eigenspaces of L for 5.13 shows for j ≥ 2 that λ − λ j divides eigenvalues λ2 , . . . , λk , and Corollary the minimal polynomial of L ker S1 . Taking Proposition 5.12 into account, we has minimal polynomial k (λ−λ j ). We have succeeded conclude that L ker S1

j=2

in splitting off the eigenspace of L under λ1 as a direct summand and reducing the proposition to the case of k − 1 eigenvalues. Thus induction shows that V is the direct sum of its eigenspaces for the eigenvalues λ2 , . . . , λk , and L thus has a basis of eigenvectors.

Theorem 5.14 comes close to solving the canonical-form problem for similarity in the case of one kind of square matrices: if the minimal polynomial of A is the product of distinct factors of degree 1, then A is similar to a diagonal matrix. To complete the solution for this case, all we have to do is to say when two diagonal matrices are similar to each other; this step is handled by the following easy proposition. Proposition 5.15. Two diagonal matrices A and A in Mn (K) with respective diagonal entries d1 , . . . , dn and d1 , . . . , dn are similar if and only if there is a permutation σ in Sn such that d j = dσ ( j) for all j.

224

V. Theory of a Single Linear Transformation

PROOF. The respective characteristic polynomials are nj=1 (λ − d j ) and n j=1 (λ − d j ). If A and A are similar, then the characteristic polynomials are equal, and unique factorization (Theorem 1.17) shows that the factors λ − d j match the factors λ − d j up to order. Conversely if there is a permutation σ in Sn such that d j = dσ ( j) for all j, then the matrix C whose j th column is eσ ( j) has the property that A = C −1 AC. To proceed further with obtaining canonical forms for matrices under similarity and for linear maps under isomorphism, we shall use linear maps in ways that we have not used them before. In particular, it will be convenient to be able to recognize direct-sum decompositions from properties of linear maps. We take up this matter in the next section.

4. Projection Operators In this section we shall see how to recognize direct-sum decompositions of a vector space V from the associated projection operators, and we shall relate these operators to invariant subspaces under a linear map L : V → V . If V = U1 ⊕ U2 , then the function E 1 deﬁned by E 1 (u 1 + u 2 ) = u 1 when u 1 is in U1 and u 2 is in U2 is linear, satisﬁes E 12 = E 1 , and has image E 1 = U1 and ker E 1 = U2 . We call E 1 the projection of V on U1 along U2 . A decomposition of V as the direct sum of two vector spaces, when the ﬁrst of the two spaces is singled out, therefore determines a projection operator uniquely. A converse is as follows. Proposition 5.16. If V is a vector space and E 1 : V → V is a linear map such that E 12 = E 1 , then there exists a direct-sum decomposition V = U1 ⊕ U2 such that E 1 is the projection of V on U1 along U2 . In this case, (I − E 1 )2 = I − E 1 , and I − E 1 is the projection of V on U2 along U1 . PROOF. Deﬁne U1 = image E 1 and U2 = ker E 1 . If v is in image E 1 ∩ ker E 1 , then E 1 (v) = 0 since v is in ker E 1 and v = E 1 (w) for some w in V since v is in image E 1 . Then 0 = E 1 (v) = E 12 (w) = E 1 (w) = v, and therefore image E 1 ∩ ker E 1 = 0. If v ∈ V is given , write v = E 1 (v) + (I − E 1 )(v). Then E 1 (v) is in image E 1 , and the computation E 1 (I − E 1 )(v) = (E 1 − E 12 )(v) = (E 1 − E 1 )(v) = 0 shows that (I − E 1 )(v) = 0. Consequently V = image E 1 + ker E 1 , and we conclude that V = image E 1 ⊕ ker E 1 . Hence V = U1 ⊕ U2 , where U1 = image E 1 and U2 = ker E 1 . In this notation, E 1 is 0 on U2 . If v is in U1 , then v = E 1 (w) for some w, and we have

4. Projection Operators

225

v = E 1 (w) = E 12 (w) = E 1 (E 1 (w)) = E 1 (v). Thus E 1 is the identity on U1 and is the projection as asserted. For (I − E 1 )2 , we have (I − E 1 )2 = I − 2E 1 + E 12 = I − 2E 1 + E 1 = I − E 1 , and I − E 1 is a projection. It is 1 on U2 and is 0 on U1 , hence is the projection of V on U2 along U1 . Let us generalize these considerations to the situation that V is the direct sum of r vector subspaces. The following facts about the situation in Proposition 5.16, with the deﬁnition E 2 = I − E 1 , are relevant to formulating the generalization: (i) E 1 and E 2 have E 12 = E 1 and E 22 = E 2 , (ii) E 1 E 2 = E 2 E 1 = 0, (iii) E 1 + E 2 = I . Suppose that V = U1 ⊕ · · · ⊕ Ur . Deﬁne E j (u 1 + · · · + u r ) = u j . Then E j is linear from V to itself with E j2 = E j , and Proposition 5.16 shows that E j is the projection of V on U j along the direct sum of the remaining Ui ’s. The linear maps E 1 , . . . , Er then satisfy (i ) E j2 = E j for 1 ≤ j ≤ r , (ii ) E j E i = 0 if i = j, (iii ) E 1 + · · · + Er = I . A converse is as follows. Proposition 5.17. If V is a vector space and E j : V → V for 1 ≤ j ≤ r are linear maps such that (a) E j E i = 0 if i = j, and (b) E 1 + · · · + Er = I , then E j2 = E j for 1 ≤ j ≤ r and the vector subspaces U j = image E j have the properties that V = U1 ⊕ · · · ⊕ Ur and that E j is the projection of V on U j along the direct sum of all Ui but U j . PROOF. Multiplying (b) through by E j on the left and applying (a) to each term on the left side except the j th , we obtain E j2 = E j . Therefore, for each j, E j is a projection on U j along some vector subspace depending on j. If v is in V , then (b) gives v = E 1 (v) + · · · + Er (v) and shows that V = U1 + · · · + Ur . Suppose that v is in the intersection of U j with the sum of the other Ui ’s. Write v = i = j u i with u i = E i (wi ) in Ui . Applying E j and using the fact that v is in U j , we obtain v = E j (v) = i = j E j E i (wi ). Every term of the right side is 0 by (a), and hence v = 0. Thus V = U1 ⊕ · · · ⊕ Ur . Since E j E i = 0 for i = j, E j is 0 on each Ui for i = j. Therefore the sum of all Ui except U j is contained in the kernel of E j . Since the image and kernel of E j intersect in 0, the sum of all Ui except U j is exactly equal to the kernel of E j . This completes the proof.

226

V. Theory of a Single Linear Transformation

Proposition 5.18. Suppose that a vector space V is a direct sum V = U1 ⊕ · · · ⊕ Ur of vector subspaces, that E 1 , . . . , Er are the corresponding projections, and that L : V → V is linear. Then all the subspaces U j are invariant under L if and only if L E j = E j L for all j. PROOF. If L(U j ) ⊆ U j for all j, then i = j implies E i L(U j ) ⊆ E i (U j ) = 0 and L E i (U j ) = L(0) = 0. Also, v ∈ U j implies E j L(v) = L(v) = L E j (v). Hence E i L = E i L for all i. Conversely if E j L = L E j and if v is in U j , then E j L(v) = L E j (v) = L(v) shows that L(v) is in U j . Therefore L(U j ) ⊆ U j for all j.

5. Primary Decomposition For the case that the minimal polynomial of a linear map L : V → V is the product of distinct factors of degree 1, Theorem 5.14 showed that V is a direct sum of its eigenspaces. The proof used elementary vector-space techniques from Chapter II but did not take full advantage of the machinery developed in the present chapter for passing back and forth between polynomials in one indeterminate and the values of polynomials on L. Let us therefore rework the proof of that proposition, taking into account the discussion of projections in Section 4. We seek an eigenspace decomposition V = Vλ1 ⊕ · · · ⊕ Vλk relative to L. Proposition 5.17 suggests looking for the corresponding decomposition of the identity operator as a sum of projections: I = E 1 + · · · + E k . According to that proposition, we obtain a direct-sum decomposition as soon as we obtain this kind of sum of linear maps such that E i E j = 0 for i = j. The E j ’s will automatically be projections. The proof of Theorem 5.14 showed that S1 = kj=2 (L − λ j I ) has image equal to the kernel of L − λ1 I , i.e., equal to the eigenspace for eigenvalue λ1 . If v is in this eigenspace, then S1 (v) = kj=2 (λ1 − λ j )v. Hence E 1 = c1 S1 , where c−1 = kj=2 (λ1 − λ j ). The linear map S1 equals Q 1 (L), where Q 1 (λ) = 1 k j=2 (λ − λ j ). Thus E 1 = c1 Q 1 (L). Similar remarks apply to the other eigenspaces, and therefore the required decomposition of the identity operator has to be of the form I = c1 Q 1 (L) + · · · + ck Q k (L) with c1 , . . . , ck equal to certain scalars. The polynomials Q 1 (λ), . . . , Q l (λ) are at hand from the start, each containing all but one factor of the minimal polynomial. Moreover, i = j implies that Q i (L)Q j (L) =

k l=1

(L − λl I ) (L − λl I ) . l =i, j

5. Primary Decomposition

227

The ﬁrst factor on the right side is the value of the minimal polynomial of L with L substituted for λ. Hence the right side is 0, and we see that our linear maps E 1 , . . . , E k have E i E j = 0 for i = j. As soon as we allow nonconstant coefﬁcients in place of the c j ’s in the above argument, we obtain a generalization of Theorem 5.14 to the situation that the minimal polynomial of L is arbitrary. The prime factors of the minimal polynomial need not even be of degree 1. Hence the theorem applies to all L’s even if K is not algebraically closed. Theorem 5.19 (Primary Decomposition Theorem). Let L : V → V be linear on a ﬁnite-dimensional vector space over K, and let M(λ) = P1 (λ)l1 · · · Pk (λ)lk be the unique factorization of the minimal polynomial M(λ) of L into the product of powers of distinct monic prime polynomials Pj (λ). Deﬁne U j = ker(Pj (L)l j ) for 1 ≤ j ≤ k. Then (a) V = U1 ⊕ · · · ⊕ Uk , (b) the projection E j of V on U j along the sum of the other Ui ’s is of the form Tj (L) for some polynomial Tj , (c) each vector subspace U j is invariant under L, (d) any linear map from V to itself that commutes with L carries each U j into itself, (e) any vector subspace W invariant under L has the property that W = (W ∩ U1 ) ⊕ · · · ⊕ (W ∩ Uk ), (f) the minimal polynomial of L j = L U j is Pj (λ)l j . REMARKS. The decomposition in (a) is called the primary decomposition of V under L, and the vector subspaces U j are called the primary subspaces of V under L. PROOF. For 1 ≤ j ≤ k, deﬁne Q j (λ) = M(λ)/Pj (λ)l j . The ideal in K[λ] generated by Q 1 (λ), . . . , Q k (λ) consists of all products of a single monic polynomial D(λ) by arbitrary polynomials, according to Proposition 5.8, and D(λ) has to divide each Q j (λ). Since Q j (λ) = i = j Pi (λ)li , D(λ) cannot be divisible by any Pj (λ), and consequently D(λ) = 1. Thus there exist polynomials R1 (λ), . . . , Rk (λ) such that 1 = Q 1 (λ)R1 (λ) + · · · + Q k (λ)Rk (λ). Deﬁne E j = Q j (L)R j (L), so that E 1 + · · · + E k = I . If i = j, then Q i (λ)Q j (λ) = M(λ) r =i, j Pr (λ)lr . Since M(L) = 0, we see that E i E j = 0. Proposition 5.17 says that each E j is a projection. Also, it says that if U j denotes image E j , then V = U1 ⊕ · · · ⊕ Uk , and E j is the projection on U j along

228

V. Theory of a Single Linear Transformation

the sum of the other Ui ’s. With this deﬁnition of the U j ’s (rather than the one in the statement of the theorem), we have therefore shown that (a) and (b) hold. Let us see that conclusions (c), (d), and (e) follow from (b). Conclusion (c) holds by Proposition 5.18 since L commutes with Tj (L) whenever Tj is a polynomial. For (d), if J : V → V is a linear map commuting with L, then J commutes with each E j since (b) shows that each E j is of the form Tj (L). From Proposition 5.18 we conclude that each U j is invariant under J . For (e), the subspace W certainly contains (W ∩ U1 ) ⊕ · · · ⊕ (W ∩ Uk ). For the reverse containment suppose w is in W . Since E j is of the form Tj (L) and since W is invariant under L, E j (w) is in W . But also E j (w) is in U j . Therefore the expansion w = j E j (w) exhibits w as the sum of members of the spaces W ∩ Uj . Next let us prove that U j , as we have deﬁned it, is given also by the deﬁnition in the statement of the theorem. In other words, let us prove that image E j = ker(Pj (L)l j ).

(∗)

We need a preliminary fact. The polynomial Pj (λ)l j has the property that M(λ) = Pj (λ)l j Q j (λ). Hence Pj (L)l j Q j (L) = M(L) = 0. Multiplying by R j (L), we obtain (∗∗) Pj (L)l j E j = 0. Now suppose that v is in image E j . Then Pj (L)l j (v) = Pj (L)l j E j (v) = 0 by (∗∗), and hence image E j ⊆ ker(Pj (L)lj ).For the reverse inclusion,l let v be in lr ker(Pj (L)l j ). For i = j, Q i (λ)Ri (λ) = P (λ) Ri (λ)Pj (λ) j and hence r =i, j r lr lj E i (v) = r =i, j Pr (L) Ri (L)Pj (L) (v) = 0. Writing v = E 1 (v) + · · · + E k (v), we see that v = E j (v). Thus ker(Pj (L)l j ) ⊆ image E j . Therefore (∗) holds, and U j is as in the statement of the theorem. Finally let us prove (f). Let M j (λ) be the minimal polynomial of L j = L U j .

From (∗∗) we see that Pj (L j )l j = 0. Hence M j (λ) divides Pj (λ)l j . For the reverse divisibility we have M j (L j ) = 0. Then certainly M j (L j )Q j (L j )R j (L j ), which equals M j (L)E j on U j , is 0 on U j . Consider M j (L)E j on Ui = image E i when i = j. Since E j E i = 0, M j (L)E j equals 0 on all Ui other than U j . We conclude that M j (L)E j equals 0 on V , i.e., M j (L)Q j (L)R j (L) = 0. Since M(λ) is the minimal polynomial of L, M(λ) divides Q i (λ)Ri (λ) , (†) M j (λ)Q j (λ)R j (λ) = M j (λ) 1 − i = j

and the factor Pj (λ)l j of M(λ) must divide the right side of (†). On that right side, Pj (λ)l j divides each Q i (λ) with i = j. Since Pj (λ) does not divide 1, Pj (λ)

6. Jordan Canonical Form

229

does not divide the factor 1 − i = j Q i (λ)Ri (λ). Since Pj (λ) is prime, Pj (λ)l j and 1 − i = j Q i (λ)Ri (λ) are relatively prime. We know that Pj (λ)l j divides the product of M j (λ) and 1 − i = j Q i (λ)Ri (λ), and consequently Pj (λ)l j divides M j (λ). This proves the reverse divisibility and completes the proof of (f).

6. Jordan Canonical Form Now we can return to the canonical-form problem for similarity of square matrices and isomorphism of linear maps from a ﬁnite-dimensional vector space to itself. The answer obtained in this section will solve the problem completely if K is algebraically closed but only partially if K fails to be algebraically closed. Problems 32–40 at the end of the chapter extend the content of this section to give a complete answer for general K. The present theorem is most easily stated in terms of matrices. A square matrix is called a Jordan block if it is of the form ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

c

1 0 c 1 c

0 0 1 .. .

··· ··· ··· .. . c

⎞ 0 0 0 0⎟ ⎟ 0 0⎟ .. .. ⎟ . .⎟ ⎟, 1 0⎟ ⎟ c 1⎠ c

of some size and for some c in K, as in Example 2 of Section 3, with 0 everywhere below the diagonal. A square matrix is in Jordan form, or Jordan normal form, if it is block diagonal and each block is a Jordan block. One can insist on grouping the blocks for which the constant c is the same and arranging the blocks for given c in some order, but these reﬁnements are inessential. Theorem 5.20 (Jordan canonical form). (a) If the ﬁeld K is algebraically closed, then every square matrix over K is similar to a matrix in Jordan form, and two matrices in Jordan form are similar to each other if and only if their Jordan blocks can be permuted so as to match exactly. (b) For a general ﬁeld K, a square matrix A is similar to a matrix in Jordan form if and only if each prime factor of its minimal polynomial has degree 1. Two matrices in Jordan form are similar to each other if and only if their Jordan blocks can be permuted so as to match exactly.

230

V. Theory of a Single Linear Transformation

The ﬁrst step in proving existence of a matrix in Jordan form similar to a given matrix is to use the Primary Decomposition Theorem (Theorem 5.19). We think of the matrix A as operating on the space Kn of column vectors in the usual way. The primary subspaces are uniquely deﬁned vector subspaces of Kn , and we introduce an ordered basis, yet to be speciﬁed in full detail, within each primary subspace. The union of these ordered bases gives an ordered basis of Kn , and we change from the standard basis to this one. The result is that the given matrix has been conjugated so that its appearance is block diagonal, each block having minimal polynomial equal to a power of a prime polynomial and the prime polynomials all being different. Let us call these blocks primary blocks. The effect of Theorem 5.19 has been to reduce matters to a consideration of each primary block separately. The hypothesis either that K is algebraically closed or, more generally, that the prime divisors of the minimal polynomial all have degree 1 means that the minimal polynomial of the primary block under study may be taken to be (λ − c)l for some c in K and some integer l ≥ 1. In terms of Jordan form, we have isolated, for each c in K, what will turn out to be the subspace of Kn corresponding to Jordan blocks with c in every diagonal entry. Let us write B for a primary block with minimal polynomial (λ − c)l . We certainly have (B − cI )l = 0, and it follows that the matrix N = B − cI has N l = 0. A matrix N with N l = 0 for some integer l ≥ 0 is said to be nilpotent. To prove the existence part of Theorem 5.20, it is enough to prove the following theorem. Theorem 5.21. For any ﬁeld K, each nilpotent matrix N in Mn (K) is similar to a matrix in Jordan form. The proof of Theorem 5.21 and of the uniqueness statements in Theorem 5.20 will occupy the remainder of this section. It is implicit in Theorem 5.21 that a nilpotent matrix in Mn (K) has 0 as a root of its characteristic polynomial with multiplicity n, in particular that the only prime polynomials dividing the characteristic polynomial are the ones dividing the minimal polynomial. We proved such a fact about divisibility earlier for general square matrices when the prime factor has degree 1, but we did not give a proof for general degree. We pause for a moment to give a direct proof in the nilpotent case. Lemma 5.22. If N is a nilpotent matrix in Mn (K ), then N has characteristic polynomial λn and satisﬁes N n = 0. PROOF. If N l = 0, then (λI − N )(λl−1 I + λl−2 N + · · · + λ2 N l−3 + λN l−2 + N l−1 ) = λl I − N l = λl I. Taking determinants and using Proposition 5.1 in the ring R = K[λ], we obtain det(λI − N ) det(other factor) = det(λl I ) = λln .

6. Jordan Canonical Form

231

Thus det(λI − N ) divides λln . By unique factorization in K[λ], det(λI − N ) is a constant times a power of λ. Then we must have det(λI − N ) = λn . Applying the Cayley–Hamilton Theorem (Theorem 5.9), we obtain N n = 0. Let us now prove the uniqueness statements in Theorem 5.20; this step will in fact help orient us for the proof of Theorem 5.21. In (b), one thing we are to prove is that if A is similar to a matrix in Jordan form, then every prime polynomial dividing the minimal polynomial has degree 1. Since characteristic and minimal polynomials are unchanged under similarity, we may assume that A is itself in Jordan form. The characteristic and minimal polynomials of A are computed in the four examples of Section 3. Since the minimal polynomial is the product of polynomials of degree 1, the only primes dividing it have degree 1. In both (a) and (b) of Theorem 5.20, we are to prove that the Jordan form is unique up to permutation of the Jordan blocks. The matrix A determines its characteristic polynomial, which determines the roots of the characteristic polynomial, which are the diagonal entries of the Jordan form. Thus the sizes of the primary blocks within the Jordan form are determined by A. Within each primary block, we need to see that the sizes of the various Jordan blocks are completely determined. Thus we may assume that N is nilpotent and that C −1 N C = J is in Jordan form with 0’s on the diagonal. Although we shall make statements that apply in all cases, the reader may be helped by referring to the particular matrix J in Figure 5.1 and its powers in Figure 5.2. ⎞ ⎛0100 0010

⎜0001 ⎜ ⎜0000 ⎜ ⎜ J =⎜ ⎜ ⎜ ⎜ ⎝

010 001 000

01 00

01 00

⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎠ 0

FIGURE 5.1. Example of a nilpotent matrix in Jordan form. Each block of the Jordan form J contributes 1 to the dimension of the kernel (or null space really) of J via the ﬁrst column of the block, and hence dim(ker J ) = #{Jordan blocks in J }. In Figure 5.1 this number is 5.

V. Theory of a Single Linear Transformation

232

⎛0010 ⎜ ⎜ ⎜ ⎜ ⎜ J2 = ⎜ ⎜ ⎜ ⎜ ⎝

0001 0000 0000

001 000 000

00 00

00 00

⎞

⎛0001

⎞

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

0000 0000 0000

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

and

⎜ ⎜ ⎜ ⎜ ⎜ J3 = ⎜ ⎜ ⎜ ⎜ ⎝

000 000 000

00 00

00 00

0

0

FIGURE 5.2. Powers of the nilpotent matrix in Figure 5.1. When J is squared, the 1’s in J move up and to the right one more step beyond the diagonal except that blocks of size 2 become 0. When J is cubed, the 1’s in J move up and to the right one further step except that blocks of size 3 become 0. Each time J is raised to a new power one higher than before, each block that is nonzero in the old power contributes an additional 1 to the dimension of the kernel. Thus we have dim(ker J 2 ) − dim(ker J ) = #{Jordan blocks of size ≥ 2} and

dim(ker J 3 ) − dim(ker J 2 ) = #{Jordan blocks of size ≥ 3};

in the general case, dim(ker J k ) − dim(ker J k−1 ) = #{Jordan blocks of size ≥ k}

for k ≥ 1.

Lemma 5.22 says that J k = 0 when k is ≥ the size of J , and the differences need not be computed beyond that point. For Figure 5.2 the values by inspection are dim(ker J 2 ) = 9 and dim(ker J 3 ) = 11; also J 4 = 0 and hence dim(ker J 4 ) = 12. The numbers of Jordan blocks of size ≥ k for k = 1, 2, 3, 4 are 5, 4, 2, 1, and these numbers indeed match the differences 5 − 0, 9 − 5, 11 − 9, 12 − 11, as predicted by the above formula. Since C −1 N C = J , we have C −1 N k C = J k and N k C = C J k . The matrix C is invertible, and therefore dim(ker J k ) = dim(ker C J k ) = dim(ker N k C) = dim(ker N k ). Hence dim(ker N k ) − dim(ker N k−1 ) = #{Jordan blocks of size ≥ k}

for k ≥ 1,

and the number of Jordan blocks of each size is uniquely determined by properties of N . This completes the proof of all the uniqueness statements in Theorem 5.20.

6. Jordan Canonical Form

233

Now let us turn to the proof of Theorem 5.21, ﬁrst giving the idea. The argument involves a great many choices, and it may be helpful to understand it in the context of Figures 5.1 and 5.2. Let = (e1 , . . . , e12 ) be the standard ordered basis of K12 . The matrix J , when operating by multiplication on the left, moves basis vectors to other basis vectors or to 0. Namely, J e1 = 0,

J e2 = e1 ,

J e5 = 0,

J e3 = e2 ,

J e6 = e5 ,

J e8 = 0, J e10 = 0,

J e4 = e3 ,

J e7 = e6 ,

J e9 = e8 , J e11 = e10 ,

J e12 = 0, with each line describing what happens for a single Jordan block. Let us think L of the given nilpotent matrix N as equal to for some linear map L. We want to ﬁnd a new ordered basis = (v1 , . . . , v12 ) in which

the matrix of L is I −1 J . In the expression C N C = J , the matrix C equals , and its columns are expressions for v1 , . . . , v12 in the basis , i.e., Cei = vi . For each index i, we have J ei = J ei−1 or J ei = 0. The formula N C = C J , when applied to ei , therefore says that Cei−1 = vi−1 if J ei = ei−1 , N vi = N Cei = C J ei = 0 if J ei = 0. Thus we are looking for an ordered basis such that N sends each member of the basis either into the previous member or into 0. The procedure in this example will be to pick out v4 as a vector not annihilated by N 3 , obtain v3 , v2 , v1 , from it by successively applying N , pick out v7 as a vector not annihilated by N 2 and independent of what has been found, obtain v6 , v5 from it by successively applying N , and so on. It is necessary to check that the appropriate linear independence can be maintained, and that step will be what the proof is really about. The proof of Theorem 5.21 will now be given in the general case. The core of the argument concerns linear maps and appears as three lemmas. Afterward the results of the lemmas will be interpreted in terms of matrices. For all the lemmas let V be an n-dimensional vector space over K, and let N : V → V be linear with N n = 0. Deﬁne K j = ker N j , so that 0 = K 0 ⊆ K 1 ⊆ K 2 ⊆ · · · ⊆ K n = V. Lemma 5.23. Suppose j ≥ 1 and suppose Sj is any vector subspace of V such that K j+1 = K j ⊕ Sj . Then N is one-one from Sj into K j and N (Sj ) ∩ K j−1 = 0.

234

V. Theory of a Single Linear Transformation

PROOF. Since N (ker N j+1 ) ⊆ ker N j , we obtain N (Sj ) ⊆ K j ; thus N indeed sends Sj into K j . To see that N is one-one from Sj into K j , suppose that s is a member of Sj with N (s) = 0. Then s is in K 1 . Since j ≥ 1, K 1 ⊆ K j . Thus s is in K j . Since K j ∩ Sj = 0, s is 0. Hence N is one-one from Sj into K j . To see that N (Sj ) ∩ K j−1 = 0, suppose s is a member of Sj with N (s) in K j−1 . Then 0 = N j−1 (N (s)) = N j (s) shows that s is in K j . Since K j ∩ Sj = 0, s equals 0. Lemma 5.24. Deﬁne Un = Wn = 0. For 0 ≤ j ≤ n − 1, there exist vector subspaces U j and W j of K j+1 such that K j+1 = K j ⊕ U j ⊕ W j , and

U j = N (U j+1 ⊕ W j+1 ), N : U j+1 ⊕ W j+1 → U j is one-one.

PROOF. Deﬁne Un−1 = N (Un ⊕ Wn ) = 0, and let Wn−1 be a vector subspace such that V = K n = K n−1 ⊕ Wn−1 . Put Sn−1 = Un−1 ⊕ Wn−1 . Proceeding inductively downward, suppose that Un , Un−1 , . . . , U j+1 , Wn , Wn−1 , . . . , W j+1 have been deﬁned so that Uk = N (Uk+1 ⊕ Wk+1 ), N : Uk+1 ⊕ Wk+1 → Uk is one-one, and K k+1 = K k ⊕ Uk ⊕ Wk whenever k satisﬁes j < k ≤ n − 1. We put Sk = Uk ⊕ Wk for these values of k, and then Sk satisﬁes the hypothesis of Lemma 5.23 whenever k satisﬁes j < k ≤ n − 1. We now construct U j and W j . We put U j = N (Sj+1 ). Since Sj+1 satisﬁes the hypothesis of Lemma 5.23, we see that U j ⊆ K j+1 , N is one-one from Sj+1 into U j , and U j ∩ K j = 0. Thus we can ﬁnd a vector subspace W j with K j+1 = K j ⊕ U j ⊕ W j , and the inductive construction is complete. Lemma 5.25. The vector subspaces of Lemma 5.24 satisfy V = U0 ⊕ W0 ⊕ U1 ⊕ W1 ⊕ · · · ⊕ Un−1 ⊕ Wn−1 . PROOF. Iterated use of Lemma 5.24 gives V = K n = K n−1 ⊕ (Un−1 ⊕ Wn−1 ) = K n−2 ⊕ (Un−2 ⊕ Wn−2 ) ⊕ (Un−1 ⊕ Wn−1 ) = · · · = K 0 ⊕ (U0 ⊕ W0 ) ⊕ · · · ⊕ (Un−1 ⊕ Wn−1 ) = (U0 ⊕ W0 ) ⊕ · · · ⊕ (Un−1 ⊕ Wn−1 ), the last step holding since K 0 = 0, K 0 being the kernel of the identity function.

7. Computations with Jordan Form

235

PROOF OF THEOREM 5.21. We regard N as acting on V = Kn by multiplication on the left, and we describe an ordered basis in which the matrix of N is in Jordan form. For 0 ≤ j ≤ n − 1, form a basis of the vector subspace W j of Lemma 5.24, and let v ( j) be a typical member of this basis. Each v ( j) will be used as the last basis vector corresponding to a Jordan block of size j + 1. The full ordered basis for that Jordan block will therefore be N j v ( j) , N j−1 v ( j) , . . . , N v ( j) , v ( j) . The theorem will be proved if we show that the union of these sets as j and v ( j) vary is a basis of Kn and that N j+1 v ( j) = 0 for all j and v ( j) . From the ﬁrst conclusion of Lemma 5.24 we see for j ≥ 0 that W j ⊆ K j+1 , and hence N j+1 (W j ) = 0. Therefore N j+1 v ( j) = 0 for all j and v ( j) . Let us prove by induction downward on j that a basis of U j ⊕ W j consists of all v ( j) and all N k v ( j+k) for k > 0. The base case of the induction is j = n − 1, and the statement holds in that case since Un−1 = 0 and since the vectors v (n−1) form a basis of Wn−1 . The inductive hypothesis is that all v ( j+1) and all N k v ( j+1+k) for k > 0 together form a basis of U j+1 ⊕ W j+1 . The second and third conclusions of Lemma 5.24 together show that all N v ( j+1) and all N k+1 v ( j+1+k) for k > 0 together form a basis of U j . In other words, all N k v ( j+k) with k > 0 together form a basis of U j . The vectors v ( j) by construction form a basis of W j , and U j ∩ W j = 0. Therefore the union of these separate bases is a basis for U j ⊕ W j , and the induction is complete. Taking the union of the bases of U j ⊕ W j for all j and applying Lemma 5.25, we see that we have a basis of V = Kn . This shows that the desired set is a basis of Kn and completes the proof of Theorem 5.21.

7. Computations with Jordan Form Let us illustrate the computation of Jordan form and the change-of-basis matrix with a few examples. We are given a matrix A and we seek J and C with J = C −1 AC. We regard A as the matrix of some linear L in the standard ordered basis

, andwe regard J as the matrix of L in some other ordered basis . Then I C= , and so the columns of C give the members of written as ordinary column vectors (in the standard ordered basis). EXAMPLE 1. This example will be a nilpotent matrix, and we shall compute J and C merely by interpreting the proof of Theorem 5.21 in concrete terms. Let A=

−1 1 0 −1 1 0 −1 1 0

.

V. Theory of a Single Linear Transformation

236

The ﬁrst step is to compute the characteristic polynomial, which is det(λI − A) = det

λ+1 1 1

−1 0 λ−1 0 −1 λ

= λ det

λ+1 −1 1 λ−1

= λ3 .

Then A3 = 0 by the Cayley–Hamilton Theorem (Theorem 5.9), and A is indeed nilpotent. The diagonal entries of J are thus all 0, and we have to compute the sizes of the various Jordan blocks. To do so, we compute the dimension of the kernel of each power of A. The dimension of the kernel of a matrix equals the number of independent variables when we solve AX = 0 by row reduction. With the ﬁrst power of A, the variable x1 is dependent, and x2 and x3 are independent. Also, A2 = 0. Thus dim(ker A0 ) = 0,

dim(ker A) = 2,

and

dim(ker A2 ) = 3.

Hence #{Jordan blocks of size ≥ 1} = dim(ker A) − dim(ker A0 ) = 2 − 0 = 2, #{Jordan blocks of size ≥ 2} = dim(ker A2 ) − dim(ker A) = 3 − 2 = 1. From these equalities we see that one Jordan block has size 2 and the other has size 1. Thus

J=

01 00

0

.

We want to set up vector subspaces as in Lemma 5.24 so that K j+1 = K j ⊕U j ⊕W j and U j = A(U j+1 ⊕ W j+1 ) for 0 ≤ j ≤ 2. Since K 3 = K 2 , the equations begin with K 2 = · · · and are K 2 = K 1 ⊕ 0 ⊕ W1 ,

U0 = A(0 ⊕ W1 ),

K 1 = K 0 ⊕ U 0 ⊕ W0 .

x1 3 Here K 2 = K and K 1 is the subspace of all X = x2 such that AX = 0. x3

The space W1 is to satisfy K 2 = K 1 ⊕ W1 , and we see that W1 is 1-dimensional. Let {v (1) } be a basis of the 1-dimensional vector subspace W1 . Then U0 is 1-dimensional with basis {Av (1) }. The subspace K 1 is 2-dimensional and contains U0 . The space W0 is to satisfy K 1 = U0 ⊕W0 , and we see that W0 is 1-dimensional. Let {v (0) } be a basis of W0 . Then the respective columns of C may be taken to be Av (1) , Let us compute these vectors.

v (1) ,

v (0) .

7. Computations with Jordan Form

237

If we extend a basis of K 1 to a basis of K 2 , then W1 may be taken to be the linear span of the added vector. To

obtain a basis of K 1 , we compute that the 1 −1 0 0 0 0 , and the 0 00 Thus x1 = x2 , and

reduced row-echelon form of A is the single equation x1 − x2 = 0.

x1 x2 x3

= x2

1 1 0

+ x3

resulting system consists of

0 0 1

.

The coefﬁcients of x2 and x3 on the right side form a basis of K 1 , and we are to choose v (1) =

1 take

a vector that is not a linear combination of these. Thus we can −1 1 0 as the basis vector of W1 . Then U0 = A(W1 ) has Av (1) = A 0 = −1 0

0

−1

taken as a basis, and the basis of W0 may be

as any vector in K 1 but not U0 . We can take this basis to consist of v (0) =

0 0 1

.

−1 1 0 Lining up our three basis vectors as the columns of C gives us C = −1 0 0 . −1 0 1

0 −1 0 Computation gives C −1 = 1 −1 0 , and we readily check that C −1 AC = J . 0 −1 1

EXAMPLE 2. We continue with A and J as in Example 1, but we compute the columns of C without directly following the proof of Theorem 5.21. The method starts from the fact that each Jordan block corresponds to a 1-dimensional space of eigenvectors, and then we backtrack to ﬁnd vectors corresponding to the other columns. For this particular A, we know that the three columns of C are to be of the form v1 = Av (1) , v2 = v (1) , and v3 = v (0) . The vectors v1 and v3 together span the 0 eigenspace of A. We ﬁnd all the 0 eigenvectors, writing them as a two-parameter family.

This% eigenspace is just K 1 = ker A, and we found in Example 1 that K 1 =

x2 x2 x3

. One of these vectors is to be v1 , and it has to

x2 equal Av2 . Thus we solve Av2 = x2 . Applying the solution procedure yields x3

1 −1 0 0 0 0 0 0 0

−x2 0 x3 −x2

.

This system has no solutions unless x3 − x2 = 0. If we take x2 = x3 = −1, then we obtain the same

ﬁrst two columns of C as in Example 1, and any vector in K 1 independent of

−1 −1 −1

may be taken as the third column.

V. Theory of a Single Linear Transformation

238

EXAMPLE 3. Let

A=

2 1 0 −1 4 0 −1 2 2

.

Direct calculation shows that the characteristic polynomial is det(λI − A) = λ3 − 8λ2 + 21λ − 18 = (λ − 2)(λ − 3)2 . The possibilities for J are therefore

3 1 0

3 0 0 030 030 ; and 002

002

the ﬁrst one will be correct if the dimension of the eigenspace for the eigenvalue 3 is 2, and the second one will be correct if that dimension is 1. The third column of C corresponds to an eigenvector for the eigenvalue

2, hence to a nonzero solution of (A − 2I )v = 0. The solutions are v = k

0 and we can therefore use 0 .

0 0 1

,

1

For the ﬁrst two columns of C, we have to ﬁnd ker(A − 3I ) no matter which of the methods we use, the one in Example 1 or the one in Example

% 2. Solving the system of equations, we obtain all vectors in the space z

1 1 1

. The dimension

of the space is 1, and the second possibility for the Jordan form is the correct one. Following the method of Example 1 to ﬁnd the columns of C means that we pick a basis of this kernel and extend )2 . A basis

it to a basis of ker(A − 3I

of ker(A − 3I ) consists of the vector

1 1 1

. The matrix (A − 3I )2 is

the solution procedure leads to the formula

a

1

0 b =a 0 +c 1 c

1 for its kernel. The vector

1 1

0

0 00 0 00 0 −1 1

, and

1

arises from a = 1 and c = 1. We are to make an

independent choice, say a = 1 and c = 0. Then the second basis vector to use is

1 0 . This becomes the second column of C, and the ﬁrst column then has to be 0

1 −1

−1 1 0 (A − 3I ) 0 = −1 . The result is that C = −1 0 0 . 0

−1

−1 0 1

Following the method of Example 2 for this example means that we retain the

entire kernel of A − 3I , namely all vectors v1 = z

1 1 1

, as candidates for the

ﬁrst column of C. The second column is to satisfy (A − 3I )v2 = v1 . Solving

8. Problems

leads to v2 = z

−1 0 0

+c

239

1 1 1

. In contrast to Example 2, there is no potential

contradictory equation. So we choose z and then c.

If we take z−1= 1 and 1 0 . Then c = 0, we ﬁnd that the ﬁrst two columns of C are to be 1 and 0 1

1 −1 0 C = 1 00 . 1

01

For any example in which we can factor the characteristic polynomial exactly, either of the two methods used above will work. The ﬁrst method appears complicated but uses numbers throughout; it tends to be more efﬁcient with large examples involving high-degree minimal polynomials. The second method appears direct but requires solving equations with symbolic variables; it tends to be more efﬁcient for relatively simple examples.

8. Problems In Problems 1–25 all vector spaces are assumed ﬁnite-dimensional, and all linear transformations are assumed deﬁned from such spaces into themselves. Unless information is given to the contrary, the underlying ﬁeld K is assumed arbitrary. 1.

Let Mmn (C) be the vector space of m-by-n complex matrices. The group GL(m, C) × GL(n, C) acts on Mmn (C) by ((g, h), x) → gxh −1 , where gxh −1 denotes a matrix product. (a) Verify that this is indeed a group action. (b) Prove that two members of Mmn (C) lie in the same orbit if and only if they have the same rank. (c) For each possible rank, give an example of a member of Mmn (C) with that rank.

2.

Prove that a member of Mn (K) is invertible if and only if the constant term of its minimal polynomial is different from 0.

3.

Suppose that L : V → V is a linear map with minimal polynomial M(λ) = P1 (λ)l1 · · · Pk (λ)lk and that V = U ⊕ W with U and W both invariant under L. Let P1 (λ)r1 ·· · Pk (λ)rk and P1 (λ)s1 · · · Pk (λ)sk be the respective minimal polynomials of L U and L W . Prove that l j = max(r j , s j ) for 1 ≤ j ≤ k.

4.

(a) If A and B are in Mn (K), if P(λ) is a polynomial such that P(AB) = 0, and if Q(λ) = λP(λ), prove that Q(B A) = 0. (b) What can be inferred from (a) about the relationship between the minimal polynomials of AB and of B A?

V. Theory of a Single Linear Transformation

240

5.

(a) Suppose that D and D are in Mn (K), are similar to diagonal matrices, and have D D = D D. Prove that there is a matrix C such that C −1 DC and C −1 D C are both diagonal. (b) Give an example of two nilpotent matrices N and N in Mn (K) with N N = N N such that there is no C with C −1 N C and C −1 N C both in Jordan form.

6.

(a) Prove that the matrix of a projection is similar to a diagonal matrix. What are the eigenvalues? (b) Give a necessary and sufﬁcient condition for two projections involving the same V to be given by similar matrices.

7.

Let E : V → V and F : V → V be projections. Prove that E and F have (a) the same image if and only if E F = F and F E = E, (b) the same kernel if and only if E F = E and F E = F.

8.

Let E : V → V and F : V → V be projections. Prove that E F is a projection if E F = F E. Prove or disprove a converse.

9.

An involution on V is a linear map U : V → V such that U 2 = I . Show that the equation U = 2E − 1 establishes a one-one correspondence between all projections E and all involutions U .

10. Let L : V → V be linear. Prove that there exist vector subspaces U and W of V such that

(i) (ii) (iii) (iv)

V = U ⊕ W, L(U ) ⊆ U and L(W ) ⊆ W , L is nilpotent on U , L is nonsingular on W .

11. Prove that the vector subspaces U and W in the previous problem are uniquely characterized by (i) through (iv). 12. Let L : V → V be a linear map, and suppose that its minimal polynomial is of the form M(λ) = kj=1 (λ − λ j )l j with the λ j distinct. Let V = U1 ⊕ · · · ⊕ Uk be the corresponding primary decomposition of V , and deﬁne D : V → V by D = λ1 E 1 + · · · + λk E k , where E 1 , . . . , E k are the projections associated with the primary decomposition. Finally put N = L − D. Prove that (a) L = D + N , (b) D has a basis of eigenvectors, (c) N is nilpotent, (d) D N = N D. (e) D and N are given by polynomials in L, (f) the minimal polynomial of D is kj=1 (λ − λ j ), (g) the minimal polynomial of N is λmax l j .

8. Problems

241

13. In the previous problem with L given, prove that the decomposition L = D + N is uniquely determined by properties (a) through (d). 14. Let A be a nilpotent square matrix. Prove that det(I + A) = 1. −5 9 15. For the complex matrix A = −4 7 , ﬁnd a Jordan-form matrix J and an invertible matrix C such that J = C −1 AC.

4 1 −1 16. For the complex matrix A = −8 −2 2 , ﬁnd a Jordan-form matrix J and an 8

2 −2

invertible matrix C such that J = C −1 AC. 17. For the upper triangular matrix ⎛2 0 0 1 1 0 0⎞ ⎜ ⎜ A=⎜ ⎜ ⎝

2000 201 20 2

1 0 1 1 2

1 ⎟ 0⎟ 2⎟ ⎟, 1⎠ 1 3

ﬁnd a Jordan-form matrix J and an invertible matrix C such that J = C −1 AC. 18. (a) For M3 (C), prove that any two matrices with the same minimal polynomial and the same characteristic polynomial must be similar. (b) Is the same thing true for M4 (C)? 19. Suppose that K has characteristic 0 and that J is a Jordan block with nonzero eigenvalue and with size > 1. Prove that there is no n ≥ 1 such that J n is diagonal. 20. Classify up to similarity all members A of Mn (C) with An = I . 21. How many similarity classes are there of 3-by-3 matrices A with entries in C such that A3 = A? Explain. 22. Let n ≥ 2, and let N be a member of Mn (K) with N n = 0 but N n−1 = 0. Prove that there is no n-by-n matrix A with A2 = N . 23. For a Jordan block J , prove that J t is similar to J . 24. Prove that if A is in Mn (C), then At is similar to A. 25. Let N be the 2-by-2 matrix 00 10 , and let A and B be the 4-by-4 matrices A = N0 N0 and B = N0 NN . Prove that A and B are similar. Problems 26–31 concern cyclic vectors. Fix a linear map L : V → V from a ﬁnitedimensional vector space V to itself. For v in V , let P(v) denote the set of all vectors Q(L)(v) in V for Q(λ) in K[λ]; P(v) is a vector subspace and is invariant under L. If U is an invariant subspace of V , we say that U is a cyclic subspace if there is some

V. Theory of a Single Linear Transformation

242

v in U such that P(v) = U ; in this case, v is said to be a cyclic vector for U , and U is called the cyclic subspace generated by v. For v in V , let Iv be the ideal of all polynomials Q(λ) in K[λ] with Q(L)v = 0. The monic generator of v is the unique monic polynomial Mv (λ) such that Mv (λ) divides every member of Iv . 26. For v ∈ V , explain why Iv is nonzero and why Mv (λ) therefore exists. 27. For v ∈ V , prove that (a) the degree of the monic generator Mv (λ) equals the dimension of the cyclic subspace P(v), (b) the vectors v, L(v), L 2 (v), . . . , L deg Mv −1 (v) form a vector-space basis of P(v), (c) the minimal and characteristic polynomials of L P(v) are both equal to Mv (λ). 28. Suppose that Mv (λ) = c0 + c1 λ + · · · + cd−1 λd−1 + λd . Prove that the matrix of L in a suitable ordered basis is P(v)

⎛ −c

d−1

⎜ . ⎜ . ⎜ . ⎜ −c2 ⎝

⎞

1 0 ···

⎜ −cd−2 0 1 ⎜ −cd−3 0 0 00

⎟ ⎟ . . . . .. ⎟ . . .⎟ ⎟. ⎟ ··· 0 1 0 ⎠

−c1 0 0 ··· −c0 0 0 ···

0 01 00

29. Suppose that v is in V , that Mv (λ) is a power of a prime polynomial P(λ), and that Q(λ) is a nonzero polynomial with deg Q(λ) < deg P(λ). Prove that P(Q(L)(v)) = P(v). 30. Let P(λ) be a prime polynomial. (a) Prove by induction on dim V that if the minimal polynomial of L is P(λ), then the characteristic polynomial of L is a power of P(λ). (b) Prove by induction on l that if the minimal polynomial of L is P(λ)l , then the characteristic polynomial of L is a power of P(λ). (c) Conclude that if the minimal polynomial of L is a power of P(λ), then deg P(λ) divides dim V . 31. (a) Prove that every prime factor of the characteristic polynomial of L divides the minimal polynomial of L. (b) In Problem 12 prove that D and L have the same characteristic polynomial. Problems 32–40 continue the study of cyclic vectors begun in Problems 26–31, using the same notation. The goal is to obtain a canonical-form theorem like Theorem 5.20 for L but with no assumption on K or P(λ), namely that each primary subspace for L is the direct sum of cyclic subspaces and the resulting decomposition is unique up to isomorphism. This result and the Fundamental Theorem of Finitely Generated

8. Problems

243

Abelian Groups (Theorem 4.56) will be seen in Chapter VIII to be special cases of a single more general theorem. Still another canonical form for matrices and linear maps is an analog of the result with elementary divisors mentioned in the remarks with Theorem 4.56 and is valid here; it is called rational canonical form, but we shall not pursue it until the problems at the end of Chapter VIII. The proof in Problems 32–40 uses ideas similar to those used for Theorem 5.21 except that the hypothesis will now be that the minimal polynomial of L is P(λ)l with P(λ) prime, rather than just λl . Deﬁne K j = ker(P(L) j ) for j ≥ 0, so that K 0 = 0, K j ⊆ K j+1 for all j, K l = V , and each K j is an invariant subspace under L. Deﬁne d = deg P(λ). 32. Suppose j ≥ 1, and suppose Sj is any vector subspace of V such that K j+1 = K j ⊕ Sj . Prove that P(L) is one-one from Sj into K j and P(L)(Sj ) ∩ K j−1 = 0. 33. Deﬁne Ul = Wl = 0. For 0 ≤ j ≤ l − 1, prove that there exist vector subspaces U j and W j of K j+1 such that K j+1 = K j ⊕ U j ⊕ W j , U j = P(L)(U j+1 ⊕ W j+1 ), P(L) : U j+1 ⊕ W j+1 → U j

is one-one.

34. Prove that the vector subspaces of the previous problem satisfy V = U0 ⊕ W0 ⊕ U1 ⊕ W1 ⊕ · · · ⊕ Ul−1 ⊕ Wl−1 . 35. For v = 0 in W j , prove that the set of all L r P(L)s (v) with 0 ≤ r ≤ d − 1 and 0 ≤ s ≤ j is a vector-space basis of P(v). 36. Going back over the construction in Problem 33, prove that each W j can be ( j) chosen to have a basis consisting of vectors L r (vi ) for 1 ≤ i ≤ (dim W j )/d and 0 ≤ r ≤ d − 1. 37. Let the index i used in the previous problem with j be denoted by i j for 1 ≤ i j ≤ (dim W j )/d. Prove that a vector-space basis of U j ⊕ W j consists of all ( j+k) L r P(L)k (vi j+k ) for 0 ≤ r ≤ d − 1, k ≥ 0, 1 ≤ i j+k ≤ (dim W j+k )/d. 38. Prove that V is the direct sum of cyclic subspaces under L. Prove speciﬁcally ( j) that each vi j generates a cyclic subspace and that the sum of all these vector subspaces, with 0 ≤ j ≤ l and 1 ≤ i j ≤ (dim W j )/d, is a direct sum and equals V. 39. In the decomposition of the previous problem, each cyclic subspace generated ( j) by some vi j has minimal polynomial P(λ) j+1 . Prove that

% direct summands with minimal polynomial # = (dim K j+1 − dim K j )/d. P(λ)k for some k ≥ j + 1

V. Theory of a Single Linear Transformation

244

40. Prove that the formula of the previous problem persists for any decomposition of V as the direct sum of cyclic subspaces, and conclude from Problem 28 that the decomposition into cyclic subspaces is unique up to isomorphism. Problems 41–46 concern systems of ordinary differential equations with constant coefﬁcients. The underlying ﬁeld is taken to be C, and differential calculus is used. t k Ak For A in Mn (C) and t in R, deﬁne et A = ∞ k=0 k ! . Take for granted that the series deﬁning et A converges entry by entry, that the series may be differentiated term by term to yield dtd (et A ) = Aet A = et A A, and that es A+t B = es A et B if A and B commute. 41. Calculate et A for A equal to 01 (a) −1 0 , (b) 01 10 , (c) the diagonal matrix with diagonal entries d1 , . . . , dn . 42. (a) Calculate et J when J is a nilpotent n-by-n Jordan block. (b) Use (a) to calculate et J when J is a general n-by-n Jordan block. 43. Let y1 , . . . , yn be unknown functions from R to C, and let y be the vector-valued function formed by arranging y1 , . . . , yn in a column. Suppose that A is in Mn (C). Prove for each vector v ∈ Cn that y(t) = et A v is a solution of the system of differential equations dy = Ay(t). dt 44. With notation as in the previous problem and with v ﬁxed in Cn , use e−t A y(t) to show, for each open interval of t’s containing 0, that the only solution of dy = Ay(t) on that interval such that y(0) = v is y(t) = et A v. dt −1

45. For C invertible, prove that etC AC = C −1 et A C, and deduce a relationship between solutions of dy = Ay(t) and solutions of dy = (C −1 AC)y(t). dt dt

2 1 0 46. Let A = −1 4 0 . Taking into account Example 3 in Section 7 and Problems −1 2 2

42 through 45 above, ﬁnd all solutions for t in (−1, 1) to the system

1 such that y(0) = 2 . 3

dy dt

= Ay(t)

CHAPTER VI Multilinear Algebra

Abstract. This chapter studies, in the setting of vector spaces over a ﬁeld, the basics concerning multilinear functions, tensor products, spaces of linear functions, and algebras related to tensor products. Sections 1–5 concern special properties of bilinear forms, all vector spaces being assumed to be ﬁnite-dimensional. Section 1 associates a matrix to each bilinear form in the presence of an ordered basis, and the section shows the effect on the matrix of changing the ordered basis. It then addresses the extent to which the notion of “orthogonal complement” in the theory of inner-product spaces applies to nondegenerate bilinear forms. Sections 2–3 treat symmetric and alternating bilinear forms, producing bases for which the matrix of such a form is particularly simple. Section 4 treats a related subject, Hermitian forms when the ﬁeld is the complex numbers. Section 5 discusses the groups that leave some particular bilinear and Hermitian forms invariant. Section 6 introduces the tensor product of two vector spaces, working with it in a way that does not depend on a choice of basis. The tensor product has a universal mapping property—that bilinear functions on the product of the two vector spaces extend uniquely to linear functions on the tensor product. The tensor product turns out to be a vector space whose dual is the vector space of all bilinear forms. One particular application is that tensor products provide a basis-independent way of extending scalars for a vector space from a ﬁeld to a larger ﬁeld. The section includes a number of results about the vector space of linear mappings from one vector space to another that go hand in hand with results about tensor products. These have convenient formulations in the language of category theory as “natural isomorphisms.” Section 7 begins with the tensor product of three and then n vector spaces, carefully considering the universal mapping property and the question of associativity. The section deﬁnes an algebra over a ﬁeld as a vector space with a bilinear multiplication, not necessarily associative. If E is a vector space, the tensor algebra T (E) of E is the direct sum over n ≥ 0 of the n-fold tensor product of E with itself. This is an associative algebra with a universal mapping property relative to any linear mapping of E into an associative algebra A with identity: the linear map extends to an algebra homomorphism of T (E) into A carrying 1 into 1. Sections 8–9 deﬁne the symmetric and exterior algebras of a vector space E. The symmetric algebra S(E) is a quotient of T (E) with the following universal mapping property: any linear mapping of E into a commutative associative algebra A with identity extends to an algebra homomorphism of S(E) into A carrying 1 into 1. The symmetric algebra is commutative. Similarly the exterior algebra (E) is a quotient of T (E) with this universal mapping property: any linear mapping l of E into an associative algebra A with identity such that l(v)2 = 0 for all v ∈ E extends to an algebra homomorphism of (E) into A carrying 1 into 1. The problems at the end of the chapter introduce some other algebras that are of importance in applications, and the problems relate some of these algebras to tensor, symmetric, and exterior algebras. Among the objects studied are Lie algebras, universal enveloping algebras, Clifford algebras, Weyl algebras, Jordan algebras, and the division algebra of octonions. 245

VI. Multilinear Algebra

246

1. Bilinear Forms and Matrices This chapter will work with vector spaces over a common ﬁeld of “scalars,” which will be called K. In Section 6 a ﬁeld containing K as a subﬁeld will brieﬂy play a role, and that will be called L. If V is a vector space over K, a bilinear form on V is a function from V × V into K that is linear in each variable when the other variable is held ﬁxed. EXAMPLES. (1) For general K, take V = Kn . Any matrix A in Mn (K) determines a bilinear form by the rule v, w = v t Aw. (2) For K = R, let V be an inner-product space, in the sense of Chapter III, with inner product ( · , · ). Then ( · , · ) is a bilinear form on V . Multilinear functionals on a vector space of row vectors, also called k-linear functionals or k-multilinear functionals, were deﬁned in the course of working with determinants in Section II.7, and that deﬁnition transparently extends to general vector spaces. A bilinear form on a general vector space is then just a 2-linear functional. From the point of view of deﬁnitions, the words “functional” and “form” are interchangeable here, but the word “form” is more common in the bilinear case because of a certain homogeneity that it suggests and that comes closer to the surface in Corollary 6.12 and in Section 7. For the remainder of this section, all vector spaces will be ﬁnite-dimensional. Bilinear forms, i.e., 2-linear functionals, are of special interest relative to klinear functionals for general k because of their relationships with matrices and linear mappings. To begin with, each bilinear form, in the presence of an ordered basis, is given by a matrix. In more detail let V be a ﬁnite-dimensional vector space, and let · , · be a bilinear form on V . If an ordered basis = (v1 , . . . , vn ) of V is speciﬁed, then the bilinear form determines the matrix B with entries Bi j = vi , v j . Conversely we can recover the bilinear form from B as follows: Write v = i ai vi and w = j b j v j . Then v, w =

, i

ai vi ,

j

- b j v j = i, j ai vi , v j b j .

In other words, v, w = a Bb, where a = t

v

and b =

of Section II.3. Therefore

v, w =

v

t B

w .

w

in the notation

1. Bilinear Forms and Matrices

247

Consequently we see that all bilinear forms on a ﬁnite-dimensional vector space reduce to Example 1 above—once we choose an ordered basis. Let us examine the effect of a change of ordered basis. Suppose that = (v1 , . . . , vm ) and = (w1 , . . . , wn ), and let B and C be the matrices of the bilinear form in these two ordered bases: Bi j = vi , v j and C i j = w i , w j . Let I the two bases be related by w j = i ai j vi , i.e., let [ai j ] = . Then we have , - Ci j = wi , w j = aki vk , al j vl = aki al j vk , vl = aki Bkl al j . k

l

k,l

k,l

Translating this formula into matrix form, we obtain the following proposition. Proposition 6.1. Let · , · be a bilinear form on a ﬁnite-dimensional vector space V , let and be ordered bases of V , and let B and C be the respective matrices of · , · relative to and . Then

t

I I B . C= The qualitative conclusion about the matrices may be a little unexpected. It is not that they are similar but that they are related by C = S t B S for some nonsingular square matrix S. In particular, B and C need not have the same determinant. Guided by the circle of ideas around the Riesz Representation Theorem for inner products (Theorem 3.12), let us examine what happens when we ﬁx one of the variables of a bilinear form and work with the resulting linear map. Thus again let · , · be a bilinear form on V . For ﬁxed u in V , v → u, v is a linear functional on V , thus a member of the dual space V of V . If we write L(u) for this linear functional, then L is a function from V to V satisfying L(u)(v) = u, v. The formula for L shows that L is in fact a linear function. We deﬁne the left radical, lrad, of · , · to be the kernel of L; thus lrad · , · = {u ∈ V | u, v = 0 for all v ∈ V }. Similarly we let R : V → V be the linear map R(v)(u) = u, v, and we deﬁne the right radical, rrad, of · , · to be the kernel of R; thus rrad · , · = {v ∈ V | u, v = 0 for all u ∈ V }. EXAMPLE 1, CONTINUED. The vector space V is the space Kn of n-dimensional column vectors, the dual V is the space of n-dimensional row vectors, A is

248

VI. Multilinear Algebra

an n-by-n matrix with entries in K, and · , · is given by u, v = u t Av = L(u)(v) = R(v)(u) for u and v in Kn . Explicit formulas for L and R are given by L(u) = u t A = (At u)t and

R(v) = (Av)t .

Thus lrad · , · = ker L = null space(At ), rrad · , · = ker R = null space(A). Since A is square and since the row rank and column rank of A are equal, the dimensions of the null spaces of A and At are equal. Hence dim lrad · , · = dim rrad · , · . This equality of dimensions for the case of Kn extends to general V , as is noted in the next proposition. Proposition 6.2. If · , · is any bilinear form on a ﬁnite-dimensional vector space V , then dim lrad · , · = dim rrad · , · . PROOF. We saw above that computations with bilinear forms of V reduce, once we choose an ordered basis for V , to computations with matrices, row vectors, and column vectors. Thus the argument just given in the continuation of Example 1 is completely general, and the proposition is proved. A bilinear form · , · is said to be nondegenerate if its left radical is 0. In view of the Proposition 6.2, it is equivalent to require that the right radical be 0. When the radicals are 0, the associated linear maps L and R from V to V are one-one. Since dim V = dim V , it follows that L and R are onto V . Thus a nondegenerate bilinear form on V sets up two canonical isomorphisms of V with its dual V . For deﬁniteness let us work with the linear mapping L : V → V given by L(u)(v) = u, v). If U ⊆ V is a vector subspace, deﬁne U ⊥ = {u ∈ V | u, v = 0 for all v ∈ U }. It is apparent from the deﬁnitions that U ∩ U ⊥ = lrad · , · U ×U .

1. Bilinear Forms and Matrices

249

In contrast to the special case that K = R and the bilinear form is an inner product, U ∩ U ⊥ may be nonzero even if · , · is nondegenerate. For example let V = R2 , deﬁne .x y / 1 1 = x1 y1 − x2 y2 , x2 , y2 # x $ 1 . The and suppose that U is the 1-dimensional vector subspace U = x1 1 0 matrix of the bilinear form in the standard ordered basis is 0 −1 ; since the matrix is nonsingular, # y $ the bilinear form is nondegenerate. Direct calculation shows that 1 ⊥ = U , so that U ∩U ⊥ = 0. Nevertheless, in the nondegenerate case U = y1 the dimensions of U and U ⊥ behave as if U ⊥ were an orthogonal complement. The precise result is as follows. Proposition 6.3. If · , · is a nondegenerate bilinear form on the ﬁnitedimensional vector space V and if U is a vector subspace of V , then dim V = dim U + dim U ⊥ . PROOF. Deﬁne : V → U by (v)(u) = v, u for v ∈ V and u ∈ U . The deﬁnition of U ⊥ shows that ker = U ⊥ . To see that image = U , choose a vector subspace U1 of V with V = U ⊕ U1 , let u be in U , and deﬁne v in V by u on U, v = 0 on U1 . Since · , · is nondegenerate, the linear mapping L : V → V is onto V . Thus we can choose v ∈ V with L(v) = v . Then

(v)(u) = v, u = L(v)(u) = v (u) = u (u) for all u in U , and hence (v) = u . Therefore image = U , and we conclude that dim V = dim(ker ) + dim(image ) = dim U ⊥ + dim U = dim U ⊥ + dim U. Corollary 6.4. If · , · is a nondegenerate bilinear form on the ﬁnitedimensional vector space V and if U is a vector subspace of V , then V = U ⊕U ⊥ if and only if · , · U ×U is nondegenerate. PROOF. Corollary 2.29 and Proposition 6.3 together give dim(U + U ⊥ ) + dim(U ∩ U ⊥ ) = dim U + dim U ⊥ = dim V.

Thus U + U ⊥ = V if and only if U ∩ U ⊥ = 0, if and only if · , · U ×U is nondegenerate. The result therefore follows from Proposition 2.30.

250

VI. Multilinear Algebra

2. Symmetric Bilinear Forms We continue with the setting in which K is a ﬁeld and all vector spaces of interest are deﬁned over K and are ﬁnite-dimensional. A bilinear form · , · on V is said to be symmetric if u, v = v, u for all u and v in V , skew-symmetric if u, v = −v, u for all u and v in V , and alternating if u, u = 0 for all u in V . “Alternating” always implies “skew-symmetric.” In fact, if · , · is alternating, then 0 = u + v, u + v = u, u + u, v + v, u + v, v = u, v + v, u; thus · , · is skew-symmetric. If K has characteristic different from 2, then the converse is valid: “skew-symmetric” implies “alternating.” In fact, if · , · is skew-symmetric, then u, u = −u, u and hence 2u, u = 0; thus u, u = 0, and · , · is alternating. Let us examine further the effect of the characteristic of K. If, on the one hand, K has characteristic different from 2, the most general bilinear form · , · is the sum of the symmetric form · , · s and the alternating form · , · a given by u, vs = 12 (u, v + v, u), u, va = 12 (u, v − v, u). In this sense the symmetric and alternating bilinear forms are the extreme cases among all bilinear forms, and we shall study the two cases separately. If, on the other hand, K has characteristic 2, then “alternating” implies “skewsymmetric” but not conversely. “Alternating” is a serious restriction, and we shall be able to deal with it. However, “symmetric” and “skew-symmetric” are equivalent since 1 = −1, and thus neither condition is much of a restriction; we shall not attempt to say anything insightful in these cases. In this section we study symmetric bilinear forms, obtaining results when K has characteristic different from 2. From the symmetry it is apparent that the left and right radicals of a symmetric bilinear form are the same, and we call this vector subspace the radical of the form. By way of an example, here is a continuation of Example 1 from the previous section. EXAMPLE. Let V = Kn , let A be a symmetric n-by-n matrix (i.e., one with At = A), and let u, v = u t Av. The computation v, u = v t Au = (v t Au)t = u t At v = u t Av = u, v shows that the bilinear form · , · is symmetric; the second equality v t Au = (v th Au)t holds since v t Au is a 1-by-1 matrix. Again the example is completely general. In fact, if = (v1 , . . . , vn ) is an ordered basis of a vector space V and if · , · is a given symmetric bilinear form on V , then the matrix of the form has entries Ai j = vi , v j , and these evidently satisfy Ai j = A ji . So A is a symmetric matrix, and computations with the bilinear form are reduced to those used in the example.

2. Symmetric Bilinear Forms

251

Theorem 6.5 (Principal Axis Theorem). Suppose that K has characteristic different from 2. (a) If · , · is a symmetric bilinear form on a ﬁnite-dimensional vector space V , then there exists an ordered basis of V in which the matrix of · , · is diagonal. (b) If A is an n-by-n symmetric matrix, then there exists a nonsingular n-by-n matrix M such that M t AM is diagonal. REMARKS. Because computations with general symmetric bilinear forms reduce to computations in the special case of a symmetric matrix and because Proposition 6.1 tells the effect of a change of ordered basis, (a) and (b) amount to the same result; nevertheless, we give two proofs of Theorem 6.5—a proof via matrices and a proof via linear maps. A hint of the validity of the theorem comes from the case that K = R. For the ﬁeld R when the bilinear form is an inner product, the Spectral Theorem (Theorem 3.21) says that there is an orthonormal basis of eigenvectors and hence that (a) holds. When K = R, the same theorem says that there exists an orthogonal matrix M with M −1 AM diagonal; since any orthogonal matrix M satisﬁes M −1 = M t , the Spectral Theorem is saying that (b) holds. PROOF VIA MATRICES. If A is an n-by-n symmetric matrix, we seek a nonsingular M with M t AM diagonal. We induct on the size of A, the base case of the induction being n = 1, where there is nothing to prove. Assume the result to be known for size n − 1, and write the given n-by-n matrix A in block form as A = bat db with d of size 1-by-1. If d = 0, let x be the column vector −d −1 b. Then a b I 0 I x = ∗0 d0 , bt d xt 1 0 1 and the induction goes through. If d = 0, we argue in a different way. We may assume that b = 0 since otherwise the result is immediate by induction. Say bi = 0 with 1 ≤ i ≤ n − 1. Let y be an (n − 1)-dimensional row vector with i th entry a member δ of K to be speciﬁed and with other entries 0. Then

I 0 y 1

a b

I yt

bt 0

0 1

=

∗

∗

∗ yay t +bt y t +yb

=

∗

∗

∗ δ 2 aii +2δbi

.

Since K has characteristic different from 2, 2bi is not 0; thus there is some value of δ for which δ 2 aii + 2δbi = 0. Then we are reduced to the case d = 0, which we have already handled, and the induction goes through. PROOF VIA LINEAR MAPS. We may assume that the given symmetric bilinear form is not identically 0, since otherwise any basis will do. Let the radical of the form be denoted by rad = rad · , · . Choose a vector subspace S of V such that V = rad ⊕S, and put [ · , · ] = · , · S×S . Then [ · , · ] is a symmetric

252

VI. Multilinear Algebra

bilinear form on S, and it is nondegenerate. In fact, [u, · ] = 0 means u, v = 0 for all v ∈ S; since u, v = 0 for v in rad anyway, u, v = 0 for all v ∈ V , u is in rad as well as S, and u = 0. Since · , · is not identically 0, the subspace S is not 0. Thus the nondegenerate symmetric bilinear form [ · , · ] on S is not 0. Since [u, v] = 12 [u + v, u + v] − [u, u] − [v, v] , it follows that [v, v] = 0 for some v in S. Put U1 = Kv. Then [ · , · ]U1 ×U1 is nondegenerate, and Corollary 6.4 implies that S = U1 ⊕ U1⊥ . Applying the ⊥ converse direction of the same corollary to U1 , we see that [ · , · ] U ⊥ ×U ⊥ is 1

1

nondegenerate. Repeating this construction with U ⊥ and iterating, we obtain V = rad ⊕U1 ⊕ · · · ⊕ Uk

with Ui , U j = 0 for i = j and with dim Ui = 1 for all i. This completes the proof. Theorem 6.5 fails in characteristic 2. Problem 2 at the end of the chapter illustrates the failure. Let us examine the matrix version of Theorem 6.5 more closely when K is C or R. The theorem says that if A is n-by-n symmetric, then we can ﬁnd a nonsingular M with B = M t AM diagonal. Taking D diagonal and forming C = D t B D, we see that we can adjust the diagonal entries of B by arbitrary nonzero squares. Over C, we can therefore arrange that C is of the form diag(1, . . . , 1, 0, . . . , 0). The number of 1’s equals the rank, and this has to be the same as the rank of the given matrix A. The form is nondegenerate if and only if there are no 0’s. Thus we understand everything about the diagonal form. Over R, matters are more subtle. We can arrange that C is of the form diag(±1, . . . , ±1, 0, . . . , 0), the various signs ostensibly not being correlated. Replacing C by P t C P with P a permutation matrix, we may assume that our diagonal matrix is of the form diag(+1, . . . , +1, −1, . . . , −1, 0, . . . , 0). The number of +1’s and −1’s together is again the rank of A, and the form is nondegenerate if and only if there are no 0’s. But what about the separate numbers of +1’s and −1’s? The triple given by ( p, m, z) = #(+1)’s, #(−1)’s, #(0)’s is called the signature of A when K = R. A similar notion can be deﬁned in the case of a symmetric bilinear form over R. Theorem 6.6 (Sylvester’s Law). The signature of an n-by-n symmetric matrix over R is well deﬁned.

3. Alternating Bilinear Forms

253

PROOF. The integer p + m is the rank, which does not change under a transformation A → M t AM if M is nonsingular. Thus we may take z as known. Let ( p , m , z) and ( p, m, z) be two signatures for a symmetric matrix A, with p ≤ p. Deﬁne the corresponding symmetric bilinear form on Rn by u, v = u t Av. Let (v1 , . . . , vn ) and (v1 , . . . , vn ) be ordered bases of Rn diagonalizing the bilinear form and exhibiting the resulting signature, i.e., having vi , v j = vi , v j = 0 for i = j and having ⎧ ⎨ +1 v j , v j = −1 ⎩ 0 ⎧ ⎨ +1 v j , v j = −1 ⎩ 0

for 1 ≤ j ≤ p , for p + 1 ≤ j ≤ n − z, for n − z + 1 ≤ j ≤ n, for 1 ≤ j ≤ p, for p + 1 ≤ j ≤ n − z, for n − z + 1 ≤ j ≤ n.

We shall prove that {v1 , . . . , v p , v p +1 , . . . , vn } is linearly independent, and then we must have p ≥ p. Reversing the roles of p and p , we see that p = p and m = m, and the theorem is proved. Thus suppose we have a linear dependence: a1 v1 + · · · + a p v p = b p +1 v p +1 + · · · + bn vn . Let v be the common value of the two sides of this equation. Then v, v = a1 v1 + · · · + a p v p , a1 v1 + · · · + a p v p =

p

a j2 ≥ 0

j=1

and v, v = b p +1 v p +1 + · · · + bn vn , b p +1 v p +1 + · · · + bn vn = −

n−z j= p +1

b2j ≤ 0.

p We conclude that v, v = 0, j=1 a j2 = 0, and a1 = · · · = a p = 0. Thus v = 0 and b p +1 v p +1 + · · · + bn vn = 0. Since {v p +1 , . . . , vn } is linearly independent, we obtain also b p +1 = · · · = bn = 0. Therefore {v1 , . . . , v p , v p +1 , . . . , vn } is a linearly independent set, and the proof is complete. 3. Alternating Bilinear Forms We continue with the setting in which K is a ﬁeld and all vector spaces of interest are deﬁned over K and are ﬁnite-dimensional.

VI. Multilinear Algebra

254

In this section we study alternating bilinear forms, imposing no restriction on the characteristic of K. From the skew symmetry of any alternating bilinear form it is apparent that the left and right radicals of such a form are the same, and we call this vector subspace the radical of the form. First let us consider examples given in terms of matrices. Temporarily let us separate matters according to the characteristic. EXAMPLE 1 OF SECTION 1 WITH K OF CHARACTERISTIC = 2. Let V = Kn , let A be a skew-symmetric n-by-n matrix (i.e., one with At = −A), and let u, v = u t Av. The computation v, u = v t Au = (v t Au)t = u t At v = −u t Av = −u, v shows that the bilinear form · , · is skew-symmetric, hence alternating. EXAMPLE 1 OF SECTION 1 WITH K OF CHARACTERISTIC = 2. Let V = Kn , let A be an n-by-n matrix, and deﬁne u, v = u t Av. We suppose that A is skewsymmetric; it is the same to assume that A is symmetric since the characteristic is 2. In order to have ei , ei = 0 for each standard basis vector, we shall 0 for all i. If u is a column vector with entries u 1 , . . . , u n , then assume that Aii = u, u = u t Au = i, j u i Ai j u j = i = j u i Ai j u j = i< j (Ai j u i u j + A ji u i u j ) = i< j 2Ai j u i u j = 0. Hence the bilinear form · , · is alternating. Again the examples are completely general. In fact, if = (v1 , . . . , vn ) is an ordered basis of a vector space V and if · , · is a given alternating bilinear form, then the matrix of the form has entries Ai j = vi , v j that evidently satisfy Ai j = −A ji and Aii = 0. So A is a skew-symmetric matrix with 0’s on the diagonal, and computations with the bilinear form are reduced to those used in the examples. To keep the terminology parallel, let us say that a square matrix is alternating if it is skew-symmetric and has 0’s on the diagonal. Theorem 6.7. (a) If · , · is an alternating bilinear form on a ﬁnite-dimensional vector space V , then there exists an ordered basis of V in which the matrix of · , · has the form ⎞ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

01 −1 0

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

01 −1 0

..

. 01 −1 0 0

..

. 0

4. Hermitian Forms

255

If · , · is nondegenerate, then dim V is even. (b) If A is an n-by-n alternating matrix, then there exists a nonsingular n-by-n matrix M such that M t AM is as in (a). PROOF. It is enough to prove (a). Let rad be the radical of the given form · , · , and choose a vector subspace S of V with V = rad ⊕S. The restriction of · , · to S is then alternating and nondegenerate. We may now proceed by induction on dim V under the assumption that · , · is nondegenerate. For dim V = 1, the form is degenerate. For dim V = 2, we can ﬁnd u and v with u, v = 0, and we can normalize one of the vectors to make u, v = 1. Then (u, v) is the required ordered basis. Assuming the result in the nondegenerate case for dimension < n, suppose that dim V = n. Again choose u and v with u, v = 1, and deﬁne U = Ku ⊕ Kv. 01 Then · , · U ×U has matrix −1 0 and is nondegenerate. By Corollary 6.4,

⊥ that V = U ⊕ U , and an application of the converse of the corollary shows · , · U ⊥ ×U ⊥ is nondegenerate. The induction hypothesis applies to U ⊥ , and we obtain the desired matrix for the given form.

4. Hermitian Forms In this section the ﬁeld will be C, and V will be a ﬁnite-dimensional vector space over C. A sesquilinear form · , · on V is a function from V × V into C that is linear in the ﬁrst variable and conjugate linear in the second.1 Sesquilinear forms do not make sense for general ﬁelds because of the absence of a universal analog of complex conjugation, and we shall consequently work only with the ﬁeld C in this section.2 A sesquilinear form · , · is Hermitian if u, v = v, u for all u and v in V . The form is skew-Hermitian if instead u, v = −v, u for all u and v in V . Hermitian and skew-Hermitian forms are the extreme types of sesquilinear forms since any sesquilinear form · , · is the sum of a Hermitian form · , · h and a skew-Hermitian form · , · sh given by u, vh = 12 (u, v + v, u), u, vsh = 12 (u, v − v, u). 1 Some authors, particularly in mathematical physics, reverse the roles of the two variables and assume the conjugate linearity in the ﬁrst variable instead of the second. √

2 Sesquilinear forms make sense in number ﬁelds like Q 2 that have an automorphism of order 2 (see Section IV.1), but sesquilinear forms in this kind of setting will not concern us here.

256

VI. Multilinear Algebra

In addition, any skew-Hermitian form becomes a Hermitian form simply by multiplying by i. Speciﬁcally if · , · sh is skew-Hermitian, then i · , · sh is sesquilinear and Hermitian, as is readily checked. Consequently the study of skew-Hermitian forms immediately reduces to the study of Hermitian forms. EXAMPLE. Let V = Cn , and let A be a Hermitian matrix, i.e., one with A = A, where A∗ is the conjugate transpose of A. Then it is a simple matter to check that u, v = v ∗ Au deﬁnes a Hermitian form on Cn . ∗

Again the example with a matrix is completely general. In fact, let · , · be a Hermitian form on V , let = (v1 , . . . , vn ) be an ordered basis of V , and deﬁne ¯ where v¯ is the Ai j = vi , v j . Then A is a Hermitian matrix, and u, v = u t Av, entry-by-entry complex conjugate of v. If = (w1 , . . . , wn ) is a second ordered basis, then the formula for changing basis may

be derived as follows: Write w j = i ci j vi , so that [ci j ] is the matrix I . If Bi j = wi , w j , then Bi j = wi , w j = kl cki vk , vl c¯l j , and hence

B=

I

t

A

I .

Thus two Hermitian matrices A and B represent the same Hermitian form in different bases if and only if B = M ∗ AM for some nonsingular matrix M. Proposition 6.8. (a) If · , · is a Hermitian form on a ﬁnite-dimensional vector space V over C, then there exists an ordered basis of V in which the matrix of · , · is diagonal with real entries. (b) If A is an n-by-n Hermitian matrix, then there exists a nonsingular n-by-n matrix M such that M ∗ AM is diagonal. PROOF. The above considerations show that (a) and (b) are reformulations of the same result. Hence it is enough to prove (b). By the Spectral Theorem (Theorem 3.21), there exists a unitary matrix U such that U −1 AU is diagonal with real entries. Since U is unitary, U −1 = U ∗ . Thus we can take M = U to prove (b). Just as with symmetric bilinear forms over R, we can do a little better than Proposition 6.8 indicates. If B is Hermitian and diagonal with diagonal entries bi , and if D is diagonal with positive entries di , then C = D ∗ B D is diagonal with diagonal entries di2 bi . Choosing D suitably and then replacing C by P t C P for a suitable permutation matrix P, we may assume that P t C P is of the

5. Groups Leaving a Bilinear Form Invariant

257

form diag(+1, . . . , +1, −1, . . . , −1, 0, . . . , 0). The number of +1’s and −1’s together is the rank of A, and the form is nondegenerate if and only if there are no 0’s. The triple given by ( p, m, z) = #(+1)’s, #(−1)’s, #(0)’s is again called the signature of A. A similar notion can be deﬁned in the case of a Hermitian form, as opposed to a Hermitian matrix. Theorem 6.9 (Sylvester’s Law). The signature of an n-by-n Hermitian matrix is well deﬁned. The proof is the same as for Theorem 6.6 except for adjustments in notation. 5. Groups Leaving a Bilinear Form Invariant Although it is not logically necessary to do so, we digress in this section to introduce some important groups that are deﬁned by means of bilinear or Hermitian forms. These groups arise in many areas of mathematics, both pure and applied, and their detailed structure constitutes a topic in the ﬁelds of Lie groups, algebraic groups, and ﬁnite groups that is beyond the scope of this book. Thus the best place to deﬁne them seems to be now. We limit our comments on applications to just these: When the underlying ﬁeld in the deﬁnition of these groups is R or C, the group is quite often a “simple Lie group,” one of the basic building blocks of the theory of the continuous groups that so often arise in topology, geometry, differential equations, and mathematical physics. When the underlying ﬁeld is a number ﬁeld in the sense of Example 9 of Section IV.1, the group quite often plays a role in algebraic number theory. When the underlying ﬁeld is a ﬁnite ﬁeld, the group is often closely related to a ﬁnite simple group; an example of this relationship occurred in Problems 55–62 at the end of Chapter IV, where it was shown that the group PSL(2, K), built in an easy way from the general linear group GL(2, K), is simple if the ﬁeld K has more than 5 elements. More general examples of ﬁnite simple groups produced by analogous constructions are said to be of “Lie type.” A celebrated theorem of the late twentieth century classiﬁed the ﬁnite simple groups—establishing that the only such groups are the cyclic groups of prime order, the alternating groups on 5 or more letters, the simple groups of Lie type, and 26 so-called sporadic simple groups. If · , · is a bilinear form on an n-dimensional vector space V over a ﬁeld K, a nonsingular linear map g : V → V is said to leave the bilinear form invariant if g(u), g(v) = u, v

258

VI. Multilinear Algebra

for all u and v in V . Fix an ordered

basis of V , let A be the matrix of the bilinear g form in this basis, let g = be the member of GL(n, K) corresponding

w to g, and abbreviate as w for any w in V . To translate the invariance condition into one concerning matrices, we use the formula u, v = u t Av , the corresponding formula for g(u), g(v), and the formula g(w) = g (w ) from Theorem 2.14. Then we obtain u t g t Ag v = u t Av . Taking u to be the i th member of the ordered basis and v to be the j th member, we obtain equality of the (i, j)th entry of the two matrices g t Ag and A. Thus the matrix form of the invariance condition is that a nonsingular matrix g satisfy g t Ag = A. We know that changing the ordered basis amounts to replacing A by M t AM for some nonsingular matrix M. If g satisﬁes the invariance condition g t Ag = A relative to A, then M −1 g M satisﬁes (M −1 g M)t (M t AM)(M −1 g M) = M t AM. Thus we are led to a conjugate subgroup within GL(n, K). A conjugate subgroup is not something substantially new, and thus we might as well make a convenient choice of basis so that A looks particularly special. The interesting cases are that the given bilinear form is symmetric or alternating, hence that the matrix A is symmetric or alternating. Let us restrict our attention to them. The left and right radicals coincide in these cases, and the ﬁrst thing to do is to take the two-sided radical into account. Returning to the original bilinear form, we write V = rad ⊕S, where rad is the radical and S is some vector subspace of S, and we choose an ordered basis (v1 , . . . , v p , v p+1 , . . . , vn ) such that v1 , . . . , v p are in S and v p+1 , . . . , vn are in rad. Then vi , v j = 0 if i > p or j > p, and consequently A has its only nonzero entries in the upper left p-by- p block. The same argument as in the proofs of Theorems 6.5 and 6.7 shows that the restriction of the bilinear form to S is nondegenerate, and consequently the upper left p-by- p block of A is nonsingular. Changingnotation g11 g12 slightly, suppose that g is an n-by-n matrix written in block form as g = g21 g22 with g11 of size p-by- p, suppose that A0 00 is another matrix written in the same block form, suppose that the p-by- p matrix A is nonsingular, and suppose that g t A0 00 g = A0 00 . Making a brief computation, we ﬁnd that necessary and t Ag11 = A, sufﬁcient conditions on g are that g11 be nonsingular and have g11 that g12 = 0, that g22 be arbitrary nonsingular, and that g21 be arbitrary. In other

5. Groups Leaving a Bilinear Form Invariant

259

t words, the only interesting condition g11 Ag11 = A is a reﬂection of what happens in the nonsingular case. Consequently the interesting cases are that the given bilinear form is nondegenerate, as well as either symmetric or alternating. If A is symmetric and nonsingular, then the group of all nonsingular matrices g such that g t Ag = A is called the orthogonal group relative to A. If A is alternating and nonsingular, then the group of all nonsingular matrices g such that g t Ag = A is called the symplectic group relative to A. For the symplectic case it is customary to invoke Theorem 6.7 and take A to be ⎞ ⎛

⎜ ⎜ ⎜ ⎜ J =⎜ ⎜ ⎜ ⎝

01 −1 0

01 −1 0

..

. 01 −1 0

⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠

except possibly for a permutation of the rows and columns and possibly for a multiplication by −1. Two conﬂicting notations are in common use for the symplectic group, namely Sp(n, K) and Sp( 12 n, K), and one always has to check a particular author’s deﬁnitions. For the orthogonal case the notation is less standardized. Theorem 6.5 says that we may take A to be diagonal except when K has characteristic 2. But the theorem does not tell us exactly which A’s are representative of the same bilinear form. When K = C, we know that we can take A to be the identity matrix I . The group is known as the complex orthogonal group and is denoted by O(n, C). When K = R, we can take A to be diagonal with diagonal entries ±1. Sylvester’s Law (Theorem 6.6) says that the form determines the number of +1’s and the number of −1’s. The groups are called indeﬁnite orthogonal groups and are denoted by O( p, q), where p is the number of +1’s and q is the number of −1’s. When q = 0, we obtain the ordinary orthogonal group of matrices relative to an inner product. A similar analysis applies to Hermitian forms. The ﬁeld is now C, the invariance condition with the form is still g(u), g(v) = u, v, and the corresponding condition with matrices is g t A g¯ = A. The interesting case is that the Hermitian form is nondegenerate. Proposition 6.8 and Sylvester’s Law (Theorem 6.9) together show that we may take A to be diagonal with diagonal entries ±1 and that the Hermitian form determines the number of +1’s and the number of −1’s. The groups are the indeﬁnite unitary groups and are denoted by U( p, q), where p is the number of +1’s and q is the number of −1’s. When q = 0, we obtain the ordinary unitary group of matrices relative to an inner product.

260

VI. Multilinear Algebra

6. Tensor Product of Two Vector Spaces If E is a vector space over K, then the set of all bilinear forms on E is a vector space under addition and scalar multiplication of the values, i.e., it is a vector subspace of the set of all functions from E × E into K. In this section we introduce a vector space called the “tensor product” of E with itself, whose dual, even if E is inﬁnite-dimensional, is canonically isomorphic to this vector space of bilinear forms. Matters will be clearer if we work initially with something slightly more general than bilinear forms on a single vector space E. Thus ﬁx a ﬁeld K, and let E and F be vector spaces over K. A function from E × F into a vector space U over K is said to be bilinear if it is linear in each of the two variables when the other one is held ﬁxed. Such a space of bilinear functions is a vector space over K under addition and scalar multiplication of the values. The bilinear functions are called bilinear forms when the range space U is K itself. More generally, if E 1 , . . . , E k are vector spaces over K, a function from E 1 × · · · × E k into a vector space over K is said to be k-linear or k-multilinear if it is linear in each of its k variables when the other k − 1 variables are held ﬁxed. Again the word “form” is used in the scalar-valued case, and all of these spaces of multilinear functions are vector spaces over K. In this section we shall introduce the tensor product of two vector spaces E and F over K, ultimately denoting it by E ⊗K F. The dual of this tensor product will be canonically isomorphic to the vector space of bilinear forms on E × F. More generally the space of linear functions from the tensor product into a vector space U will be canonically isomorphic to the vector space of bilinear functions on E × F with values in U . Following the habit encouraged by Chapter IV, we want to arrange that tensor product is a functor. If V denotes the category of vector spaces over K and if V × V denotes the category described in Section IV.11 as V S for a two-element set S, then tensor product is to be a functor from V × V into V. Hence we will want to examine the effect of tensor products on morphisms, i.e., on linear maps. As in similar constructions in Chapter IV, the effect of tensor product on linear maps is captured by deﬁning the tensor product by means of a universal mapping property. The appropriate universal mapping property rephrases the statement above that the space of linear functions from the tensor product into any vector space U is canonically isomorphic to the vector space of bilinear functions on E × F with values in U . If E and F are vector spaces over K, a tensor product of E and F is a pair (V, ι) consisting of a vector space V over K together with a bilinear function ι : E × F → V , with the following universal mapping property: whenever b is a bilinear mapping of E ×F into a vector space U over K, then there exists a unique

6. Tensor Product of Two Vector Spaces

261

linear mapping B of V into U such that the diagram in Figure 6.1 commutes, i.e., such that Bι = b holds in the diagram. When ι is understood, one frequently refers to V itself as the tensor product. The linear mapping B : V → U is called the linear extension of b to the tensor product. b

E × F −−−→ U ⏐ ⏐ ι B V FIGURE 6.1. Universal mapping property of a tensor product. Theorem 6.10. If E and F are vector spaces over K, then a tensor product of E and F exists and is unique up to canonical isomorphism in this sense: if (V1 , ι1 ) and (V2 , ι2 ) are tensor products, then there exists a unique linear mapping B : V2 → V1 with Bι2 = ι1 , and B is an isomorphism. Any tensor product is spanned linearly by the image of E × F in it. REMARKS. As usual, uniqueness will follow readily from the universal mapping property. What is really needed is a proof of existence. This will be carried out by an explicit construction. Later, in Chapter X, we shall reintroduce tensor products, taking the basic construction to be that of the tensor product of two abelian groups, and then the tensor product of two vector spaces will in effect be obtained in a slightly different way. However, the exact construction does not matter, only the existence; the uniqueness allows us to match the results of any two constructions. ι2

E × F −−−→ V2 ⏐ ⏐ B2 ι1 V1

ι1

and

E × F −−−→ V1 ⏐ ⏐ B1 ι2 V2

FIGURE 6.2. Diagrams for uniqueness of a tensor product. PROOF OF UNIQUENESS. Let (V1 , ι1 ) and (V2 , ι2 ) be tensor products. Set up the diagrams in Figure 6.2, and use the universal mapping property to obtain linear maps B2 : V1 → V2 and B1 : V2 → V1 extending ι2 and ι1 . Then B1 B2 : V1 → V1 has B1 B2 ι1 = B1 ι2 = ι1 , and 1V1 : V1 → V1 has (1V1 )ι1 = ι1 . By the assumed uniqueness within the universal mapping property, B1 B2 = 1V1 on V1 . Similarly B2 B1 = 1V2 on V2 . Then B1 : V2 → V1 gives the canonical isomorphism. Because of the isomorphism the image of E × F will span an arbitrary tensor product if it spans some particular tensor product.

VI. Multilinear Algebra

262

PROOF OF EXISTENCE. Let V1 = (e, f ) K(e, f ), the direct sum being taken over all ordered pairs (e, f ) with E ∈ E and f ∈ F. Then V1 is a vector space over K with a basis consisting of all ordered pairs (e, f ). We think of all identities that the elements of V1 must satisfy to be a tensor product, writing each as some expression set equal to 0, and then we assemble those expressions into a vector subspace to factor out from V1 . Namely, let V0 be the vector subspace of V1 generated by all elements of any of the kinds (e1 + e2 , f ) − (e1 , f ) − (e2 , f ), (ce, f ) − c(e, f ), (e, f 1 + f 2 ) − (e, f 1 ) − (e, f 2 ), (e, c f ) − c(e, f ), the understanding being that c is in K, the elements e, e1 , e2 are in E, and the elements f, f 1 , f 2 are in F. Deﬁne V = V1 /V0 , and deﬁne ι : E × F → V1 /V0 by ι(e, f ) = (e, f ) + V0 . We shall prove that (V, ι) is a tensor product of E and F. The deﬁnitions show that the image of ι spans V linearly. Let b : E × F → U be given as in Figure 6.1. To see that a linear extension B exists and is unique, deﬁne B1 on V1 by B1

ci (ei , f i ) = ci b(ei , f i ).

(ﬁnite)

(ﬁnite)

The bilinearity of b shows that B1 maps V0 to 0. By Proposition 2.25, B1 descends to a linear map B : V1 /V0 → U , and we have Bι = b. Hence B exists as required. To check uniqueness of B, we observe again that the cosets (e, f ) + V0 within V1 /V0 span V ; since commutativity of the diagram in Figure 6.1 forces B((e, f ) + V0 ) = B(ι(e, f )) = b(e, f ), B is unique. This completes the proof.

A tensor product of E and F is denoted by (E ⊗K F, ι), with the bilinear map ι given by ι(e, f ) = e ⊗ f ; the map ι is frequently dropped from the notation when there is no chance of ambiguity. The tensor product that was constructed in the proof of existence in Theorem 6.10 is not given any special notation to distinguish it from any other tensor product. The elements e ⊗ f span E ⊗K F, as was noted in the statement of the theorem. Elements of the form e ⊗ f are sometimes called pure tensors. Not every element need be a pure tensor, but every element in E ⊗K F is a ﬁnite sum of pure tensors. We shall see in Proposition 6.14 that if {u i } is a basis

6. Tensor Product of Two Vector Spaces

263

of E and {v j } is a basis of F, then the pure tensors u i ⊗ v j form a basis of E ⊗K F. In particular the dimension of the tensor product is the product of the dimensions of the factors. We could have deﬁned the tensor product in this way—by taking bases and declaring that u i ⊗ v j is to be a basis of the desired space. The difﬁculty is that we would be forever wedded to our choice of those particular bases, or we would constantly have to prove that our deﬁnitions are independent of bases. The deﬁnition by means of Theorem 6.10 avoids this difﬁculty. To make tensor product (E, F) → E ⊗K F into a functor, we have to describe the effect on linear mappings. To aid in that discussion, let us reintroduce some notation ﬁrst used in Chapter II: if U and V are vector spaces over K, then HomK (U, V ) is deﬁned to be the vector space of K linear maps from U to V . Corollary 6.11. If E, F, and V are vector spaces over K, then the vector space HomK (E ⊗K F, V ) is canonically isomorphic (via restriction to pure tensors) to the vector space of all V -valued bilinear functions on E × F. PROOF. Restriction is a linear mapping from HomK (E ⊗K F, V ) to the vector space of all V -valued bilinear functions on E × F, and it is one-one since the image of E × F in E ⊗K F spans E ⊗K F. It is onto since any bilinear function from E × F to V has a linear extension to E ⊗K F, by Theorem 6.10. Corollary 6.12. If E and F are vector spaces over K, then the vector space of all bilinear forms on E × F is canonically isomorphic to (E ⊗K F) , the dual of the vector space E ⊗K F. PROOF. This is the special case of Corollary 6.11 in which V = K.

Corollary 6.13. If E, F, and V are vector spaces over K, then there is a canonical K linear isomorphism of left side to right side in HomK (E ⊗K F, V ) ∼ = HomK (E, HomK (F, V )) such that (ϕ)(e)( f ) = ϕ(e ⊗ f ) for all ϕ ∈ HomK (E ⊗K F, V ), e ∈ E, and f ∈ F. REMARK. This result is just a restatement of Corollary 6.11, but let us prove it anyway, writing the proof in the language of the statement. PROOF. The map is well deﬁned and K linear, and it carries the left side to the right side. For ψ in the right side, deﬁne (ψ)(e, f ) = ψ(e)( f ). Then (ψ) 3 is a bilinear map from E × F into V , and we let (ψ) be the linear extension 3 is a two-sided inverse to from E ⊗K F into V given in Theorem 6.10. Then , and the corollary follows.

264

VI. Multilinear Algebra

Let us now make (E, F) → E ⊗K F into a covariant functor. If (E 1 , F1 ) and (E 2 , F2 ) are objects in V × V, i.e., if they are two ordered pairs of vector spaces, then a morphism from the ﬁrst to the second is a pair (L , M) of linear maps of the form L : E 1 → E 2 and M : F1 → F2 . To (L , M), we are to associate a linear map from E 1 ⊗K F1 into E 2 ⊗K F2 ; this linear map will be denoted by L ⊗ M. We use Corollary 6.11 to deﬁne L ⊗ M as the member of HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ) that corresponds under restriction to the bilinear map (e1 , f 1 ) → L(e1 ) ⊗ M( f 1 ) of E 1 × F1 into E 2 ⊗K F2 . In terms of pure tensors, the map L ⊗ M satisﬁes (L ⊗ M)(e1 ⊗ f 1 ) = L(e1 ) ⊗ M( f 1 ), and this formula completely determines L ⊗ M because of the uniqueness of linear extensions of bilinear maps. To check that this deﬁnition of the effect of tensor product on pairs of linear maps makes (E, F) → E ⊗K F into a covariant functor, we have to check the effect on the identity map and the effect on composition. For the effect on the identity map (1 E1 , 1 F1 ) when E 1 = E 2 and F1 = F2 , we see from the above displayed formula that (1 E1 ⊗ 1 F1 )(e1 ⊗ f 1 ) = 1 E1 (e1 ) ⊗ 1 F1 ( f 1 ) = e1 ⊗ f 1 = 1 E1 ⊗K F1 (e1 ⊗ f 1 ). Since elements of the form e1 ⊗ f 1 span E 1 ⊗K F1 , we conclude that 1 E1 ⊗ 1 F1 = 1 E1 ⊗K F1 . For the effect on composition, let (L 1 , M1 ) : (E 1 , F1 ) → (E 2 , F2 ) and (L 2 , M2 ) : (E 2 , F2 ) → (E 3 , F3 ) be given. Then we have (L 2 ⊗ M2 )(L 1 ⊗ M1 )(e1 ⊗ f 1 ) = (L 2 ⊗ M2 )(L 1 (e1 ) ⊗ M1 ( f 1 )) = (L 2 L 1 )(e1 ) ⊗ (M2 M1 )( f 1 ) = (L 2 L 1 ⊗ M2 M1 )(e1 ⊗ f 1 ). Since elements of the form e1 ⊗ f 1 span E 1 ⊗K F1 , we conclude that (L 2 ⊗ M2 )(L 1 ⊗ M1 ) = L 2 L 1 ⊗ M2 M1 . Therefore (E, F) → E ⊗K F is a covariant functor. In particular, E → E ⊗K F and F → E ⊗K F are covariant functors from V into itself. For these two functors from V into itself, the effect on linear mappings is especially nice, namely that is K linear from HomK (E 1 , E 2 ) L 1 → L 1 ⊗ M1 into HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ), is K linear from HomK (F1 , F2 ) M1 → L 1 ⊗ M1 into HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ). To prove the ﬁrst of these assertions, for example, we observe that the sum of the linear extensions of (e1 , f 1 ) → L 1 (e1 ) ⊗ M1 ( f 1 )

and

(e1 , f 1 ) → L 1 (e1 ) ⊗ M1 ( f 1 )

6. Tensor Product of Two Vector Spaces

265

is a linear extension of (e1 , f 1 ) → (L 1 + L 1 )(e1 )⊗ M1 ( f 1 ), and the uniqueness in the universal mapping property implies that (L 1 +L 1 )⊗M1 = L 1 ⊗M1 +L 1 ⊗M1 . Similar remarks apply to multiplication by scalars. Let us mention some identities satisﬁed by ⊗K . There is a canonical isomorphism ∼ F ⊗K E E ⊗K F = given by taking the linear extension of (e, f ) → f ⊗ e as the map from left to right. The linear extension of ( f, e) → e ⊗ f gives a two-sided inverse. Category theory has a way of capturing the idea that this isomorphism is systematic, rather than randomly dependent on E and F. The two sides of the above isomorphism may be regarded as the values of the covariant functors (E, F) → E ⊗K F and (E, F) → F ⊗K E. The notion in category theory capturing “systematic” is called “naturality.” It makes precise the fact that the system of isomorphisms respects linear maps, as well as the vector spaces. Here is the general deﬁnition. Its usefulness will be examined later in this section. Let C and D be two categories, and let : C → D and : C → D be covariant functors. Suppose that for each X in Obj(C ), a morphism TX in MorphD ((X ), (X )) is given. Then the system {TX } is called a natural transformation of into if for each pair of objects X 1 and X 2 in C and each h in MorphC (X 1 , X 2 ), the diagram in Figure 6.3 commutes. If furthermore each TX is an isomorphism, then it is immediate that the system {TX−1 } is a natural transformation of into , and we say that {TX } is a natural isomorphism. (h)

(X 1 ) −−−→ (X 2 ) ⏐ ⏐ ⏐ ⏐T TX 1 X2 (h)

(X 1 ) −−−→ (X 2 ) FIGURE 6.3. Commutative diagram of a natural transformation {TX }. If and are contravariant functors, then the system {TX } is called a natural transformation of into if the diagram obtained from Figure 6.3 by reversing the horizontal arrows commutes. The system is a natural isomorphism if furthermore each Tx is an isomorphism. In the case we are studying, we have C = V × V and D = V. Objects X in C are pairs (E, F) of vector spaces, and and are the covariant functors with (E, F) = E ⊗K F and (E, F) = F ⊗K E. The mapping T(E,F) : E ⊗K F → F ⊗K E is uniquely determined by the condition that T(E,F) (e ⊗ f ) = f ⊗ e for all e ∈ E and f ∈ F. A morphism of pairs from (E 1 , F1 ) to (E 2 , F2 ) is of

VI. Multilinear Algebra

266

the form h = (L , M) with L ∈ HomK (E 1 , E 2 ) and M ∈ HomK (F1 , F2 ). Our constructions above show that (L , M) = L ⊗ M ∈ HomK (E 1 ⊗K F1 , E 2 ⊗K F2 ) and

(L , M) = M ⊗ L ∈ HomK (F1 ⊗K E 1 , F2 ⊗K E 2 ).

In Figure 6.3 the two routes from top left to bottom right in the diagram have T(E2 ,F2 ) (L , M)(e1 ⊗ f 1 ) = T(E2 ,F2 ) (L ⊗ M)(e1 ⊗ f 1 ) = T(E2 ,F2 ) (L(e1 ) ⊗ M( f 1 )) = M( f 1 ) ⊗ L(e1 ) and (L , M)T(E1 ,F1 ) (e1 ⊗ f 1 ) = (L , M)( f 1 ⊗ e1 ) = (M ⊗ L)( f 1 ⊗ e1 ) = M( f 1 ) ⊗ L(e1 ). The results are equal, and therefore the diagram commutes. Consequently the isomorphism E ⊗K F ∼ = F ⊗K E is natural in the pair (E, F). Another canonical isomorphism of interest is E ⊗K K ∼ = E. Here the map from left to right is the linear extension of (e, c) → ce, while the map from right to left is e → e ⊗ 1. In view of the previous canonical isomorphism, we have K ⊗K E ∼ = E also. Each of these isomorphisms is natural in E. Next let us consider how ⊗K interacts with direct sums. The result is that tensor product distributes over direct sums, even inﬁnite direct sums: Fs ∼ (E ⊗K Fs ). E ⊗K = s∈S

s∈S

The map from left to right is the linear extension of the bilinear map (e, { f s }s∈S ) → {e ⊗ f s }s∈S . For the deﬁnition of the inverse, the constructions of Section II.6 , where it is the linear show that we have only to deﬁne the map on each E ⊗K Fs extension of (e, f s ) → e ⊗ {i s ( f s ))}s∈S ; here i s0 : Fs0 → s Fs is the one-one linear map carrying the s0th vector space into the direct sum. Once again it is possible to prove that the isomorphism is natural; we omit the details. It follows from the displayed isomorphism and the isomorphism E ⊗K K ∼ =E that if {xi } is a basis of E and {yj } is a basis of F, then {xi ⊗ yj } is a basis of E ⊗K F. This proves the following result.

6. Tensor Product of Two Vector Spaces

267

Proposition 6.14. If E and F are vector spaces over K, then dim(E ⊗K F) = (dim E)(dim F). If {yj } is a basis of F, then the most general member of E ⊗K F is of the form j e j ⊗ y j with all e j in E. We turn to a consideration of HomK from the point of view of functors. In the examples in Section IV.11, we saw that V → HomK (U, V ) is a covariant functor from V to itself and that U → HomK (U, V ) is a contravariant functor from V to itself. If we are not squeamish about mixing the two types—covariant and contravariant—then we can consider (U, V ) → HomK (U, V ) as a functor3 from V × V to V. At any rate if L is in HomK (U1 , U2 ) and M is in HomK (V1 , V2 ), then Hom(L , M) carries HomK (U2 , V1 ) into HomK (U1 , V2 ) and is given by Hom(L , M)(h) = Mh L

for h ∈ HomK (U2 , V1 ).

It is evident that the result is K linear as a function of h, and hence Hom(L , M) is in HomK HomK (U2 , V1 ), HomK (U1 , V2 ) . When we look for analogs for the functor HomK of the identity E ⊗K K ∼ =E for the functor ⊗K , we are led to two identities. One is just the deﬁnition of the dual of a vector space: HomK (U, K) = U . The other is the natural isomorphism HomK (K, V ) ∼ = V. In the proof of the latter identity, the mapping from left to right is given by sending a linear h : K → V to h(1), and the mapping from right to left is given by sending v in V to h with h(c) = cv. Next let us consider how HomK interacts with direct sums and direct products. The construction HomK (U, V ) distributes over ﬁnite direct sums in each variable, but the situation with inﬁnite direct sums or direct products is more subtle. Valid identities are Us , V ∼ HomK (Us , V ) HomK = s∈S

and

s∈S

HomK U, Vs ∼ HomK (U, Vs ), = s∈S

s∈S

3 Readers who prefer to be careful about this point can regard U as in the category V opp deﬁned in Problems 78–80 at the end of Chapter IV. Then (U, V ) → HomK (U, V ) is a covariant functor from V opp × V into V.

VI. Multilinear Algebra

268

and these are natural isomorphisms. Proofs of these identities for all S and counterexamples related to them when S is inﬁnite appear in Problems 7–8 at the end of the chapter. We have already checked that the isomorphism E ⊗K F ∼ = F ⊗K E is natural in (E, F), and we have asserted naturality in some other situations in which it is easy to check. The next proposition asserts naturality for the identity of Corollary 6.13, which combines ⊗K and HomK in a nontrivial way. After the proof of the result, we shall digress for a moment to indicate the usefulness of natural isomorphisms. Proposition 6.15. Let E, F, V , E 1 , F1 , and V1 be vector spaces over K, and let L E1 : E 1 → E, L F1 : F1 → F, and L V : V → V1 be K linear maps. Then the isomorphism of Corollary 6.13 is natural in the sense that the diagram

HomK (E ⊗K F, V ) −−−→ HomK (E, ⏐ ⏐ Hom(L E1 ⊗L F1 , L V )

HomK (F, V )) ⏐ ⏐Hom(L ,Hom(L ,L )) E1 F1 V

HomK (E 1 ⊗K F1 , V1 ) −−−→ HomK (E 1 , HomK (F1 , V1 )) commutes. REMARKS. Observe that the ﬁrst two linear maps L E1 and L F1 go in the opposite direction to the two vertical maps, while L V goes in the same direction as the vertical maps. This is a reﬂection of the fact that both sides of the identity in Corollary 6.13 are contravariant in the ﬁrst two variables and covariant in the third variable. PROOF. For ϕ in HomK (E ⊗K F, V ), e1 in E 1 , and f 1 in F1 , we have (Hom(L E1 , Hom(L F1 ,L V )) ◦ )(ϕ)(e1 )( f 1 ) = (Hom(L F1 , L V ) ◦ (ϕ) ◦ L E1 )(e1 )( f 1 ) = (Hom(L F1 , L V ) ◦ ((ϕ) ◦ L E1 ))(e1 )( f 1 ) = L V ((ϕ)(L E1 (e1 ))(L F1 ( f 1 ))) = L V (ϕ(L E1 (e1 ) ⊗ L F1 ( f 1 ))) = (L V ◦ ϕ ◦ (L E1 ⊗ L F1 ))(e1 ⊗ f 1 ) = (Hom(L E1 ⊗ L F1 , L V )(ϕ))(e1 ⊗ f 1 ) = (Hom(L E1 ⊗ L F1 , L V ) ◦ ϕ)(e1 )( f 1 ). This proves the proposition.

6. Tensor Product of Two Vector Spaces

269

Let us now discuss naturality in a wider context. In a general category D, if we have two objects U and U such that Morph(U, V ) and Morph(U , V ) have the same cardinality for each object V , then we cannot really say anything about the relationship between U and U . But under a hypothesis that the isomorphism of sets has a certain naturality to it, then, according to Proposition 6.16 below, U and U are isomorphic objects. Thus naturality of a system of weak-looking set-theoretic isomorphisms can lead to a much stronger-looking isomorphism. Corollary 6.17 goes on to make a corresponding assertion about functors. The assertion about functors in the corollary is a helpful tool for establishing natural isomorphisms of functors, and an example appears below in Proposition 6.20 . Proposition 6.16. Let D be a category, and suppose that U and U are objects in D with the following property: to each object V in D corresponds a one-one onto function TV : Morph(U, V ) → Morph(U , V ) with the system {TV } natural in V in the sense that whenever σ is in Morph(V, V ), then the diagram TV

Morph(U, V ) −−−→ Morph(U , V ) ⏐ ⏐ ⏐ ⏐left-by-σ left-by-σ TV

Morph(U, V ) −−−→ Morph(U , V ) commutes. Then U is isomorphic to U as an object in D, an isomorphism from U to U being the member TU−1 (1U ) of Morph(U, U ). REMARKS. (1) Another way of formulating this result is as follows: Let D be any category, let S be the category of sets, and let U and U be objects in D. Deﬁne a covariant functor HU : D → S by HU (V ) = MorphD (U, V ) and HU (σ ) = left-by-σ for σ ∈ MorphD (V, V ), and deﬁne HU similarly. If HU and HU are naturally isomorphic functors, then U and U are isomorphic objects in D. (2) A similar result is valid when HU and HU are contravariant functors, HU being deﬁned by HU (V ) = HomD (V, U ) and HU (σ ) = right-by-σ for σ ∈ MorphD (V, V ). The result in this case follows immediately by applying Proposition 6.16 to the opposite category D opp of D as deﬁned in Problems 78–80 at the end of Chapter IV. PROOF. Let ϕ be the element TU−1 (1U ) of Morph(U, U ), and let ψ be the element TU (1U ) of Morph(U , U ). To prove the proposition, it is enough to show that ϕψ = 1U and ψϕ = 1U .

270

VI. Multilinear Algebra

For σ in Morph(V, V ), form the commutative diagram in the statement of the proposition. The commutativity says that σ TV (h) = TV (σ h)

for h ∈ Morph(U, V ).

(∗)

Taking V = U , V = U , σ = ϕ, and h = 1U in (∗) proves the second equality of the chain ϕψ = ϕTU (1U ) = TU (ϕ1U ) = TU (ϕ) = 1U . Taking V = U , V = U , σ = ψ, and h = ϕ in (∗) proves the ﬁrst equality of the chain TU (ψϕ) = ψ TU (ϕ) = ψ1U = ψ = TU (1U ); Applying TU−1 , we obtain ψϕ = 1U , as required.

Corollary 6.17. Let C and D be categories, and let F : C → D and G : C → D be covariant functors. Suppose that to each pair of objects (A, V ) in C × D corresponds a one-one onto function T A,V : Morph(F(A), V ) → Morph(G(A), V ) with the system {T A,V } natural in (A, V ). Then the functors F and G are naturally isomorphic. REMARKS. A similar result is valid if T A,V carries Morph(V, F(A)) to Morph(V, G(A)) and/or if F and G are contravariant. To handle these situations, we apply the corollary to the opposite categories D opp and/or C opp , as deﬁned in Problems 78–80 at the end of Chapter IV, instead of to the categories D and/or C. −1 (1G(A) ) PROOF. By Proposition 6.16 and the hypotheses, the member T A,G(A) of MorphD (F(A), G(A)) is an isomorphism. We are to prove that the system {T A,G(A) } is natural in A. If σ in MorphC (A, A ) is given, then the naturality of T A,V in the V variable implies that the diagram T A,G(A)

MorphD (F(A), G(A)) −−−→ MorphD (G(A), G(A)) ⏐ ⏐ ⏐ ⏐left-by-G(σ ) left-by-G(σ ) T A,G(A )

MorphD (F(A), G(A )) −−−−→ MorphD (G(A), G(A )) −1 commutes. Evaluating at T A,G(A) (1G(A) ) ∈ MorphD (F(A), G(A)) the two equal compositions in the diagram, we obtain −1 (1G(A) ) . (∗) G(σ ) = G(σ )1G(A) = T A,G(A ) G(σ )T A,G(A)

6. Tensor Product of Two Vector Spaces

271

With σ as above, the naturality of T A,V in the A variable implies that the diagram T A ,G(A )

MorphD (F(A ), G(A )) −−−−→ MorphD (G(A ), G(A )) ⏐ ⏐ ⏐ ⏐right-by-G(σ ) right-by-F(σ ) T A,G(A )

MorphD (F(A), G(A )) −−−−→ MorphD (G(A), G(A )) commutes. Evaluating at T A−1 ,G(A ) (1G(A ) ) ∈ MorphD (F(A ), G(A )) the two equal compositions in the diagram, we obtain

G(σ ) = 1G(A ) G(σ ) = T A,G(A ) T A−1 ,G(A ) (1G(A ) )F(σ ) .

(∗∗)

Equations (∗) and (∗∗), together with the fact that T A,G(A ) is invertible, say that −1 (1G(A) ) = T A−1 G(σ )T A,G(A) ,G(A ) (1G(A ) )F(σ ).

3A = 3A ∈ MorphD (F(A), G(A)) given by T In other words, the isomorphism T −1 T A,G(A) (1G(A) ) makes the diagram TA

F(A) −−−→ G(A) ⏐ ⏐ ⏐ ⏐G(σ ) F(σ ) TA

F(A ) −−−→ G(A ) commute. Thus F is naturally isomorphic to G.

Tensor product provides a device for converting a real vector space canonically into a complex vector space, so that a basis over R in the original space becomes a basis over C in the new space. If E is the given real vector space, then the complex vector space, called the complexiﬁcation of E, is the space E C = E ⊗R C with multiplication by a complex number c in E C deﬁned to be 1 ⊗ (z → cz). This construction works more generally when we have any inclusion of ﬁelds K ⊆ L. In this situation, L becomes a vector space over K if scalar multiplication K × L → L is deﬁned as the restriction of the multiplication L × L → L within L. For any vector space E over K, we deﬁne E L = E ⊗K L, initially as a vector space over K. For c ∈ L, we then deﬁne (multiplication by c in E ⊗K L) = 1 ⊗ (multiplication by c in L).

272

VI. Multilinear Algebra

The above identities concerning tensor products of linear maps allow one easily to prove the following identities: c1 (c2 v) = (c1 c2 )v, c(u + v) = cu + cv, (c1 + c2 )v = c1 v + c2 v, 1v = v. Together these identities say that E L = E ⊗K L, with its vector-space addition and the above deﬁnition of multiplication by scalars in L, is a vector space over L. The further identity c(e ⊗ 1) = ce ⊗ 1

if c is in K and e is in E

shows that its scalar multiplication is consistent with scalar multiplication in E when the scalars are in K and E is identiﬁed with the subset E ⊗ 1 of E L . Let us say that the pair (E L , ι), where ι : E → E L is the mapping e → e ⊗ 1, is obtained by extension of scalars. This construction is characterized by a universal mapping property as follows. Proposition 6.18. Let K ⊆ L be an inclusion of ﬁelds, and let E be a vector space over K. (a) If (E L , ι) is formed by extension of scalars, then (E L , ι) has the following universal mapping property: whenever U is a vector space over L and ϕ : E → U is a K linear map, there exists a unique L linear map : E L → U such that ι = ϕ. (b) Suppose that (V, j) is any pair in which V is a vector space over L and j : E → V is a K linear function such that the following universal mapping property holds: whenever U is a vector space over L and ϕ : E → U is a K linear map, there exists a unique L linear map : V → U such that j = ϕ. Then there exists a unique isomorphism : E L → V of L vector spaces such that ι = j. PROOF. In (a), for the uniqueness of , we must have (e ⊗c) = c(e ⊗1) = c( ι)(e) = cϕ(e). Hence is determined by ϕ on pure tensors in E ⊗K L and therefore everywhere. For existence let : E ⊗K L → U be the K linear extension of the K bilinear function of E × L into U given by (e, c) → cϕ(e)

for e ∈ E and c ∈ L.

6. Tensor Product of Two Vector Spaces

273

In the L vector space E ⊗K L, multiplication by a member c0 of L is deﬁned to be 1 ⊗ (multiplication by c0 ). On a pure tensor e ⊗ c, we therefore have (c0 (e ⊗ c)) = (e ⊗ c0 c) = (c0 c)ϕ(e) = c0 (cϕ(e)) = c0 ((e ⊗ c)). Since E ⊗K L is generated by pure tensors, is L linear. By the construction of , ϕ(e) = (e ⊗ 1) = ( ι)(e). Thus has the required properties. In (b), let (V, j) have the same universal mapping property as (E L , ι). We apply the universal mapping property of (E L , ι) to the K linear map j : E → V to obtain an L linear : E L → V with ι = j, and we apply the universal mapping property of (V, j) to the K linear map ι : E → E L to obtain an L linear : V → E L with j = ι. From ( ) ι = j = ι and 1 E L ι = ι, the uniqueness in the universal mapping property for (E L , ι) implies = 1 E L . Arguing similarly, we obtain = 1V . Thus is an isomorphism with the required properties. If : E L → V is another isomorphism with ι = j, then the argument just given shows that = 1 E L and = 1V . Hence = ( )−1 = , and is unique. To make E → E L into a covariant functor from vector spaces over K to vector spaces over L, we must examine the effect on linear maps. The tool is Proposition 6.18a. Thus let E and F be two vector spaces over K, and let M : E → F be a K linear map between them. We extend scalars for E and F. The proposition applies to the composition E → F → F L and shows that the composition extends uniquely to an L linear map from E L to F L . A quick look at the proof shows that this L linear map is M ⊗ 1. Actually, we can see directly that M ⊗ 1 is indeed linear over L and not just over K: we just use our identity for compositions of tensor products to write (M ⊗ 1)(I ⊗ (multiplication by c)) = M ⊗ (multiplication by c) = (I ⊗ (multiplication by c))(M ⊗ 1). In any event, the explicit form of the extended linear map as M ⊗ 1 shows immediately that the identity linear map goes to the identity and that compositions go to compositions. Thus E → E L is a covariant functor. In the special case that the vector spaces are Kn and Km , extension of scalars has a particularly simple interpretation. The new spaces may be viewed as Ln and Lm . Thus column vectors with entries in K get replaced by column vectors with entries in L. What happens with linear mappings is even more transparent. A linear map M : E → F is given by an m-by-n matrix A with entries in K, and the linear map M ⊗ 1 : E L → F L is the one given by the same matrix A. Now the entries of A are to be regarded as members of the larger ﬁeld L. Viewed this

274

VI. Multilinear Algebra

way, extension of scalars might look as if it is dependent on choices of bases, but the tensor-product formalism shows that it is not. A related notion to extension of scalars is that of restriction of scalars. Again with an inclusion K ⊆ L of ﬁelds, a vector space E over the larger ﬁeld L becomes a vector space E K over the smaller ﬁeld K by ignoring unnecessary scalar multiplications. Although this notion is related to extension of scalars, it is not inverse to it. For example, if the two ﬁelds are R and C and if we start with an n-dimensional vector space E over R, then E C is a complex vector space of dimension n and (E C )R is a real vector space of dimension 2n. We thus do not get back to the original space E.

7. Tensor Algebra Just as polynomial rings are often used in the construction of more general commutative rings, so “tensor algebras” are often used in the construction of more general rings that may not be commutative. In this section we construct the “tensor algebra” of a vector space as a direct sum of iterated tensor products of the vector space with itself, and we establish its properties. We shall proceed with care, in order to provide a complete proof of the associativity of the multiplication. Let A, B, and C be vector spaces over a ﬁeld K. A triple tensor product V = A ⊗K B ⊗K C is a vector space over K with a 3-linear map ι : A × B × C → V having the following universal mapping property: whenever t is a 3-linear mapping of A × B ×C into a vector space U over K, then there exists a linear mapping T of V into U such that the diagram in Figure 6.4 commutes. A× B ×C ⏐ ⏐ ι

t

−−−→ U T

V = A ⊗K B ⊗K C FIGURE 6.4. Commutative diagram of a triple tensor product. The usual argument with universal mapping properties shows that there is at most one triple tensor product up to a well-determined isomorphism, and one can give an explicit construction of it that is similar to the one for ordinary tensor products E ⊗K F. We shall not need that particular proof of existence since Proposition 6.19a below will give us an alternative argument. Once we have that statement, we shall use the uniqueness of triple tensor products to establish in Proposition 6.19b an associativity formula for ordinary iterated tensor products.

7. Tensor Algebra

275

A shorter proof of Proposition 6.19b, which avoids Proposition 6.19a and uses naturality, will be given after the proof of Proposition 6.20. Proposition 6.19. If K is a ﬁeld and A, B, C are vector spaces over K, then (a) (A ⊗K B) ⊗K C and A ⊗K (B ⊗K C) are triple tensor products. (b) there exists a unique K isomorphism from left to right in (A ⊗K B) ⊗K C ∼ = A ⊗K (B ⊗K C) such that ((a ⊗ b) ⊗ c) = a ⊗ (b ⊗ c) for all a ∈ A, b ∈ B, and c ∈ C. PROOF. In (a), consider (A ⊗K B) ⊗K C. Let t : A × B × C → U be 3-linear. For c ∈ C, deﬁne tc : A × B → U by tc (a, b) = t (a, b, c). Then tc is bilinear and hence extends to a linear Tc : A ⊗K B → U . Since t is 3-linear, tc1 +c2 = tc1 +tc2 and txc = xtc for scalar x; thus uniqueness of the linear extension forces Tc1 +c2 = Tc1 + Tc2 and Txc = x Tc . Consequently t : (A ⊗K B) × C → U given by t (d, c) = Tc (d) is bilinear and therefore extends to a linear T : (A ⊗K B) ⊗K C → U . This T proves existence of the linear extension of the given t. Uniqueness is trivial, since the elements (a ⊗b)⊗c span (A ⊗K B)⊗K C. So (A ⊗K B) ⊗K C is a triple tensor product. In a similar fashion, A ⊗K (B ⊗K C) is a triple tensor product. For (b), set up the diagram of the universal mapping property for a triple tensor product, using V = (A ⊗K B) ⊗K C, U = A ⊗K (B ⊗K C), and t (a, b, c) = a ⊗ (b ⊗ c). We have just seen in (a) that V is a triple tensor product with ι(a, b, c) = (a ⊗b)⊗c. Thus there exists a linear T : V → U with T ι(a, b, c) = t (a, b, c). This equation means that T ((a ⊗ b) ⊗ c) = a ⊗ (b ⊗ c). Interchanging the roles of (A ⊗K B) ⊗K C and A ⊗K (B ⊗K C), we obtain a two-sided inverse for T . Thus T will serve as in (b), and existence is proved. Uniqueness is trivial, since the elements (a ⊗ b) ⊗ c span (A ⊗K B) ⊗K C. When there is no danger of confusion, Proposition 6.19 allows us to write a triple tensor product without parentheses as A ⊗K B ⊗K C. The same argument as in Corollaries 6.11 and 6.12 shows that the vector space of 3-linear forms on A × B ×C is canonically isomorphic to the dual of the vector space A ⊗K B ⊗K C. Just as with Corollary 6.13 and Proposition 6.15, the result of Proposition 6.19 can be improved by saying that the isomorphism is natural in the variables A, B, and C, as follows.

VI. Multilinear Algebra

276

Proposition 6.20. Let A, B, C, A1 , B1 , and C1 be vector spaces over a ﬁeld K, and let L A : A → A1 , L B : B → B1 , and L C : C → C1 be linear maps. Then the isomorphism of Proposition 6.19b is natural in the triple (A, B, C) in the sense that the diagram (A ⊗K B) ⊗K C ⏐ ⏐ (L A ⊗L B )⊗L C

−−−→

A ⊗K (B ⊗K C) ⏐ ⏐ L ⊗(L ⊗L ) A B C

(A1 ⊗K B1 ) ⊗K C1 −−−→ A1 ⊗K (B1 ⊗K C1 ) commutes. PROOF. We have ((L A ⊗ (L B ⊗ L C )) ◦ )((a ⊗ b) ⊗ c) = (L A ⊗ (L B ⊗ L C ))(a ⊗ (b ⊗ c)) = L A a ⊗ (L B ⊗ L C )(b ⊗ c) = L A a ⊗ (L B b ⊗ L C c) = ((L A a ⊗ L B b) ⊗ L C c) = ((L A ⊗ L B )(a ⊗ b) ⊗ L C c) = ( ◦ ((L A ⊗ L B ) ⊗ L C ))((a ⊗ b) ⊗ c), and the proposition follows.

The treatment of Propositions 6.19 and 6.20 can be shortened if we are willing to bypass the notion of a triple tensor product and use what was proved about naturality in the previous section. The result and the proof are as follows. Proposition 6.20 . Let A, B, and C be vector spaces over a ﬁeld K. Then there is an isomorphism : (A ⊗K B) ⊗K C → A ⊗K (B ⊗K C) that is natural in the triple (A, B, C) and satisﬁes (a ⊗ (b ⊗ c)) = a ⊗ (b ⊗ c). PROOF. Writing ∼ = for “naturally isomorphic in all variables” and applying Proposition 6.15 and other natural isomorphisms of the previous section repeatedly, we have HomK (A ⊗K B) ⊗K C, V ∼ = HomK A ⊗K B, HomK (C, V ) ∼ = HomK B, HomK (A, HomK (C, V )) ∼ = HomK B, HomK (A ⊗K C, V ) ∼ = HomK B, HomK (C ⊗K A, V ) ∼ by symmetry = HomK (C ⊗K B) ⊗K A, V ∼ = HomK A ⊗K (C ⊗K B), V ∼ = HomK A ⊗K (B ⊗K C), V .

7. Tensor Algebra

277

Then the existence of the natural isomorphism follows from Corollary 6.17. Using the explicit formula for the isomorphism in Proposition 6.16 and tracking matters down, we see that (a ⊗ (b ⊗ c)) = a ⊗ (b ⊗ c). There is no difﬁculty in generalizing matters to n-fold tensor products by induction. An n-fold tensor product is to be universal for n-multilinear maps. Again it is unique up to canonical isomorphism, as one proves by an argument that runs along familiar lines. A direct construction of an n-fold tensor product is possible in the style of the proof for ordinary tensor products, but such a construction will not be needed. Instead, we can form an n-fold tensor product as the (n − 1)-fold tensor product of the ﬁrst n − 1 spaces, tensored with the n th space. Proposition 6.19b allows us to regroup parentheses (inductively) in any fashion we choose, and the same argument as in Corollaries 6.11 and 6.12 yields the following proposition. Proposition 6.21. If E 1 , . . . , E n , and V are vector spaces over K, then the vector space HomK (E 1 ⊗K · · · ⊗K E n , V ) is canonically isomorphic (via restriction to pure tensors) to the vector space of all V -valued n-multilinear functions on E 1 × · · · × E n . In particular the vector space of all n-multilinear forms on E 1 × · · · × E n is canonically isomorphic to (E 1 ⊗K · · · ⊗K E n ) . Iterated application of Proposition 6.20 shows that we get also a well-deﬁned notion of a linear map L 1 ⊗ · · · ⊗ L n , the tensor product of n linear maps. Thus (E 1 , . . . , E n ) → E 1 ⊗K · · · ⊗K E n is a functor. There is no need to write out the details. We turn to the question of deﬁning a multiplication operation on tensors. If K is a ﬁeld, an algebra4 over K is a vector space V over K with a multiplication or product operation V × V → V that is K bilinear. The additive part of the K bilinearity means that the product operation satisﬁes the distributive laws a(b + c) = ab + ac

and

(b + c)a = ba + ca

for all a, b, c in V,

and the scalar-multiplication part of the K bilinearity means that (ka)b = k(ab) = a(kb)

for all k in K and a, b in V.

Within the text of the book, we shall work mostly just with associative algebras, i.e., those algebras satisfying the usual associative law a(bc) = (ab)c 4 Some

for all a, b, c in V.

authors use the term “algebra” to mean what we shall call an “associative algebra.”

VI. Multilinear Algebra

278

An associative algebra is therefore a ring and a vector space, the scalar multiplication and the ring multiplication being linked by the requirement that (ka)b = k(ab) = a(kb) for all scalars k. Some commutative examples of associative algebras over K are any ﬁeld L containing K, the polynomial algebra K[X 1 , . . . , X n ], and the algebra of all K-valued functions on a nonempty set S. Two noncommutative examples of associative algebras over K are the matrix algebra Mn (K), with matrix multiplication as its product, and HomK (V, V ) for any vector space V , with composition as its product. The division ring H of quaternions (Example 10 in Section IV.1) is another example of a noncommutative associative algebra over R. Despite our emphasis on algebras that are associative, certain kinds of nonassociative algebras are of great importance in applications, and consequently several problems at the end of the chapter make use of nonassociative algebras. A nonassociative algebra is determined by its vector-space structure and the multiplication table for the members of a K basis. There is no restriction on the multiplication table; all multiplication tables deﬁne algebras. Perhaps the bestknown nonassociative algebra is the 3-dimensional algebra over R determined by vector product in R3 . A basis is {i, j, k}, the multiplication operation is denoted by ×, and the multiplication table is i × i = 0,

i × j = k,

i × k = −j,

j × i = −k,

j × j = 0,

j × k = i,

k × i = j,

k × j = −i,

k × k = 0.

Since i × (i × k) = i × (−j) = −k and (i × i) × k = 0, vector product is not associative. The vector-product algebra is a special case of a Lie algebra; Lie algebras are deﬁned in Problems 31–35 at the end of the chapter. Tensor algebras, which we shall now construct, will be associative algebras. Fix a vector space E over K, and for integers n ≥ 1, let T n (E) be the n-fold tensor product of E with itself. In the case n = 0, we let T 0 (E) be the ﬁeld K. Deﬁne, initially as a vector space, T (E) to be the direct sum T (E) =

∞

T n (E).

n=0

The elements that lie in one or another T n (E) are called homogeneous. We deﬁne a bilinear multiplication on homogeneous elements T m (E) × T n (E) → T m+n (E) to be the restriction of the canonical isomorphism T m (E) ⊗K T n (E) → T m+n (E)

7. Tensor Algebra

279

resulting from iterating Proposition 6.19b. This multiplication, denoted by ⊗, is associative, as far as it goes, because the restriction of the K isomorphism T l (E) ⊗K (T m (E) ⊗K T n (E)) → (T l (E) ⊗K T m (E)) ⊗K T n (E) to T l (E) × (T m (E) × T n (E)) factors through the map T l (E) × (T m (E) × T n (E)) → (T l (E) × T m (E)) × T n (E) given by (r, (s, t)) → ((r, s), t). This much tells how to multiply homogeneous elements in T (E). nSince each element t in T (E) has a unique expansion as a ﬁnite sum t = k=0 tk with k tk ∈ T (E), we can deﬁne the product of this t and the element t = nk=0 tk to n+n be the element t ⊗ t = l=0 k+k =l (tk ⊗ tk ); the expression k+k =l (tk ⊗ tk ) l is the component of the product in T (E). Multiplication is thereby well deﬁned in T (E), and it satisﬁes the distributive laws and is associative. Thus T (E) becomes an associative algebra with a (two-sided) identity, namely the element 1 in T 0 (E). In the presence of the identiﬁcation ι : E → T 1 (E), T (E) is known as the tensor algebra of E. The pair (T (E), ι) has the universal mapping property given in Proposition 6.22 and pictured in Figure 6.5. E ⏐ ⏐ ι

l

−−−→ A L

T (E) FIGURE 6.5. University mapping property of a tensor algebra. Proposition 6.22. The pair (T (E), ι) has the following universal mapping property: whenever l : E → A is a linear map from E into an associative algebra with identity, then there exists a unique associative algebra homomorphism L : T (E) → A with L(1) = 1 such that the diagram in Figure 6.5 commutes. PROOF. Uniqueness is clear, since E and 1 generate T (E) as an algebra. For existence we deﬁne L (n) on T n (E) to be the linear extension of the n-multilinear map (v1 , v2 , . . . , vn ) → l(v1 )l(v2 ) · · · l(vn ), (n) and we let L = L in obvious notation. Let u 1 ⊗ · · · ⊗ u m be in T m (E) and v1 ⊗ · · · ⊗ vn be in T n (E). Then we have L (m) (u 1 ⊗ · · · ⊗ u m ) = l(u 1 ) · · · l(u m ), L (n) (v1 ⊗ · · · ⊗ vn ) = l(v1 ) · · · l(vn ), L (m+n) (u 1 ⊗ · · · ⊗ u m ⊗ v1 ⊗ · · · ⊗ vn ) = l(u 1 ) · · · l(u m )l(v1 ) · · · l(vn ).

280

VI. Multilinear Algebra

Hence L (m) (u 1 ⊗ · · · ⊗ u m )L (n) (v1 ⊗ · · · ⊗ vn ) = L (m+n) (u 1 ⊗ · · · ⊗ u m ⊗ v1 ⊗ · · · ⊗ vn ). Taking linear combinations, we see that L is a homomorphism.

Proposition 6.22 allows us to make E → T (E) into a functor from the category of vector spaces over K to the category of associative algebras with identity over K. To carry out the construction, we suppose that ϕ : E → F is a linear map between two vector spaces over K. If i : E → T (E) and j : F → T (F) are the inclusion maps, then jϕ is a linear map from E into T (F), and Proposition 6.22 produces a unique algebra homomorphism : T (E) → T (F) carrying 1 to 1 and satisfying i = ϕ. Then the tensor-product functor is deﬁned to carry the linear map ϕ to the homomorphism of associative algebras with identity. For the situation in which R is a commutative ring with identity, Section IV.5 introduced the ring R[X 1 , . . . , X n ] of polynomials in n commuting indeterminates with coefﬁcients in R. This ring was characterized by a universal mapping property saying that if a ring homomorphism of R into a commutative ring with identity were given and if n elements t1 , . . . , tn were given, then the ring homomorphism of R could be extended uniquely to a ring homomorphism of R[X 1 , . . . , X n ] carrying X j into t j for each j. Proposition 6.22 yields a noncommutative version of this result, except that the ring of coefﬁcients is assumed this time to be a ﬁeld K. To arrange for X 1 , . . . , X n to be noncommuting indeterminates, we form a vector space with {X 1 , . . . , X n } as a basis. Thus we let E = nj=1 KX j . If t1 , . . . , tn are arbitrary elements of an associative algebra A with identity, then the formulas l(X j ) = t j for 1 ≤ j ≤ n deﬁne a linear map l : E → A. The associative-algebra homomorphism L : T (E) → A produced by the proposition extends the inclusion of K into the subﬁeld K1 of A and carries each X j to t j .

8. Symmetric Algebra We continue to allow K to be an arbitrary ﬁeld. Let E be a vector space over K, and let T (E) be the tensor algebra. We begin by deﬁning the symmetric algebra S(E). This is to be a version of T (E) in which the elements, which are called symmetric tensors, commute with one another. It will not be canonically an algebra of polynomials, as we shall see presently, and thus we make no use of polynomial rings in the construction. Just as the vector space of n-multilinear forms E ×· · ·× E → K is canonically the dual of T n (E), so the vector space of symmetric n-multilinear forms will be

8. Symmetric Algebra

281

canonically the dual of S n (E). Here “symmetric” means that f (x1 , . . . , xn ) = f (xτ (1) , . . . , xτ (n) ) for every permutation τ in the symmetric group Sn . Since tensor algebras are supposed to be universal devices for constructing associative algebras over K, whether commutative or not, we seek to form S(E) as a quotient of T (E). If q is the quotient homomorphism, we want to have q(u ⊗ v) = q(v ⊗ u) in S(E) whenever u and v are in ι(E) = T 1 (E). Hence every element u ⊗ v − v ⊗ u is to be in the kernel of the homomorphism. On the other hand, we do not want to impose any unnecessary conditions on our quotient, and so we factor out only what the elements u ⊗ v − v ⊗ u force us to factor out. Thus we deﬁne the symmetric algebra by S(E) = T (E)/I, where

I =

two-sided ideal generated by all u ⊗ v − v ⊗ u with u and v in T 1 (E)

.

Then S(E) is an associative algebra with identity. Let us see that the fact that the generators of the ideal I are homogeneous elements (all being in T 2 (E)) implies that I =

∞

(I ∩ T n (E)).

n=0

In fact, each I ∩ T n (E) is contained in I , and hence I contains the right side. On the other hand, if x is any element of I , then x is a sum of terms of the form a ⊗ (u ⊗ v − v ⊗ u) ⊗ b, and we may assume that each a and b is homogeneous. Any individual term a ⊗ (u ⊗ v − v ⊗ u) ⊗ b is in some I ∩ T n (E), and x is exhibited as a sum of members of the various intersections I ∩ T n (E). n An ideal with the property I = ∞ n=0 (I ∩ T (E)) is said to be homogeneous. Since I is homogeneous, S(E) =

∞

T n (E)/(I ∩ T n (E)).

n=0

We write S n (E) for the n th summand on the right side, so that S(E) =

∞

S n (E).

n=0

Since I ∩ T 1 (E) = 0, the map of E → T 1 (E) → S 1 (E) into ﬁrst-order elements is one-one onto. The product operation in S(E) is written without a product sign,

282

VI. Multilinear Algebra

the image in S n (E) of v1 ⊗ · · · ⊗ vn in T n (E) being written as v1 · · · vn . If a is in S m (E) and b is in S n (E), then ab is in S m+n (E). Moreover, S n (E) is generated by elements v1 · · · vn with all v j in S 1 (E) ∼ = E, since T n (E) is generated by corresponding elements v1 ⊗ · · · ⊗ vn . The deﬁning relations for S(E) make vi v j = v j vi for vi and v j in S 1 (E), and it follows that the associative algebra S(E) is commutative. Proposition 6.23. Let E be a vector space over the ﬁeld K. (a) Let ι be the n-multilinear function ι(v1 , . . . , vn ) = v1 · · · vn of E × · · · × E into S n (E). Then (S n (E), ι) has the following universal mapping property: whenever l is any symmetric n-multilinear map of E × · · · × E into a vector space U , then there exists a unique linear map L : S n (E) → U such that the diagram l

E × · · · × E −−−→ U ⏐ ⏐ ι L S n (E) commutes. (b) Let ι be the one-one linear function that embeds E as S 1 (E) ⊆ S(E). Then (S(E), ι) has the following universal mapping property: whenever l is any linear map of E into a commutative associative algebra A with identity, then there exists a unique algebra homomorphism L : S(E) → A with L(1) = 1 such that the diagram E ⏐ ⏐ ι

l

−−−→ A L

S(E) commutes. PROOF. In both cases uniqueness is trivial. For existence we use the universal mapping properties of T n (E) and T (E) to produce 3 L on T n (E) or T (E). If we 3 can show that L annihilates the appropriate subspace so as to descend to S n (E) or S(E), then the resulting map can be taken as L, and we are done. For (a), we L(T n (E) ∩ I ) = 0, where I is have 3 L : T n (E) → U , and we are to show that 3 generated by all u ⊗ v − v ⊗ u with u and v in T 1 (E). A member of T n (E) ∩ I is thus of the form ai ⊗ (u i ⊗ vi − vi ⊗ u i ) ⊗ bi with each term in T n (E). Each term here is a sum of pure tensors x1 ⊗· · ·⊗ xr ⊗u i ⊗vi ⊗ y1 ⊗· · ·⊗ ys − x1 ⊗· · ·⊗ xr ⊗vi ⊗u i ⊗ y1 ⊗· · ·⊗ ys (∗)

8. Symmetric Algebra

283

with r + 2 + s = n. Since l by assumption takes equal values on x1 × · · · × xr × u i × vi × y1 × · · · × ys and

x1 × · · · × xr × vi × u i × y1 × · · · × ys ,

3 L vanishes on (∗), and it follows that 3 L(T n (E) ∩ I ) = 0. For (b) we are to show that 3 L : T (E) → A vanishes on I . Since ker 3 L is an ideal, it is enough to check that 3 L vanishes on the generators of I . But 3 L(u ⊗ v − v ⊗ u) = l(u)l(v) − l(v)l(u) = 0 by the commutativity of A, and thus L(I ) = 0. Corollary 6.24. If E and F are vector spaces over the ﬁeld K, then the vector space HomK (S n (E), F) is canonically isomorphic (via restriction to pure tensors) to the vector space of all F-valued symmetric n-multilinear functions on E × · · · × E. PROOF. Restriction is linear and one-one. It is onto by Proposition 6.23a. Corollary 6.25. If E is a vector space over the ﬁeld K, then the dual (S n (E)) of S n (E) is canonically isomorphic (via restriction to pure tensors) to the vector space of symmetric n-multilinear forms on E × · · · × E. PROOF. This is a special case of Corollary 6.24.

If ϕ : E → F is a linear map between vector spaces, then we can use Proposition 6.23b to deﬁne a corresponding homomorphism : S(E) → S(F) of associative algebras with identity. In this way, we can make E → S(E) into a functor from the category of vector spaces over K to the category of commutative associative algebras with identity over K. The details appear in Problem 14 at the end of the chapter. Next we shall identify a basis for S n (E) as a vector space. The union of such bases as n varies will then be a basis of S(E). Let {u i }i∈A be a basis of E, possibly inﬁnite. As noted in Section A5 of the appendix, a simple ordering on the index set A is a partial ordering in which every pair of elements is comparable and in which a ≤ b and b ≤ a together imply a = b. Proposition 6.26. Let E be a vector space over the ﬁeld K, let {u i }i∈A be a basis of E, and suppose that a simple ordering has been imposed on the index set j j A. Then the set of all monomials u i11 · · · u ikk with i 1 < · · · < i k and m jm = n is a basis of S n (E).

284

VI. Multilinear Algebra

REMARK. In particular if E is ﬁnite-dimensional with (u 1 , . . . , u N ) as an j j ordered basis, then the monomials u 11 · · · u NN of total degree n form a basis of n S (E). PROOF. Since S(E) is commutative and since n-fold products of elements ι(u i ) in T 1 (E) span T n (E), the indicated set of monomials spans S n (E). Let us see that the independent. Take any ﬁnite subset F ⊆ A of indices. The map set is linearly c u → i∈A i i i∈F ci X i of E into the polynomial algebra K[{X i }i∈F ] is linear into a commutative algebra with identity. Its extension via Proposition 6.23b maps all monomials in the u i for i ∈ F into distinct monomials in K[{X i }i∈F ], which are necessarily linearly independent. Hence any ﬁnite subset of the monomials in the statement of the proposition is linearly independent, and the whole set must be linearly independent. Therefore our spanning set is a basis. The proof of Proposition 6.26 shows that S(E) may be identiﬁed with polynomials in indeterminates identiﬁed with members of E once a basis has been chosen, but this identiﬁcation depends on the choice of basis. Indeed, if we think of E as speciﬁed in advance, then the isomorphism was set up by mapping the set {X i }i∈A to the speciﬁed basis of E, and the result certainly depended on what basis was used. Nevertheless, if E is ﬁnite-dimensional, there is still an isomorphism that is independent of basis; it is between S(E ), where E is the dual of E, and a natural basis-free notion of “polynomials” on E. We return to this point after one application of Proposition 6.26. Corollary 6.27. Let E be a ﬁnite-dimensional vector space over K of dimension N . Then

n+ N −1 n for 0 ≤ n < ∞, (a) dim S (E) = N −1 (b) S n (E ) is canonically isomorphic to S n (E) in such a way that ( f 1 · · · f n )(w1 · · · wn ) =

n

f j (wτ ( j)) ),

τ ∈Sn j=1

for any f 1 , . . . , f n in E and any w1 , . . . , wn in E, provided K has characteristic 0; here Sn is the symmetric group on n letters. PROOF. For (a), a basis has been described in Proposition 6.26. To see its cardinality, we recognize that picking out N − 1 objects from n + N − 1 to label as dividers is a way of assigning exponents

to the u j’s in an ordered basis; thus n+ N −1 the cardinality of the indicated basis is . N −1

8. Symmetric Algebra

285

For (b), let f 1 , . . . , f n be in E and w1 , . . . , wn be in E, and deﬁne l f1 ,..., fn (w1 , . . . , wn ) =

n

f j (wτ ( j)) ).

τ ∈Sn j=1

Then l f1 ,..., fn is symmetric n-multilinear from E × · · · × E into K and extends by Proposition 6.23a to a linear L f1 ,..., fn : S n (E) → K. Thus l( f 1 , . . . , f n ) = L f1 ,..., fn deﬁnes a symmetric n-multilinear map of E × · · · × E into S n (E) . Its linear extension L maps S n (E ) into S n (E) . To complete the proof, we shall show that L carries basis to basis. Let . , u N be the dual basis. Part u 1 , . . . , u N be an ordered basis of E, and let u 1 , . . j1 jN (a) shows that the elements (u 1 ) · · · (u N ) with m jm = n form a basis of S n (E ) and that the elements (u 1 )k1 · · · (u N )k N with m km = n form a basis of S n (E). We show that L of the basis of S n (E ) is the dual basis of the basis of S n (E), except for positive-integer factors. Thus let all of f 1 , . . . , f j1 be u 1 , let all of f j1 +1 , . . . , f j1 + j2 be u 2 , and so on. Similarly let all of w1 , . . . , wk1 be u 1 , let all of wk1 +1 , . . . , wk1 +k2 be u 2 , and so on. Then L((u 1 ) j1 · · · (u N ) jN )((u 1 )k1 · · · (u N )k N ) = L( f 1 · · · f n )(w1 · · · wn ) = l( f 1 , . . . , f n )(w1 · · · wn ) n = f i (wτ (i)) ). τ ∈Sn i=1

For given τ , the product on the right side is 0 unless, for each index i, an inequality jm−1 + 1 ≤ i ≤ jm implies that km−1 + 1 ≤ τ (i) ≤ km . In this case the product is 1; so the right side counts the number of such τ ’s. For given τ , obtaining a nonzero product forces km = jm for all m. And when km = jm for all m, the choice τ = 1 does lead to product 1. Hence the members of L of the basis are positive-integer multiples of the members of the dual basis, as asserted. Let us return to the question of introducing a basis-free notion of polynomials on the vector space E under the assumption that E is ﬁnite-dimensional. We take a cue from Corollary 4.32, which tells us that the evaluation homomorphism carrying K[X 1 , . . . , X n ] to the algebra of K-valued polynomial functions of (t1 , . . . , tn ) is one-one if K is an inﬁnite ﬁeld. We regard the latter as the algebra of polynomial functions on Kn , and we check what happens when we carry the vector space E over to Kn by ﬁxing a basis. Let = {x1 , . . . , xn } be a basis of E, and let = {x1 , . . . , xn } be the dual basis of E . If e = t1 x1 + · · · + tn xn is the expansion of a member of E in terms of , then we have x j (e) = t j . Thus the polynomial functions t j are given by the members of the dual basis. The vector

286

VI. Multilinear Algebra

space of all homogeneous ﬁrst-degree polynomial functions is the set of linear combinations of the t j ’s, and these are given by arbitrary linear functionals on E. Thus the vector space of homogeneous ﬁrst-degree polynomial functions on E is just the dual space E , and this conclusion does not depend on the choice of basis. The algebra of all polynomial functions on E is then the algebra of all K-valued functions on E generated by E and the constant functions. This discussion tells us unambiguously what polynomial functions on E are to be, and we want to backtrack to handle abstract polynomials on E. Although the evaluation homomorphism from K[X 1 , . . . , X n ] to the algebra of polynomial functions on Kn may fail to be one-one if K is a ﬁnite ﬁeld, its restriction to homogeneous ﬁrst-degree polynomials is one-one. Thus, whatever we might mean by the vector space of homogeneous ﬁrst-degree polynomials on E, the evaluation mapping should exhibit this space as isomorphic to E . Armed with these clues, we deﬁne the polynomial algebra P(E) on E to be the symmetric algebra S(E ) if E is ﬁnite-dimensional. We need an evaluation mapping for each point e of E, and we obtain this from the universal mapping property of symmetric algebras (Proposition 6.23b): With e ﬁxed, we have a linear map l from the vector space E to the commutative associative algebra K given with l(e ) = e (e). The universal mapping property gives us a unique algebra homomorphism L : S(E ) → K that extends l and carries 1 to 1. The algebra homomorphism L is then a multiplicative linear functional on P(E) = S(E ) that carries 1 to 1 and agrees with evaluation at e on homogeneous ﬁrstdegree polynomials. We write this homomorphism as p → p(e), and we deﬁne P n (E) = S n (E ); this is the vector space of homogeneous n th -degree polynomials on E. A conﬁrmation that P(E) is indeed to be regarded as the algebra of abstract polynomials on E comes from the following. Proposition 6.28. If E is a ﬁnite-dimensional vector space over the ﬁeld K, then the system of evaluation homomorphisms P(E) → K on polynomials given by p → { p(e)}e∈E is an algebra homomorphism of P(E) onto the algebra of K-valued polynomial functions on E that carries the identity to the constant function 1, and it is one-one if K is an inﬁnite ﬁeld. PROOF. Certainly p → { p(e)}e∈E is an algebra homomorphism of P(E) into the algebra of K-valued polynomial functions on E, and it carries the identity to the constant function 1. We have seen that the image of P 1 (E) is exactly E , and hence the image of P(E) is the algebra of K-valued functions on E generated by E and the constants. This is exactly the algebra of all K-valued polynomial functions, and hence the mapping is onto. Suppose that K is inﬁnite. The restriction of p → { p(e)}e∈E to the ﬁnitedimensional subspace P n (E) of P(E) maps into the ﬁnite-dimensional subspace of all polynomial functions on E homogeneous of degree n, and this restriction

8. Symmetric Algebra

287

must therefore be onto. We can read off the dimension of the space of all polynomial functions on E homogeneous of degree n from Corollary 4.32 and Corollary 6.27a. This dimension matches the dimension of P n (E), according to Corollary 6.27a. Since the mapping is onto and the ﬁnite dimensions match, the restricted mapping is one-one. Hence p → { p(e)}e∈E is one-one. We have deﬁned the symmetric algebra S(E) as a quotient of the tensor algebra T (E). Now let us suppose that K has characteristic 0. With this hypothesis we shall be able to identify an explicit vector subspace of T (E) that maps one-one onto S(E) during the passage to the quotient. This subspace of T (E) can therefore be viewed as a version of S(E) for some purposes. We deﬁne an n-multilinear function from E × · · · × E into T n (E) by (v1 , . . . , vn ) →

1 vτ (1) ⊗ · · · ⊗ vτ (n) , n! τ ∈S n

and let σ : T n (E) → T n (E) be its linear extension. We call σ the symmetrizer operator. The image of σ in T (E) is denoted by 3 S n (E), and the members of this subspace are called symmetrized tensors. Proposition 6.29. Let the ﬁeld K have characteristic 0, and let E be a vector space over K. Then the symmetrizer operator σ satisﬁes σ 2 = σ . The kernel of σ on T n (E) is exactly T n (E) ∩ I , and therefore S n (E) ⊕ (T n (E) ∩ I ). T n (E) = 3 REMARK. In view of this corollary, the quotient map T n (E) → S n (E) carries 3 S n (E) one-one onto S n (E). Thus 3 S n (E) can be viewed as a copy of S n (E) n embedded as a direct summand of T (E). PROOF. We have 1 vρτ (1) ⊗ · · · ⊗ vρτ (n) (n!)2 ρ,τ ∈S n 1 = vω(1) ⊗ · · · ⊗ vω(n) (n!)2 ρ∈S ω∈S ,

σ 2 (v1 ⊗ · · · ⊗ vn ) =

n

=

n

(ω=ρτ )

1 σ (v1 ⊗ · · · ⊗ vn ) n! ρ∈S n

= σ (v1 ⊗ · · · ⊗ vn ).

VI. Multilinear Algebra

288

Hence σ 2 = σ . Thus σ ﬁxes any member of image σ , and it follows that image σ ∩ ker σ = 0. Consequently T n (E) is the direct sum of image σ and ker σ . We are left with identifying ker σ as T n (E) ∩ I . The subspace T n (E) ∩ I is spanned by elements x1 ⊗ · · · ⊗ xr ⊗ u ⊗ v ⊗ y1 ⊗ · · · ⊗ ys − x1 ⊗ · · · ⊗ xr ⊗ v ⊗ u ⊗ y1 ⊗ · · · ⊗ ys with r + 2 + s = n, and the symmetrizer σ certainly vanishes on such elements. Hence T n (E) ∩ I ⊆ ker σ . Suppose that the inclusion is strict, say with t in ker σ but t not in T n (E) ∩ I . Let q be the quotient map T n (E) → S n (E). The kernel of q is T n (E) ∩ I , and thus q(t) = 0. From Proposition 6.26 the T (E) monomials in basis elements from E with increasing indices map onto a basis of S(E). Since K has characteristic 0, the symmetrized versions of these monomials map to nonzero multiples of the images of the initial monomials. S n (E) Consequently q carries 3 S n (E) = image σ onto S n (E). Thus choose t ∈ 3 n with q(t ) = q(t). Then t − t is in ker q = T (E) ∩ I ⊆ ker σ . Since σ (t) = 0, we see that σ (t ) = 0. Consequently t is in ker σ ∩ image σ = 0, and we obtain t = 0 and q(t) = q(t ) = 0, contradiction. 9. Exterior Algebra We turn to a discussion of the exterior algebra. Let K be an arbitrary ﬁeld, and let E be a vector 4space over K. The construction, results, and proofs for the exterior algebra (E) are similar to those for the symmetric algebra S(E). The 4 elements of (E) are to be all the alternating tensors (= skew-symmetric if K has characteristic = 2), and so we want to force v ⊗ v = 0. Thus we deﬁne the exterior algebra by 4 (E) = T (E)/I , where

I =

two-sided ideal generated by all v ⊗ v with v in T 1 (E)

.

4 Then (E) is an associative algebra with identity. ∞ n It is clear that I is homogeneous: I = n=0 (I ∩ T (E)). Thus we can write 4 n n (E) = ∞ n=0 T (E)/(I ∩ T (E)). 4 We write n (E) for the n th summand on the right side, so that 4

(E) =

∞ 4n n=0

(E).

9. Exterior Algebra

289

4 Since I ∩ T 1 (E) = 0, the map4 of E into ﬁrst-order elements 1 (E) is one-one onto. The product operation in (E) is denoted by ∧ rather than ⊗, the image 4m in 4 n n (E) of v4 denoted by v1 ∧ · · · ∧ v4 (E) 1 ⊗ · · · vn in T (E) being4 n . If a is in and b is in n (E), then a ∧ b is in m+n (E). Moreover, n (E) is generated 4 is generated by elements v1 ∧ · · · ∧ vn with all v j in 1 (E) ∼ = E, since T n (E) 4 by corresponding elements v1 ⊗ · · · ⊗ vn . The deﬁning relations for (E) make 4 vi ∧ v j = −v j ∧ vi for vi and v j in 1 (E), and it follows that 4 4 a ∧ b = (−1)mn b ∧ a for a ∈ m (E) and b ∈ n (E). Proposition 6.30. Let E be a vector space over the ﬁeld K. (a)4Let ι be the n-multilinear function ι(v1 , . . . , vn ) = v1 ∧· · ·∧vn of E×· · ·×E 4 into n (E). Then ( n (E), ι) has the following universal mapping property: whenever l is any alternating n-multilinear map 4nof E ×· · ·× E into a vector space (E) → U such that the diagram U , then there exists a unique linear map L : l

E × · · · × E −−−→ U ⏐ ⏐ ι L 4n (E) commutes. 4 4 4 (b) Let ι be the function that embeds E as 1 (E) ⊆ (E). Then ( (E), ι) has the following universal mapping property: whenever l is any linear map of 2 E into an associative algebra A with identity such that 4 l(v) = 0 for all v ∈ E, then there exists a unique algebra homomorphism L : (E) → A with L(1) = 1 such that the diagram l

E −−−→ A ⏐ ⏐ ι L 4 (E) commutes. PROOF. The proof is completely analogous to the proof of Proposition 6.23. Corollary 6.31. 4 If E and F are vector spaces over the ﬁeld K, then the vector space HomK ( n (E), F) is canonically isomorphic (via restriction to pure tensors) to the vector space of all F-valued alternating n-multilinear functions on E × · · · × E. PROOF. Restriction is linear and one-one. It is onto by Proposition 6.30a.

290

VI. Multilinear Algebra

4n Corollary 6.32. If E is a vector space over the ﬁeld K, then the dual ( (E)) 4n of (E) is canonically isomorphic (via restriction to pure tensors) to the vector space of alternating n-multilinear forms on E × · · · × E. PROOF. This is a special case of Corollary 6.31.

If ϕ : E → F is a linear map between vector spaces, then 4 we can 4 use Proposition 6.30b to deﬁne a corresponding homomorphism : (E) → (F) 4 of associative algebras with identity. In this way, we can make E → (E) into a functor from the category of vector spaces over K to the category of commutative associative algebras with identity over K. We omit the details, which are similar to those for symmetric tensors. 4 Next we shall identify a basis for n (E) 4 as a vector space. The union of such bases as n varies will then be a basis of (E). Proposition 6.33. Let E be a vector space over the ﬁeld K, let {u i }i∈A be a basis of E, and suppose that a simple ordering has been imposed on the index set A. Then the set of all monomials u i1 ∧ · · · ∧ u in with i 1 < · · · < i n is a basis of 4 n (E). 4 PROOF in (E) satisﬁes a ∧ b = (−1)mn b ∧ a for 4n 4m . Since multiplication (E) and since monomials span T n (E), the indicated set a ∈ 4(E) and b ∈ n (E). Let us see that the set is linearly independent. For i ∈ A, let u i be spans the member of E with u i (u j ) equal to 1 for j = i and equal to 0 for j = i. Fix r1 < · · · < rn , and deﬁne l(w1 , . . . , wn ) = det{u r i (w j )}

for w1 , . . . , wn in E.

Then l is alternating n-multilinear from E × · · · × E into K and extends by 4 Proposition 6.30a to L : n (E) → K. If k1 < · · · < kn , then L(u k1 ∧ · · · ∧ u kn ) = l(u k1 , . . . , u kn ) = det{u r i (u k j )}, and the right side is 0 unless r1 = k1 , . . . , rn = kn , in which 4 case it is 1. This proves that the u r1 ∧ · · · ∧ u rn are linearly independent in n (E). Corollary 6.34. Let E be a ﬁnite-dimensional vector space over K of dimension N . Then

4 N for 0 ≤ n ≤ N and = 0 for n > N , (a) dim n (E) = n 4 4n (E ) is canonically isomorphic to n (E) by (b) ( f 1 ∧ · · · ∧ f n )(w1 , . . . , wn ) = det{ f i (w j )}.

9. Exterior Algebra

291

PROOF. Part (a) is an immediate consequence of Proposition 6.33, and (b) is proved in the same way as Corollary 6.27b, using Proposition 6.30a as a tool. The “positive-integer multiples” that arise in the proof of Corollary 6.27b are all 1 in the current proof, and hence no restriction on the characteristic of K is needed. Now let us suppose that K has characteristic 0. We deﬁne an n-multilinear function from E × · · · × E into T n (E) by (v1 , . . . , vn ) →

1 (sgn τ )vτ (1) ⊗ · · · ⊗ vτ (n) , n! τ ∈S n

and let σ : T n (E) → T n (E) be its linear extension. We call σ the antisym4n metrizer operator. The image of σ in T (E) is denoted by 3 (E), and the members of this subspace are called antisymmetrized tensors. Proposition 6.35. Let the ﬁeld K have characteristic 0, and let E be a vector space over K. Then the antisymmetrizer operator σ satisﬁes σ 2 = σ . The kernel of σ on T n (E) is exactly T n (E) ∩ I , and therefore 4n T n (E) = 3 (E) ⊕ (T n (E) ∩ I ). 4 REMARK. In view of this corollary, the quotient map T n (E) → n (E) carries 4 3 n (E) can be viewed as a copy of 4n (E) 3 n (E) one-one onto 4n (E). Thus 4 embedded as a direct summand of T n (E). PROOF. We have 1 (sgn ρτ )vρτ (1) ⊗ · · · ⊗ vρτ (n) 2 (n!) ρ,τ ∈S n 1 = (sgn ω)vω(1) ⊗ · · · ⊗ vω(n) (n!)2 ρ∈S ω∈S ,

σ 2 (v1 ⊗ · · · ⊗ vn ) =

n

=

n

(ω=ρτ )

1 σ (v1 ⊗ · · · ⊗ vn ) n! ρ∈S n

= σ (v1 ⊗ · · · ⊗ vn ). Hence σ 2 = σ . Consequently T n (E) is the direct sum of image σ and ker σ , and we are left with identifying ker σ as T n (E) ∩ I . The subspace T n (E) ∩ I is spanned by elements x1 ⊗ · · · ⊗ xr ⊗ v ⊗ v ⊗ y1 ⊗ · · · ⊗ ys

292

VI. Multilinear Algebra

with r +2+s = n, and the antisymmetrizer σ certainly vanishes on such elements. Hence T n (E) ∩ I ⊆ ker σ . Suppose that the inclusion is strict, say 4 with t in ker σ but t not in T n (E) ∩ I . Let q be the quotient map T n (E) → n (E). The kernel of q is T n (E) ∩ I , and thus q(t) = 0. From Proposition46.33 the T (E) monomials with strictly increasing indices map onto a basis of (E). Since K has characteristic 0, the antisymmetrized versions of these monomials map to nonzero multiples of the images of the initial monomials. Consequently q carries 4 3 n (E) with q(t ) = q(t). 3 n (E) = image σ onto 4n (E). Thus choose t ∈ 4 Then t − t is in ker q = T n (E) ∩ I ⊆ ker σ . Since σ (t) = 0, we see that σ (t ) = 0. Consequently t is in ker σ ∩ image σ = 0, and we obtain t = 0 and q(t) = q(t ) = 0, contradiction.

10. Problems 1.

2.

Let V be a vector space over a ﬁeld K, and let · , · be a nondegenerate bilinear form on V . (a) Prove that every member v of V is of the form v (w) = v, w for one and only one member v of V . (b) Suppose that ( · , · ) is another bilinear form on V . Prove that there is some linear function L : V → V such that (v, w) = L(v), w for all v and w in V . The matrix A = 01 10 with entries in F2 is symmetric. Prove that there is no nonsingular M with M t AM diagonal.

3.

This problem shows that one possible generalization of Sylvester’s Law to other ﬁelds is not valid. Over theﬁeldF3 , show that there is a nonsingular matrix −1 0 M such that = M t 10 01 M. Conclude that the number of squares in 0 −1

K× among the diagonal entries of the diagonal form in Theorem 6.5 is not an invariant of the symmetric matrix. 4.

Let V be a complex n-dimensional vector space, let ( · , · ) be a Hermitian form on V , let VR be the 2n-dimensional real vector space obtained from V by restricting scalar multiplication to real scalars, and deﬁne · , · = Im( · , · ). Prove that (a) · , · is an alternating bilinear form on VR , (b) J (v1 ), J (v2 ) = v1 , v2 for all v1 and v2 if J : VR → VR is what multiplication by i becomes when viewed as a linear map from VR to itself, (c) · , · is nondegenerate on VR if and only if ( · , · ) is nondegenerate on V .

5.

Let W be a 2n-dimensional real vector space, and let · , · be a nondegenerate alternating bilinear form on W . Suppose that J : W → W is a linear map such

10. Problems

293

that J 2 = −I and J (w1 ), J (w2 ) = w1 , w2 for all w1 and w2 in W . Prove that W equals VR for some n-dimensional complex vector space V possessing a Hermitian form whose imaginary part is · , · . 6.

This problem sharpens the result of Theorem 6.7 in the nondegenerate case. Let · , · be a nondegenerate alternating bilinear form on a 2n-dimensional vector space V over K. A vector subspace S of V is called an isotropic subspace if u, v = 0 for all u and v in S. Prove that (a) any isotropic subspace of V that is maximal under inclusion has dimension n, (b) for any maximal isotropic subspace S1 , there exists a second maximal isotropic subspace S2 such that S1 ∩ S2 = 0. (c) if S1 and S2 are maximal isotropic subspaces of V such that S1 ∩ S2 = 0, then the linear map S2 → S1 given by s2 → · , s2 S1 is an isomorphism of S2 onto the dual space S1 . (d) if S1 and S2 are maximal isotropic subspaces of V such that S1 ∩ S2 = 0, then there exist bases { p1 , . . . , pn } of S1 and {q1 , . . . , qn } of S2 such that pi , p j = qi , q j = 0 and pi , q j = δi j for all i and j. (The resulting basis { p1 , . . . , pn , q1 , . . . , qn } of V is called a Weyl basis of V .)

7.

Let S be a nonempty set, and let K be a ﬁeld. For s in S, let Us and Vs be vector spaces over K, and let U and V be two vector spaces over K. further ∼ (a) Prove that HomK s∈S Us , V = s∈S HomK (Us , V ). (b) Prove that HomK U, s∈S Vs ∼ = s∈S HomK (U, Vs ). (c) Give examples to show that neither isomorphism in (a) and (b) need remain valid if all three direct products are changed to direct sums.

8.

This problem continues Problem 1 at the end of Chapter V, which established a canonical-form theorem for an action of G L(m, K) × G L(n, K) on m-byn matrices. For the present problem, the group G L(n, K) acts on Mn (K) by (g, x) → gxg t . (a) Verify that this is indeed a group action and that the vector subspaces Ann (K) of alternating matrices and Snn (K) of symmetric matrices are mapped into themselves under the group action. (b) Prove that two members of Ann (K) lie in the same orbit if and only if they have the same rank, and that the rank must be even. For each even rank ≤ n, ﬁnd an example of a member of Ann (K) with that rank. (c) Prove that two members of Snn (C) lie in the same orbit if and only if they have the same rank, and for each rank ≤ n, ﬁnd an example of a member of Snn (C) with that rank.

9.

Let U and V be vector spaces over K, and let U be the dual of U . The bilinear map (u , v) → u ( · )v of U × V into HomK (U, V ) extends to a linear map TU V : U ⊗K V → HomK (U, V ).

VI. Multilinear Algebra

294

(a) (b) (c) (d)

Prove that TU V is one-one. Prove that TU V is onto HomK (U, V ) if U is ﬁnite-dimensional. Give an example for which TU V is not onto HomK (U, V ). Let C be the category of all vector spaces over K, and let and be the functors from C × C into C whose effects on objects are (U, V ) = U ⊗K V and (U, V ) = HomK (U, V ). Prove that the system {TU V } is a natural transformation of into . (e) In view of (c), can the system {TU V } be a natural isomorphism?

10. Let K ⊆ L be an inclusion of ﬁelds, and let VK and VL be the categories of vector spaces over K and L. Section 6 of the text deﬁned extension of scalars as a covariant functor (E) = E ⊗K L. Another deﬁnition of extension of scalars is (E) = HomK (L, E) with (lϕ)(l ) = ϕ(ll ). Verify that (E) is a vector space over L and that is a functor. 11. A linear map L : E → F between ﬁnite-dimensional complex vector spaces becomes a linear map L R : E R → FR when we restrict attention to real scalars. Explain how to express a matrix for L R in terms of a matrix for L. 12. (Kronecker product of matrices) Let L : E 1 → E 2 and M : F1 → F2 be linear maps between ﬁnite-dimensional vector spaces over K, let 1 and 2 be ordered bases of E 1 and E 2 , and let 1and 2 beordered bases of F1 and F2 . L M Deﬁne matrices A and B by A = and B = . Use 1 , 2 , 1 , and 2 1 2 1 2 to deﬁne ordered bases 1 and 2 of E 1 ⊗K F1 and E 2 ⊗K F2 , and describe L⊗M how the matrix C = is related to A and B. 2

1

13. Let K be a ﬁeld, and let E be the vector space KX ⊕KY . Prove that the subalgebra of T (E) generated by 1, Y , and X 2 + X Y + Y 2 is isomorphic as an algebra with identity to T (F) for some vector space F. 4 Problems 14–17 concern the functors E → T (E), E → S(E), and E → E deﬁned for vector spaces over a ﬁeld K. 14. If ϕ : E → F is a linear map between vector spaces over K, Section 8 of the text indicated how to deﬁne a corresponding homomorphism : S(E) → S(F) of associative algebras with identity over K, using Proposition 6.23b. (a) Fill in the details of this application of Proposition 6.23b. (b) Establish the appropriate conditions on mappings that complete the proof that E → S(E) is a functor. (c) Verify that carries S n (E) linearly into S n (F) for all integers n ≥ 0. 15. Suppose that a linear map ϕ : E → E is given. Let : S(E) → S(E) and 3 : T (E) → T (E) be the associated algebra homomorphisms of S(E) into itself and of T (E) into itself, and let q : T (E) → S(E) be the quotient homomorphism appearing in the deﬁnition of S(E). These mappings are related by the equation 3 q(x) = q (x) for x in T (E). Proposition 6.29 shows for each n ≥ 0 that

10. Problems

295

T n (E) = 3 S n (E) ⊕ (T n (E) ∩ I ), where 3 S n (E) is the image of T n (E) under the symmetrizer mapping. The remark with the proposition observes that q carries 3 3 carries 3 3 n S n (E) into itself and that S n (E) one-one onto S n (E). Prove that 3 S (E) 3 matches S n (E) in the sense that q (x) = q(x) for all x in 3 S n (E). 16. With 4 E ﬁnite-dimensional let ϕ : E → E be a linear mapping, and deﬁne 4 4 : E → E to be the corresponding algebra homomorphism of E 4n sending 1 into 1. This carries each E into itself. Prove that acts as 4 multiplication by the scalar det ϕ on the 1-dimensional space dim E (E). 17. Suppose that G is a group, that the vector space E over K is ﬁnite-dimensional, and that ϕ : G → GL(E)4 is a representation of G on E. The functors E → T (E), E → S(E), and E → E yield, for4each ϕ(g), algebra homomorphisms of T (E) into itself, S(E) into itself, and E into itself. (a) Show that as g varies, the result in each case is a representation of G. (b) Suppose that E = Kn . Give a formula for the representation of G on a member of P(Kn ) = S((Kn ) ). Problems 18–22 concern universal mapping properties. Let A and V be two categories, and let F : A → V be a covariant functor. (In practice, F tends to be a relatively simple functor, such as one that simply ignores some of the structure of A.) Let E be in Obj(V ). A pair (S, ι) with S in Obj(A) and ι in MorphV (E, F(S)) is said to have the universal mapping property relative to E and F if the following condition is satisﬁed: whenever A is in Obj(A) and a member l of MorphV (E, F(A)) is given, there exists a unique member L of MorphA (S, A) such that F(L) ι = l. 18. (a) By suitably specializing A, V, F, etc., show that the universal mapping property of the symmetric algebra of a vector space over K is an instance of what has been described. (b) How should the answer to (a) be adjusted so as to account for the universal mapping property of the exterior algebra of a vector space over K? (c) How should the answer to (a) be adjusted so as to account for the universal mapping property of the coproduct of {X j } j∈J in a category C, the universal mapping property being as in Figure 4.12? (Educational note: For the product of {X j } j∈J in C, the above description does not apply directly because the morphisms go the wrong way. Instead, one applies the above description to the opposite categories Aopp and V opp , deﬁned as in Problems 78–80 at the end of Chapter IV.) 19. If (S, ι) and (S , ι ) are two pairs that each have the universal mapping property relative to E and F, prove that S and S are canonically isomorphic as objects in A. More speciﬁcally prove that there exists a unique L in MorphA (S, S ) such that F(L)ι = ι and that L is an isomorphism whose inverse L in MorphA (S , S) has F(L )ι = ι.

VI. Multilinear Algebra

296

20. Suppose that the pair (S, ι) has the universal mapping property relative to E and F. Let S be the category of sets, and deﬁne functors F : A → S and G : A → S by F(A) = MorphA (S, A), F(ϕ) equals composition on the left by ϕ for ϕ ∈ MorphA (A, A ), G(A) = MorphV (E, F(A)), and G(ϕ) equals composition on the left by F(ϕ). Let T A : MorphA (S, A) → MorphV (E, F(A)) be the one-one onto map given by the universal mapping property. Show that the system {T A } is a natural isomorphism of F into G. 21. Suppose that (S , ι) is a second pair having the universal mapping property relative to E and F. Deﬁne F : A → S by F (A) = MorphA (S , A). Combining the previous problem and Proposition 6.16, obtain a second proof (besides the one in Problem 19) that S and S are canonically isomorphic. 22. Suppose that for each E in Obj(V ), there is some pair (S, ι) with the universal mapping property relative to E and F. Fix such a pair (S, ι) for each E, calling it (S(E), ι E ). Making an appropriate construction for morphisms and carrying out the appropriate veriﬁcations, prove that E → S(E) is a functor. Problems 23–28 introduce the Pfafﬁan of a (2n)-by-(2n) alternating matrix X = [xi j ] with entries in a ﬁeld K. This is the polynomial in the entries of X with integer coefﬁcients given by Pfaff(X ) =

some τ ’s in S2n

(sgn τ )

n

xτ (2k−1),τ (2k) ,

k=1

where the sum is taken over those permutations τ such that τ (2k − 1) < τ (2k) for 1 ≤ k ≤ n and such that τ (1) < τ (3) < · · · < τ (2n − 1). It will be seen that det X is the square of this polynomial. Examples of Pfafﬁans are ⎞ ⎛ 0 a b c −a 0 d e 0 x Pfaff −x 0 = x and Pfaff ⎝ −b −d 0 f ⎠ = a f − be + cd. −c −e − f

0

The problems in this set will be continued at the end of Chapter VIII. 23. For the matrix J in Section 5, show that Pfaff(J ) = 1. 2n 24. In the expansion det X = σ ∈S2n (sgn σ ) l= 1 xl,σ (l) , prove that the value of the right side with X as above is not changed if the sum is extended only over those σ ’s whose expansion in terms of disjoint cycles involves only cycles of even length (and in particular no cycles of length 1). 25. Deﬁne σ ∈ S2n to be “good” if its expansion in terms of disjoint cycles involves only cycles of even length. If σ is good, show that there uniquely exist two disjoint subsets A and B of n elements each in {1, . . . , 2n} such that A contains the smallest-numbered index in each cycle and such that σ maps each set onto the other.

10. Problems

297

26. In the notation of the previous problem with σ good, let y(σ ) be the product of the monomials xab such that a is in A and b = σ (a). For each factor xi j of y(σ ) with i > j, replace the factor by −x ji . In the resulting product, arrange the factors in order so that their ﬁrst subscripts are increasing, and denote this expression by sxi1 i2 xi3 i4 · · · xi2n−1 i2n , where s is a sign. Let τ be the permutation that carries each r to ir , and deﬁne s(τ ) to be the sign s. Similarly let z(σ ) be the product of the monomials xba such that b is in B and a = σ (b). For each factor xi j of z(σ ) with i > j, replace the factor by −x ji . In the resulting product, arrange the factors in order so that their ﬁrst subscripts are increasing, and denote this expression by s x j1 j2 x j3 j4 · · · x j2n−1 j2n , where s is a sign. Let τ be the permutation that carries each r to jr , and deﬁne s (τ ) to be the sign s . Prove, apart from signs, that the σ th term in the expansion of det X matches the product of the τ th term of Pfaff(X ) and the τ th term of Pfaff(X ). 27. In the previous problem, take the signs s(τ ) and s (τ ) into account and show that the signs of σ , τ , and τ work out so that the σ th term in the expansion of det X is the product of the τ th and τ th terms of Pfaff(X ). 28. Show that every term of the product of Pfaff(X ) with itself is accounted for once and only once by the construction in the previous three problems, and conclude that the alternating matrix X has det X = (Pfaff(X ))2 . Problems 29–30 concern ﬁltrations and gradings. A vector space V over K is said to be ﬁltered when an increasing sequence of subspaces V0 ⊆ V1 ⊆ V2 ⊆ · · · is speciﬁed with union V . In this case we put V−1 = 0 by convention. The space V is graded if a sequence of subspaces V 0 , V 1 , V 2 , . . . is speciﬁed such that V =

∞

V n.

n=0

When V is graded, there is a natural ﬁltration of V given by Vn = nk=0 V k . Examples of graded vector4 spaces are any tensor algebra V = T (E), symmetric algebra S(E), exterior algebra (E), and polynomial algebra P(E), the n th subspace of the grading consisting of those elements that are homogeneous of degree n. Any polynomial algebra K[X 1 , . . . , X n ] is another example of a graded vector space, the grading being by total degree. 29. When V is a ﬁltered vector space as in (A.34), the associated graded vector # space is gr V = ∞ n=0 Vn /Vn−1 . Let V and V be two ﬁltered vector spaces, and let ϕ be a linear map between them such that ϕ(Vn ) ⊆ Vn# for all n. Since # , this restriction induces a linear the restriction of ϕ to Vn carries Vn−1 into Vn− 1 # n # map gr ϕ : (Vn /Vn−1 ) → (Vn /Vn−1 ). The direct sum of these linear maps is then a linear map gr ϕ : gr V → gr V # called the associated graded map for ϕ. Prove that if gr ϕ is a vector-space isomorphism, then ϕ is a vector-space isomorphism.

298

VI. Multilinear Algebra

30. Let A be an associative algebra over K with identity. If A has a ﬁltration A0 , A1 , . . . of vector subspaces with 1 ∈ A0 such that Am An ⊆ Am+n for all m and n, then one says algebra; similarly ∞that n A is a ﬁltered associative m n if A is graded as A = A in such a way that A A ⊆ Am+n for all m n=0 and n, then one says that A is a graded associative algebra. If A is a ﬁltered associative algebra with identity, prove that the graded vector space gr A acquires a multiplication in a natural way, making it into a graded associative algebra with identity. Problems 31–35 concern Lie algebras and their universal enveloping algebras. If K is a ﬁeld, a Lie algebra g over K is a nonassociative algebra whose product, called the Lie bracket and written [x, y], is alternating as a function of the pair (x, y) and satisﬁes the Jacobi identity [x, [y, z]] + [y, [z, x]] + [z, [x, y]] = 0 for all x, y, z in g. The universal enveloping algebra U (g) of g is the quotient T (g)/I , where I is the two-sided ideal generated by all elements x ⊗ y − y ⊗ x − [x, y] with x and y in T 1 (g). The grading for T (g) makes U (g) into a ﬁltered associate algebra with identity. The product of x and y in U (g) is written x y. 31. If A is an associative algebra over K, prove that A becomes a Lie algebra if the Lie bracket is deﬁned by [x, y] = x y − yx. In particular, observe that Mn (K) becomes a Lie algebra in this way. 32. Fix a matrix A ∈ Mn (K), and let g be the vector subspace of all members x of Mn (K) with x t A + Ax = 0. (a) Prove that g is closed under the bracket operation of the previous problem and is therefore a Lie subalgebra of Mn (K). (b) Deduce as a special case of (a) that the vector space of all skew-symmetric matrices in Mn (K) is a Lie subalgebra of Mn (K). 33. Let g be a Lie algebra over K, and let ι be the linear map obtained as the composition of g → T 1 (g) and the passage to the quotient U (g). Prove that (U (g), ι) has the following universal mapping property: whenever l is any linear map of g into an associative algebra A with identity satisfying the condition of being a Lie algebra homomorphism, namely l[x, y] = l(x)l(y) − l(y)l(x) for all x and y in g, then there exists a unique associative algebra homomorphism L : U (g) → A with L(1) = 1 such that L ◦ ι = l. 34. Let g be a Lie algebra over K, let {u i }i∈A be a vector-space basis of g, and suppose that a simple ordering has been imposed on the index set A. Prove that the set of j j all monomials u i11 · · · u ikk with i 1 < · · · < i k and m jm arbitrary is a spanning set for U (g). 35. For a Lie algebra g over K, the Poincar´e–Birkhoff–Witt Theorem says that the spanning set for U (g) in the previous problem is actually a basis. Assuming this theorem, prove that gr U (g) is isomorphic as a graded algebra to S(g). Problems 36–40 introduce Clifford algebras. Let K be a ﬁeld of characteristic = 2,

10. Problems

299

let E be a ﬁnite-dimensional vector space over K, and let · , · be a symmetric bilinear form on E. The Clifford algebra Cliff(E, · , · ) is the quotient T (E)/I , where I is the two-sided ideal generated by all elements5 v ⊗ v + v, v with v in E. The grading for T (E) makes Cliff(E, · , · ) into a ﬁltered associative algebra with identity. Products in Cliff(E, · , · ) are written as ab with no special symbol. 36. Let ι be the composition of the inclusion E ⊆ T 1 (E) and the passage to the quotient modulo I . Prove that (Cliff(E, · , · ), ι) has the following universal mapping property: whenever l is any linear map of E into an associative algebra A with identity such that l(v)2 = −v, v1 for all v ∈ E, then there exists a unique algebra homomorphism L : Cliff(E, · , · ) → A with L(1) = 1 and such that L ◦ ι = l. 37. Let {u 1 , . . . , u n } be a basis of E. Prove that the 2n elements of Cliff(E, · , · ) given by u i1 u i2 · · · u ik with i 1 < · · · < i k form a spanning set of Cliff(E, · , · ). 38. Using the Principal Axis Theorem, ﬁx a basis {e1 , . . . , en } of E such that ei , e j = di δi j for all j. Introduce an algebra C over K of dimension 2n with generators e1 , . . . , en and with a basis parametrized by subsets of {1, . . . , n} and given by all elements ei1 ei2 · · · eik with i1 < i2 < · · · < ik , with the multiplication that is implicit in the rules ei2 = −di and ei e j = −e j ei if i = j, namely, to multiply two monomials ei1 ei2 · · · eik and e j1 e j2 · · · e jl , put them end to end, replace any occurrence of two ek ’s by the scalar −dk , and then permute the remaining ek ’s until their indices are in increasing order, introducing a minus sign each time two distinct ek ’s are interchanged. Prove that the algebra C is associative. 39. Prove that the associative algebra C of the previous problem is isomorphic as an algebra to Cliff(E, · , · ). 4 40. Prove that gr Cliff(E, · , · ) is isomorphic as a graded algebra to (E). Problems 41–48 introduce ﬁnite-dimensional Heisenberg Lie algebras and the corresponding Weyl algebras. They make use of Problems 31–35 concerning Lie algebras and universal enveloping algebras. Let V be a ﬁnite-dimensional vector space over the ﬁeld K, and let · , · be a nondegenerate alternating bilinear form on V × V . Write 2n for the dimension of V . Introduce an indeterminate X 0 . The Heisenberg Lie algebra H (V ) on V is a Lie algebra whose underlying vector space is KX 0 ⊕ V and whose Lie bracket is given by [(cX 0 , u), (d X 0 , v)] = u, vX 0 . Let U (H (V )) be its universal enveloping algebra. The Weyl algebra W (V ) on V is the quotient of the tensor algebra T (V ) by the two-sided ideal generated by all u ⊗ v − v ⊗ u − u, v1 with u and v in V ; as such, it is a ﬁltered associative algebra. authors factor out the elements v ⊗ v − v, v instead. There is no generally accepted convention.

5 Some

VI. Multilinear Algebra

300

41. Verify when the ﬁeld is K = R that an example of a 2n-dimensional V with its nondegenerate alternating bilinear form · , · is V = Cn with u, v = Im(u, v), where ( · , · ) is the usual inner product on Cn . For this V , exhibit a Lie-algebra isomorphism of H (V ) with the Lie algebra of all complex (n + 1)-by-(n + 1) matrices of the form

0 z¯ t ir 0 0 z 0 0 0

with z ∈ Cn and r ∈ R.

42. In the general situation show that the linear map ι(cX 0 , v) = c1+v is a Lie algebra homomorphism of H (V ) into W (V ) and that its extension to an associative algebra homomorphism3 ι : U (H (V )) → W (V ) is onto and has kernel equal to the two-sided ideal in U (H (V )) generated by X 0 − 1. 43. Prove that W (V ) has the following universal mapping property: whenever ϕ : H (V ) → A is a Lie algebra homomorphism of H (V ) into an associative algebra A with identity such that ϕ(X 0 ) = 1, then there exists a unique associative algebra homomorphism 3 ϕ of W (V ) into A such that ϕ = 3 ϕ ◦ ι. 44. Let v1 , . . . , v2n be any vector space basis of V . Prove that the elements v1k1 · · · v2k2nn with integer exponents ≥ 0 span W (V ). 45. For K = R, let S be the vector space of all real-valued functions P(x)e−π|x| , where P(x) is a polynomial in n real variables. Show that S is mapped into itself by the linear operators ∂/∂ xi and m j = (multiplication by x j ). 2

46. With K = R, let { p1 , . . . , pn , q1 , . . . , qn } be a Weyl basis of V in the terminology of Problem 6. In the notation of Problem 45, let ϕ : V → HomR (S, S ) be the linear map given by ϕ( pi ) = ∂/∂ xi and ϕ(q j ) = m j . Use Problem 43 to extend ϕ to an algebra homomorphism 3 ϕ : W (V ) → HomR (S, S ) with 3 ϕ (1) = 1, and use Problem 42 to obtain a representation of H (V ) on S. Prove that this representation of H (V ) is irreducible in the sense that there is no proper nonzero vector subspace carried to itself by all members of 3 ϕ (H (V )). 47. In Problem 46 with K = R, prove that the associative algebra homomorphism 3 ϕ : W (V ) → HomR (S, S ) is one-one. Conclude for K = R that the elements v1k1 · · · v2k2nn of Problem 44 form a vector-space basis of W (V ). 48. For K = R, prove that gr W (V ) is isomorphic as a graded algebra to S(V ). Problems 49–51 deal with Jordan algebras. Let K be a ﬁeld of characteristic = 2. An algebra J over K with multiplication a · b is called a Jordan algebra if the identities a · b = b · a and a 2 · (b · a) = (a 2 · b) · a are always satisﬁed; here a 2 is an abbreviation for a · a. 49. Let A be an associative algebra, and deﬁne a · b = 12 (ab + ba). Prove that A becomes a Jordan algebra under this new multiplication.

10. Problems

301

50. In the situation of the previous problem, suppose that a → a t is a one-one linear mapping of A onto itself such that (ab)t = bt a t for all a and b. (For example, a → a t could be the transpose mapping if A = Mn (K).) Prove that the vector subspace of all a with a t = a is carried to itself by the Jordan product a · b and hence is a Jordan algebra. 51. Let V be a ﬁnite-dimensional vector space over K, and let · , · be a symmetric bilinear form on V . Deﬁne A = K1 ⊕ V as a vector space, and deﬁne a multiplication in A by (c1, x) · (d1, y) = (cd + x, y)1, cy + d x . Prove that A is a Jordan algebra under this deﬁnition of multiplication. Problems 52–56 deal with the algebra O of real octonions, sometimes known as the Cayley numbers. This is a certain 8-dimensional nonassociative algebra with identity over R with an inner product such that ab = ab for all a and b and such that the left and right multiplications by any element a = 0 are always invertible. 52. Let A be an algebra over R. Let [a, b] = ab − ba and [a, b, c] = (ab)c − a(bc). (a) The 3-multilinear function (a, b, c) → [a, b, c] from A× A× A to A is called the associator in A. Observe that it is 0 if and only if A is associative. Show that it is alternating if and only if A always satisﬁes the limited associativity laws (aa)b = a(ab),

(ab)a = a(ba),

(ba)a = b(aa).

In this case, A is said to be alternative. (b) Show that A is alternative if the ﬁrst and third of the limited associativity laws in (a) are always satisﬁed. 53. (Cayley–Dickson construction) Suppose that A is an algebra over R with a two-sided identity 1, and suppose that there is an R linear function ∗ from A to itself (called “conjugation”) such that 1∗ = 1, a ∗∗ = a, and (ab)∗ = b∗ a ∗ for all a and b in A. Deﬁne an algebra B over R to have the underlying real vector-space structure of A ⊕ A and to have multiplication and conjugation given by (a, b)(c, d) = (ac − db∗ , a ∗ d + cb)

and

(a, b)∗ = (a ∗ , −b).

(a) Prove that (1, 0) is a two-sided identity in B and that the operation ∗ in B satisﬁes the required properties of a conjugation. (b) Prove that if a ∗ = a for all a ∈ A, then A is commutative. (c) Prove that if a ∗ = a for all a ∈ A, then B is commutative. (d) Prove that if A is commutative and associative, then B is associative. (e) Verify the following outcomes of the above construction A → B:

(i) A = R yields B = C, (ii) A = C yields B = H, the algebra of quaternions.

302

VI. Multilinear Algebra

54. Suppose that A is an algebra over R with an identity and a conjugation as in the previous problem. Say that A is nicely normed if

(i) a + a ∗ is always of the form r 1 with r real and (ii) aa ∗ always equals a ∗ a and for a = 0, is of the form r 1 with r real and positive. (a) Prove that if A is nicely normed, then so is the algebra B of the previous problem. (b) Prove that if A is nicely normed, then (a, b) = 12 (ab∗ + ba ∗ ) is an inner product on A with norm a = (aa ∗ )1/2 = (a ∗ a)1/2 . (c) Prove that if A is associative and nicely normed, then the algebra B of the previous problem is alternative. 55. Starting from the real algebra A = H, apply the construction of Problem 53, and let the resulting 8-dimensional real algebra be denoted by O, the algebra of octonions. (a) Prove that O is an alternative algebra and is nicely normed. (b) Prove that (x x ∗ )y = x(x ∗ y) and x(yy ∗ ) = (x y)y ∗ within O. (c) Prove that ab2 a = a2 b2 a within O. (d) Conclude from (c) that the operations of left and right multiplication by any a = 0 within O are invertible. (e) Show that the inverse operators are left and right multiplication by a−2 a ∗ . (f) Denote the usual basis vectors of H by 1, i, j, k. Write down a multiplication table for the eight basis vectors of O given by (x, 0) and (0, y) as x and y run through the basis vectors of H. 56. What prevents the construction of Problem 53, when applied with A = O, from yielding a 16-dimensional algebra B in which ab2 = a2 b2 and therefore in which the operations of left and right multiplication by any a = 0 within B are invertible?

CHAPTER VII Advanced Group Theory

Abstract. This chapter continues the development of group theory begun in Chapter IV, the main topics being the use of generators and relations, representation theory for ﬁnite groups, and group extensions. Representation theory uses linear algebra and inner-product spaces in an essential way, and a structure-theory theorem for ﬁnite groups is obtained as a consequence. Group extensions introduce the subject of cohomology of groups. Sections 1–3 concern generators and relations. The context for generators and relations is that of a free group on the set of generators, and the relations indicate passage to a quotient of this free group by a normal subgroup. Section 1 constructs free groups in terms of words built from an alphabet and shows that free groups are characterized by a certain universal mapping property. This universal mapping property implies that any group may be deﬁned by generators and relations. Computations with free groups are aided by the fact that two reduced words yield the same element of a free group if and only if the reduced words are identical. Section 2 obtains the Nielsen–Schreier Theorem that subgroups of free groups are free. Section 3 enlarges the construction of free groups to the notion of the free product of an arbitrary set of groups. Free product is what coproduct is for the category of groups; free groups themselves may be regarded as free products of copies of the integers. Sections 4–5 introduce representation theory for ﬁnite groups and give an example of an important application whose statement lies outside representation theory. Section 4 contains various results giving an analysis of the space C(G, C) of all complex-valued functions on a ﬁnite group G. In this analysis those functions that are constant on conjugacy classes are shown to be linear combinations of the characters of the irreducible representations. Section 5 proves Burnside’s Theorem as an application of this theory—that any ﬁnite group of order pa q b with p and q prime and with a +b > 1 has a nontrivial normal subgroup. Section 6 introduces cohomology of groups in connection with group extensions. If N is to be a normal subgroup of G and Q is to be isomorphic to G/N , the ﬁrst question is to parametrize the possibilities for G up to isomorphism. A second question is to parametrize the possibilities for G if G is to be a semidirect product of N and Q.

1. Free Groups This section and the next two introduce some group-theoretic notions that in principle apply to all groups but in practice are used with countable groups, often countably inﬁnite groups that are nonabelian. The material is especially useful in applications in topology, particularly in connection with fundamental groups and covering spaces. But the formal development here will be completely algebraic, not making use of any deﬁnitions or theorems from topology. 303

304

VII. Advanced Group Theory

In the case of abelian groups, every abelian group G is a quotient of a suitable free abelian group, i.e., a suitable direct sum of copies of the additive group Z of integers.1 Recall the discussion of Section IV.9: We introduce a copy Zg of 3 3 = Z for each g in G, deﬁne G g∈G Zg , let i g : Zg → G be the standard embedding, and let ϕg : Zg → G be the group homomorphism written additively as ϕg (n) = ng. The universal mapping property of direct sums that was stated 3 → G such as Proposition 4.17 produces a unique group homomorphism ϕ : G that ϕ ◦ i g = ϕg for all g, and ϕ is the required homomorphism of a free abelian group onto G. The goal in this section is to carry out an analogous construction for groups that are not necessarily abelian. The constructed groups, to be called “free groups,” are to be rather concrete, and the family of all of them is to have the property that every group is the quotient of some member of the family. If S is any set, we construct a “free group F(S) on the set S.” Let us speak of S as a set of “symbols” or as the members of an “alphabet,” possibly inﬁnite, with which we are working. If S is empty, the group F(S) is taken to be the one-element trivial group, and we shall therefore now assume that S is not empty. If a is a symbol in S, we introduce a new symbol a −1 corresponding to it, and we let S −1 denote the set of all such symbols a −1 for a ∈ S. Deﬁne S = S ∪ S −1 . A word is a ﬁnite string of symbols from S , i.e., an ordered n-tuple for some n of members of S with repetitions allowed. Words that are n-tuples are said to have length n. The empty word, with length 0, will be denoted by 1. Other words are usually written with the symbols juxtaposed and all commas omitted, as in abca −1 cb−1 . The set of words will be denoted by W (S ). We introduce a multiplication W (S )× W (S ) → W (S ) by writing end-to-end the words that are to be multiplied: (abca −1 , cb−1 ) → abca −1 cb−1 . The length of a product is the sum of the lengths of the factors. It is plain that this multiplication is associative and that 1 is a two-sided identity. It is not a group operation, however, since most elements of W (S ) do not have inverses: multiplication never decreases length, and thus the only way that 1 can be a product of two elements is as the product 11. To obtain a group from W (S ), we shall introduce an equivalence relation in W (S ). Two words are said to be equivalent if one of the words can be obtained from the other by a ﬁnite succession of insertions and deletions of expressions aa −1 or a −1 a within the word; here a is assumed to be an element of S. It will be convenient to refer to the pairs aa −1 and a −1 a together; therefore when b = a −1 is in S −1 , let us deﬁne b−1 = (a −1 )−1 to be a. Then two words are equivalent if one of the words can be obtained from the other by a ﬁnite succession of insertions and deletions of expressions of the form bb−1 with b in S . This deﬁnition is 1 Direct sum here is what coproduct, in the sense of Section IV.11, amounts to in the category of all abelian groups.

1. Free Groups

305

arranged so that “equivalent” is an equivalence relation. We write x ∼ y if x and y are words that are equivalent. The underlying set for the free group F(S) will be taken to be the set of equivalence classes of members of W (S ). Theorem 7.1. If S is a set and W (S ) is the corresponding set of words built from S = S ∪ S −1 , then the product operation deﬁned on W (S ) descends in a well-deﬁned fashion to the set F(S) of equivalence classes of members of W (S ), and F(S) thereby becomes a group. Deﬁne ι : S → F(S) to be the composition of the inclusion into words of length one followed by passage to equivalence classes. Then the pair (F(S), ι) has the following universal mapping property: whenever G is a group and ϕ : S → G is a function, then there exists a unique group homomorphism 3 ϕ : F(S) → G such that ϕ = 3 ϕ ◦ ι. REMARK. The group F(S) is called the free group on S. Figure 7.1 illustrates its universal mapping property. The brief form in words of the property is that any function from S into a group G extends uniquely to a group homomorphism of F(S) into G. This universal mapping property actually characterizes F(S), as will be seen in Proposition 7.2. S ⏐ ⏐ ι

ϕ

−−−→ G ϕ

F(S) FIGURE 7.1. Universal mapping property of a free group. PROOF. Let us denote equivalence classes by brackets. We want to deﬁne multiplication in F(S) by [w1 ][w2 ] = [w1 w2 ]. To see that this formula makes sense in F(S), let x1 , x2 , and y be words, and let b be in S . Deﬁne x = x1 x2 and x = x1 bb−1 x2 , so that x ∼ x. Then it is evident that x y ∼ x y and yx ∼ yx. Iteration of this kind of relationship shows that w1 ∼ w1 and w2 ∼ w2 implies w1 w2 ∼ w1 w2 , and hence multiplication of equivalence classes is well deﬁned. Since multiplication in W (S ) is associative, we have [w1 ]([w2 ][w3 ]) = [w1 ][w2 w3 ] = [w1 (w2 w3 )] = [(w1 w2 )w3 ] = [w1 w2 ][w3 ] = ([w1 ][w2 ])[w3 ]. Thus multiplication is associative in F(S). The class [1] of the empty word 1 is a two-sided identity. If b1 , . . . , bn are in S , then bn−1 · · · b2−1 b1−1 b1 b2 · · · bn is equivalent to 1, and so is b1 b2 · · · bn bn−1 · · · b2−1 b1−1 . Consequently [bn−1 · · · b2−1 b1−1 ] is a two-sided inverse of [b1 b2 · · · bn ], and F(S) is a group. Now we address the universal mapping property, ﬁrst proving the stated uniqueness of the homomorphism. Every member of F(S) is the product of classes [b] with b in S . In turn, if b is of the form a −1 with a in S, then [b] = [a]−1 . Hence F(S) is generated by all classes [a] with a in S, i.e., by ι(S). Any homomorphism

VII. Advanced Group Theory

306

of a group is determined by its values on the members of a generating set, and uniqueness therefore follows from the formula 3 ϕ ([a]) = 3 ϕ (ι(a)) = ϕ(a). For existence we begin by deﬁning a function : W (S ) → G such that (a) = ϕ(a) (a

−1

) = ϕ(a)

for a in S, −1

for a −1 in S −1 ,

(w1 w2 ) = (w1 )(w2 ) for w1 and w2 in W (S ). We use the formulas (a) = ϕ(a) for a in S and (a −1 ) = ϕ(a)−1 for a −1 in S −1 as a deﬁnition of (b) for b in S . Any member of W (S ) can be written uniquely as b1 · · · bn with each b j in S , and we set (b1 · · · bn ) = (b1 ) · · · (bn ). (If n = 0, the understanding is that (1) = 1.) Then has the required properties. Let us show that w ∼ w implies (w ) = (w). If b1 , . . . , bn are in S and b is in S , then the question is whether (b1 · · · bk bb−1 bk+1 · · · bn ) = (b1 · · · bk bk+1 · · · bn ). ?

If g and g denote the elements (b1 ) · · · (bk ) and (bk+1 ) · · · (bn ) of G, then the two sides of the queried formula are g(b)(b−1 )g

and

gg .

Thus the question is whether (b)(b−1 ) always equals 1 in G. If b = a is in S, this equals ϕ(a)ϕ(a)−1 = 1, while if b = a −1 is in S −1 , it equals ϕ(a)−1 ϕ(a) = 1. We conclude that w ∼ w implies (w ) = (w). We may therefore deﬁne 3 ϕ ([w]) = (w) for [w] in F(S). Since 3 ϕ ([w][w ]) = ϕ ([w])3 ϕ ([w ]), 3 ϕ is a homomorphism 3 ϕ ([ww ]) = (ww ) = (w)(w ) = 3 of F(S) into G. For a in S, we have 3 ϕ ([a]) = (a) = ϕ(a). In other words, 3 ϕ (ι(a)) = ϕ(a). This completes the proof of existence. Proposition 7.2. Let S be a set, F be a group, and ι : S → F be a function. Suppose that the pair (F, ι ) has the following universal mapping property: whenever G is a group and ϕ : S → G is a function, then there exists a unique group homomorphism 3 ϕ : F → G such that ϕ = 3 ϕ ◦ ι . Then there exists a unique group homomorphism : F(S) → F such that ι = ◦ ι, and it is a group isomorphism. REMARKS. Chapter VI is not a prerequisite for the present chapter. However, readers who have been through Chapter VI will recognize that Proposition 7.2 is a special case of Problem 19 at the end of that chapter.

1. Free Groups

307

PROOF. We apply the universal mapping property of (F(S), ι), as stated in Theorem 7.1, to the group G = F and the function ϕ = ι , obtaining a group homomorphism : F(S) → F such that ι = ◦ ι. Then we apply the given universal mapping property of (F, ι ) to the group G = F(S) and the function ϕ = ι, obtaining a group homomorphism : F → F(S) such that ι = ◦ ι . The group homomorphism ◦ : F(S) → F(S) has the property that ( ◦)◦ι = ◦(◦ι) = ◦ι = ι, and the identity 1 F(S) has this same property. By the uniqueness of the group homomorphism in Theorem 7.1, ◦ = 1 F(S) . Similarly the group homomorphism ◦ : F → F has the property that ( ◦ ) ◦ ι = ι , and the identity 1 F has this same property. By the uniqueness of the group homomorphism in the assumed universal mapping property of F, ◦ = 1F . Therefore is a group isomorphism. We know that ι(S) generates F(S). If : F(S) → F is another group isomorphism with ι = ◦ ι, then and agree on ι(S) and therefore have to agree everywhere. Hence is unique. Proposition 7.2 raises the question of recognizing candidates for the set T = ι (S) in a given group F so as to be in a position to exhibit F as isomorphic to the free group F(S). Certainly T has to generate F. But there is also an independence condition. The idea is that if we form words from the members of T , then two words are to lead to equal members of F only if they can be transformed into one another by the same rules that are allowed with free groups. What this problem amounts to in the case that F = F(S) is that we want a decision procedure for telling whether two given words are equivalent. This is the so-called word problem for the free group. If we think about the matter for a moment, not much is instantly obvious. If a1 and a2 are two members of S and if they are considered as words of length 1, are they equivalent? Equivalence allows for inserting pairs bb−1 with b in S , as well as deleting them. Might it be possible to do some complicated iterated insertion and deletion of pairs to transform a1 into a2 ? Although the negative answer can be readily justiﬁed in this situation by a parity argument, it can be justiﬁed even more easily by the universal mapping property: there exist groups G with more than one element; we can map a1 to one element of G and a2 to another element of G, extend to a homomorphism ϕ (ι(a2 )), and conclude that ι(a1 ) = ι(a2 ). 3 ϕ : F(S) → G, see that 3 ϕ (ι(a1 )) = 3 But what about the corresponding problem for two more-complicated words in a free group? Fortunately there is a decision procedure for the word problem in a free group. It involves the notion of “reduced” words. A word in W (S ) is said to be reduced if it contains no consecutive pair bb−1 with b in S .

Proposition 7.3 (solution of the word problem for free groups). Let S be a set, let S = S ∪ S −1 , and let W (S ) be the corresponding set of words. Then each word in W (S ) is equivalent to one and only one reduced word.

VII. Advanced Group Theory

308

REMARK. To test whether two words are equivalent, the proposition says to delete pairs bb−1 with b ∈ S as much as possible from each given word, and to check whether the resulting reduced words are identical. PROOF. Removal of a pair bb−1 with b ∈ S decreases the length of a word by 2, and the length has to remain ≥ 0. Thus the process of successively removing such pairs has to stop after ﬁnitely many steps, and the result is a reduced word. This proves that each equivalence class contains a reduced word. For uniqueness we shall associate to each word a ﬁnite sequence of reduced words such that the last member of the sequence is unchanged when we insert or delete within the given word any expression bb−1 with b ∈ S . Speciﬁcally if w = b1 · · · bn , with each bi in S , is a given word, we associate to w the sequence of words x0 , x1 , . . . , xn deﬁned inductively by x0 = 1, x 1 = b1 , xi−1 bi xi = yi−2

if i ≥ 2 and xi−1 does not end in bi−1 , if i ≥ 2 and xi−1 = yi−2 bi−1 ,

(∗)

and we deﬁne r (w) = xn . Let us see, by induction on i ≥ 0, that xi is reduced. The base cases i = 0 and i = 1 are clear from the deﬁnition. Suppose that i ≥ 2 and that x0 , . . . , xi−1 are reduced. If xi−1 = yi−2 bi−1 for some yi−2 , then xi−1 reduced forces yi−2 to be reduced, and hence xi = yi−2 is reduced. If xi−1 does not end in bi−1 , then the last two symbols of xi = xi−1 bi do not cancel, and no earlier pair can cancel since xi−1 is assumed reduced; hence xi is reduced. This completes the induction and shows that xi is reduced for 0 ≤ i ≤ n. If the word w = b1 · · · bn is reduced, then each xi for i ≥ 2 is determined by the ﬁrst of the two choices in (∗), and hence xi = b1 · · · bi for all i. Consequently r (w) = w if w is reduced. If we can prove for a general word b1 · · · bn that r (b1 · · · bn ) = r (b1 · · · bk bb−1 bk+1 · · · bn ),

(∗∗)

then it follows that every word w equivalent to a word w has r (w ) = r (w). Since r (w) = w for w reduced, there can be only one reduced word in an equivalence class. To prove (∗∗), let x0 , . . . , xn be the ﬁnite sequence associated with b1 · · · bn , be the sequence associated with b1 · · · bk bb−1 bk+1 · · · bn . and let x0 , . . . , xn+2 and xk+2 . From (∗) we see that Certainly xi = xi for i ≤ k. Let us compute xk+1 xk+1

=

xk b

if xk does not end in b−1 ,

y

if xk = yb−1 .

1. Free Groups

309

In the ﬁrst of these cases, xk+1 ends in b, and (∗) says therefore that xk+2 = xk . In the second of the cases, the fact that xk is reduced implies that y does not end = yb−1 = xk . In other words, xk+2 = xk in both in b; hence (∗) says that xk+2 cases. Since the inductive deﬁnition of any xi depends only on xi−1 , and similarly = xk+i for 0 ≤ i ≤ n − k. Therefore xn+2 = xn , and for xi , we see that xk+2+i (∗∗) follows. This proves the proposition.

Let us return to the problem of recognizing candidates for the set T = ι (S) in a given group F so that the subgroup generated by T is a free group. Using the universal mapping property for the free group F(T ), we form the group homomorphism of F(T ) into F that extends the identity mapping on T . We want this homomorphism to be one-one, i.e., to have the property that the only way a word in F built from the members of T can equal the identity is if it comes from the identity. Because of Proposition 7.3 the only reduced word in F(T ) that yields the identity is the empty word. Thus the condition that the homomorphism be one-one is that the only image in F of a reduced word in F(T ) that can equal the identity is the image of the empty word. Making this condition into a deﬁnition, we say that a subset S = {gt | t ∈ T } of F not containing 1 is free if no nonempty product h 1 h 2 · · · h m in which each h i or h i−1 is in S and each h i+1 is different from h i−1 can be the identity. A free set in F that generates F is called a free basis for F. EXAMPLE. Within the free group F({x, y}) on two generators x and y, consider the subgroup generated by u = x 2 , v = y 2 , and w = x y. The claim is that the subset {u, v, w} is free, so that the subgroup generated by u, v, and w is isomorphic to a free group F({u, v, w}) on three generators. We are to check that no nonempty reduced word in u, v, w, u −1 , v −1 , w −1 can reduce to the empty word after substitution in terms of x and y. We induct on the length of the u, v, w word, the base case being length 0. Suppose that v = y 2 occurs somewhere in our reduced u, v, w word that collapses to the empty word after substitution. Consider what is needed for the left-hand factor of y in the y 2 to cancel. The cancellation must result from the presence of some y −1 . Suppose that this y −1 occurs to the left of y 2 . Since passing to a reduced word need involve only deletions and not insertions of pairs, everything between y −1 and y 2 must cancel. If the y −1 has resulted from w−1 = y −1 x −1 , then the number of x, y symbols between y −1 and y 2 is odd, and an odd number of factors can never cancel. So the y −1 must arise from the right-hand y −1 in a factor v −1 = y −2 . The symbols between y −2 and y 2 come from some reduced u, v, w word, and induction shows that this word must be trivial. Then y −2 and y 2 are adjacent, contradiction. Thus the left factor of y 2 must cancel because of some y −1 on the right of y 2 . If the y −1 is part of w −1 = y −1 x −1 or is the left y −1 in v −1 = y −2 , then the number of x, y

310

VII. Advanced Group Theory

symbols between the left y and the y −1 is odd, and we cannot get cancellation. So the y −1 must be the right-hand y −1 in a factor y −2 . Then we have an expression y(y · · · y −1 )y −1 in which the symbols in parentheses cancel. The symbols · · · must cancel also; since these represent some reduced u, v, w word, induction shows that · · · is empty. We conclude that y 2 and y −2 are adjacent, contradiction. Thus our reduced u, v, w word contains no factor v. Similarly examination of the right-hand factor x in an occurrence of x 2 shows that our reduced u, v, w word contains no factor u. It must therefore be a product of factors w or a product of factors w −1 . Substitution of w = x y leads directly without any cancellation to an x, y reduced word, and we conclude that the u, v, w word is empty. Thus the subset {u, v, w} is free. If G is any group, the commutator subgroup G of G is the subgroup generated by all elements x yx −1 y −1 with x ∈ G and y ∈ G. Proposition 7.4. If G is a group, then the commutator subgroup is normal, and G/G is abelian. If ϕ : G → H is any homomorphism of G into an abelian group H , then ker ϕ ⊇ G . PROOF. The computation ax yx −1 y −1 a −1 = (axa −1 )(aya −1 )(axa −1 )−1 (aya −1 )−1 shows that G is normal. If ψ : G → G/G is the quotient homomorphism, then ψ(x)ψ(y) = x yG = x y(y −1 x −1 yx)G = yx G = ψ(y)ψ(x), and therefore G/G is abelian. Finally if ϕ : G → H is a homomorphism of G into an abelian group H , then the computation ϕ(x yx −1 y −1 ) = ϕ(x)ϕ(y)ϕ(x)−1 ϕ(y)−1 = ϕ(x)ϕ(x)−1 ϕ(y)ϕ(y)−1 = 1 shows that G ⊆ ker ϕ. Corollary 7.5. If F is the free group on a set S and if F is thecommutator subgroup of F, then F/F is isomorphic to the free abelian group s∈S Zs . PROOF. Let H = s∈S Zs , and let ϕ : S → H be the function with ϕ(s) = 1s , i.e., ϕ(s) is to be the member of H that is 1 in the s th coordinate and is 0 elsewhere. Application of the universal mapping property of F as given in Theorem 7.1 yields a group homomorphism 3 ϕ : F → H such that 3 ϕ ◦ ι = ϕ. Since the elements ϕ(s), with s in S, generate H , 3 ϕ carries F onto H . Since H is abelian, ϕ descends Proposition 7.4 shows that ker 3 ϕ ⊇ F . Proposition 4.11 shows that 3 ϕ0 has to be onto H . to a homomorphism 3 ϕ0 : F/F → H , and 3 To complete the proof, we show that 3 ϕ0 is one-one. Let x be a member of F. Since the products of the elements ι(s) and their inverses generate F and since j j F/F is abelian, we can write x F = si11 · · · sinn F , where si1 occurs a total of j1 times in x, . . . , and sin occurs a total of jn times in x; it is understood that

1. Free Groups

311

an occurrence of si−1 is to contribute −1 toward j1 . Then we have 3 ϕ0 (x F ) = 1 ϕ0 (x F ) = 0, we obtain j1 ϕ(si1 )+· · ·+ jn ϕ(sin ) = 0, j1 ϕ(si1 )+· · ·+ jn ϕ(sin ). If 3 and then j1 = · · · = jn = 0 since the elements ϕ(si1 ), . . . , ϕ(sin ) are members ϕ0 is one-one. of a Z basis of H . Hence x F = F , x is in F , and 3 Corollary 7.6. If F1 and F2 are isomorphic free groups on sets S1 and S2 , respectively, then S1 and S2 have the same cardinality. PROOF. Corollary 7.5 shows that an isomorphism of F1 with F2 induces an isomorphism of the free abelian groups s∈S1 Zs1 and s∈S2 Zs2 . The rank of a free abelian group is a well-deﬁned cardinal, and the result follows—almost. We did not completely prove this fact about the rank of a free abelian group in Section IV.9. Theorem 4.53 did prove, however, that rank is well deﬁned for ﬁnitely generated free abelian groups. Thus the corollary follows if S1 and S2 are ﬁnite. If S1 or S2 is uncountable, then the cardinality of the corresponding free abelian group matches the cardinality of its Z basis; hence the corollary follows if S1 or S2 is uncountable. The only remaining case to eliminate is that one of S1 and S2 , say the ﬁrst of them, has a countably inﬁnite Z basis and the other has ﬁnite rank n. The ﬁrst of the groups then has a linearly independent set of n + 1 elements, and Lemma 4.54 shows that the span of these elements cannot be isomorphic to a subgroup of a free abelian group of rank n. This completes the proof in all cases. Because of Corollary 7.6, it is meaningful to speak of the rank of a free group; it is the cardinality of any free basis. We shall see in the next section that any subgroup of a free group is free. In contrast to the abelian case, however, the rank may actually increase in passing from a free group to one of its subgroups: the example earlier in this section exhibited a free group of rank 3 as a subgroup of a free group of rank 2. We turn to a way of describing general groups, particularly groups that are at most countable. The method uses “generators,” which we already understand, and “relations,” which are deﬁned in terms of free groups. Let S be a set, let R be a subset of F(S), and let N (R) be the smallest normal subgroup of F(S) containing R. The group G = F(S)/N (R) is sometimes written as G = S; R or as G = elements of S; elements of R, with the elements of S and R listed rather than grouped as a set. Either of these expressions is called a presentation of G. The set S is a set of generators, and the set R is the corresponding set of relations. The following result implicit in the universal mapping property of Theorem 7.1 shows the scope of this deﬁnition.

312

VII. Advanced Group Theory

Proposition 7.7. Each group G is the homomorphic image of a free group. PROOF. Let S be a set of generators for G; for example, S can be taken to be G itself. Let ϕ : S → G be the inclusion of the set of generators into G, and let 3 ϕ : F(S) → G be the group homomorphism of Theorem 7.1 such that 3 ϕ (ι(s)) = ϕ(s) for all s in S. The image of 3 ϕ is a subgroup of G that contains the generating set S and is therefore equal to all of G. Thus 3 ϕ is the required homomorphism. If G is any group and 3 ϕ : F(S) → G is the homomorphism given in Proposition 7.7, then the subgroup R = ker 3 ϕ has the property that G ∼ = S; R. Consequently every group can be given by generators and relations. For example the proof of the proposition shows that one possibility is to take S = G and R equal to the set of all members of the multiplication table, but with the multiplication table entry ss = s rewritten as the left side ss (s )−1 of an equation ss (s )−1 = 1 specifying a combination of generators that maps to 1. This is of course not a very practical example. Generators and relations are most useful when S and R are fairly small. One says that G is ﬁnitely generated if S can be chosen to be ﬁnite, ﬁnitely presented if both S and R can be chosen to be ﬁnite. A frequently used device in working with generators and relations is the following simple proposition. Proposition 7.8. Let G = S; R be a group given by generators and relations, let G be a second group, let ϕ be a one-one function ϕ from S onto a set of generators for G , and let : F(S) → G be the extension of ϕ to a group homomorphism. If (r ) = 1 for every member r of R, then descends to a homomorphism of G onto G . In particular, if G = S; R and G = S; R are groups given by generators and relations with R ⊆ R , then the natural homomorphism of F(S) onto G descends to a homomorphism of G onto G . PROOF. The proposition follows immediately from the universal mapping property in Theorem 7.1 in combination with Proposition 4.11. Now let us consider some examples of groups given by generators and relations. The case of one generator is something we already understand: the group has to be cyclic. A presentation of Z is as a; , and a presentation of Cn is as a; a n . But other presentations are possible with one generator, such as a; a 6 , a 9 for C3 . Here is an example with two generators.

1. Free Groups

313

, EXAMPLE. Let us prove that Dn ∼ = x, y; x n , y 2 , (x y)2 , where Dn is the dihedral group of order 2n. Concretely let us work with Dn as the group of 2-by-2 cos 2π/n − sin 2π/n 1 0 real matrices generated by sin 2π/n cos 2π/n and 0 −1 . The generated group indeed has order 2n. If we identify cos 2π/n − sin 2π/n 1 0 x with sin 2π/n cos 2π/n and y with 0 −1 , then y 2 = 1, and the formula

cos 2π/n − sin 2π/n sin 2π/n cos 2π/n

k

=

cos 2π k/n − sin 2π k/n sin 2π k/n cos 2π k/n

, and the square of 3n = this is the identity. - By Proposition 7.8, Dn is a homomorphic image of D , x, y; x n , y 2 , (x y)2 . To complete the identiﬁcation, it is enough to show that the 3n onto Dn must then be 3n is ≤ 2n because the homomorphism of D order of D , n 2 2 −1 one-one. In x, y; x , y , (x y) , we compute that y = y and that x(yx)y = 1 implies yx = x −1 y −1 = x −1 y. Induction then yields yx k = x −k y for k > 0. Multiplying left and right by y gives yx −k = x k y for k > 0. So yx l = x −l y for every integer l. This means that every element is of the form x m or x m y, and we may take 0 ≤ m ≤ n − 1. Hence there are at most 2n elements. shows that x n = 1. In addition, x y =

cos 2π/n sin 2π/n sin 2π/n − cos 2π/n

Without trying to be too precise, let us mention that the word problem for ﬁnitely presented groups is to give an algorithm for deciding whether two words represent the same element of the group. It is known that there is no such algorithm applicable to all ﬁnitely presented groups. Of course, there can be such an algorithm for certain special classes of presentations. For example, if there are no relations in the presentation, then the group is a free group, and Proposition 7.3 gives a solution in this case. There tends to be a solution for a class of groups if the groups all correspond rather concretely to some geometric situation, such as a tiling of Euclidean space or some other space. The example above with Dn is of this kind. By way of a concrete class of examples, - one can identify any doubly generated , group of the form x, y; x a , y b , (x y)c if a, b, c are integers > 1, and one can describe what words represent what elements in these groups. These groups all correspond to tilings in 2 dimensions. In fact, let γ = a −1 + b−1 + c−1 . If γ > 1, the tiling is of the Riemann sphere, and the group is ﬁnite. If γ = 1, the tiling is of the Euclidean plane R2 , and the group is inﬁnite. If γ < 1, the tiling is of the hyperbolic plane, and the group is inﬁnite. In all cases one starts from a triangle in the appropriate geometry with angles π/a, π/b, and π/c, and a basic tile consists of the double of this triangle obtained by reﬂecting the triangle about any of its

314

VII. Advanced Group Theory

sides. The group elements x, y, and x y are rotations, suitably oriented, about the vertices of the triangle through respective angles 2π/a, 2π/b, and 2π/c. Further information about the cases γ > 1 and γ = 1 is obtained in Problems 37–46 at the end of the chapter. We conclude with one further example of a presentation whose group we can readily identify concretely. Proposition 7.9. Let S be a set, and let R = {sts −1 t −1 | s ∈ S, t ∈ S}. Then the smallest normal subgroup of the free group F(S) containing R is the com mutator subgroup F(S) , and therefore S; R is isomorphic to the free abelian group s∈S Zs . PROOF. The members of R are in F(S) , the product of two members of F(S) is in F(S) , and any conjugate of a member of F(S) is in F(S) . Therefore the smallest normal subgroup N (R) containing R has N (R) ⊆ F(S) . Let ϕ : F(S) → F(S)/N (R) be the quotient homomorphism. Elements of the quotient F(S)/N (R) may be expressed as words in the elements ϕ(s) and ϕ(s)−1 for s in S, and the factors commute because of the deﬁnition of R. Therefore F(S)/N (R) is abelian. By Proposition 7.4, N (R) ⊇ F(S) . Therefore N (R) = F(S) . This proves the ﬁrst conclusion, and the second conclusion follows from Corollary 7.5. 2. Subgroups of Free Groups The main result of this section is that any subgroup of a free group is a free group. An example in the previous section shows that the rank can actually increase in the process of passing to the subgroup. The proof of the main result is ostensibly subtle but is relatively easy to understand in topological terms. Although we shall give the topological interpretation, we shall not pursue it further, and the proof that we give may be regarded as a translation of the topological proof into the language of algebra, combined with some steps of beautiﬁcation. For purposes of the topological argument, let us think of the given free group for the moment as ﬁnitely generated, and let us suppose that the subgroup has ﬁnite index. A free group on n symbols is the fundamental group of a bouquet of n circles, all joined at a single point, which we take as the base point. By the theory of covering spaces, any subgroup of index k is the fundamental group of some k-sheeted covering space of the bouquet of circles. This covering space is a 1-dimensional simplicial complex, and one can prove with standard tools that the fundamental group of any 1-dimensional simplicial complex is a free group. The theorem follows.

2. Subgroups of Free Groups

315

If the special hypotheses are dropped that the given free group is ﬁnitely generated and the subgroup has ﬁnite index, then the same proof is applicable as long as one allows a suitable generalization of the notion of simplicial complex. Thus the topological argument is completely general. The theorem then is as follows. Theorem 7.10 (Nielsen–Schreier Theorem). Every subgroup of a free group is a free group. REMARKS. The algebraic proof will occupy the remainder of the section but will occasionally be interrupted by comments about the example in the previous section. Let the given free group be F, let the subgroup be H , and form the right cosets H g in F. Let C be a set of representatives for these cosets, with 1 chosen as the representative of the identity coset; we shall impose further conditions on C shortly. EXAMPLE, continued. For the example in the previous section, we were given a free group F with two generators x, y, and the subgroup H is taken to have generators x 2 , x y, y 2 . In fact, one readily checks that H is the subgroup formed from all words of even length, and we shall think of it that way. The set C of coset representatives may be taken to be {1, x} in this case. The argument we gave that H is free has points of contact with the proof we give of Theorem 7.10 but is not an exact special case of it. One point of contact is that within each generator of H that we identify, there is some particular factor that does not cancel when that generator appears in a word representing a member of the subgroup. We deﬁne a function ρ : F → C by taking ρ(x) to be the coset representative of the member x of F. This function has the property that ρ(hx) = ρ(x) for all h in H and x in F. Also, x → xρ(x)−1 is a function from F to H , and it is the identity function on H . The ﬁrst lemma shows that a relatively small subset of the elements xρ(x)−1 is a set of generators of H . Lemma 7.11. Let S be the set of generators of F, and let S = S ∪ S −1 . Every element of H is a product of elements of the form gbρ(gb)−1 with g in C and b in S . Furthermore the element g = ρ(gb) of C has the properties −1 that g = ρ(g b−1 ) and that gb−1 ρ(gb−1 )−1 is of the form g bρ(g b)−1 . Consequently the elements gaρ(ga)−1 with g in C and a in S form a set of generators of H .

316

VII. Advanced Group Theory

EXAMPLE, continued. In the example, we are taking C = {1, x} and S = {x, y}. The elements gbρ(gb)−1 obtained with g=1 and b equal to x, y, x −1 , y −1 are 1, yx −1 , x −1 x −1 , and y −1 x −1 . The elements gbρ(gb)−1 obtained with g = x and b equal to x, y, x −1 , y −1 are x x, x y, 1, and x y −1 . The lemma says that 1, yx −1 , x x, and x y form a set of generators of H and that the elements x −1 x −1 , y −1 x −1 , 1, and x y −1 are inverses of these generators in some order. REMARK. The lemma needs no hypothesis that F is free. A nontrivial application of the lemma with F not free appears in Problem 43 at the end of the chapter. PROOF. Any h in F can be written as a product h = b1 · · · bn with each b j in S . Deﬁne r0 = 1 and rk = ρ(b1 · · · bk ) for 1 ≤ k ≤ n. Then

hrn−1 = (r0 b1r1−1 )(r1 b2r2 )−1 · · · (rn−1 bn rn−1 ).

(∗)

Since rk = ρ(b1 · · · bk ) = ρ(b1 · · · bk−1 bk ) = ρ(ρ(b1 · · · bk−1 )bk ) = ρ(rk−1 bk ), we have rk−1 bk rk−1 = gbρ(gb)−1 with g = rk−1 and b = bk . Thus (∗) exhibits hrn−1 as a product of elements as in the ﬁrst conclusion of the lemma. Since rn = ρ(b1 · · · bn ) = ρ(h), rn = 1 if h is in H . Therefore in this case, h itself is a product of elements as in the statement of that conclusion, and that conclusion is now proved. For the other conclusion, let gb−1 ρ(gb−1 )−1 be given, and put g = ρ(gb−1 ), so that gb−1 g −1 = h is in H . This equation implies that g b = h −1 g. Hence ρ(g b) = ρ(h −1 g) = ρ(g) = g, and it follows that gb−1 ρ(gb−1 )−1 = gb−1 g −1 −1 = (g bg −1 )−1 = g bρ(g b)−1 . This proves the lemma. Lemma 7.12. With F free it is possible to choose the set C of coset representatives in such a way that all of its members have expansions in terms of S as g = b1 · · · bn in which (a) g = b1 b2 · · · bn is a reduced word as written, (b) b1 b2 · · · bn−1 is also a member of C. REMARKS. It is understood from the case of n = 1 in (b) that 1 is the representative of the identity coset. When C is chosen as in this lemma, C is said to be a Schreier set. In the example, C = {1, x} is a Schreier set. So is C = {1, y}, and hence the selection of a Schreier set may involve a choice. PROOF. If S is ﬁnite or countably inﬁnite, we enumerate it. In the uncountable case (which is of less practical interest), we introduce a well ordering in S by means of Zermelo’s Well-Ordering Theorem as in Section A5 of the appendix.

2. Subgroups of Free Groups

317

The ordering of S will be used to deﬁne a lexicographic ordering of the set of all reduced words in the members of S . If x = b1 · · · bm

and

y = b1 · · · bn

(∗)

are reduced words with m ≤ n, we say that x < y if any of the following hold: (i) m < n, (ii) m = n and b1 < b1 , . (iii) m = n, and for some k < m, b1 = b1 , . . . , bk = bk , and bk+1 < bk+1 With this deﬁnition the set of reduced words is well ordered, and hence any nonempty subset of reduced words has a least element. Let us observe that if x, y, z are reduced words with x < y and if yz is reduced as written, then x z < yz after x z has been reduced. In fact, let us assume that x and y are as in (∗) and that the length of z is r . The assumption is that yz has length n + r , and the length of x z is at most m + r . If m < n, then certainly x z < yz. If m = n and x z fails to be reduced, then the length of x z is less than the length of yz, and x z < yz. If m = n and x z is reduced, then the ﬁrst inequality bk < bk with x and y shows that x z < yz. To deﬁne the set C of coset representatives, let the representative of H g be the least member of the set H g, each element being written as a reduced word. Since the length of the empty word is 0, the representative of the identity coset H is 1 under this deﬁnition. Thus all we have to check is that an initial segment of a member of C is again in C. Suppose that b1 · · · bn is in C, so that b1 · · · bn is the least element of H b1 · · · bn . Denote the least element of H b1 · · · bn−1 by g. If g = b1 · · · bn−1 , we are done. Otherwise g < b1 · · · bn−1 , and then the fact that b1 · · · bn is reduced implies that gbn < b1 · · · bn . But gbn is in H b1 · · · bn , and this inequality contradicts the minimality of b1 · · · bn in that coset. Thus we conclude that g = b1 · · · bn−1 . This proves the lemma. For the remainder of the proof of Theorem 7.10, we assume, as we may by Lemma 7.12, that the set C of coset representatives is a Schreier set. Typical elements of S will be denoted by a, and typical elements of S = S ∪ S −1 will be denoted by b. Let us write u for a typical element gaρ(ga)−1 with g in C, and let us write v for a typical element gbρ(gb)−1 with g in C. The elements u generate H by Lemma 7.11, and each element v is either an element u or the inverse of an element u, according to the lemma. We shall prove that the elements u not equal to 1 are distinct and form a free basis of H . First we prove that each of the elements v = gbρ(gb)−1 either is reduced as written or is equal to 1. Put g = ρ(gb), so that v = gbg −1 . Since g and g are in the Schreier set C, they are reduced as written, and hence so are g and g −1 . Thus

318

VII. Advanced Group Theory

the only possible cancellation in v occurs because the last factor of g is b−1 or the last factor of g is b. If the last factor of g is b−1 , then gb is an initial segment of g and hence is in the Schreier set C; thus ρ(gb) = gb and v = gbρ(gb)−1 = 1. Similarly if the last factor of g is b, then g b−1 is an initial segment of g and hence is in the Schreier set C; thus ρ(g b−1 ) = g b−1 , and Lemma 7.11 gives −1 = g b−1 ρ(g b−1 )−1 = 1. Thus v = gbρ(gb)−1 either is v −1 = gbρ(gb)−1 reduced as written or is equal to 1. Next let us see that the elements v other than 1 are distinct. Suppose that v = gbρ(gb)−1 = g b ρ(g b )−1 is different from 1. Remembering that each of these expressions is reduced as written, we see that if g is shorter than g , then gb is an initial segment of g . Since C is a Schreier set, gb is in C and ρ(gb) = gb; thus v = gbρ(gb)−1 equals 1, contradiction. Similarly g cannot be shorter than g. So g and g must have the same length l. In this case the ﬁrst l + 1 factors must match in the two equal reduced words, and we conclude that g = g and b = b . This proves the uniqueness. We know that each v is either some u or some u −1 , and this uniqueness shows that it cannot be both unless v = 1. Therefore the nontrivial u’s are distinct, and the nontrivial v’s consist of the u’s and their inverses, each appearing once. Since an element v not equal to 1 therefore determines its g and b, let us refer to the factor b of v = gbρ(gb)−1 as the signiﬁcant factor of v. This is the part that will not cancel out when we pass from a product of v’s to its reduced form. ¯ g¯ b) ¯ −1 , that Speciﬁcally suppose that we have v = gbρ(gb)−1 and v¯ = g¯ bρ( −1 ¯ The neither of these is 1, and that v¯ = v . Put g = ρ(gb) and g¯ = ρ(g¯ b). −1 ¯ −1 does not extend claim is that the cancellation in forming v v¯ = gbg g¯ b g¯ ¯ If it does, then one of three things to either of the signiﬁcant factors b and b. happens: (i) the b in bg −1 gets canceled because the last factor of g is b, in which case g b−1 is an initial segment of g , g b−1 = ρ(g b−1 ) = g, and v = gbg −1 = 1, or (ii) the b¯ in g¯ b¯ gets canceled because the last factor of g¯ is b¯ −1 , in which case ¯ = g¯ , and v¯ = g¯ b¯ g¯ −1 = 1, or g¯ b¯ is an initial segment of g, ¯ g¯ b¯ = ρ(g¯ b) −1 ¯ (iii) g g¯ = 1 and bb = 1, in which case g¯ = g , b¯ = b−1 , and the middle conclusion of Lemma 7.11 allows us to conclude that v¯ = v −1 . All three of these possibilities have been ruled out by our assumptions, and therefore neither of the signiﬁcant factors in v v¯ cancels. As a consequence of this noncancellation, we can see that in any product v1 · · · vm of v’s in which no vk is 1 and no vk+1 equals vk−1 , none of the signiﬁcant factors cancel. In fact, the previous paragraph shows that the signiﬁcant factors of v1 and v2 survive in forming v1 v2 , the signiﬁcant factors of v2 and v3 survive in right multiplying by v3 , and so on. Since the nontrivial u’s are distinct and

3. Free Products

319

the nontrivial v’s consist of the u’s and their inverses, each appearing once, we conclude that the set of nontrivial u’s is a free subset of F. Lemma 7.11 says that the u’s generate H , and therefore the set of nontrivial u’s is a free basis of H .

3. Free Products The free abelian group on an index set S, as constructed in Section IV.9, has a universal mapping property that allows arbitrary functions from S into any target abelian group to be extended to homomorphisms of the free abelian group into the target group. The construction of free groups in Section 1 was arranged to adapt the construction so that the target group in the universal mapping property could be any group, abelian or nonabelian. In this section we make a similar adaptation of the construction of a direct sum of abelian groups so that the result is applicable in a context of arbitrary groups. Proposition 4.17 gave the universal mapping property of the external direct sum G of a set s∈S s of abelian groups with associated embedding homomorphisms i s0 : G s0 → s∈S G s . The statement is that if H is any abelian group and ϕs : G s → H , then there {ϕs | s ∈ S} is a system of group homomorphisms exists a unique group homomorphism ϕ : s∈S G s → H such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S. Example 2 of coproducts in Section IV.11 shows that direct sum is therefore the coproduct functor in the category of all abelian groups. This universal mapping property of s∈S G s fails when H is a nonabelian group such as the symmetric group S3 . In fact, S3 has an element of order 2 and an element of order 3 and hence admits nontrivial homomorphisms ϕ2 : C2 → S3 and ϕ3 : C3 → S3 . But there is no homomorphism ϕ : C2 ⊕ C3 → S3 such that ϕ ◦ i 2 = ϕ2 and ϕ ◦ i 3 = ϕ3 because the image of ϕ has to be abelian but the images of ϕ2 and ϕ3 do not commute. Consequently direct sum cannot extend to a coproduct functor in the category of all groups. Instead, the appropriate group constructed from C2 and C3 for this kind of universal mapping property is the “free product” of C2 and C3 , denoted by C2 ∗ C3 . In this section we construct the free product of any set of groups, ﬁnite or inﬁnite. Also, we establish its universal mapping property and identify it in terms of generators and relations. The prototype of a free product is the free group F(S), which equals a free product of copies of Z indexed by S. A free product is always an inﬁnite group if at least two of the factors are not 1-element groups. An important application of free products occurs in the theory of the fundamental group in topology: if X is a topological space for which the theory of covering spaces is applicable, and if A and B are open subsets of X with X = A ∪ B such that A ∩ B is nonempty, connected, and simply connected, then the fundamental

VII. Advanced Group Theory

320

group of X is the free product of the fundamental group of A and the fundamental group of B. This result, together with a generalization that no longer requires A ∩ B to be simply connected, is known as the Van Kampen Theorem. Let S be a nonempty set of groups G s for s in S. The set S is allowed to be inﬁnite, but in practice it often has just two elements. We shall describe the group deﬁned to be the free product G = s∈S G s . We start from the set W ({G s }) of all words built from the groups G s . This consists of all ﬁnite sequences g1 · · · gn with each gi in some G s depending on i. The length of a word is the number of factors in it. The empty word is denoted by 1. We multiply two words by writing them end to end, and the resulting operation of multiplication is associative. A word is said to be equivalent to a second word if the ﬁrst can be obtained from the second by a ﬁnite sequence of steps of the following kinds and their inverses: (i) drop a factor for which gi is the identity element of the group in which it lies, (ii) collapse two factors gi gi+1 to a single one gi∗ if gi and gi+1 lie in the same G s and their product in that group is gi∗ . The result is an equivalence relation, and the set of equivalence classes is the underlying set of s∈S G s .

*

*

Theorem 7.13. If S is a nonempty set of groups G s and W ({G s }) is the set of all words from the groups G s , then the product operation deﬁned on W ({G s }) descends in a well-deﬁned fashion to the set s∈S G s of equivalence classes of members of W ({G s }), and s∈S G s thereby becomes a group. For each s0 in S, deﬁne i s0 : G s0 → s∈S G s to be the group homomorphism obtained as the 1 followed by passage composition of the inclusion of G s0 into words of length to equivalence classes. Then the pair s∈S G s , {i s } has the following universal mapping property: whenever H is a group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : G s → H , then there exists a unique group homomorphism ϕ : s∈S G s → H such that ϕ ◦ i s0 = ϕs0 for all s0 ∈ S.

*

*

*

*

*

G s0 ⏐ ⏐ i s0

ϕs

−−−→ H ϕ

s∈S G s

*

FIGURE 7.2. Universal mapping property of a free product. REMARKS. The group s∈S G s is called the free product of the groups G s . Figure 7.2 illustrates its universal mapping property. This universal mapping property actually characterizes s∈S G s , as will be seen in Proposition 7.14. One

*

*

3. Free Products

321

often writes G 1 ∗ · · · ∗ G n when the set S is ﬁnite; the order of listing the groups is immaterial. The proof of Theorem 7.13 is rather similar to the proof of Theorem 7.1, and we shall skip some details. PROOF. Let us write ∼ for the equivalence relation on words, and let us denote equivalence classes by brackets. We want to deﬁne multiplication in s∈S G s by [w1 ][w2 ] = [w1 w2 ]. To see that this formula makes sense in

* let x, x ,

s∈S G s ,

* and y be words in W ({G }), and suppose that x and x differ by only one operation

s

of type (i) or type (ii) as above. Then x ∼ x , and it is evident that x y ∼ x y and yx ∼ yx. Iteration of this kind of relationship shows that w1 ∼ w1 and w2 ∼ w2 implies w1 w2 ∼ w1 w2 , and hence multiplication is well deﬁned. The associativity of multiplication in W ({G s }) implies that multiplication in s∈S G s is associative, and [1] is a two-sided identity. We readily check that if

*

g = g1 · · · gn is a word, then the word g −1 = gn−1 · · · g1−1 has the property that [g −1 ] is a two-sided inverse to [g]. Therefore s∈S G s is a group. The uniqueness of the homomorphism ϕ in the universal mapping property is no problem since all words are products of words of length 1 and since the subgroups i s0 (G s0 ) together generate s∈S G s . For existence of ϕ, we begin by deﬁning a function : W ({G s }) → H such that

*

*

(gs ) = ϕs (gs )

for gs in G s when viewed as a word of length 1,

(w1 w2 ) = (w1 )(w2 ) for w1 and w2 in W ({G s }). We take the formulas (gs ) = ϕ(gs ) for gs in G s as a deﬁnition of on words of length 1. Any member of W ({G s }) can be written uniquely as g1 · · · gn with each gi in G si , and we set (g1 · · · gn ) = (g1 ) · · · (gn ). (If n = 0, the understanding is that (1) = 1.) Then has the required properties. Let us show that w ∼ w implies (w ) = (w). The questions are whether (i) if g1 , . . . , gn are in various G s ’s with gi equal to the identity 1si of G si , then ?

(g1 · · · gi−1 1si gi+1 · · · gn ) = (g1 · · · gi−1 gi+1 · · · gn ), (ii) if g1 , . . . , gn are in various G s ’s with G si = G si+1 and if gi gi+1 = gi∗ in G si , then (g1 · · · gi−1 gi gi+1 gi+2 · · · gn ) = (g1 · · · gi−1 gi∗ gi+2 · · · gn ). ?

VII. Advanced Group Theory

322

In the case of (i), the question comes down to whether a certain h(1si )h in H equals hh , and this is true because (1si ) = ϕsi (1si ) is the identity of H . In the case of (ii), the question comes down to whether h(gi )(gi+1 )h equals h(gi∗ )h if G si = G si+1 and gi gi+1 = gi∗ in G si , and this is true because (gi )(gi+1 ) = ϕsi (gi )ϕsi (gi+1 ) = ϕsi (gi gi+1 ) = ϕsi (gi∗ ) = (gi∗ ). We conclude that w ∼ w implies (w ) = (w). We may therefore deﬁne ϕ([w]) = (w) for [w] in F({G s }), and ϕ is a homomorphism of F({G s }) into H as a consequence of the property (w1 w2 ) = (w1 )(w2 ) of on W ({G s }). For gs in G s , we have ϕ([gs ]) = (gs ) = ϕs (gs ), i.e., ϕ(i(gs )) = ϕs (gs ). This completes the proof of existence. Proposition 7.14. Let S be a nonempty set of groups G s . Suppose that G is a group and that i s : G s → G for s ∈ S is a system of group homomorphisms with the following universal mapping property: whenever H is a group and {ϕs | s ∈ S} is a system of group homomorphisms ϕs : G s → H , then there exists a unique group homomorphism ϕ : G → H such that ϕ ◦ i s = ϕs for all s ∈ S. Then there exists a unique group homomorphism : s∈S G s → G

*

such that i s = ◦ i s for all s ∈ S. Moreover, is a group isomorphism, and the homomorphisms i s : G s → G are one-one. REMARKS. As was true with Proposition 7.2, readers who have been through Chapter VI will recognize that Proposition 7.14 is a special case of Problem 19 at the end of that chapter. PROOF. Put G =

s∈S G s . s = is ,

* and ϕ

In the universal mapping property of Theorem

and let : G → G be the homomorphism ϕ 7.13, let H = G produced by that theorem. Then satisﬁes ◦ i s = i s for all s. Reversing the roles of G and G , we obtain a homomorphism : G → G with ◦ i s = i s for all s. Therefore ( ◦ ) ◦ i s = ◦ i s = i s . Comparing ◦ with the identity 1G and applying the uniqueness in the universal mapping property for G, we see that ◦ = 1G . Similarly the uniqueness in the universal mapping property of G gives ◦ = 1G . Thus is a group isomorphism. It is uniquely determined by the given properties since the various subgroups i s (G s ) generate G. Since i s = ◦ i s and since and i s are one-one, i s is one-one. As was the case for free groups, we want a decision procedure for telling whether two given words in W ({G s }) are equivalent. This is the so-called word problem for the free product. Solving it allows us to use free products concretely, just as Proposition 7.3 allowed us to use free groups concretely. A word in W ({G s })) is said to be reduced if it (i) contains no factor for which gi is the identity element of the group G s in which it lies,

3. Free Products

323

(ii) contains no two consecutive factors gi and gi+1 taken from the same group G s . Proposition 7.15. (solution of the word problem for free products). If S is a nonempty set of groups G s and W ({G s }) is the set of all words from the groups G s , then each word in W ({G s }) is equivalent to one and only one reduced word. EXAMPLE. Consider the free product C2 ∗C2 of two cyclic groups, one with x as generator and the other with y as generator. Words consist of a ﬁnite sequence of factors of x, y, the identity of the ﬁrst factor, and the identity of the second factor. A word is reduced if no factor is an identity and if no two x’s are adjacent and no two y’s are adjacent. Thus the reduced words consist of ﬁnite sequences whose terms are alternately x and y. Those of length ≤ 3 are 1, x, y, x y, yx, x yx, yx y, and in general there are two of each length > 0. The proposition tells us that all these reduced words give distinct members of C2 ∗ C2 . In particular, the group is inﬁnite. REMARK. More generally, to test whether two words are equivalent, the proposition says to eliminate factors of the identity and multiply consecutive factors in each word when they come from the same group, and repeat these steps until it is no longer possible to do either of these operations on either word. Then each of the given words has been replaced by a reduced word, and the two given words are equivalent if and only if the two reduced words are identical. Problems 37–46 at the end of the chapter concern C2 ∗C3 , and some of these problems make use of the result of this proposition—that distinct reduced words are inequivalent. PROOF OF PROPOSITION 7.15. Both operations—eliminating factors of the identity and multiplying consecutive factors in each word when they come from the same group—reduce the length of a word. Since the length has to remain ≥ 0, the process of successively carrying out these two operations as much as possible has to stop after ﬁnitely many steps, and the result is a reduced word. This proves that each equivalence class of words contains a reduced word. For uniqueness of the reduced word in an equivalence class, we proceed somewhat as with Proposition 7.3, associating to each word a ﬁnite sequence of reduced words such that the last member of the sequence is unchanged when we apply an operation to the word that preserves equivalence. However, there are considerably more details to check this time. If w = g1 · · · gn is a given word with each gi in G si , then we associate to w the sequence of reduced words x0 , x1 , . . . , xn deﬁned inductively by x0 = 1, g1 x1 = 1

if g1 is not the identity of G s1 , if g1 is the identity of G s1 ,

324

VII. Advanced Group Theory

and the following formula for i ≥ 2 if xi−1 is of the reduced form h 1 · · · h k with h j in G tj : ⎧ h 1 · · · h k gi if G si = G tk and gi is not the identity 1G si of G si , ⎪ ⎪ ⎪ ⎨ h ···h if gi is the identity 1G si of G si , 1 k xi = ⎪ if G tk = G si with h k gi = 1G si , h 1 · · · h k−1 ⎪ ⎪ ⎩ ∗ h 1 · · · h k−1 gi if G tk = G si with h k gi = gi∗ = 1G si . Put r (w) = xn . We check inductively for i ≥ 0 that each xi is reduced. In fact, xi for i ≥ 2 begins in every case with h 1 · · · h k−1 , which is assumed reduced. The only possible reduction for xi thus comes from factors that are adjoined or from interference with h k−1 , and all possibilities are addressed in the above choices. Thus r (w) = xn is necessarily reduced for each word w. If g1 · · · gn is reduced as given, then xi is determined by the ﬁrst possible choice h 1 · · · h k gi every time, and hence xi = g1 · · · gi for all i. Therefore we obtain r (w) = w if w is reduced. Now consider the equivalent words w = g1 · · · g j g j+1 · · · gn

and

w = g1 · · · g j 1G s g j+1 · · · gn .

for w . Then we have x j = x j ; let Form x0 , . . . , xn for w and x0 , . . . , xn+1 h 1 · · · h k be a reduced form of x j . The formula for x j+1 is governed by the second choice in the display, and x j+1 = h 1 · · · h k = x j . Then x j+i+1 = x j+i for 1 ≤ i ≤ n − j as well. Hence xn+1 = xn , and r (w ) = r (w). Next suppose that g j∗ = g j g j+1 in G sj , and consider the equivalent words

w = g1 · · · g j−1 g j∗ g j+2 · · · gn

and

w = g1 · · · g j−1 g j g j+1 g j+2 · · · gn .

As above, form x0 , . . . , xn for w and x0 , . . . , xn+1 for w . Then we have x j−1 = , and we let h 1 · · · h k be a reduced form of x j−1 . There are cases, subcases, x j−1 and subsubcases. First assume G tk = G sj . Then x j equals h 1 · · · h k g j∗ or h 1 · · · h k in the two subcases g j∗ = 1G sj and g j∗ = 1G sj . In the ﬁrst subcase, we have g j∗ = 1G sj and x j = h 1 · · · h k g j∗ . Then x j equals h 1 · · · h k g j or h 1 · · · h k in the two subsubcases g j = 1G sj and g j = 1G sj . In the ﬁrst subsubcase, x j+1 = h 1 · · · h k g j∗ = x j ∗ whether or not g j+1 = 1G sj . In the second subsubcase, g j = g j g j+1 cannot be 1G sj , and therefore x j+1 = h 1 · · · h k g j∗ = x j . In the second subcase of the case G tk = G sj , we have g j∗ = 1G sj and x j = x j−1 = h 1 · · · h k . Then x j equals h 1 · · · h k g j or h 1 · · · h k in the two subsubcases g j = 1G sj and g j = 1G sj . In both subsubcases, x j+1 = h 1 · · · h k , so that x j+1 = xj .

3. Free Products

325

Now assume G tk = G sj . Then x j equals h 1 · · · h k−1 h ∗k or h 1 · · · h k−1 in the two subcases h k g j∗ = h ∗k = 1G sj and h k g j∗ = 1G sj . In the ﬁrst subcase, we have h k g j∗ = h ∗k = 1G sj and x j = h 1 · · · h k−1 h ∗k . Then x j equals h 1 · · · h k−1 h k or h 1 · · · h k−1 in the two subsubcases h k g j = h k = 1G sj and h k g j = 1G sj . In the ﬁrst subsubcase, h k g j+1 = h k g j g j+1 = h k g j∗ = h ∗k implies x j+1 = h 1 · · · h k−1 h ∗k = x j . In the second subsubcase, we know that h ∗k cannot be 1G si and hence that g j+1 = h k g j g j+1 = h k g j∗ = h ∗k cannot be 1G sj ; thus x j+1 = h 1 · · · h k−1 h ∗k = x j . In the second subcase of the case G tk = G sj , we have h k g j∗ = 1G sj and x j = h 1 · · · h k−1 . Then x j equals h 1 · · · h k−1 h ∗k or h 1 · · · h k−1 in the two subsubcases h k g j = h ∗k = 1G sj and h k g j = 1G sj . In the ﬁrst subsubcase, g j+1 cannot be 1G sj but h ∗k g j+1 = h k g j g j+1 = h k g j∗ = 1G sj ; hence x j+1 = h 1 · · · h k−1 = x j . In the second subsubcase, x j = h 1 · · · h k−1 and g j+1 = 1G sj , so that x j+1 = h 1 · · · h k−1 = x j . = x j in all cases. Hence x j+i+1 = x j+i for 0 ≤ i ≤ We conclude that x j+1 n − j, xn+1 = xn , and r (w ) = r (w). Consequently the only reduced word that is equivalent to w is r (w). Proposition 7.16. Let S be a nonempty set of groups G s , and suppose that Ss ; Rs is a ,presentation of G s ,- the sets Ss being understood to be disjoint for S ; s ∈ S. Then s∈S G s . s∈S s s∈S Rs is a presentation of the free product

*

REMARK. One effect of this proposition is to make Proposition 7.8 available as a tool for use with free products. Using Proposition 7.8 may be easier than appealing to the universal mapping property in Theorem 7.13. PROOF. Put S = s∈S Ss and R = s∈S Rs , and deﬁne G to be a group given by generators and relations as G = S; R. Consider the function from Ss into the quotient group G = F(S)/N (R) given by carrying x in Ss into the word x in S and then passing to F(S) and its quotient G. Because of the universal mapping property of free groups, this function extends to a group homomorphism 3 i s : F(Ss ) → G. If r is a reduced word relative to Ss representing a member of Rs , then r is carried by 3 i s into a member of the larger set R and then into i s contains the smallest the identity of G. Since ker3 i s is normal in F(Ss ), ker3 normal subgroup N (Rs ) in F(Ss ) that contains Rs . Proposition 4.11 shows that 3 i s descends to a group homomorphism i s : G s → G. We shall prove that G and the system {i s } have the universal mapping property of Proposition 7.14 that characterizes a free product. Then it will follow from that proposition that G ∼ = s∈S G s , and the proof will be complete. Thus let H be a group, and let {ϕs | s ∈ S} be a system of group homomorphisms ϕs : G s → H . We are to produce a homomorphism : G → H such that ◦ i s = ϕs for all s, and we are to prove that such a homomorphism

*

326

VII. Advanced Group Theory

is unique. Let qs : F(Ss ) → G s be the quotient homomorphism, and deﬁne 3 : S → H as follows: if ϕs = ϕs ◦ qs . Now deﬁne 3 ϕs : F(Ss ) → H by 3 x is in S, then x is in a set Ss for a unique s and thereby deﬁnes a member 3 is taken to be 3 ϕs (x). The universal mapping of F(Ss ) for that unique s; (x) 3 to a group homomorphism, property of the free group F(S) allows us to extend 3 which we continue to call , of F(S) into H . Let r be a nontrivial relation in R ⊆ F(S). Then r , by hypothesis of disjointness for the sets Ss , lies in a unique 3 )=3 ϕs (r ) = ϕs (qs (r )) = ϕs (1s ) = 1 H . Consequently the kernel Rs . Hence (r 3 contains the smallest normal subgroup N (R) of F(S) containing R, and 3 of descends to a homomorphism : G → H . This satisﬁes 3 is = =3 ϕs = ϕs ◦ qs . ◦ i s ◦ qs = ◦ 3 F(Ss )

Since the quotient homomorphism qs is onto G s , we obtain ◦ i s = ϕs , and existence of the homomorphism is established. For uniqueness, we observe that the identities ◦ i s = ϕs imply that is uniquely determined on the subgroup of G generated by the images of all i s . Since qs is onto G s , this subgroup is the same as the subgroup generated by the images of all 3 i s . This subgroup contains the image in G of every generator of F(S) and hence is all of G. Thus is uniquely determined. 4. Group Representations Group representations were deﬁned in Section IV.6 as group actions on vector spaces by invertible linear functions. The underlying ﬁeld of the vector space will be taken to be C in this section and the next, and the theory will then be especially tidy. The subject of group representations is one that uses a mix of linear algebra and group theory to reveal hidden structure within group actions. It has broad applications to algebra and analysis, but we shall be most interested in an application to ﬁnite groups known as Burnside’s Theorem that will be proved in the next section. Let us begin with the abelian case, taking G for the moment to be a ﬁnite abelian group. A multiplicative character of G is a homomorphism χ : G → S 1 ⊆ C× of G into the multiplicative group of complex numbers of absolute value 1. The under pointwise multiplication multiplicative characters form an abelian group G is the of their complex values: (χχ )(g) = χ (g)χ (g). The identity of G multiplicative character that is identically 1 on G, and the inverse of χ is the complex conjugate of χ. The notion of multiplicative character adapts to the case of a ﬁnite group the familiar exponential functions x → einx on the line, which can be regarded as multiplicative characters of the additive group R/2πZ of real numbers modulo 2π. These functions have long been used to resolve a periodic function of

4. Group Representations

327

time into its component frequencies: The device is the Fourier series of the function f . &If f is periodic of period 2π, then the Fourier coefﬁcients of f π 1 −inx d x, and the Fourier series of f is the inﬁnite series are c = 2π −π f (x)e ∞ n inx c e . A portion of the subject of Fourier series looks for senses in n=−∞ n which f (x) is actually equal to the sum of its Fourier series. This is the problem of Fourier inversion. A similar problem can be formulated when R/2πZ is replaced by the ﬁnite abelian group G. The exponential functions are replaced by the multiplicative characters. One can form an analog of Fourier coefﬁcients for the vector space C(G, C) of complex-valued functions2 deﬁned on G, and then one can form the analog of the Fourier series of the function. The problem of Fourier inversion becomes one of linear algebra, once we take into account the known structure of all ﬁnite abelian groups (Theorem 4.56). The result is as follows. Theorem 7.17 (Fourier inversion formula for ﬁnite abelian groups). Let G be a ﬁnite abelian group, and introduce an inner product on the complex vector space C(G, C) of all functions from G to C by the formula F(g)F (g), F, F = g∈G

form an the corresponding norm being F = F, F1/2 . Then the members of G satisfying χ2 = |G|. Consequently orthogonal basis of C(G, C), each χ in G = |G|, and any function F : G → C is given by the “sum of its Fourier |G| series”: 1 F(h)χ (h) χ (g). F(g) = |G| h∈G χ ∈G

REMARKS. This theorem is one of the ingredients in the proof in Advanced Algebra of Dirichlet’s theorem that if a and b are positive relatively prime integers, then there are inﬁnitely many primes of the form an + b. In applications to engineering, the ordinary Fourier transform on the line is often approximated, for computational purposes, by a Fourier series on a large cyclic group, and then Theorem 7.17 is applicable. Such a Fourier series can be computed with unexpected efﬁciency using a special grouping of terms; this device is called the fast Fourier transform and is described in Problems 29–31 at the end of the chapter. notation C(G, C) is to be suggestive of what happens for G = S 1 and for G = R1 , where one works in part with the space of continuous complex-valued functions vanishing off a bounded set. In any event, pointwise multiplication makes C(G, C) into a commutative ring. Later in the section we introduce a second multiplication, called “convolution,” that makes C(G, C) into a ring in a different way. In Chapter VIII we shall introduce the “complex group algebra” C G of G. The vector space C(G, C) is the dual vector space of C G. However, C(G, C) and C G are canonically isomorphic because they have distinguished bases, and the isomorphism respects the multiplication structures—convolution in C(G, C) and the group-algebra multiplication in C G. 2 The

VII. Advanced Group Theory

328

and put PROOF. For orthogonality let χ and χ be distinct members of G, −1 χ = χ χ = χ χ . Choose g0 in G with χ (g0 ) = 1. Then χ (g0 ) so that and therefore

χ (g) = g∈G χ (g0 g) = g∈G χ (g), [1 − χ (g0 )] g∈G χ (g) = 0 g∈G χ (g) = 0.

g∈G

Consequently χ , χ =

g∈G

χ (g)χ (g) =

g∈G

χ (g) = 0.

are linearly independent, The orthogonality implies that the members of G 2 ≤ dim C(G, C) = |G|. Certainly χ2 = and we obtain |G| g∈G |χ (g)| = g∈G 1 = |G|. are a basis of C(G, C), we write G as a direct To see that the members of G sum of cyclic groups, by Theorem 4.56. A summand Z/mZ has at least m distinct multiplicative characters, given by j mod m → e2πi jr/m for 0 ≤ r ≤ m − 1, and these characters extend to G as 1 on the other direct summands of G. Taking products of such multiplicative characters from the different summands of G, ≥ |G|. Therefore |G| = |G|, and G is an orthogonal basis by we see that |G| Corollary 2.4. The formula for F(g) in the statement of the theorem follows by applying Theorem 3.11c. Now suppose that the ﬁnite group G is not necessarily abelian. Since S 1 is abelian, Proposition 7.4 shows that χ takes the value 1 on every member of the commutator subgroup G of G. Consequently there is no way that the multiplicative characters can form a basis for the vector space C(G, C) of complex-valued functions on G. The above analysis thus breaks down, and some adjustment is needed in order to extend the theory. The remedy is to use representations, as deﬁned in Section IV.6, on complex vector spaces of dimension > 1. We shall assume in the text that the vector space is ﬁnite-dimensional. The sense in which representations extend the theory of multiplicative characters is that any multiplicative character χ gives a representation R on the 1-dimensional vector space C by R(g)(z) = χ (g)z for g in G and z in C. Conversely any 1-dimensional representation gives a multiplicative character: if R is the representation on the 1-dimensional vector space V and if v0 = 0 is in V , then χ (g) is the scalar such that R(g)v0 = χ (g)v0 . It is enough to observe that the only elements of ﬁnite order in the multiplicative group C× are certain members of the circle S 1 , and then it follows that χ takes values in S 1 .

4. Group Representations

329

In the higher-dimensional case, the analog of the multiplicative character χ in passing to a 1-dimensional representation R is a “matrix representation.” A matrix representation of G is a function g → [ρ(g)i j ] from G into invertible square matrices of some given size such that ρ(g1 g2 )i j = nk=1 ρ(g1 )ik ρ(g2 )k j . If a representation R acts on the ﬁnite-dimensional complex vector space V , then the choice of an ordered basis for V leads to a matrix representation by the formula

R(g) [ρ(g)i j ] = . Conversely if a matrix representation g → [ρ(g)i j ] and an ordered basis of V are given, then the same formula may be used to obtain a representation R of G on V . In contrast to the 1-dimensional case, the matrices that occur with a matrix representation of dimension > 1 need not be unitary. The correspondence between unitary linear maps and unitary matrices was discussed in Chapter III. When the ﬁnite-dimensional vector space V has an inner product, a linear map was deﬁned to be unitary if it satisﬁes the equivalent conditions of Proposition 3.18. A complex square matrix A was deﬁned to be unitary if A∗ A = I . The matrix of a unitary linear map relative to an ordered orthonormal basis is unitary, and conversely when a unitary matrix and an ordered orthonormal basis are given, the associated linear map is unitary. We can thus speak of unitary representations and unitary matrix representations. Some examples of representations appear in Section IV.6. One further pair of examples will be of interest to us. With the ﬁnite group G ﬁxed but not necessarily abelian, we continue to let C(G, C) be the complex vector space of all functions f : G → C. We deﬁne two representations of G on C(G, C): the left regular representation given by ( (g) f )(x) = f (g −1 x) and the right regular representation r given by (r (g) f )(x) = f (xg). The reason for the presence of an inverse in one case and not the other was discussed in Section IV.6. Relative to the inner product f 1 (x) f 2 (x), ( f1, f2) = x∈G

both and r are unitary. The argument for is that ( (g) f 1 , (g) f 2 ) = ( (g) f 1 )(x)( (g) f 2 )(x) = f 1 (g −1 x) f 2 (g −1 x) x∈G under

y=g −1 x

=

x∈G

f 1 (y) f 2 (y) = ( f 1 , f 2 ),

y∈G

and the argument for r is completely analogous.

330

VII. Advanced Group Theory

It will be convenient to abbreviate “representation R on V ” as “representation (R, V ).” Let (R, V ) be a representation of the ﬁnite group G on a ﬁnitedimensional complex vector space. An invariant subspace U of V is a vector subspace such that R(g)U ⊆ U for all g in G. The representation is irreducible if V = 0 and if V has no invariant subspaces other than 0 and V . Two representations (R1 , V1 ) and (R2 , V2 ) on ﬁnite-dimensional complex vector spaces are equivalent if there exists a linear invertible function A : V1 → V2 such that A R1 (g) = R2 (g)A for all g in G. In the terminology of Section IV.11, “equivalent” is the notion of “is isomorphic to” in the category of all ﬁnite-dimensional representations of G. In more detail a morphism from (R1 , V1 ) to (R2 , V2 ) in this category is an intertwining operator, namely a linear map A : V1 → V2 such that A R1 (g) = R2 (g)A for all g in G. The condition for this equality to hold is that the diagram in Figure 7.3 commute. A

V1 −−−→ ⏐ ⏐ R1 (g)

V2 ⏐ ⏐ R (g) 2

A

V1 −−−→ V2 FIGURE 7.3. An intertwining operator for two representations, i.e., a morphism in the category of ﬁnite-dimensional representations of G. An example of a pair of representations that are equivalent is the left and right regular representations of G on C(G, C): in fact, if we deﬁne (A f )(x) = f (x −1 ), then ( (g)A f )(x) = (A f )(g −1 x) = f (x −1 g) = (r (g) f )(x −1 ) = (Ar (g) f )(x). Proposition 7.18 (Schur’s Lemma). If (R1 , V1 ) and (R2 , V2 ) are irreducible representations of the ﬁnite group G on ﬁnite-dimensional complex vector spaces and if A : V1 → V2 is an intertwining operator, then A is invertible (and hence exhibits R1 and R2 as equivalent) or else A = 0. If (R1 , V1 ) = (R2 , V2 ) and A : V1 → V2 is an intertwining operator, then A is scalar. REMARK. The conclusion that A is scalar makes essential use of the fact that the underlying ﬁeld is C. PROOF. The equality R2 (g)Av1 = A R1 (g)v shows that ker A and image A are invariant subspaces. By the assumed irreducibility, ker A equals 0 or V1 , and image A equals 0 or V2 . The ﬁrst statement follows. When (R1 , V1 ) = (R2 , V2 ), the identity I : V1 → V2 is an intertwining operator. If λ is an eigenvalue of A, then A − λI is another intertwining operator. Since A − λI is not invertible when λ is an eigenvalue of A, A must be 0.

4. Group Representations

331

Corollary 7.19. Every irreducible ﬁnite-dimensional representation of a ﬁnite abelian group G is 1-dimensional. PROOF. If (R, V ) is given, then the linear map A = R(g) satisﬁes A R(g) = R(xg) = R(gx) = R(g)A for all x in G. By Schur’s Lemma (Proposition 7.18), A = R(g) is scalar. Since g is arbitrary, every vector subspace of V is invariant. Irreducibility therefore implies that V is 1-dimensional. Let R be a representation of the ﬁnite group G on the ﬁnite-dimensional complex vector space V , let ( · , · )0 be any inner product on V , and deﬁne (v1 , v2 ) =

(R(x)v1 , R(x)v2 )0 .

x∈G

Then we have (R(g)v1 , R(g)v2 ) =

(R(x)R(g)v1 , R(x)R(g)v2 )0

x∈G

=

(R(xg)v1 , R(xg)v2 )0

x∈G

=

(R(y)v1 , R(y)v2 )0

by the change y = xg

y∈G

= (v1 , v2 ). With respect to the inner product ( · , · ), the representation (R, V ) is therefore unitary. In other words, we are always free to introduce an inner product to make a given ﬁnite-dimensional representation unitary. The signiﬁcance of this construction is noted in the following proposition. Proposition 7.20. If (R, V ) is a ﬁnite-dimensional representation of the ﬁnite group G and if an inner product is introduced in V that makes the representation unitary, then the orthogonal complement of an invariant subspace is invariant. PROOF. Let U be an invariant subspace. If u is in U and u ⊥ is in U ⊥ , then (R(g)u ⊥ , u) = (R(g)−1 R(g)u ⊥ , R(g)−1 u) = (u ⊥ , R(g)−1 u) = 0. Thus u ⊥ in U ⊥ implies R(g)u ⊥ is in U ⊥ . Corollary 7.21. Any ﬁnite-dimensional representation of the ﬁnite group G is a direct sum of irreducible representations. REMARK. That is, we can ﬁnd a system of invariant subspaces such that the action of G is irreducible on each of these subspaces and such that the whole vector space is the direct sum of these subspaces.

VII. Advanced Group Theory

332

PROOF. This is immediate by induction on the dimension. For dimension 0, the representation is the empty direct sum of irreducible representations. If the decomposition is known for dimension < n and if U is an invariant subspace under R of smallest possible dimension ≥ 1, then U is irreducible under R, and Proposition 7.20 says that the subspace U ⊥ , which satisﬁes V = U ⊕ U ⊥ , is invariant. It is therefore enough to decompose U ⊥ , and induction achieves such a decomposition. Proposition 7.22 (Schur orthogonality). For ﬁnite-dimensional representations of a ﬁnite group G in which inner products have been introduced to make the representations unitary, (a) if (R1 , V1 ) and (R2 , V2 ) are inequivalent and irreducible, then (R1 (x)v1 , v1 )(R2 (x)v2 , v2 ) = 0 for all v1 , v2 ∈ V1 and v2 , v2 ∈ V2 . x∈G

(b) if (R, V ) is irreducible, then

(R(x)v1 , v1 )(R(x)v2 , v2 ) =

x∈G

|G|(v1 , v2 )(v1 , v2 ) dim V

for v1 , v2 , v1 , v2 ∈ V.

REMARKS. If G is abelian, then V1 and V2 in (a) are 1-dimensional, and the conclusion of (a) reduces to the statement that the multiplicative characters are orthogonal. Conclusion (b) in this case reduces to a trivial statement. PROOF. For (a), let l : V2 → V1 be any linear map, and form the linear map R1 (x)l R2 (x −1 ). L= x∈G

Multiplying on the left by R1 (g) and on the right by R2 (g −1 ) and changing variables in the sum, we obtain R1 (g)L R2 (g −1 ) = L, so that R1 (g)L = L R2 (g) for all g ∈ G. By Schur’s Lemma (Proposition 7.18) and the assumed irreducibility and inequivalence, L = 0. Thus (Lv2 , v1 ) = 0. For the particular choice of l as l(w2 ) = (w2 , v2 )v1 , we have (R1 (x)l R2 (x −1 )v2 , v1 ) 0 = (Lv2 , v1 ) = x∈G = R1 (x)(R2 (x −1 )v2 , v2 )v1 , v1 = (R1 (x)v1 , v1 )(R2 (x −1 )v2 , v2 ), x∈G

x∈G

and (a) results since (R2 (x −1 )v2 , v2 ) = (R2 (x)v2 , v2 ). For (b), we proceed in the same way, starting from l : V → V , and we obtain L = λI from Schur’s Lemma. Taking the trace of both sides, we ﬁnd that

4. Group Representations

333

λ dim V = Tr L = |G| Tr l. ( Therefore λ = |G|(Tr l) dim V . Since L = λI , (Lv2 , v1 ) =

|G| Tr l (v1 , v2 ). dim V

Again we make the particular choice of l as l(w2 ) = (w2 , v2 )v1 . Since Tr l = (v1 , v2 ), we obtain (v1 , v2 )(v1 , v2 ) Tr l = (v , v ) = |G|−1 (Lv2 , v1 ) dim V dim V 1 2 = |G|−1 (R(x)l R(x −1 )v2 , v1 ) x∈G

= |G|

−1

= |G|

−1

x∈G

x∈G

R(x)(R(x −1 )v2 , v2 )v1 , v1

(R(x)v1 , v1 )(R(x −1 )v2 , v2 ),

and (b) results since (R(x −1 )v2 , v2 ) = (R(x)v2 , v2 ).

Let us interpret Proposition 7.22 as a statement about the left and right regular representations and r of G on the inner-product space C(G, C), the inner product being f, f = g∈G f (g) f (g). Let R be an irreducible representation of G on the ﬁnite-dimensional vector space V , and introduce an inner product to make it unitary. A member of C(G, C) of the form g → (R(g)v, v ) is called a matrix coefﬁcient of R. Let v1 , . . . , vn be an orthonormal basis of V . The matrix representation of G that corresponds to R and this choice of orthonormal basis has ρ(g)i j = (R(g)v j , vi ), and hence the entries of [ρ(g)i j ], as functions on G, provide examples of matrix coefﬁcients. These particular matrix coefﬁcients are orthogonal, according to Proposition 7.22b, with x∈G

|ρ(x)i j |2 =

x∈G

(R(g)v j , vi )(R(g)v j , vi ) =

|G|(v j , v j )(vi , vi ) |G| = . dim V dim V

6 Thus the functions |G|−1 dim V ρ(x)i j form an orthonormal basis of an n 2 -dimensional subspace VR of C(G, C), where n = dim V . The vector subspace VR has the following properties: (i) All matrix coefﬁcients of R are in VR , as is seen by expanding v = j c j v j and v = i di vi and obtaining (R(g)v, v ) = i, j c j d¯i (R(g)v j , vi ) = ¯ i, j c j di ρ(g)i j .

VII. Advanced Group Theory

334

(ii) VR is invariant under and r because

(g)(R( · )v, v )(x) = (R(g −1 x)v, v ) = (R(x)v, R(g)v ), r (g)(R( · )v, v )(x) = (R(xg)v, v ) = (R(x)R(g)v, v ). (iii) Any representation R equivalent to R has VR = VR . Let us see how VR decomposes into irreducible subspaces under r . The computation with r in (ii) above shows, for each i, that the vector space of all functions x → (R(x)v, vi ) for v ∈ V is invariant under r . This is the linear span of the matrix coefﬁcients obtained from the i th row of [ρ(x)i j ]. Deﬁne a linear map A from V into this vector space by Av = (R( · )v, vi ). It is evident that A is one-one onto, and moreover A R(g)v = (R( · )R(g)v, vi ) = r (g)(R( · )v, vi ) = r (g)Av. Thus A exhibits this space, with r as representation, as equivalent to (R, V ). The space VR is the direct sum of these spaces on i, and the summands are orthogonal, according to Proposition 7.22b. Thus VR decomposes under r as the direct sum of dim V irreducible subspaces, each one equivalent to (R, V ). One can make a similar analysis with , using columns in place of rows. However, this analysis is a little more subtle since VR , acted upon by , is the direct sum of dim V copies of the “contragredient” of (R, V ), rather than (R, V ) itself. The details are left to Problems 32–36 at the end of the chapter. As R varies over inequivalent representations, these vector spaces VR are orthogonal, according to Proposition 7.22a. The claim is that their direct sum is the space C(G, C) of all functions on G. In fact, the sum is invariant under r , and if it is nonzero, then we can ﬁnd a nonzero vector subspace U = { f ( · )} of C(G, C) orthogonal to all the spaces VR such that U is invariant and irreducible under r . Let u 1 , . . . , u m be an orthonormal basis of U . Then each function x → (r (x)u j , u i ) is orthogonal to U by construction, i.e., 0= (r (x)u j , u i ) f (x) for all f in U . x∈G

Applying the Riesz Representation Theorem (Theorem 3.12), choose a member e of U such that f (1) = ( f, e) for all f in U . By deﬁnition of r (x) and e, we ﬁnd that u(x) = (r (x)u)(1) = (r (x)u, e) for all u in U . Substitution and use once more of Proposition 7.22b gives 0=

x∈G

(r (x)u j , u i )(r (x)u, e) =

|G|(u j , u)(u i , e) dim U

for all i and j. Since we can take u = u j = u 1 and since i is arbitrary, this equation forces e = 0 and gives a contradiction. We conclude that the sum of all the spaces VR is all of C(G, C). Let us state the result as a theorem.

4. Group Representations

335

Theorem 7.23. For the ﬁnite group G, let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnite-dimensional representations of G, and let VRα be the linear span of the matrix coefﬁcients of Rα . Then (a) the spaces VRα are mutually orthogonal and are invariant under the left and right regular representations and r , (b) the representation (r, VRα ) is equivalent to the direct sum of dim Uα copies of (Rα , Uα ), (c) the direct sum of the spaces VRα is the space C(G, C) of all complexvalued functions on G. Moreover, (d) the number of Rα ’s is ﬁnite, (e) dim VRα = (dim Uα )2 , (f) any irreducible subspace of (r, C(G, C)) that is equivalent to (Rα , Uα ) is contained in VRα . Corollary 7.24. Let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnite-dimensional representations of the ﬁnite group G, and let dα = dim Uα . In Uα , introduce an inner product making (Rα , Uα ) unitary. For each α, let each (α) u 1 , . . . , u (α) be an orthonormal basis of Uα . Then the functions in C(G, C) 6 dα −1 given by |G| dα Rα (x)v j(α) , vi(α) form an orthonormal basis of C(G, C). Consequently every f in C(G, C) satisﬁes 1 Rα (x)v j(α) , vi(α) dα f (y) Rα (y)v j(α) , vi(α) f (x) = |G| α i, j y∈G and

x∈G

| f (x)|2 =

2 1 dα f (y) Rα (y)v j(α) , vi(α) . |G| α i, j y∈G

REMARKS. The ﬁrst displayed formula is the Fourier inversion formula for an arbitrary ﬁnite group G and generalizes Theorem 7.17, which gives the result in the abelian(α)case;(α)in the abelian case all the dimensions dα equal 1, and the functions Rα (x)v j , vi are just the multiplicative characters of G. The second displayed formula is known as the Plancherel formula, a result incorporating the conclusion about norms in Parseval’s equality (Theorem 3.11d). PROOF. This follows form (a), (c), and (e) in Theorem 7.23, together with Theorem 3.11 and the remarks made before the statement of Theorem 7.23. Corollary 7.25. Let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnite-dimensional representations of the ﬁnite group G, and let dα = dim Uα . Then α dα2 = |G|.

VII. Advanced Group Theory

336

PROOF. This follows by counting the number of members listed in the orthonormal basis of C(G, C) given in Corollary 7.24. We shall make use of a second multiplication on the vector space C(G, C) besides the pointwise multiplication that itself makes C(G, C) into a ring. The new multiplication is called convolution and is deﬁned by f 1 (y) f 2 (y −1 x) = f 1 (x y −1 ) f 2 (y), ( f 1 ∗ f 2 )(x) = y∈G

y∈G

the two expressions on the right being equal by a change of variables. The ﬁrst of the expressions on the right equals the value of the function y∈G f 1 (y) (y) f 2 at x and shows that the convolution is an average of the left translates of f 2 weighted by f 1 . Convolution is associative because f 1 (y)( f 2 ∗ f 3 )(y −1 x) = f 1 (y) f 2 (y −1 x z −1 ) f 3 (z) ( f 1 ∗ ( f 2 ∗ f 3 ))(x) = y

=

y,z

( f 1 ∗ f 2 )(x z

−1

) f 3 (z) = (( f 1 ∗ f 2 ) ∗ f 3 )(x),

z

and one readily checks that C(G, C) becomes a ring when convolution is used as the multiplication. For any ﬁnite-dimensional representation (R, V ) and any v in V , let us deﬁne R( f )v = x∈G f (x)R(x)v. Convolution has the property that R( f 1 ∗ f 2 ) = R( f 1 )R( f 2 ) because R( f 1 ∗ f 2 )v = = =

x,y x

f 1 (x y −1 ) f 2 (y)R(x)v f 1 (x) f 2 (y)R(x y)v = x f 1 (x)R(x) y f 2 (y)R(y)v

x ( f1

∗ f 2 )(x)R(x)v =

x,y

f 1 (x)R(x)R( f 2 )v = R( f 1 )R( f 2 )v.

We shall combine the notion of convolution with the notion of a “character.” If (R, V ) is a ﬁnite-dimensional representation of G, then the character of (R, V ) is the function χ R given by χ R (x) = Tr R(x), with Tr denoting the trace. Equivalent representations have the same character since Tr(A R(x)A−1 ) = Tr R(x) if A is invertible. Characters have the additional properties that (i) χ R (gxg −1 ) = χ R (x) because Tr R(gxg −1 ) = Tr(R(g)R(x)R(g)−1 ) = Tr R(x), (ii) χ R1 ⊕···⊕Rn = χ R1 + · · · + χ Rn since the trace of a block-diagonal matrix is the sum of the traces of the blocks.

4. Group Representations

337

The character of a 1-dimensional representation is the associated multiplicative character. Here is an example of a character for a representation on a space of dimension more than 1; its values are not all in S 1 . EXAMPLE. The dihedral group Dn with 2n elements, deﬁned in Section IV.1, is isomorphic to the matrix group generated by cos 2π/n − sin 2π/n 1 0 and y = 0 −1 . x = sin 2π/n cos 2π/n The map carrying each matrix of the group to itself is a representation of Dn on C2 . The value of the character of this representation is 2 cos 2π k/n on x k for 0 ≤ k ≤ n − 1, and the value of the character is 0 on y and on the remaining n − 1 elements of the group. Computations with characters are sometimes aided by the use of inner products. If an inner product is imposed on a ﬁnite-dimensional complex vector space V an orthonormal basis, then the trace of a linear A : V → V is given and if {vi } is by Tr A = i (Avi , vi ). If R is a representation on V , we consequently have χ R (x) = i (R(x)vi , vi ). Proposition 7.26. Let R, R1 , and R2 be irreducible ﬁnite-dimensional representations of a ﬁnite group G. Then their characters satisfy |χ (x)|2 = |G|, (a) x∈G R (b) x∈G χ R1 (x)χ R2 (x) = 0 if R1 and R2 are inequivalent. PROOF. These follow from Schur orthogonality (Proposition 7.22): For (a), let R act on the vector space V , let d = dim V , introduce an inner product with respect to which R is unitary, and let {vi } be an orthonormal basis of V . Then Proposition 7.22b gives x

|χ R (x)|2 = = =

x

i, j

i, j

i

(R(x)vi , vi )

j

(R(x)v j , v j )

x (R(x)vi , vi )(R(x)v j , v j )

|G|d −1 δi j δi j =

i

|G|d −1 = |G|.

Part (b) is proved in the same fashion, using Proposition 7.22a.

Let us now bring together the notions of convolution and character. A class function on G is a function f in C(G, C) with f (gxg −1 ) = f (x) for all g and x in G. That is, class functions are the ones that are constant on each conjugacy class of the group. Every character is an example of a class function. The class

VII. Advanced Group Theory

338

functions form a vector subspace of C(G, C), and the dimension of this vector subspace equals the number of conjugacy classes in G. Class functions are closed under convolution because if f 1 and f 2 are class functions, then ( f 1 ∗ f 2 )(gxg −1 ) = y f 1 (gxg −1 y −1 ) f 2 (y) = y f 1 (xg −1 y −1 g) f 2 (g −1 yg) = z f 1 (x z −1 ) f 2 (z) = ( f 1 ∗ f 2 )(x). On an abelian group every member of C(G, C) is a class function. Theorem 7.27 (Fourier inversion formula for class functions). For the ﬁnite group G, let {(Rα , Uα )} be a complete set of inequivalent irreducible ﬁnitedimensional representations of G. If f is a class function on G, then 1 f (y)χ Rα (y) χ Rα (x). f (x) = |G| α y∈G REMARK. This result may be regarded as a second way (besides the one in Corollary 7.24) of generalizing Theorem 7.17 to the nonabelian case. PROOF. Using the result and notation of Corollary 7.24, we have f (x) = |G|−1 dα f (y)(Rα (y)vi(α) , v j(α) ) (Rα (x)vi(α) , v j(α) ). α

i, j

y∈G

Replace f (y) by f (gyg −1 ) since f is a class function, and then change variables and sum over g in G to see that |G| f (x) is equal to dα f (y)(Rα (y)Rα (g)vi(α) , Rα (g)v j(α) ) (Rα (x)vi(α) , v j(α) ). |G|−1 α

i, j

g,y

Within this expression we have (Rα (y)Rα (g)vi(α) , Rα (g)v j(α) ) g

=

g,k

=

g,k

= = =

|G| dα

Rα (y)(Rα (g)vi(α) , vk(α) )vk(α) , Rα (g)v j(α)

(Rα (g)vi(α) , vk(α) )(Rα (g)v j(α) , Rα (y)vk(α) ) k

(v j(α) , vi(α) )(Rα (y)vk(α) , vk(α) )

by Schur orthogonality

(α) (α) |G| dα (v j , vi )χ Rα (y) |G| dα δi j χ Rα (y).

Substituting, we obtain the formula of the theorem.

4. Group Representations

339

Corollary 7.28. If G is a ﬁnite group, then the number of irreducible ﬁnitedimensional representations of G, up to equivalence, equals the number of conjugacy classes of G. PROOF. Theorem 7.27 shows that the irreducible characters span the vector space of class functions. Proposition 7.26b shows that the irreducible characters are orthogonal and hence are linearly independent. Thus the number of irreducible characters equals the dimension of the space of class functions, which equals the number of conjugacy classes. EXAMPLE. The above information already gives us considerable control over ﬁnding a complete set of inequivalent irreducible ﬁnite-dimensional representations of elementary groups. We know that the number of such representations equals the number of conjugacy classes and that the sum of the squares of their dimensions equals |G|. For the symmetric group S3 of order 6, for example, the conjugacy classes are given by the cycle structures of the possible permutations, namely the cycle structures of (1), (1 2), and (1 2 3). Hence there are three inequivalent irreducible representations. The sum of the squares of the three dimensions is to be 6; thus we have two of dimension 1 and one of dimension 2. The multiplicative characters 1 and sgn are the two of dimension 1, and the one of dimension 2 can be taken to be the 2-dimensional representation of D3 whose character was computed in the example preceding Proposition 7.26. One ﬁnal constraint on the dimensions of the irreducible representations of a ﬁnite group G is as follows. Proposition 7.29. If G is a ﬁnite group and (R, V ) is an irreducible ﬁnitedimensional representation of G, then dim V divides |G|. For example, if |G| = p 2 with p prime, then it follows from Propositions 7.29 and 7.25 that every irreducible ﬁnite-dimensional representation of G has dimension 1, and one can easily conclude from this fact that G is abelian. (See Problem 14 at the end of the chapter.) Thus we recover as an immediate consequence the conclusion of Corollary 4.39 that groups of order p 2 are abelian. The proof of Proposition 7.29 is surprisingly subtle. We shall obtain the theorem as a consequence of Theorem 7.31 below, a theorem that will be used also in the proof of Burnside’s Theorem in the next section. Theorem 7.31 gives a little taste of the usefulness of algebraic number theory, and we shall see more of this usefulness in Chapter IX. The application to Burnside’s Theorem will use the Fundamental Theorem of Galois Theory, whose proof is deferred to Chapter IX. An algebraic integer is any complex number √ that is a root√of a monic polynomial with coefﬁcients in Z. For example, 2 and 12 (1 + i 3) are algebraic

VII. Advanced Group Theory

340

integers because they are roots of X 2 − 2 and X 2 − X + 1, respectively. Any root of unity is an algebraic integer, being a root of some polynomial X n − 1. The set of algebraic integers will be denoted in this chapter by O. Before stating Theorem 7.31, let us establish two elementary facts about O. Lemma 7.30. The set O of algebraic integers is a ring, and O ∩ Q = Z. PROOF. Suppose that x and y are complex numbers satisfying the polynomial equations x m +am−1 x m−1 +· · ·+a1 x+a0 = 0 and y n +bn−1 y n−1 +· · ·+b1 y+b0 = 0, each with integer coefﬁcients. Form the subset of C given by M=

m−1 n−1

Zx k y l .

k=0 l=0

This is a ﬁnitely generated subgroup of the abelian group C under addition. It satisﬁes m n−1 n−1 Zx k y l ⊆ M + Zy l x m xM = k=1 l=0

=M+

n−1

l=0

Zy l (−am−1 x m−1 − · · · − a1 x − a0 ) ⊆ M,

l=0

and similarly y M ⊆ M. Hence (x ± y)M ⊆ M and x y ⊆ M. To prove that O is a ring, it is enough to show that if N is a nonzero ﬁnitely generated subgroup of the abelian group C under addition and if z is a complex number with z N ⊆ N , then z is an algebraic integer. By Theorem 4.56, N is a direct sum of cyclic groups. Since every nonzero member of C has inﬁnite order additively, these cyclic groups must be copies of Z. So N is free abelian. Let z 1 , . . . , z n be a Z basis of N . Here n > 0. Since z N ⊆ N , we can ﬁnd unique integers ci j such that n zz i = ci j z j for 1 ≤ i ≤ n. j=1

z1 . This equation says that the matrix C = [ci j ] has .. as an eigenvector with zn

eigenvalue z. Therefore the matrix z I − C is singular, and det(z I − C) = 0. Since det(z I −C) is a monic polynomial expression in z with integer coefﬁcients, z is an algebraic integer. To see that O∩Q = Z, let p and q be relatively prime integers with q > 0, and suppose that p/q is a root of X n + an−1 X n−1 + · · · + a1 X + a0 with an−1 , . . . , a0 in Z. Substituting p/q for X , setting the expression equal to 0, and clearing fractions, we obtain p n + an−1 p n−1 q + · · · + a1 pq n−1 + a0 q n = 0. Since q divides every term here after the ﬁrst, we conclude that q divides p n . Since GCD( p, q) = 1, we conclude that q = 1. Thus p/q is in Z.

4. Group Representations

341

Lemma 7.30 allows us to see that if G is a ﬁnite group and χ is the irreducible character corresponding to an irreducible ﬁnite-dimensional representation R, then χ (x) is an algebraic integer for each x in G. In fact, the subgroup H of G generated by x is cyclic and is in particular abelian. Corollary 7.21 says that R H is the direct sum of irreducible representations of H , and Corollary 7.19 says that each such irreducible representation is 1-dimensional. Thus in a suitable basis, R H is diagonal. The diagonal entries must be roots of unity (in fact, N th roots of unity if x has order N ), and χ (x) is thus a sum of roots of unity. By Lemma 7.30, χ (x) is an algebraic integer. Theorem 7.31. Let G be a ﬁnite group, (R, V ) be an irreducible ﬁnitedimensional representation of G, χ be the character of R, and C be a conjugacy class in G. Denote by χ (C) the constant value of χ on the conjugacy class C. ( Then |C|χ (C) dim V is an algebraic integer. on G, then R( f ) commutes with each R(x) PROOF. If f is any class function for x in G because R( f ) = y f (y)R(y) yields R(x)R( f )R(x)−1 =

f (y)R(x)R(y)R(x)−1 =

y

=

f (x −1 zx)R(z) =

z

f (y)R(x yx −1 )

y

f (z)R(z) = R( f ).

z

By Schur’s Lemma (Proposition 7.18), R( f ) is scalar. If C is a conjugacy class, then the function IC that is 1 on C and is 0 elsewhere is a class function, and hence basis of R(IC ) is a scalar λC . As C varies, the functions IC form a vector-space the space of class functions. The formula (IC ∗ IC )(x) = y IC (y)IC (y −1 x) shows that IC ∗ IC is integer-valued, and we have seen that the convolution of = C n CC C IC for two class functions is a class function. Therefore IC ∗ IC suitable integers n CC C . Application of R gives λC λC = C n CC C λC . If we ﬁx C and let A be the square matrix with entries AC C = n CC C , we obtain λC λC =

AC C λC .

C

This equation says that the matrix A has the column vector with entries λC as an eigenvector with eigenvalue λC . Therefore the matrix λC I − A is singular, and det(λC I − A) = 0. Since det(λC I − A) is a monic polynomial expression is an algebraic integer. Taking the trace of the in λC with integer coefﬁcients, λC equation R(IC ) = λC I , we obtain x∈C χ (x) = λC dim V . Since χ (x) = χ (C) for x in C, the result is that |C|χ (C)/ dim V = λC . Since λC is an algebraic integer, |C|χ (C)/ dim V is an algebraic integer.

VII. Advanced Group Theory

342

PROOF gives

THAT

|G| = dim V

THEOREM 7.31

|χ (x)|2 = dim V

x∈G

IMPLIES

PROPOSITION 7.29. Proposition 7.26a

C

|χ (x)|2 |C|χ (C) = χ (C). dim V dim V C x∈C

Each term in parentheses on the right side is an algebraic integer, according to Theorem 7.31, and therefore Lemma 7.30 shows that |G|/ dim V is an algebraic integer. Since |G|/ dim V is in Q, Lemma 7.30 shows that |G|/ dim V is in Z. 5. Burnside’s Theorem The theorem of this section is as follows. Theorem 7.32 (Burnside’s Theorem). If G is a ﬁnite group of order pa q b with p and q prime and with a + b > 1, then G has a nontrivial normal subgroup. The argument will use the result Theorem 7.31 from algebraic number theory, and also it will make use of a special case of the Fundamental Theorem of Galois Theory, whose proof is deferred to Chapter IX. That special case is the following statement, whose context was anticipated in Section IV.1, where groups of automorphisms of certain ﬁelds were discussed brieﬂy. Since the set {1, e2πi/n , e2·2πi/n , e3·2πi/n , . . . } is linearly dependent over Q, Proposition 4.1 in that section implies that the subring Q[e2πi/n ] of C generated by Q and e2πi/n is a subﬁeld and is a ﬁnite-dimensional vector space over Q. According to Example 9 of that section, the group = Gal(Q[e2πi/n ]/Q) of automorphisms of Q[e2πi/n ] ﬁxing every element of Q is a ﬁnite group. Proposition 7.33 (special case of the Fundamental Theorem of Galois Theory). Let n > 0 be an integer, and put K = Q[e2πi/n ]. Let be the ﬁnite group of ﬁeld automorphisms of K ﬁxing every element of Q. Then the only members β of K such that σ (β) = β for every σ in are the members of Q. Lemma 7.34. Let G be a ﬁnite group, (R, V ) be an irreducible ﬁnitedimensional representation of G, χ be the character of R, and C be a conjugacy class in G. If GCD(|C|, dim V ) = 1 and if x is in C, then either R(x) is scalar or χ (x) = 0. PROOF. Deﬁne χ (C) to be the constant value of χ on C, and put α = χ (x)/ dim V = χ (C)/ dim V . Since GCD(|C|, dim V ) = 1, we can choose integers m and n with m|C| + n dim V = 1. Multiplication by α yields m|C|χ (C) + nχ (C) = α. dim V

5. Burnside's Theorem

343

(C) Theorem 7.31 shows that the coefﬁcients |C|χ dim V and χ (C) of m and n on the left side are algebraic integers, and therefore α is an algebraic integer. As we observed toward the end of the previous section, χ (x) = χ (C) is the sum of dim V roots of unity. Since α = χ (C)/ dim V , we see that |α| ≤ 1 with equality only if all the roots of unity are equal, in which case R(x) is scalar. In view of the hypothesis, we may assume that |α| < 1. We shall show that α = 0. Let K = Q[e2πi/|G| ] be the smallest subﬁeld of C containing Q and the complex number e2πi/|G| , and let be the group of ﬁeld automorphisms of K that ﬁx every element of Q. We know that K is ﬁnite-dimensional over Q and that is a ﬁnite group, and Proposition 7.33 shows that the only members of K ﬁxed by every element of are the members of Q. Our element x of G has x |G| = 1. Thus every root of unity contributing to χ (x) is a |G|th root of unity and is in K . Therefore the algebraic integer α is in K . If σ is in , each of the |G|th roots of unity is mapped by σ to some complex number x satisfying x |G| = 1, and hence the member σ (α) of K satisﬁes |σ (α)| ≤ 1. Also, σ (α) is an algebraic integer, as we see by applying σ to the monic equation with integer coefﬁcients satisﬁed by α, and we are assuming that |α| < 1. Consequently β = σ ∈ σ (α) is an algebraic integer and has absolute value < 1. A change of variables in the product shows that β is ﬁxed by every member of , and we see from the previous paragraph that β is in Q. By Lemma 7.30, β is in Z. Being of absolute value less than 1, it is 0. Thus α = 0, and χ (x) = 0.

Lemma 7.35. Let G be a ﬁnite group, and let C be a conjugacy class in G such that |C| = p k for some prime p and some integer k > 0. Then there exists an irreducible ﬁnite-dimensional representation R = 1 of G with R(x) scalar for every x in C. Consequently G is not simple. PROOF. The conjugacy class C cannot be {1} because |{1}| = p k with k > 0. Let χreg be the character of the right regular representation r of G on C(G, C). If Ig denotes the function that is 1 at g and is 0 elsewhere, then the functions Ig form an orthonormal basis of C(G, C), and therefore χreg (x) = g∈G (r (x)Ig , Ig ) = g∈G (I gx −1 , I g ). Every term on the right side is 0 if x = 1, and thus Theorem 7.23 gives dχ χ (x) for x ∈ C, (∗) 0 = χreg (x) = 1 + χ =1

the sum being taken over all irreducible characters other than 1, with dχ being the dimension of an irreducible representation corresponding to χ. Let Rχ be an irreducible representation with character χ. Any χ such that p does not divide dχ has GCD(|C|, dχ ) = 1 since |C| is assumed to be a power of p. Arguing by

344

VII. Advanced Group Theory

contradiction, we may assume that no such χ has Rχ (x) scalar, and then Lemma 7.34 says that χ (x) = 0 for all such χ. Hence (∗) simpliﬁes to dχ χ (x) for x ∈ C. (∗∗) 0=1+ χ =1, p divides dχ

Since χ (x) is an algebraic integer, Lemma 7.30 shows that this equation is of the form 1 + pβ = 0, where β is an algebraic integer. Then β = −1/ p shows that −1/ p is an algebraic integer. Since −1/ p is in Q, Lemma 7.30 shows that it must be in Z, and we have arrived at a contradiction. Thus there must have been some χ with Rχ (x) scalar for x in C. The set of g in G for which this Rχ has Rχ (g) scalar is a normal subgroup of G that contains x and cannot therefore be {1}. Assume by way of contradiction that G is simple. Then Rχ (g) is scalar for all g in G. Since Rχ is irreducible, Rχ is 1-dimensional. Then the commutator subgroup G of G is contained in the kernel of Rχ . Since Rχ = 1, G is not all of G. Since G is normal, G = {1}, and we conclude that G is abelian. But the given G has a conjugacy class with more than one element, and we have arrived at a contradiction. PROOF OF THEOREM 7.32. Corollary 4.38 shows that a group of prime-power order has a center different from {1}, and we may therefore assume that p = q, a > 0, and b > 0. Let H be a Sylow q-subgroup. Applying Corollary 4.38, let x be a member of the center Z H of H other than 1. The centralizer Z G ({x}) is a subgroup containing H , and it therefore has order pa q b . If a = a, then x is in the center of G, and the powers of x form the desired proper normal subgroup of G. Thus a < a. By Proposition 4.37 the conjugacy class C of x has |G|/ pa q b = pa−a elements with a − a > 0. By Lemma 7.35, G is not simple. 6. Extensions of Groups In Section IV.8 we examined composition series for ﬁnite groups. For a given ﬁnite group, a composition series consists of a decreasing sequence of subgroups starting with the whole group and ending with {1}, each normal in the next larger one, such that the successive quotient groups are simple. The Jordan–H¨older Theorem (Corollary 4.50) assured us that the set of successive quotients, up to isomorphism, is independent of the choice of composition series. This theorem raises the question of reconstructing the whole group from data of this kind. Consider a single step of the process. If we know the normal subgroup and the simple quotient that it yields at a certain stage, what are the possibilities for the next-larger subgroup? We study this question and some of its ramiﬁcations in this section, dropping any hypotheses that are not helpful in the analysis. Here is an example that we shall carry along.

6. Extensions of Groups

345

EXAMPLE. Suppose that the normal subgroup is the cyclic group C4 and that the quotient is the cyclic group C2 . The whole group has to be of order 8, and the classiﬁcation of groups of order 8 done in Problems 39–44 at the end of Chapter IV tells us that there are four different possibilities for the whole group: the abelian groups C4 × C2 and C8 , the dihedral group D4 , and the quaternion group H8 . Let us establish a framework for the general problem. We start with a group E, a normal subgroup N , and the quotient G = E/N . We seek data that determine the group law in E in terms of N and G. For each member u of G, ﬁx a coset representative u¯ in E such that u¯ N = u. Since N is normal, the element u¯ of E ¯ u¯ −1 . In addition, the fact yields an automorphism ( · )u of N deﬁned by x u = ux that G is a group says that any two of our representatives u¯ and v¯ have u¯ v¯ = a(u, v)uv

for some unique a(u, v) in N .

The set of all elements a(u, v) for this choice of coset representatives is called a factor set, and E is called a group extension of N by the group3 G. The automorphisms and the factor set constructed above have to satisfy two compatibility conditions, as follows: ¯ v )u¯ −1 = u¯ vx ¯ v¯ −1 u¯ −1 (i) (x v )u = a(u, v)x uv a(u, v)−1 because (x u )v = u(x −1 uv −1 = (a(u, v)uv)x(a(u, v)uv) = a(u, v)x a(u, v) , ¯ w¯ = a(u, v)uv w¯ (ii) a(v, w)u a(u, vw) = a(u, v)a(uv, w) because (u¯ v) ¯ v¯ w) ¯ = ua(v, ¯ w)vw = a(v, w)u uvw ¯ = = a(u, v)a(uv, w)uvw and u( a(v, w)u a(u, vw)uvw. Then the multiplication law in E is given in terms of the automorphisms and the factor set by the formula ¯ v) ¯ = x y u u¯ v¯ = (iii) (x u)(y ¯ v) ¯ = x y u a(u, v)uv by the computation (x u)(y u x y a(u, v)uv. Conversely, according to the proposition below, such data determine a group E with a normal subgroup isomorphic to N and a quotient E/N isomorphic to G. Proposition 7.36 (Schreier). Let two groups N and G be given, along with a family of automorphisms x → x u of N parametrized by u in G, as well as a function a : G × G → N such that (a) (x v )u = a(u, v)x uv a(u, v)−1 for all u and v in G, (b) a(v, w)u a(u, vw) = a(u, v)a(uv, w) for all u, v, w in G. Then the set N × G becomes a group E under the multiplication (c) (x, u)(y, v) = (x y u a(u, v), uv), 3 Warning:

Some authors say “group extension of G by N .”

VII. Advanced Group Theory

346

and this group has a normal subgroup isomorphic to N with quotient group isomorphic to G. More particularly, the identity of E is (a(1, 1)−1 , 1), the map x → (xa(1, 1)−1 , 1) of N into E is a one-one homomorphism that exhibits N as a normal subgroup of E, and the map (x, u) → u of E onto G is a homomorphism that exhibits G as isomorphic to E/N . PROOF. Reverting to the earlier notation, let us write x u¯ in place of (x, u) for elements of E. Associativity of multiplication follows from the computation (x u¯ y v)(z ¯ w) ¯ = x y u a(u, v)uv z w¯

by (c)

= x y a(u, v)z a(uv, w)uvw u

uv

by (c)

−1

= x y a(u, v)z a(u, v) a(u, v)a(uv, w)uvw u

uv

= x y u a(u, v)z uv a(u, v)−1 a(v, w)u a(u, vw)uvw u = x yz v a(v, w) a(u, vw)uvw = (x u) ¯ yz v a(v, w)vw

by (c)

= (x u)(y ¯ vz ¯ w) ¯

by (c).

by (b) by (a)

¯ The identity is to be 1a(1, 1)−1 . Before checking this assertion, we prove three preliminary identities. Setting u = v = 1 in (a) and replacing x 1 by x gives4 x 1 = a(1, 1)xa(1, 1)−1

for all x ∈ N .

(∗)

Setting v = w = 1 in (b) gives a(1, 1)u a(u, 1) = a(u, 1)a(u, 1) and hence a(1, 1)u = a(u, 1)

for all u ∈ G.

(†)

Meanwhile, setting u = v = 1 in (b) gives a(1, w)1 a(1, w) = a(1, 1)a(1, w) and hence a(1, w)1 = a(1, 1) for all w ∈ G. The left side a(1, w)1 of this last equality is equal to a(1, 1)a(1, w)a(1, 1)−1 by (∗); canceling a(1, 1) yields a(1, w) = a(1, 1)

for all w ∈ G.

(††)

Using these identities, we check that a(1, 1)−1 1¯ is a two-sided identity by making the computations ¯ = x(a(1, 1)−1 )u a(u, 1)u¯ (x u)(a(1, ¯ 1)−1 1) −1 u

= x(a(1, 1) ) a(1, 1) u¯ u

by (c) by (†)

= x u¯ 4 The effect of the automorphism x → x 1 is not necessarily trivial since the coset representative 1¯ of 1 is not assumed to be the identity. Thus we must distinguish between x 1 and x.

6. Extensions of Groups

347

and ¯ v) (a(1, 1)−1 1)(y ¯ = a(1, 1)−1 y 1 a(1, v)v¯ −1

by (c)

= ya(1, 1) a(1, v)v¯

by (∗)

= y v¯

by (††). −1

Let us check that a left inverse for x u¯ is a(1, 1)−1 a(u −1 , u)−1 (x u )−1 u −1 . In fact, −1 ¯ a(1, 1)−1 a(u −1 , u)−1 (x u )−1 u −1 (x u) −1 −1 = a(1, 1)−1 a(u −1 , u)−1 (x u )−1 x u a(u −1 , u)1¯ ¯ = a(1, 1)−1 1,

by (c)

as required. Thus multiplication is associative, there is a two-sided identity, and every element has a left inverse. It follows that E is a group. The map x u¯ → u of E into G is a homomorphism by (c), and it is certainly onto G. Its kernel is evidently the subgroup of all elements xa(1, 1)−1 1¯ in E. Since

xa(1, 1)−1 1¯ ya(1, 1)−1 1¯ = xa(1, 1)−1 (ya(1, 1)−1 )1 a(1, 1)1¯ = xa(1, 1)−1 a(1, 1)(ya(1, 1)−1 )1¯

by (c) by (∗)

¯ = x ya(1, 1)−1 1, the one-one map x → xa(1, 1)−1 1¯ of N onto the kernel respects the group structures and is therefore an isomorphism. In other words, the embedded version of N is the kernel. Being a kernel, it is a normal subgroup. EXAMPLE, CONTINUED. Let N = C4 = {1, r, r 2 , r 3 } and G = C2 = {1, u 0 } with u 20 = 1. The group N has two automorphisms, the nontrivial one ﬁxing 1 and r 2 while interchanging r and r 3 . The automorphism of N from 1 ∈ G has to be trivial, while the automorphism of N from u 0 ∈ G can be trivial or nontrivial. In fact, trivial for E = C4 × C2 and E = C8 , the automorphism is nontrivial for E = D4 and E = H8 . In each case the automorphism does not depend on the choice of coset representatives. The factor sets do depend on the choice of representatives, however. Let us ﬁx 1¯ as the identity of E and make a particular choice of u 0 for each E. Then

VII. Advanced Group Theory

348

the deﬁnition of factor set shows that a(1, 1) = a(u 0 , 1) = a(1, u 0 ) = 1, and the only part of the factor set yet to be determined is a(u 0 , u 0 ). Let us consider matters group by group. For C4 × C2 , we can take u 0 to be the generator of the C2 factor; this has square 1, and hence a(u 0 , u 0 ) = 1. For C8 = {1 θ, θ 2 , . . . , θ 7 }, let us think of N as embedded in E with r = θ 2 . The element u 0 can be any odd power of θ; if we take u 0 = θ, then (u 0 )2 = θ 2 = r , and hence a(u 0 , u 0 ) = r . For E = D4 , the example following Proposition 7.8 shows that we may view the elements as the rotations 1, r, r 2 , r 3 and the reﬂections s, r s, r 2 s, r 3 s for particular choices of r and s. We can take u 0 to be any of the reﬂections, and then (u 0 )2 = 1 and a(u 0 , u 0 ) = 1. Finally for E = H8 = {±1, ±i, ±j, ±k}, let us say that N is embedded as {±1, ±i}. Then u 0 can be any of the four elements ±j and ±k. Each of these has square −1, and hence a(u 0 , u 0 ) = −1. For the choices we have made, we therefore have ⎧ for E = C4 × C2 and E = D4 , ⎨1 a(u 0 , u 0 ) = r for E = C8 , ⎩ −1 for E = H8 . The formula of Proposition 7.36a reduces to (x v )u = x uv since N is abelian, and it is certainly satisﬁed. The formula for Proposition 7.36b is a(v, w)u a(u, vw) = a(u, v)a(uv, w). This is satisﬁed for E = C4 × C2 and E = D4 since a( · , · ) is identically 1. For the other two cases the values of a( · , · ) lie in the 2-element subgroup of N that is ﬁxed by the nontrivial automorphism, and hence a(v, w)u = a(v, w) in every case. The formula to be checked reduces to a(v, w)a(1, 1) = a(1, 1)a(v, w) by (††) if u = 1, to a(1, 1)a(u, w) = a(1, 1)a(u, w) by (†) and (††) if v = 1, and to a(1, 1)a(u, v) = a(u, v)a(1, 1) by (†) if w = 1. Thus all that needs checking is the case that u = v = w = u 0 , and then the formula in question reduces to a(u 0 , u 0 )a(1, 1) = a(u 0 , u 0 )a(1, 1) by (†) and (††). Let us examine for a particular extension the dependence of the automorphisms and factor set on the choice of coset representatives. Returning to our original construction, suppose that we change the coset representatives of the members of G, associating a member 3 u to u ∈ G in place of u. ¯ We then obtain a new ∗ u x3 u −1 automorphism of N corresponding to u, and we write it as x → x u = 3 u −1 instead of x → x = ux ¯ u¯ . To quantify matters, we observe that 3 u lies in the same coset of N as does u. ¯ Thus 3 u = α(u)u¯ for some function α : G → N , and the function α can be absolutely arbitrary. In terms of this function α, the two automorphisms are related by ∗

u x3 u −1 = α(u)ux ¯ u¯ −1 α(u)−1 = α(u)x u α(u)−1 . xu = 3 If the factor set for the system {3 u } of coset representatives is denoted by uv = 3 u3 v = α(u)uα(v) ¯ v¯ = {b(u, v)}, then we have b(u, v)α(uv)uv = b(u, v)7

6. Extensions of Groups

349

α(u)α(v)u a(u, v)uv. Equating coefﬁcients of uv, we obtain b(u, v) = α(u)α(v)u a(u, v)α(uv)−1 . Accordingly we say that a group extension of N by G determined by automorphisms x → x u and a factor set a(u, v) is equivalent, or isomorphic, to a group ∗ extension of N by G determined by automorphisms x → x u and a factor set b(u, v) if there is a function α : G → N such that ∗

x u = α(u)x u α(u)−1

and

b(u, v) = α(u)α(v)u a(u, v)α(uv)−1

for all u and v in G. It is immediate that equivalence of group extensions is an equivalence relation. Proposition 7.37. Suppose that E 1 and E 2 are group extensions of N by G with respective inclusions i 1 : N → E 1 and i 2 : N → E 2 and with respective quotient homomorphisms ϕ1 : E 1 → G and ϕ2 : E 2 → G. If there exists a group isomorphism : E 1 → E 2 such that the two squares in Figure 7.4 commute, then the two group extensions are equivalent. Conversely if the two group extensions are equivalent, then there exists a group isomorphism : E 1 → E 2 such that the two squares in Figure 7.4 commute. i1

ϕ1

i2

ϕ2

N −−−→ E 1 −−−→ 8 ⏐ 8 ⏐ 8

G 8 8 8

N −−−→ E 2 −−−→ G FIGURE 7.4. Equivalent group extensions. REMARKS. The commutativity of the squares is important. Just because two group extensions of N by G are isomorphic as groups does not imply that they are equivalent group extensions. An example is given in Problem 19 at the end of the chapter. PROOF. For the direct part, suppose that exists. For each u in G, select u¯ in ¯ = u. Then we can form the extension data {x → x u } and {a(u, v)} E 1 with ϕ1 (u) for E 1 relative to the normal subgroup i 1 (N ) and the system {u¯ | u ∈ G} of coset representatives. When reinterpreted in terms of N , E 1 , and G, these data become {i 1−1 (x) → i 1−1 (x u )} and {i 1−1 (a(u, v))}. ¯ since i 1 = i 2 , and Application of to the coset i 1 (N )u¯ yields i 2 (N )(u) ¯ = ϕ1 (u) ¯ = u. Setting 3 u = (u), ¯ we (u) ¯ is a member of E 2 with ϕ2 ((u)) ¯ is the coset i 2 (N )3 u of i 2 (N ) in E 2 . Thus we can determine see that (i 1 (N )u)

VII. Advanced Group Theory

350

extension data for E 2 relative to i 2 (N ) and the system {3 u | u ∈ G}, and we can −1 transform them by i 2 to obtain data relative to N , E 2 , and G. The claim is that the data relative to N , E 2 , and G match those for N , E 1 , and ∗ G. The automorphisms of N from E 2 are the maps i 2−1 (x ) → i 2−1 (x u ), where ∗ x u = 3 u −1 . From i 2 = i 1 and the fact that each of these maps is one-one, u x 3 we obtain i 2−1 = i 1−1 −1 on i 2 (N ). Substitution shows that the automorphisms of N from E 2 are ∗

i 1−1 (−1 (x )) → i 1−1 (−1 (x u )) = i 1−1 (−1 (3 u −1 )) u x 3 ¯ −1 (x )u¯ −1 ) = i 1−1 ((−1 (x ))u ). = i 1−1 (u If we set x = (x) with x in i 1 (N ), then the automorphisms of N from E 2 take the form i 1−1 (x) → i 1−1 (x u ). Thus they match the automorphisms of N from E 1 . In the case of the factor sets, we have u¯ v¯ = a(u, v)uv. Application of gives 3 u3 v = (a(u, v))7 u v. Thus the factor set for E 2 relative to N is {i 2−1 (a(u, v))}. −1 −1 Since i 2 = i 1 , this matches the factor set for E 1 relative to N . We turn to the converse part. Suppose that the multiplication law in E 1 is ¯ 1 (y)v) ¯ = i 1 (x)i 1 (y)u i 1 (a(u, v))uv for x and y in N , and that the (i 1 (x)u)(i ∗ u )(i 2 (y)3 v ) = i 2 (x)i 2 (y)u i 2 (b(u, v))7 u v. Here u¯ multiplication law in E 2 is (i 2 (x)3 u and3 v are preimages of u and v under and v¯ are preimages of u and v under ϕ1 , and3 ∗ ∗ ϕ2 . Deﬁne automorphisms of N by x u = i 1−1 (i 1 (x)u ) and x u = i 2−1 (i 2 (x)u ). We can then rewrite the multiplication laws as ¯ 1 (y)v) ¯ = i 1 (x y u a(u, v))uv (i 1 (x)u)(i ∗

(i 2 (x)3 u )(i 2 (y)3 v ) = i 2 (x y u b(u, v))7 u v.

and

The assumption that E 1 is equivalent to E 2 as an extension of N by G means that there exists a function α : G → N such that ∗

x u = α(u)x u α(u)−1

and

b(u, v) = α(u)α(v)u a(u, v)α(uv)−1

for all u and v in G. Deﬁne : E 1 → E 2 by ¯ = i 2 (xα(u)−1 )3 (i 1 (x)u) u. Certainly is one-one onto. It remains to check that is a group homomorphism and that the squares commute in Figure 7.4. To check that : E 1 → E 2 is a group homomorphism, we compare ¯ 1 (y)v) ¯ = (i 1 (x y u a(u, v))uv = i 2 (x y u a(u, v)α(uv)−1 )7 uv (i 1 (x)ui

6. Extensions of Groups

351

with the product (i 1 (x)u)(i ¯ ¯ = i 2 (xα(u)−1 )3 u i 2 (yα(v)−1 )3 v 1 (y)v) ∗

= i 2 (xα(u)−1 (yα(v)−1 )u b(u, v))7 u v. Since ∗

∗

α(u)−1 (yα(v)−1 )u b(u, v) = α(u)−1 (yα(v)−1 )u α(u)α(v)u a(u, v)α(uv)−1 = (yα(v)−1 )u α(v)u a(u, v)α(uv)−1 = y u a(u, v)α(uv)−1 , these expressions are equal, and is a group homomorphism. Thus is a group isomorphism. Now we check the commutativity of the squares. The computation ¯ = ϕ2 (i 2 (xα(u)−1 )3 u ) = u = ϕ1 (i 1 (x)u) ¯ ϕ2 (i 1 (x)u) shows that the right-hand square commutes. For the left-hand square we use the fact recorded in the statement of Proposition 1 is the identity of 7.36 that i 1 (a(1, 1)−1 )1¯ is the identity of E 1 and i 2 (b(1, 1)−1 )3 ¯ = i 2 (xa(1, 1)−1 α(1)−1 )3 1. Since E 2 . Therefore i 1 (x) = (i 1 (xa(1, 1)−1 )1) −13 i 2 (x) = xb(1, 1) 1, the left-hand square commutes if b(1, 1) = α(1)a(1, 1). This formula follows from (∗) in the proof of Proposition 7.36 by the computation b(1, 1) = α(1)α(1)1 a(1, 1)α(1)−1 = α(1)a(1, 1)α(1)α(1)−1 = α(1)a(1, 1), and thus the left-hand square indeed commutes.

For the remainder of this section, let us assume that N is abelian. In this case Proposition 7.36a reduces to the identity (x v )u = x uv for all u and v in G independently of the choice of representatives, just as it does in the example we studied with N = C4 and G = C2 . In the terminology of Section IV.7, G acts on N by automorphisms.5 Suppose we ﬁx such an action τ : G → Aut N by automorphisms and consider all extensions of N by G built from τ . In our example we are thus to consider E equal to C4 × C2 or C8 , which are built with the trivial τ , or else E equal to D4 or H8 , which are built with the nontrivial τ (in which the nontrivial element of G acts by the nontrivial automorphism of N ). Since N is abelian, let us switch to additive notation for N and to ordinary function notation for τ (w), rewriting the formula of Proposition 7.36b as τ (u)a(v, w) + a(u, vw) = a(u, v) + a(uv, w). 5 The formula (x v )u = x uv correctly corresponds to a group action with the group on the left as in Section IV.7.

352

VII. Advanced Group Theory

This condition is preserved under addition of factor sets as long as τ does not change, it is satisﬁed by the 0 factor set, and the negative of a factor set is again a factor set. Therefore the factor sets for this τ form an abelian group. Two factor sets for this τ are equivalent (in the sense of yielding equivalent group extensions) if and only if their difference is equivalent to 0, and a(u, v) is equivalent to 0 if and only if a(u, v) = α(uv) − α(u) − τ (u)α(v) for some function α : G → N . The set of factor sets for this τ that are equivalent to 0 is thus a subgroup,6 and we arrive at the following result. Proposition 7.38. Let G and N be groups with N abelian, and suppose that τ : G → Aut N is a homomorphism. Then the set of equivalence classes of group extensions of N by G corresponding to the action τ : G → Aut N is parametrized by the quotient of the abelian group of factor sets by the subgroup of factor sets equivalent to 0. The extension E corresponding to the 0 factor set is of special interest. In this case the multiplication law for the coset representatives is u¯ v¯ = uv since the member a(u, v) = 0 of N is to be interpreted multiplicatively in this product formula. Consequently the map u → u¯ of G into E is a group homomorphism, necessarily one-one, and we can regard G as a subgroup of E. Proposition 4.44 allows us to conclude that E is the semidirect product G ×τ N . The multiplication law for general elements of E, with multiplicative notation used for N , is (x u)(y ¯ v) ¯ = x(τ (u)y)uv. It is possible also to describe explicitly the extension one obtains from the sum of two factor sets corresponding to the same τ , but we leave this matter to Problems 20–23 at the end of the chapter. The operation on extensions that corresponds to addition of factor sets in this way is called Baer multiplication. What we saw in the previous paragraph says that the group identity under Baer multiplication is the semidirect product. The two conditions, the compatibility condition on a factor set given in Proposition 7.36b and the condition with α in it for equivalence to 0, are of a combinatorial type that occurs in many contexts in mathematics and is captured by the ideas of “homology” and “cohomology.” For the current situation the notion is that of cohomology of groups, and we shall deﬁne it now. The subject of homological 6 One can legitimately ask whether an arbitrary α : G → N leads to a factor set under the deﬁnition a(u, v) = α(uv) − τ (v)α(u) − α(v), and one easily checks that the answer is yes. Alternatively, one can refer to the case n = 2 in the upcoming Proposition 7.39.

6. Extensions of Groups

353

algebra, which is introduced in Advanced Algebra, puts cohomology of groups in a wider context and explains some of its mystery. We ﬁx an abelian group N , a group G, and a group action τ of G on N by automorphisms. It is customary to suppress τ in the notation for the group action, and we shall follow that convention. For integers n ≥ 0, one begins with the abelian group C n (G, N ) of n-cochains of G with coefﬁcients in N . This is deﬁned by N if n = 0, n C (G, N ) = n if n > 0. f : k=1 G → M n In words, C (G, N ) is the set of all functions into M from the n-fold direct product of G with itself. The coboundary map δn : C n (G, N ) → C n+1 (G, N ) is the homomorphism of abelian groups deﬁned by (δ0 f )(g1 ) = g1 f − f and by (δn f )(g1 , . . . , gn+1 ) = g1 ( f (g2 , . . . , gn+1 )) n + (−1)i f (g1 , . . . , gi−1 , gi gi+1 , gi+2 , . . . , gn+1 ) i=1

+ (−1)n+1 f (g1 , . . . , gn ) for n > 0. We postpone to the end of this section the proof of the following result. Proposition 7.39. δn δn−1 = 0 for all n ≥ 1. It follows from Proposition 7.39 that image δn−1 ⊆ ker δn for all n ≥ 1. Thus if we deﬁne abelian groups by Z n (G, N ) = ker δn , 0 for n = 0, n B (G, N ) = for n > 0, image δn−1 n n then B (G, N ) ⊆ Z (G, N ) for all n, and it makes sense to deﬁne the abelian groups for n ≥ 0. H n (G, N ) = Z n (G, N )/B n (G, N ) The elements of Z n (G, N ) are called n-cocycles, the elements of B n (G, N ) are called n-coboundaries, and H n (G, N ) is called the n th cohomology group of G with coefﬁcients in N . EXAMPLES IN LOW DEGREE. DEGREE 0. Here (δ0 f )(u) = u f − f with f in N and u in G. The cocycle condition is that this is 0 for all u. Thus f is to be ﬁxed by G. We say that an f

354

VII. Advanced Group Theory

ﬁxed by G is an invariant of the group action. The space of invariants is denoted by N G . By convention above, we are taking B 0 (G, N ) = 0. Thus H 0 (G, N ) = N G . DEGREE 1. Here (δ1 f )(u, v) = u( f (v)) − f (uv) + f (u) with f a function from G to N . The cocycle condition is that f (uv) = f (u) + u( f (v))

for all u, v ∈ G.

A function f satisfying this condition is called a crossed homomorphism of G into N . A coboundary is a function f : G → N of the form f (u) = (δ0 x)(u) = ux − x for some x ∈ N . Then H 1 (G, N ) is the quotient of the group of crossed homomorphisms by this subgroup. In the special case that the action of G on N is trivial, the crossed homomorphisms reduce to ordinary homomorphisms of G into N , and every coboundary is 0. Thus H 1 (G, N ) is the group of homomorphisms of G into N if G acts trivially on N . DEGREE 2. Here f is a function from G × G into N , and (δ2 f )(u, v, w) = u( f (v, w)) − f (uv, w) + f (u, vw) − f (u, v). The cocycle condition is that u( f (v, w)) + f (u, vw) = f (uv, w) + f (u, v)

for all u, v, w ∈ G.

This is the same as the condition that { f (u, v)} be a factor set for extensions of N by G relative to the given action of G on N by automorphisms. A coboundary is a function f : G × G → N of the form f (u, v) = (δ0 α)(u, v) = u(α(v)) − α(uv) + α(u)

for some α : G → N .

This is the same as the condition that {− f (u, v)} be a factor set equivalent to 0. Thus we can restate Proposition 7.38 as follows. Proposition 7.40. Let G and N be groups with N abelian, and suppose that τ : G → Aut N is a homomorphism. Then the set of equivalence classes of group extensions of N by G corresponding to the action τ : G → Aut N is parametrized by H 2 (G, N ). Since group extensions have such a nice interpretation in terms of cohomology groups H 2 , it is reasonable to look for a nice interpretation for H 1 as well. Indeed, H 1 has an interpretation in terms of uniqueness up to inner isomorphisms for semidirect-product decompositions. We continue with the abelian group N , a group G, and a group action τ of G on N by automorphisms. A semidirect product E = G ×τ N is an allowable extension. Since G embeds as a subgroup of E, we are given a one-one group homomorphism u → u¯ of G into E. The construction at the beginning of this section works with the set u¯ of coset representatives, and they have u¯ v¯ = uv.

6. Extensions of Groups

355

Suppose that the semidirect product can be formed by a second one-one group homomorphism u → 3 u of G into E. If we write 3 u = α(u)u¯ for a function α : G → N , then we know from earlier in the section that the extensions formed from {u} ¯ and from {3 u } are equivalent. Because G maps homomorphically into E for both systems, the factor sets are 0 in both cases. Consequently the function α must satisfy α(uv) − α(u) − τ (u)α(v) = 0. This is exactly the condition that α : G → N be a 1-cocyle. Thus the group Z 1 (G, N ) parametrizes all ways that we can embed G as a complementary subgroup to N in the semidirect product E = G ×τ N . A relatively trivial way to construct a one-one group homomorphism u → 3 u ¯ 0 for from u → u¯ is to form, in the usual multiplicative notation, 3 u = x0−1 ux some x0 ∈ N . Then 3 u = x0−1 ux ¯ 0 1¯ = x0−1 (τ (u)(x0 ))u, ¯ and the additive notation for α(u) has α(u) = τ (u)(x0 ) − x0 . Referring to our earlier computations in degree 1, we see that α is in the group B 1 (G, N ) of coboundaries. The conclusion is that H 1 (G, N ) parametrizes all ways, modulo relatively trivial ways, that we can embed G as a complementary subgroup to N in the semidirect product E = G ×τ N . As promised, we now return to the proof of Proposition 7.39. PROOF OF PROPOSITION 7.39. For n = 1, we have (δ1 δ0 f )(u, v) = u((δ0 f )(v)) − (δ0 f )(uv) + (δ0 f )(u) = u(v f − f ) − (uv f − f ) + (u f − f ) = 0. For n > 1, we begin with (δn δn−1 f )(g1 , . . . , gn+1 ) = g1 ((δn−1 f )(g2 , . . . , gn+1 )) n + (−1)i (δn−1 f )(g1 , . . . , gi gi+1 , . . . , gn+1 ) i=1

+ (−1)n+1 (δn−1 f )(g1 , . . . , gn ) = I + II + III. Here I = g1 g2 ( f (g3 , . . . , gn+1 )) +

n

(−1)i−1 g1 ( f (g2 , . . . , gi gi+1 , . . . , gn+1 ))

i=2

+ (−1)n g1 ( f (g2 , . . . , gn )) = IA + IB + IC, II = −(δn−1 f )(g1 g2 , g3 , . . . , gn )+

n i=2

= IIA + IIB,

(−1)i (δn−1 f )(g1 , . . . , gi gi+1 , . . . , gn+1 )

VII. Advanced Group Theory

356

III = (−1)n+1 g1 ( f (g2 , . . . , gn )) + (−1)n+1 (−1) f (g1 g2 , g3 , . . . , gn ) + (−1)n+1

n−1

(−1)i f (g1 , . . . , gi gi+1 , . . . , gn )

i=2

+ (−1)n+1 (−1)n f (g1 , . . . , gn−1 ) = IIIA + IIIB + IIIC + IIID. Terms IIA and IIB decompose further as IIA = −g1 g2 ( f (g3 , . . . , gn+1 )) + f (g1 g2 g3 , g4 , . . . , gn+1 ) n − (−1)i+1 f (g1 g2 , . . . , gi gi+1 , . . . , gn+1 ) − (−1)n f (g1 g2 , g3 , . . . , gn ) i=3

= IIAa + IIAb + IIAc + IIAd, IIB =

n

(−1)i g1 ( f (g2 , . . . , gi gi+1 , . . . , gn+1 ))

i=2

+ (−1)2 (−1) f (g1 g2 g3 , g4 , . . . , gn+1 ) n + (−1)i (−1) f (g1 g2 , . . . , gi gi+1 , . . . , gn+1 ) i=3

+

n

(−1)i

i=2

+

n

i−2

(−1) j f (g1 , . . . , g j g j+1 , . . . , gi gi+1 , . . . , gn+1 )

j=2

(−1)i (−1)i−1 f (g1 , . . . , gi−1 gi gi+1 , . . . , gn+1 )

i=3

+

n−1

(−1)i (−1)i f (g1 , . . . , gi gi+1 gi+2 , . . . , gn+1 )

i=2

+

n−2 i=2

+

n−1

(−1)i

n

(−1) j−1 f (g1 , . . . , gi gi+1 , . . . , g j g j+1 , . . . , gn+1 )

j=i+2

(−1)i (−1)n f (g1 , . . . , gi gi+1 , . . . , gn )

i=2

+ (−1)n (−1)n f (g1 , . . . , gn−1 ) = IIBa + IIBb + IIBc + IIBd + IIBe + IIBf + IIBg + IIBh + IIBi. Inspection shows that we have cancellation between term IA and term IIAa, term IB and term IIBa, term IC and term IIIA, term IIAb and term IIBb, term IIAc and term IIBc, term IIAd and term IIIB, term IIBd and term IIBg, term IIBe and term IIBf, term IIBh and term IIIC, and term IIBi and term IIID. All the terms cancel, and we conclude that δn δn−1 f = 0.

7. Problems

357

7. Problems 1.

Using Burnside’s Theorem and Problem 34 at the end of Chapter IV, show that 60 is the smallest possible order of a nonabelian simple group.

2.

A commutator in a group is any element of the form x yx −1 y −1 . (a) Prove that the inverse of a commutator is a commutator. (b) Prove that any conjugate of a commutator is a commutator.

3.

Let a and b be elements of a group G. Prove that the subgroup generated by a and b is the same as the subgroup generated by bab2 and bab3 .

4.

A subgroup H of a group G is said to be characteristic if it is carried into itself by every automorphism of G. (a) Prove that characteristic implies normal. (b) Prove that the center Z G of G is a characteristic subgroup. (c) Prove that the commutator subgroup G of G is a characteristic subgroup.

5.

In the terminology of the previous problem, which subgroups of the quaternion subgroup H8 are characteristic?

6.

Is every ﬁnite group ﬁnitely presented? Why or why not?

7.

Let G = SL(2, R), and let G be the commutator subgroup. (a) Prove that every element

1 t 01

is in G .

(b) Prove that G = G. −1 0 is not a commutator even though it is in G . (c) Prove that 0 −1 8.

Problem 53 at the end of Chapter IV produced a group G of order 27 generated by two elements a and b satisfying a 9 = b3 = b−1 aba −4 = 1. Prove that G is given by generators and relations as , G = a, b; a 9 , b3 , b−1 aba −4 .

9.

Let G n be given by generators and a single relation as , G n = x1 , y1 , . . . , xn , yn ; x1 y1 x1−1 y1−1 · · · xn yn xn−1 yn−1 . Prove that G n /G n is free abelian of rank 2n, and conclude that the groups G n are mutually nonisomorphic as n varies. (Educational note related to topology: The group G n may be shown to be the fundamental group of a compact orientable 2-dimensional manifold without boundary and with n handles.)

10. Prove that a free group of ﬁnite rank n cannot be generated by fewer than n elements.

358

VII. Advanced Group Theory

11. Let F be the free group on generators a, b, c, and let H be the subgroup generated by all words of length 2. (a) Find coset representatives g such that G is the disjoint union of the cosets H g. (b) Find a free basis of H . 12. For the free group on generators x and y, prove that the elements y, x yx −1 , x 2 yx −2 , x 3 yx −3 , . . . , constitute a free basis of the subgroup that they generate. Conclude that a free group of rank 2 has a free subgroup of inﬁnite rank. 13. Let G = C2 ∗ C2 . Prove that the only quotient groups of G, up to isomorphism, are G itself, {1}, C2 , C2 × C2 , and the dihedral groups Dn for n ≥ 3. 14. Prove that if every irreducible ﬁnite-dimensional representation of a ﬁnite group G is 1-dimensional, then G is abelian. 15. Let G be a ﬁnitely generated group, and let H be a subgroup of ﬁnite index. Prove that H is ﬁnitely generated. 16. Let N be an abelian group, let G be a group, let τ be an action of G on N by automorphisms, and let n > 0 be an integer. (a) Prove that if every element of N has ﬁnite order dividing an integer m, then every member of H n (G, N ) has ﬁnite order dividing m. (b) Suppose that G is ﬁnite and that f is an n-cocycle. Deﬁne an (n −1)-cochain F by F(g1 , . . . , gn−1 ) = f (g1 , . . . , gn−1 , g). g∈G

By summing the cocycle condition for f over the last variable, express |G| f (g1 , . . . , gn ) in terms of F, and deduce that |G| f is a coboundary. Conclude that every member of H n (G, N ) has order dividing |G|. 17. Let G be a ﬁnite group. Suppose that G has a normal abelian subgroup N , and suppose that GCD(|N |, |G/N |) = 1. Prove that there exists a subgroup H of G such that G is the semidirect product of H and N . 18. Let N be the cyclic group C2 , and let G be an arbitrary group of order 4. Identify up to equivalence all group extensions of N by G. 19. Let N = C2 , and let E = ∞ n=1 (C 2 ⊕ C 4 ). Regard E as an extension of N in two ways—ﬁrst by embedding N as one of the summands C2 of E and then by embedding N as a subgroup of one of the summands C4 of E. Show that the quotient groups E/N in the two cases are isomorphic, that E/N acts trivially on N in both cases, and that the two group extensions are not equivalent. Problems 20–23 concern Baer multiplication of extensions. Let N be an abelian group, let G be a group, let τ be an action of G on N by automorphisms, and let E 1 and E 2 be two extensions of N by G relative to τ . Write ϕ1 : E 1 → G and ϕ2 : E 2 → G for the quotient mappings. Let (E, E ) denote the subgroup of all

7. Problems

359

members (e1 , e2 ) of E 1 × E 2 for which ϕ1 (e1 ) = ϕ2 (e2 ). Writing the operation in N multiplicatively, let Q = {(x, x −1 ) ∈ E 1 × E 2 | x ∈ N }. The Baer product of E 1 and E 2 is deﬁned to be the quotient (E 1 , E 2 )/Q. A typical coset of the Baer product will be denoted by (e1 , e2 )Q. 20. Prove that the homomorphism x → (x, 1)Q is one-one from N into (E 1 , E 2 )/Q, that the homomorphism ϕ : (E 1 , E 2 ) → G deﬁned by ϕ(e1 , e2 ) = ϕ1 (e1 ) has image G and descends to the quotient (E 1 , E 2 )/Q, and that the kernel of the descended ϕ is the embedded copy of N . (Therefore (E 1 , E 2 )/Q is an extension of N by G, evidently relative to τ .) 21. For each u ∈ G, select u¯ ∈ E 1 and 3 u ∈ E 2 with ϕ1 (u) ¯ = u = ϕ2 (3 u ), and deﬁne a(u, v) and b(u, v) for u and v in G by (x u)(y ¯ v) ¯ = a(u, v)uv and (x3 u )(y3 v ) = b(u, v)3 b(u, v). Show that (u, ¯ 3 u )Q has ϕ((u, ¯ 3 u )Q) = u and that the associated 2-cocyle for (E 1 , E 2 )/Q is a(u, v)b(u, v) if the group operation in N is written multiplicatively. 22. Prove that Baer multiplication descends to a well-deﬁned multiplication of equivalence classes of extensions of N by G relative to τ , in the following sense: Suppose that E 1 and E 1 are equivalent extensions and that E 2 and E 2 are equivalent extensions. Let (E 1 , E 2 )/Q and (E 1 , E 2 )/Q be the Baer products. Then (E 1 , E 2 )/Q is equivalent to (E 1 , E 2 )/Q . Conclude that if Baer multiplication is imposed on equivalence classes of extensions of N by G relative to τ , then the correspondence stated in Proposition 7.40 of equivalence classes to members of H 2 (G, N ) is a group isomorphism. Problems 23–24 derive the Poisson summation formula for ﬁnite abelian groups. If G is its group of multiplicative characters, then the Fourier is a ﬁnite abelian group and G of a function f in C(G, C) is coefﬁcient at χ ∈ G f (χ ) = g∈G f (g)χ (g). The Fourier inversion formula in Theorem 7.17 says that f (g) = |G|−1 χ∈ G f (χ )χ (g). 23. Let G be a ﬁnite abelian group, let H be a subgroup, and let G/H be the quotient . group. If t is in G, write t for the coset of t in G/H . Let f be in C(G, C) . and deﬁne F(t) = h∈H f (t + h) as a function on G/H . Suppose that χ is a that is identically 1 on H , so that χ descends to a member χ. of member of G χ. ). . Prove that G/H f (χ ) = F( 24. (Poisson summation formula) With f and F as in the previous problem, apply the Fourier inversion formula for G/H to the function F, and derive the formula h∈H

f (t + h) =

1 |G/H |

ω∈ G , ω| H =1

f (ω)ω(t).

(Educational is often applied with t = 0, in which case it note: This formula 1 reduces to h∈H f (h) = |G/H ω∈ G , ω| H =1 f (ω).) |

360

VII. Advanced Group Theory

Problems 25–28 continue the introduction to error-correcting codes begun in Problems 63–73 at the end of Chapter IV, combining those results with the Poisson summation formula in the problems above and with notions from Section VI.1. Let F be the ﬁeld n n Z/2Z, and form n the Hamming space Fn . Deﬁne a nondegenerate bilinear form on F by (a, c) = i=1 ai ci for a and c in F . Recall from Chapter IV that a linear code C is a vector subspace of Fn . For such a C, let C ⊥ as in Section VI.1 be the set of all a ∈ Fn such that (a, c) = 0 for all c ∈ C; the linear code C ⊥ is called the dual code. A linear code is self dual if C ⊥ = C. 25. (a) Show that the codes 0 and Fn are dual to each other. (b) Show that the repetition code and the parity-check code are dual to each other. (c) Show that the Hamming code of order 8 is self dual. (d) Show that any self-dual linear code C has dim C = n/2, and conclude that the Hamming code of order 2r with r > 3 is not self dual. (e) Show that any member c of a self-dual linear code C has even weight. (f) Show that if a linear code C has C ⊆ C ⊥ and if every member c of C has even weight, then c → 12 wt(c) mod 2 is a group homomorphism of C into Z/2Z. Here wt(c) denotes the weight of c. 26. Regard Fn as an additive group G to which the Fourier inversion formula of Section 4 can be applied. to Fn by χ → aχ with χ (c) = (−1)(aχ ,c) and that (a) Show that one can map G the result is a group isomorphism. (Therefore if f is in C(Fn , C), we can henceforth regard f as a function on Fn .) (b) Show under the identiﬁcation in (a) that if f is in C(Fn , C), then f (a) = (a,c) for a in Fn . c∈Fn f (c)(−1) (c) Suppose that the function f ∈ C(Fn , C) is of the special form f (c) = n each f i is a function on i=1 f i (ci ) whenever c = (c1 , . . . , cn ). Here n the 2-element group F. Prove that f (a) = i= 1 f i (ai ) whenever a = (a1 , . . . , a n ). Here f i is given by the formula of (b) for the case n = 1: f i (ai ) = ci ∈F f i (ci )(−1)ai ci . 27. Fix two complex numbers x and y. Deﬁne f 0 : F → C to be the function with f 0 (0) = y. Deﬁne f : F → C to be the function with n = x and f 0 (1) n−wt(c) wt(c) f (c) = i= f (c ) = x y where wt(c) is the weight of c. 1 0 i (a) Show that f 0 (0) = x + y and f 0 (1) = x − y. (b) Show that f (a) = (x + y)n−wt(a) (x − y)wt(a) . 28. Let C be a linear code in Fn . Take G to be the additive group of Fn and H to be the additive group of C. Regard C ⊥ as an additive group also. to C ⊥ by χ → aχ with χ (c) = (−1)(aχ ,c) . Show that this (a) Map G/H mapping is a group isomorphism.

7. Problems

361

(b) Applying the Poisson summation formula of Problem 24, prove that 1 f (h) = ⊥ f (a) |C | a∈C ⊥ h∈C for all f in C(Fn , C). n n−k k (c) (MacWilliams identity) Let WC (X, Y ) = Y , where k=0 Nk (C)X Nk (C) is the number of members of C with weight k, be the weightenumerator polynomial of C, and let WC ⊥ (X, Y ) be deﬁned similarly. By applying (b) to the function f in the previous problem, prove that −1 WC (x, y) = |C ⊥ | WC ⊥ (x + y, x − y) for each x and y. Conclude from Corollary 4.32 that weight-enumerator polynomials satisfy WC (X, Y ) = −1 |C ⊥ | WC ⊥ (X + Y, X − Y ). (d) The polynomials WC (X, Y ) were seen in Chapter IV to be X n for the 0 code, (X + Y )n for the code Fn , X n + Y n for the repetition code, 1 n n 8 4 4 8 2 ((X +Y ) +(X −Y ) ) for the parity-check code, and X +14X Y +Y for the Hamming code of order 8. Using relationships established in Problem 25, verify the result of (c) for each of these codes. (e) Suppose that C is a self-dual linear code. Applying (c) in this case, exhibit WC (X, Y ) as being invariant under a copy of the dihedral group D8 of order 16. (Educational note: If the polynomial WC (X, Y ) is invariant also under X → i X , as is true for the Hamming code of order 8, then WC (X, Y ) is invariant under the group generated by D8 and this transformation, which can be shown to have order 192.) Problems 29–31 concern an unexpectedly fast method of computation of Fourier coefﬁcients in the context of ﬁnite abelian groups, particularly in the context of cyclic groups. They show for a cyclic group of order m = pq that the use of the idea behind the Poisson summation formula of Problem 24 makes it possible to compute the Fourier coefﬁcients of a function in about pq( p +q) steps rather than the expected m 2 = p 2 q 2 steps. This savings may be iterated in the case of a cyclic group of order 2n so that the Fourier coefﬁcients are computed in about n2n steps rather than the expected 22n steps. An organized algorithm to implement this method of computation is known as the fast Fourier transform. Write the cyclic group Cm as the set {0, 1, 2, . . . , m −1} of integers modulo m under addition, and let ζm = e2πi/m . For k in Cm deﬁne a multiplicative character χn of Cm by χn (k) = (ζmn )k . The resulting !m since distinct m multiplicative characters satisfy χn χn = χn+n , and they exhaust C multiplicative characters are orthogonal. It will be convenient to identify χn with χn (1) = ζmn . 29. In the setting of Problem 23, suppose that G = Cm with m = pq; here p and q need not be relatively prime. Let H = {0, q, 2q, . . . , ( p−1)q} be the subgroup of G isomorphic to C p , so that G/H = {0, 1, 2, . . . , q − 1} is isomorphic to p 2p (q−1) p Cq . Prove that the characters χ of G identiﬁed with ζm0 , ζm , ζm , . . . , ζm

362

VII. Advanced Group Theory

are the ones that are identically 1 on H and therefore descend to characters . of G/H . Verify that the descended characters χ are the ones identiﬁed with q−1 χ. ) of Problem 23 ζq0 , ζq1 , ζq2 , . . . , ζq . Consequently the formula f (χ ) = F( p 2p (q−1) p provides a way of computing f at ζm0 , ζm , ζm , . . . , ζm from the values of F. Show that if F is computed from the deﬁnition of Fourier coefﬁcients, then the number of steps involved in its computation is about q 2 , apart from a constant factor. Show therefore that the total number of steps in computing f at these special values of χ is therefore on the order of q 2 + pq. 30. In the previous problem show for each k with 0 ≤ k ≤ p−1 that the value of f at p+k 2 p+k (q−1) p+k k ζm , ζm , ζm , . . . , ζm can be handled in the same way with a different F by replacing f by a suitable variant of f . Doing so for each k requires p times the number of steps detected in the previous problem, and therefore all of f can be computed in about p(q 2 + pq) = pq( p + q) steps. 31. Show how iteration of this process to compute the Fourier coefﬁcients of each F, together with further iteration of this process, allows one to compute the Fourier coefﬁcients for a function on Cm 1 m 2 ···m r in about m 1 m 2 · · · m r (m 1 +m 2 +· · ·+m r ) steps. Problems 32–36 concern contragredient representations and the decomposition of the left regular representation of a ﬁnite group G. They make use of Problems 24–28 in Chapter III, which introduce the complex conjugate V of a complex vector space V . In the case that V is an inner-product space, those problems deﬁne (u, v)V = (v, u)V , and they show that if v ∈ V is given by v (u) = (u, v)V = (v, u)V , then the mapping v ↔ v is an isomorphism of V with V . 32. Show that the deﬁnition ( v1 , v2 )V = (v1 , v2 )V makes the isomorphism of V with V preserve inner products. 33. If R is a unitary representation of G on the ﬁnite-dimensional complex vector space V , deﬁne the contragredient representation R c of G on V by R c (x) = R(x −1 )t . Prove that R c (x) v = R(x)v and that R c is unitary on V . 34. Show that the matrix coefﬁcients of R c are the complex conjugates of those of R and that the characters satisfy χ R c = χ R . 35. Give an example of an irreducible representation of a ﬁnite group G that is not equivalent to its contragredient. 36. Let be the left regular representation of G on C(G, C), and let VR be the linear span in C(G, C) of the matrix coefﬁcients of an irreducible representation R of dimension d. Prove that the representation ( , VR ) of G is equivalent to the direct sum of d copies of the contragredient R c . Problems 37–46 concern the free product C2 ∗ C3 and its quotients. The problems make use of the group of matrices SL(2, Z/mZ) of determinant 1 over the commutative ring Z/mZ, as discussed in Section V.2. One of the quotients of C2 ∗ C3

7. Problems

363

will be PSL(2, Z) = SL(2, Z)/{scalar matrices}, and these problems show that the quotient mapping can be arranged to be an isomorphism. Other quotients will be the groups G m = X, Y ; X 2 , Y 3 , (X Y )m with m ≥ 2. These arise in connection with tilings in 2-dimensional geometry. The isomorphism C2 ∗ C3 ∼ = PSL(2, Z) leads to a homomorphism that will be called σm carrying G m onto PSL(2, Z/mZ) = SL(2, Z/mZ)/{scalar matrices}, the image group being ﬁnite. The problems show that the homomorphism σm : G m → PSL(2, Z/mZ) is an isomorphism for the cases in which G m arises from spherical geometry, namely for 2 ≤ m ≤ 5, and that the homomorphism is not an isomorphism for m = 6, the case in which G m arises from Euclidean geometry. 0 −1 0 1 37. Show that the elements 1 0 and −1 −1 generate SL(2, Z) by arguing as follows: if the subgroup ofSL(2, Z) generated by these two elements is not a b SL(2, Z), choose an element c d outside having max(|a|, |b|) as small as possible, and derive a contradiction by showing that a suitable right multiple of it by elements of is in . 0 −1 0 1 38. By mapping X → x = 1 0 mod ±I and Y → y = −1 −1 mod ±I , produce a group homomorphism of C2 ∗C3 = X, Y ; X 2 , Y 3 onto PSL(2, Z).

39. Let x, y, and : C2 ∗C3 → problem. PSL(2, Z) be as in the previous a b (a) For any member c d mod ±I of PSL(2, Z), deﬁne µ ac db mod ±I = max(|a|, |b|) and ν ac db mod ±I = max(|c|, |d|). Prove that if z = ac db mod ±I in PSL(2, Z) has ab ≤ 0, then µ(zyx) ≥ µ(z) and

(b) (c) (d)

(e)

µ(zy −1 x) ≥ µ(z), while if cd ≤ 0, then ν(zyx) ≥ ν(z) and ν(zy −1 x) ≥ ν(z). Prove that µ(zx) = µ(z) and ν(zx) = ν(z) for all z in PSL(2, Z). Show that there are only 10 members z of PSL(2, Z) for which the two conditions µ(z) = 1 and ν(z) = 1 both hold. A reduced word in C2 ∗ C3 is a ﬁnite sequence of factors X , Y , and Y −1 , with no two consecutive factors equal and with no two consecutive factors Y Y −1 or Y −1 Y . Prove for any reduced word a1 · · · an in C2 ∗ C3 , where each a j is one of X , Y , and Y −1 , that µ((a1 · · · an )) ≥ µ((a1 · · · an−1 )) and that ν((a1 · · · an )) ≥ ν((a1 · · · an−1 )). Deduce that the homomorphism is an isomorphism.

40. Let (m) be the group of all matrices M in SL(2, Z) such that every entry of M − I is divisible by m. (a) Prove that passage from a matrix in SL(2, Z) to the same matrix with its entries considered modulo m gives a homomorphism 3 σm : SL(2, Z) → SL(2, Z/mZ) with ker 3 σm = (m).

364

VII. Advanced Group Theory

(b) Prove that if α, β, and m are positive integers with GCD(α, β, m) = 1, then there exists an integer r such that GCD(α + mr, β) = 1. (One way of proceeding is to use Dirichlet’s theorem on primes in arithmetic progressions.) (c) Prove that image 3 σm = SL(2, Z/mZ), i.e., 3 σm is onto. 41. Let m : C2 ∗C3 → G m be the homomorphism deﬁned by the conditions X → X and Y → Y . Let Hm be the smallest normal subgroup of PSL(2, Z) containing σm : SL(2, Z) → SL(2, Z/mZ) be the homomorphism of (x y)m mod ±I . Let 3 the previous problem. (a) Why is m well deﬁned? (b) Why is Hm = (ker m )? (c) Deﬁne PSL(Z/mZ) = SL(2, Z/mZ)/{scalar matrices}. Why does the composition of 3 σm followed by passage to the quotient descend to a homomorphism σm of PSL(2, Z) onto PSL(2, Z/mZ)? (d) If K ⊆ PSL(2, Z) is the kernel of σm , why is Hm ⊆ K m ? (e) Show that if t isany integer, then the following members of K m lie in the mod ±I , subgroup Hm : 10 tm 1 1+tm −tm and tm 1−tm mod ±I .

1 0

tm 1

mod ±I ,

1+tm

tm −tm 1−tm

mod ±I ,

42. With G m deﬁned as above, exhibit homomorphisms of various groups G m onto the following ﬁnite groups: (a) S3 when m = 2 by sending X → (1 2) and Y → (1 2 3). (b) A4 when m = 3 by sending X → (1 2)(3 4) and Y → (1 2 3). (c) S4 when m = 4 by sending X → (1 2) and Y → (2 3 4). (d) A5 when m = 5 by sending X → (1 2)(3 4) and Y → (1 3 5). 43. This problem shows how to prove that Hm = K m for 2 ≤ m ≤ 5, and it asks that the steps be carried out for m = 2 and m = 3. Recall from the remark with Lemma 7.11 that Lemma 7.11 is valid for all groups in determining a set of generators of a subgroup from generators of the whole group and a system of coset representatives. The lemma is to be applied to the group PSL(2, Z) and the subgroup K m . Generators of PSL(2, Z) are taken as b1 = x mod ±I and b2 = y mod ±I . (a) For the case m = 2, ﬁnd members g1 , . . . , g6 of PSL(2, Z) such that the six cosets of PSL(2, Z)/K 2 are exactly K 2 g1 , . . . , K 2 g6 . (b) Still for the case m = 2, ﬁnd g j bi ρ(g j bi )−1 for 1 ≤ i ≤ 2 and 1 ≤ j ≤ 6. Lemma 7.11 says that these 12 elements generate K 2 . (c) Using Problem 41e and any necessary variations of it, show that each of the 12 generators of K 2 in (b) lies in the subgroup H2 , and conclude that H2 = K 2 .

7. Problems

365

(d) Repeat steps (a), (b), and (c) for m = 3. There are 12 cosets K 3 g j of PSL(2, Z)/K 3 . (Educational note: There are 24 cosets for PSL(2, Z)/K 4 and 60 cosets for PSL(2, Z)/K 5 .) 44. Take for granted that Hm = K m for 2 ≤ m ≤ 5. Deduce the isomorphisms (a) G 2 ∼ = PSL(2, Z/2Z) ∼ = S3 . ∼ (b) G 3 = PSL(2, Z/3Z) ∼ = A4 . (This group is called the tetrahedral group.) (c) G 4 ∼ = PSL(2, Z/4Z) ∼ = S4 . (This group is called the octahedral group.) (d) G 5 ∼ = PSL(2, Z/5Z) ∼ = A5 . (This group is called the icosahedral group.) 45. A translation in the Euclidean plane R2 is any function T(a,b) (x, y) = (a + x, b + y), the rotation about the origin clockwise through the angle θ cos θ − sin θ is the linear map Rθ given by the matrix sin θ cos θ , and the rotation about (x0 , y0 ) clockwise through the angle θ is the linear map given by (x, y) → Rθ (x − x0 , y − y0 ) + (x0 , y0 ). (a) Prove that Rθ T(a,b) Rθ−1 = TRθ (a,b) . (b) Prove that the union of the set of translations and all the sets of rotations about points of R2 is a group by showing that it is the semidirect product of the subgroup of rotations about the origin and the normal subgroup of translations. 46. Fix a triangle T in the Euclidean plane with vertices arranged counterclockwise at a, b, c and with angles π/2 at a, π/3 at b, and π/6 at c. Let ra be rotation clockwise through π at a, rb be rotation clockwise through 2π/3 at b, and rc be rotation counterclockwise through π/3 at c. (a) Show that ra2 = 1, rb3 = 1, rc6 = 1, and rc = ra rb . (b) Show that the member rb ra rb ra rb of the group generated by ra and rb is a nontrivial translation and therefore that the generated group is inﬁnite. 3 denotes the (c) Conclude that G 6 PSL(2, Z/6Z). (Educational note: If T union of T and the reﬂection of T in one of the sides of T , it can be shown that the group generated by ra and rb is isomorphic to G 6 and tiles the plane 3.) with copies of T Problems 47–52 establish a harmonic analysis for arbitrary representations of ﬁnite groups on complex vector spaces, whether ﬁnite-dimensional or inﬁnite-dimensional. Let G be a ﬁnite group, and let V be a complex vector space. For any representation R of G on V , one deﬁnes R( f )v = x∈G f (x)R(x)v for f in C(G, C) and v in V , just as in the case that V is ﬁnite-dimensional. The same computation as in Section VII.4 shows that the formula R( f 1 ∗ f 2 ) = R( f 1 )R( f 2 ) remains valid when V is inﬁnite-dimensional. 47. Let (R1 , V1 ) and (R2 , V2 ) be irreducible ﬁnite-dimensional representations of G on complex vector spaces, and let χ R1 and χ R2 be their characters. Using Schur orthogonality, prove that (a) χ R1 ∗ χ R2 = 0 if R1 and R2 are inequivalent,

366

VII. Advanced Group Theory

(b) χ R1 ∗ χ R1 = |G|d R−11 χ R1 , where d R1 = dim VR . 48. With (R, V ) given, let (Rα , Vα ) be any irreducible ﬁnite-dimensional representation of G, and deﬁne E α : V → V by E α = |G|−1 dα R(χα ), where χα is the character of Rα and where dα = dim Vα . (a) Prove that E α2 = E α . (b) Prove that E α E β = E β E α = 0 if (Rβ , Vβ ) is an irreducible ﬁnitedimensional representation of G such that Rα and Rβ are inequivalent. 49. Observe for each v in V that {R(x)v | x ∈ G} spans a ﬁnite-dimensional invariant subspace of V . By Corollary 7.21, each v in V lies in a ﬁnite direct sum of ﬁnitedimensional invariant subspaces of V on each of which R acts irreducibly. Using Zorn’s Lemma, prove that V is the direct sum of ﬁnite-dimensional subspaces on each of which R acts irreducibly. (If V is inﬁnite-dimensional, there will of course be inﬁnitely many such subspaces.) 50. Suppose that V0 is a ﬁnite-dimensional invariant subspace of V such that R V0

is equivalent to some Rα , where Rα is as in Problem 48. Prove that E α is the identity on V0 . 51. Deduce that if {(Rβ , Vβ )} is a maximal collection of inequivalent ﬁnitedimensional irreducible representations of G, then β E β = I on V and the image of E α is the set of all sums of vectors in V lying in some ﬁnite-dimensional invariant subspace V0 of V such that R V0 is equivalent to Rα . (Educational note: Consequently V is exhibited as the ﬁnite direct sum of the spaces image E α , each space image E α is the direct sum of ﬁnite-dimensional irreducible invariant subspaces, and the restriction of R to any ﬁnite-dimensional irreducible invariant subspace of image E α is equivalent with Rα . 52. Suppose that (Rα , Vα ) is a 1-dimensional representation of G given by a multiplicative character ω. Prove that the image of E α consists of all vectors v in V such that R(x)v = ω(x)v for all x in G.

CHAPTER VIII Commutative Rings and Their Modules

Abstract. This chapter ampliﬁes the theory of commutative rings that was begun in Chapter IV, and it introduces modules for any ring. Emphasis is on the topic of unique factorization. Section 1 gives many examples of rings, some commutative and some noncommutative, and introduces the notion of a module for a ring. Sections 2–4 discuss some of the tools related to questions of factorization in integral domains. Section 2 deﬁnes the ﬁeld of fractions for an integral domain and gives its universal mapping property. Section 3 deﬁnes prime and maximal ideals and relates quotients of them to integral domains and ﬁelds. Section 4 introduces principal ideal domains, which are shown to have unique factorization, and it deﬁnes Euclidean domains as a special kind of principal ideal domain for which greatest common divisors can be obtained constructively. Section 5 proves that if R is an integral domain with unique factorization, then so is the polynomial ring R[X ]. This result is a consequence of Gauss’s Lemma, which addresses what happens to the greatest common divisor of the coefﬁcients when one multiplies two members of R[X ]. Gauss’s Lemma has several other consequences that relate factorization in R[X ] to factorization in F[X ], where F is the ﬁeld of fractions of R. Still another consequence is Eisenstein’s irreducibility criterion, which gives a sufﬁcient condition for a member of R[X ] to be irreducible. Section 6 contains the theorem that every ﬁnitely generated unital module over a principal ideal domain is a direct sum of cyclic modules. The cyclic modules may be assumed to be primary in a suitable sense, and then the isomorphism types of the modules appearing in the direct-sum decomposition, together with their multiplicities, are uniquely determined. The main results transparently generalize the Fundamental Theorem for Finitely Generated Abelian Groups, and less transparently they generalize the existence and uniqueness of Jordan canonical form for square matrices with entries in an algebraically closed ﬁeld. Sections 7–11 contain foundational material related to factorization for the two subjects of algebraic number theory and algebraic geometry. Both these subjects rely heavily on the theory of commutative rings. Section 7 is a section of motivation, showing the analogy between a situation in algebraic number theory and a situation in algebraic geometry. Sections 8–10 introduce Noetherian rings, integral closures, and localizations. Section 11 uses this material to establish unique factorization of ideals for Dedekind domains, as well as some other properties.

1. Examples of Rings and Modules Sections 4–5 of Chapter IV introduced rings and ﬁelds, giving a small number of examples of each. In the present section we begin by recalling those examples and giving further ones. Although Chapters VI and VII are not prerequisite for 367

368

VIII. Commutative Rings and Their Modules

the present chapter, our list of examples will include some rings and ﬁelds that arose in those two chapters. The theory to be developed in this chapter is intended to apply to commutative rings, especially to questions related to unique factorization in such rings. Despite this limitation it seems wise to include examples of noncommutative rings in the list below. In the conventions of this book, a ring need not have an identity. Many rings that arise only in the subject of algebra have an identity, but there are important rings in the subject of real analysis that do not. From the point of view of category theory, one therefore distinguishes between the category of all rings, with ring homomorphisms as morphisms, and the category of all rings with identity, with ring homomorphisms carrying 1 to 1 as morphisms. In the latter case one may want to exclude the zero ring from being an object in the category under certain circumstances. EXAMPLES OF RINGS. (1) Basic commutative rings from Chapter IV. All of the structures Z, Q, R, C, Z/mZ, and 2Z are commutative rings. All but the last have an identity. Of these, Q, R, and C are ﬁelds, and so is F p = Z/ pZ if p is a prime number. The others are not ﬁelds. (2) Polynomial rings. Let R be a nonzero commutative ring with identity. In Section IV.5 we deﬁned the commutative ring R[X 1 , . . . , X n ] of polynomials over R in n indeterminates. It has a universal mapping property with respect to substitution for the indeterminates and use of a homomorphism on the coefﬁcients. Making substitutions from R itself and mapping the coefﬁcients by the identity homomorphism, we are led to the ring of all functions (r1 , . . . , rn ) → f (r1 , . . . , rn ) for r1 , . . . , rn in R and f (X 1 , . . . , X n ) in R[X 1 , . . . , X n ]; this is called the ring of all polynomial functions in n variables on R. Polynomials may be considered also in inﬁnitely many variables, but we did not treat this case in any detail. (3) Matrix rings over commutative rings. Let R be a nonzero commutative ring with identity. The set Mn (R) of all n-by-n matrices with entries in R is a ring under entry-by-entry addition and the usual deﬁnition of matrix multiplication: (AB)i j = nk=1 Aik Bk j . It has an identity, namely the identity matrix I with Ii j = δi j . In this setting, Section V.2 introduced a theory of determinants, and it was proved that a matrix has a one-sided inverse if and only if it has a two-sided inverse, if and only if its determinant is a member of the group R × of units in R, i.e., elements of R invertible under multiplication. The matrix ring Mn (R) is always noncommutative if n > 1. (4) Matrix rings over noncommutative rings. If R is any ring, we can still make the set Mn (R) of all n-by-n matrices with entries in R into a ring. However, if

1. Examples of Rings and Modules

369

R has no identity, Mn (R) will have no identity. The theory of determinants does not directly apply if R is noncommutative or if R fails to have an identity,1 and as a consequence, questions about the invertibility of matrices are more subtle than with the previous example. (5) Spaces of linear maps from a vector space into itself. Let V be a vector space over a ﬁeld K. The vector space EndK (V ) = HomK (V, V ) of all K linear maps from V to itself is initially a vector space over K. Composition provides a multiplication that makes EndK (V ) into a ring with identity. In fact, associativity of multiplication is automatic for any kind of function, and so is the distributive law (L 1 + L 2 )L 3 = L 1 L 3 + L 2 L 3 . The distributive law L 1 (L 2 + L 3 ) = L 1 L 2 + L 1 L 3 follows from the fact that L 1 is linear. This ring is isomorphic as a ring to Mn (K) if V is n-dimensional, an isomorphism being determined by specifying an ordered basis of V . (6) Associative algebras over ﬁelds. These were deﬁned in Section VI.7, knowledge of which is not being assumed now. Thus we repeat the deﬁnition. If K is a ﬁeld, then an associative algebra over K, or associative K algebra, is a ring A that is also a vector space over K such that the multiplication A × A → A is K-linear in each variable. The conditions of linearity concerning multiplication have two parts to them: an additive part saying that the usual distributive laws are valid and a scalar-multiplication part saying that (ka)b = k(ab) = a(kb)

for all k in K and a, b in A.

If A has an identity, the displayed condition says that all scalar multiples of the identity lie in the center of A, i.e., commute with every element of A. In Examples 2 and 3, when R is a ﬁeld K, the polynomial rings and matrix rings over K provide examples of associative algebras over K; scalar multiplication is to be done in entry-by-entry fashion. Example 5 is an associative algebra as well. If L is any ﬁeld such that K is a subﬁeld, then L may be regarded as an associative algebra over K. An interesting commutative associative algebra over C without identity is the algebra Ccom (R) of all continuous complex-valued functions on R that vanish outside a bounded interval; the vector-space operations are the usual pointwise operations, and the operation of multiplication is given by convolution 9 f (x − y)g(y) dy. ( f ∗ g)(x) = R

Section VII.4 worked with an analog C(G, C) of this algebra in the context that R is replaced by a ﬁnite group G. 1 A limited theory of determinants applies in the noncommutative case, but it will not be helpful for our purposes.

370

VIII. Commutative Rings and Their Modules

(7) Division rings. A division ring is a nonzero ring with identity such that every element has a two-sided inverse under multiplication. A commutative division ring is just a ﬁeld. The ring H of quaternions is the only explicit noncommutative division ring that we have encountered so far. It is an associative algebra over R. More generally, if A is a division ring, then we can easily check that the center K of A is a ﬁeld and that A is an associative algebra over K.2 (8) Tensor, symmetric, and exterior algebras. If E is a vector space over a ﬁeld K, Chapter VI deﬁned the tensor, symmetric, and exterior algebras of E over K, as well as the polynomial algebra on E in the case that E is ﬁnite-dimensional. These are all associative algebras with identity. Symmetric algebras and polynomial algebras are commutative. None of these algebras will be discussed further in this chapter. (9) A ﬁeld of 4 elements. This was constructed in Section IV.4. Further ﬁnite ﬁelds beyond the ﬁeld of 4 elements and the ﬁelds F p = Z/ pZ with p prime will be constructed in Chapter IX. (10) Algebraic number ﬁelds Q[θ]. These were discussed in Sections IV.1 and IV.4. In deﬁning Q[θ], we assume that θ is a complex number and that there exists an integer n > 0 such that the complex numbers 1, θ, θ 2 , . . . , θ n are linearly dependent over Q. The set Q[θ] is deﬁned to be the subset of C obtained by substitution of θ into all members of Q[X ]. It coincides with the linear span over Q of 1, θ, θ 2 , . . . , θ n−1 . Proposition 4.1 shows that it is closed under the arithmetic operations, including passage to multiplicative inverses of nonzero elements, and it is therefore a subﬁeld of C. This example ties in with the notion of minimal polynomial in Chapter V because the members of Q[X ] with θ as a root are all multiples of one nonzero such polynomial that exhibits the linear dependence. We return to this example occasionally later in this chapter, particularly in Sections 7–11, and then we treat it in more detail in Chapter IX. (11) Algebraic integers in a number ﬁeld Q[θ]. Algebraic integers were deﬁned in Section VII.5 as the roots in C of monic polynomials in Z[X ], and they were shown to form a commutative ring with identity. The set of algebraic integers in Q[θ] is therefore a commutative ring with identity, and it plays somewhat the same role for Q[θ] that Z plays for Q. We discuss this example further in Sections 7–11. (12) Integral group rings. If G is a group, then we can make the free abelian group of G into a ring by deﬁning multiplication to be ZGon the elements m g n h = i, j (m i n j )(gi h j ) when the m i and n j are in Z and the i i j j i j gi and h j are in G. It is immediate that the result is a ring with identity, and ZG 2 Use of the term “division algebra” requires some care. Some mathematicians understand division algebras to be associative, and others do not. The real algebra O of octonions, as deﬁned in Problems 52–56 at the end of Chapter VI, is not associative, but it does have division.

1. Examples of Rings and Modules

371

is called the integral group ring of G. The group G is embedded as a subgroup of the group (ZG)× of units of ZG, each element of g being identiﬁed with a sum ι(g) = m i gi in which the only nonzero term is 1g. The ring ZG has the universal mapping property illustrated in Figure 8.1 and described as follows: whenever ϕ : G → R is a group homomorphism of G into the group R × of units of a ring R, then there exists a unique ring homomorphism : ZG → R such that ι = ϕ. The existence of as a homomorphism of additive groups follows from the universal mapping property of free abelian groups, and then one readily checks that respects multiplication.3 ϕ

G −−−→ R ⏐ ⏐ ι ZG FIGURE 8.1. Universal mapping property of the integral group ring of G. (13) Quotient rings. If R is a ring and I is a two-sided ideal, then we saw in Section IV.4 that the additive quotient R/I has a natural multiplication that makes it into a ring called a quotient ring of R. This in effect was the construction that obtained the ring Z/mZ from the ring Z. (14) Direct product of rings. If {Rs | s ∈ S} is a nonempty set of rings, then a direct product s∈S Rs is a ring whose additive group is any direct product of the underlying additive groups and whose multiplication is given in entryby-entry fashion. The resulting ring and the associated ring homomorphisms ps0 : s∈S Rs → Rs0 amount to the product functor for the category of rings; if each Rs has an identity, the result amounts also to the product functor for the category of rings with identity. We give further examples of rings near the end of this section after we have deﬁned modules and given some examples. Informally a module is a vector space over a ring. But let us be more precise. If R is a ring, then a left R module4 M is an abelian group with the additional structure of a “scalar multiplication” R × M → M such that (i) r (r m) = (rr )m for r and r in R and m in M, 3 Universal mapping properties are discussed systematically in Problems 18–22 at the end of Chapter VI. The subject of such a property, here the pair (Z G, ι), is always unique up to canonical isomorphism in a given category, but its existence has to be proved. 4 Many algebra books write “R-module,” using a hyphen. However, when R is replaced by an expression, particularly in applications of the theory, the hyphen is often dropped. For an example, see “module” in Hall’s The Theory of Groups. The present book omits the hyphen in all cases in order to be consistent.

372

VIII. Commutative Rings and Their Modules

(ii) (r + r )m = r m + r m and r (m + m ) = r m + r m if r and r are in R and m and m are in M. In addition, if R has an identity, we say that M is unital if (iii) 1m = m for all m in M. One may also speak of right R modules. For these the scalar multiplication is usually written as mr with m in M and r in R, and the expected analogs of (i) and (ii) are to hold. When R is commutative, it is immaterial which side is used for the scalar multiplication, and one speaks simply of an R module. Let R be a ring, and let M and N be two left R modules. A homomorphism of left R modules, or more brieﬂy an R homomorphism, is an additive group homomorphism ϕ : M → N such that ϕ(r m) = r ϕ(m) for all r in R. Then we can form a category for ﬁxed R in which the objects are the left R modules and the morphisms are the R homomorphisms from one left R module to another. Similarly the right R modules, along with the corresponding kind of R homomorphisms, form a category. If R has an identity, then the unital R modules form a subcategory in each case. These categories are fundamental to the subject of homological algebra, which we take up in Advanced Algebra. EXAMPLES OF MODULES. (1) Vector spaces. If R is a ﬁeld, the unital R modules are exactly the vector spaces over R. (2) Abelian groups. The unital Z modules are exactly the abelian groups. Scalar multiplication is given in the expected way: If n is a positive integer, the product nx is the n-fold sum of x with itself. If n = 0, the product nx is 0. If n < 0, the product nx is −((−n)x). (3) Vector spaces as unital modules for the polynomial ring K[X ]. Let V be a ﬁnite-dimensional vector space over the ﬁeld K, and ﬁx L be in EndK (V ). Then V becomes a unital K[X ] module under the deﬁnition A(X )v = A(L)(v) whenever A(X ) is a polynomial in K[X ]; here A(L) is the member of EndK (V ) deﬁned as in Section V.3. In Section 6 in this chapter we shall see that some of the deeper results in the theory of a single linear transformation, as developed in Chapter V, follow from the theory of unital K[X ] modules that will emerge from the present chapter. (4) Modules in the context of algebraic number ﬁelds. Let Q[θ] be a subﬁeld of C as in Example 10 of rings earlier in this section. It is assumed that the Q vector space Q[θ] is ﬁnite-dimensional. Let L be the member of EndQ (Q[θ]) given as left multiplication by θ on Q[θ]. As in the previous example, Q[θ] becomes a unital Q[X ] module. Chapter V deﬁnes a minimal polynomial for L, as well as a characteristic polynomial. These objects play a role in the study

1. Examples of Rings and Modules

373

to be carried out in Chapter IX of ﬁelds like Q[θ]. If θ is an algebraic integer as in Example 11 of rings earlier in this section, then we can get more reﬁned information by replacing Q by Z in the above analysis; this technique plays a role in the theory to be developed in Sections 7–11. (5) Rings and their quotients. If R is a ring, then R is a left R module and also a right R module. If I is a two-sided ideal in I , then the quotient ring R/I , as deﬁned in Proposition 4.20, is a left R module and also a right R module. These modules are automatically unital if R has an identity. Later in this section we shall consider quotients of R by “one-sided ideals.” (6) Spaces of rectangular matrices. If R is a ring, then the space Mmn (R) of m-by-n matrices with entries in R is an abelian group under addition and becomes a left R module when multiplication by the scalar r is deﬁned as left multiplication by r in each entry. Also, if we put S = Mm (R), then Mmn (R)is a left S module under the usual deﬁnition of matrix multiplication: (sv)i j = nk=1 sik vk j , where s is in S and v is in Mmn (R). (7) Direct product of R modules. If S is a nonempty set and {Ms }s∈S is a corresponding system of left R modules, then a direct product s∈S Ms is obtained as an additive group by forming any direct product of the underlying additive groups of the Ms ’s and deﬁning scalar multiplication by members of R to be scalar multiplication in each coordinate. The associated abelian-group homomorphisms ps0 : s∈S Ms → Ms0 become R homomorphisms under this deﬁnition of scalar multiplication on the direct product. Direct product amounts to the product functor for the category of left R modules; we omit the easy veriﬁcation, which makes use of the corresponding fact about abelian groups. As in the case of abelian groups, we can speak of an external direct product as the result of a construction that starts with the product of the sets Ms , and we can speak of recognizing a direct product as internal when the Ms ’s are contained in the direct product and the restriction of each ps to Ms is the identity function. (8) Direct sum of R modules. If S is a nonempty set and {Ms }s∈S is a corresponding system of left R modules, then a direct sum s∈S Ms is obtained as an additive group by forming any direct sum of the underlying additive groups of the Ms ’s and deﬁning scalar multiplication by members of R to be scalar multiplication in each coordinate. The associated abelian-group homomorphisms i s0 : Ms0 → s∈S Ms become R homomorphisms under this deﬁnition of scalar multiplication on the direct sum. Direct sum amounts to the coproduct functor for the category of left R modules; we omit the easy veriﬁcation, which makes use of the corresponding fact about abelian groups. As in the case of abelian groups, we can speak of an external direct sum as the result of a construction that starts with a subset of the product of the sets Ms , and we can speak of recognizing a

374

VIII. Commutative Rings and Their Modules

direct sum as internal when the Ms ’s are contained in the direct sum and each i s is the inclusion mapping. (9) Free R modules. Let R be a nonzero ring with identity, and let S be a nonempty set. As in Example 5, let us regard R as a unital left R module. Then the left R module given as the direct sum F(S) = s∈S R is called a free R module, or free left R module. We deﬁne ι : S → F(S) by ι(s) = i s (1), where i s is the usual embedding map for the direct sum of R modules. The left R module F(S) has a universal mapping property similar to the corresponding property of free abelian groups. This is illustrated in Figure 8.2 and is described as follows: whenever M is a unital left R module and ϕ : S → M is a function, then there exists a unique R homomorphism : F(S) → M such that ι = ϕ. The existence of as an R homomorphism follows from the universal mapping property of direct sums (Example 8) as soon as the property is demonstrated for S equal to a singleton set. Thus let A be any left R module, and let a ∈ A be given; then it is evident that r → ra is the unique R homomorphism of the left R module R into A carrying 1 to a. S ⏐ ⏐ ι

ϕ

−−−→ M

F(S) FIGURE 8.2. Universal mapping property of a free left R module. If R is a ring and M is a left R module, then an R submodule N of M is an additive subgroup of M that is closed under scalar multiplication, i.e., has r m in N when r is in R and m is in N . In situations in which there is no ambiguity, the use of “left” in connection with R submodules is not necessary. EXAMPLES OF SUBMODULES. If V is a vector space over a ﬁeld K, then a K submodule of V is a vector subspace of V . If M is an abelian group, then a Z submodule of M is a subgroup. In Example 6 of modules, in which S = Mm (R), then an example of a left S submodule of Mmn (R) is all matrices with 0 in every entry of a speciﬁed subset of the n columns. If the ring R has an identity and M is a unital left R module, then the R submodule of M generated by m ∈ M, i.e., the smallest R submodule containing m, is Rm, the set of products r m with r in R. In fact, the set of all r m is an abelian group since (r ± s)m = r m ± sm, it is closed under scalar multiplication since s(r m) = (sr )m, and it contains m since 1m = m. However, if the left R module M is not unital, then the R submodule generated by m may not equal Rm, and it was for that reason that R modules were assumed to be unital in the construction of free R modules in Example 9 of modules above. More generally the R submodule

1. Examples of Rings and Modules

375

of M generated by a ﬁnite set {m 1 , . . . , m n } in M is Rm 1 + · · · + Rm n if the left R module M is unital. Example 5 of modules treated R as a left R module. In this setting the left R submodules are called left ideals in R. That is, a left ideal I is an additive subgroup of R such that ri is in I whenever r is in R and i is in I . As a special case of what was said in the previous paragraph, if the ring R has an identity, then the left R module R is automatically unital, and the left ideal of R generated by an element a is Ra, the set of all products ra with r in R. Similarly a right ideal in R is an additive subgroup I such that ir is in I whenever r is in R and i is in I . The right ideals are the right R submodules of the right R module R. If R is commutative, then left ideals, right ideals, and two-sided ideals are all the same. Suppose that ϕ : M → N is an R homomorphism of left R modules. In this situation we readily verify that the kernel of ϕ, denoted by ker ϕ as usual, is an R submodule of M, and the image of ϕ, denoted by image ϕ as usual, is an R submodule of N . The R homomorphism ϕ is one-one if and only if ker ϕ = 0, as a consequence of properties of homomorphisms of abelian groups. A one-one R homomorphism of one left R module onto another is called an R isomorphism; its inverse is automatically an R isomorphism, and “is R isomorphic to” is an equivalence relation. Still with R as a ring, suppose that M is a left R submodule and N is an R submodule. Then we can form the quotient M/N of abelian groups. This becomes a left R module under the deﬁnition r (m + N ) = r m + N , as we readily check. We call M/N a quotient module. The quotient mapping m → m + N of M to M/N is an R homomorphism onto. A particular example of a quotient module is R/I , where I is a left ideal in R. We can now go over the results on quotients of abelian groups in Section IV.2, speciﬁcally Proposition 4.11 through Theorem 4.14, and check that they extend immediately to results about left R modules. The statements appear below. The arguments are all routine, and there is no point in repeating them. In the special case that R is a ﬁeld and the R modules are vector spaces, these results specialize to results proved in Sections II.5 and II.6. Proposition 8.1. Let R be a ring, let ϕ : M1 → M2 be an R homomorphism between left R modules, let N0 = ker ϕ, let N be an R submodule of M1 contained in N0 , and deﬁne q : M1 → M1 /N to be the R module quotient map. Then there exists an R homomorphism ϕ : M1 /N → M2 such that ϕ = ϕq, i.e, ϕ(m 1 + N ) = ϕ(m 1 ). It has the same image as ϕ, and ker ϕ = {h 0 N | h 0 ∈ N0 }. REMARK. As with groups, one says that ϕ factors through M1 /N or descends to M1 /N . Figure 8.3 illustrates matters.

376

VIII. Commutative Rings and Their Modules

M1 ⏐ ⏐ q

ϕ

−−−→ M2 ϕ

M1 /N FIGURE 8.3. Factorization of R homomorphisms via a quotient of R modules. Corollary 8.2. Let R be a ring, let ϕ : M1 → M2 be an R homomorphism between left R modules, and suppose that ϕ is onto M2 and has kernel N . Then ϕ exhibits the left R module M1 /N as canonically R isomorphic to M2 . Theorem 8.3 (First Isomorphism Theorem). Let R be a ring, let ϕ : M1 → M2 be an R homomorphism between left R modules, and suppose that ϕ is onto M2 and has kernel K . Then the map N1 → ϕ(N1 ) gives a one-one correspondence between (a) the R submodules N1 of M1 containing K and (b) the R submodules of M2 . Under this correspondence the mapping m + N1 → ϕ(m) + ϕ(N1 ) is an R isomorphism of M1 /N1 onto M2 /ϕ(N1 ). REMARK. In the special case of the last statement that ϕ : M1 → M2 is an R module quotient map q : M → M/K and N is an R submodule of M containing K ( , the last statement of the theorem asserts the R isomorphism M/N ∼ = (M/K ) (N /K ). Theorem 8.4 (Second Isomorphism Theorem). Let R be a ring, let M be a left R module, and let N1 and N2 be R submodules of M. Then N1 ∩ N2 is an R submodule of N1 , the set N1 + N2 of sums is an R submodule of M, and the map n 1 + (N1 ∩ N2 ) → n 1 + N2 is a well-deﬁned canonical R isomorphism N1 /(N1 ∩ N2 ) ∼ = (N1 + N2 )/N2 . A quotient of a direct sum of R modules by the direct sum of R submodules is the direct sum of the quotients, according to the following proposition. The result generalizes Lemma 4.58, which treats the special case of abelian groups (unital Z modules). Proposition 8.5. Let R be a ring, let M = s∈S Ms be a direct sum of left R modules, and for each s in S, let Ns be a left R submodule of Ms . Then the natural map of s∈S Ms to the direct sum of quotients descends to an R isomorphism : Ms Ns ∼ (Ms /Ns ). = s∈S

s∈S

s∈S

1. Examples of Rings and Modules

377

PROOF. Let ϕ : s∈S Ms → s∈S (Ms /Ns ) be the R homomorphism deﬁned }s∈S ) = {m s + Ns }s∈S . The mapping ϕ is onto s∈S (Ms /Ns ), and the by ϕ({m s kernel is s∈S Ns . Then Corollary 8.2 shows that ϕ descends to the required R isomorphism. EXAMPLES OF RINGS, CONTINUED. (15) Associative algebras over commutative rings with identity. These directly generalize Example 6 of rings. Let R be a nonzero commutative ring with identity. An associative algebra over R, or associative R algebra, is a ring A that is also a left R module such that multiplication A × A → A is R linear in each variable. The conditions of R linearity in each variable mean that addition satisﬁes the usual distributive laws for a ring and that the following condition is to be satisﬁed relating multiplication and scalar multiplication: (ra)b = r (ab) = a(r b)

for all r in R and a, b ∈ A.

If A has an identity, the displayed condition says that all scalar multiples of the identity lie in the center of A, i.e., commute with every element of A. Examples 2 and 3, treating polynomial rings and matrix rings whose scalars lie in a commutative ring with identity, furnish examples. Every ring R is an associative Z algebra when the Z action is deﬁned so as to make the abelian group underlying the additive structure of R into a Z module. All that needs to be checked is the displayed formula. For n = 1, we have (1a)b = 1(ab) = a(1b) since the Z module R is unital. If we also have (na)b = n(ab) = a(nb) for a positive integer n, then we can add and use the appropriate distributive laws to obtain ((n + 1)a)b = (n + 1)(ab) = a((n + 1)b). Induction therefore gives (na)b = n(ab) = a(nb) for all positive integers n, and this equality extends to all integers n by using additive inverses. The associative R algebras form a category in which the morphisms from one such algebra to another are the ring homomorphisms that are also R homomorphisms. The product functor for this category is the direct product as in Example 14 with an overlay of scalar multiplication as in Example 7 of modules. The coproduct functor in the category of commutative associative R algebras with identity is more subtle and involves a tensor product over R, a notion we postpone introducing until Chapter X. (16) Group algebra RG over R. If G is a group and R is a commutative ring with identity, then we can introduce a multiplication in the free R module RG = r g s h on the elements of G by the deﬁnition i i j j i j i, j (ri s j )(gi h j ) when the ri and s j are in R and the gi and h j are in G. It is immediate that this multiplication makes the free R module into an associative R algebra with identity, and RG is called the group algebra of G over R. The special case R = Z leads to the integral group ring as in Example 12. The group G is embedded as a

378

VIII. Commutative Rings and Their Modules

× subgroup of the group (RG) of units of RG, each element of g being identiﬁed with a sum ι(g) = ri gi in which the only nonzero term is 1g. The associative R algebra RG has a universal mapping property similar to that in Figure 8.1 and given in Figure 8.4 as follows: whenever ϕ : G → A is a group homomorphism of G into the group A× of units of an associative R algebra A, then there exists a unique associative R algebra homomorphism : RG → A such that ι = ϕ. ϕ

G −−−→ A ⏐ ⏐ ι RG FIGURE 8.4. Universal mapping property of the group algebra RG. (17) Scalar-valued functions of ﬁnite support on a group, with convolution as multiplication. If G is a group and R is a commutative ring with identity, denote by C(G, R) the R module of all functions from G into R that are of ﬁnite support in the sense that each function is 0 except on a ﬁnite subset of G. This R module readily becomes an associative R algebra if ring multiplication is taken to be pointwise multiplication, but the interest here is in a different deﬁnition of multiplication. Instead, multiplication is deﬁned to be convolution with f 1 (x y −1 ) f 2 (y) = f 1 (y) f 2 (y −1 x). ( f 1 ∗ f 2 )(x) = y∈G

y∈G

The sums in question are ﬁnite because of the ﬁnite support of f 1 and f 2 , and the sums are equal by a change of variables. This multiplication was introduced in the special case R = C in Section VII.4, and the argument for associativity given there in the special case works in general. With convolution as multiplication, C(G, R) becomes an associative R algebra with identity. Problem 14 at the end of the chapter asks for a veriﬁcation that the mapping g → f g with 1 for x = g, f g (x) = 0 for x = g, extends to an R algebra isomorphism of RG onto C(G, R).

2. Integral Domains and Fields of Fractions For the remainder of the chapter we work with commutative rings only. In several of the sections, including this one, the commutative ring will be an integral domain, i.e., a nonzero commutative ring with identity and with no zero divisors.

2. Integral Domains and Fields of Fractions

379

In this section we show how an integral domain can be embedded canonically in a ﬁeld. This embedding is handy for recognizing certain facts about integral domains as consequences of facts about ﬁelds. For example Proposition 4.28b established that if R is a nonzero integral domain and if A(X ) is a polynomial in R[X ] of degree n > 0, then A(X ) has at most n roots. Since the coefﬁcients of the polynomial can be considered to be members of the larger ﬁeld that contains R, this result is an immediate consequence of the corresponding fact about ﬁelds (Corollary 1.14). The prototype is the construction of the ﬁeld Q of rationals from the integral domain Z of integers as in Section A3 of the appendix, in which one thinks of ab as a pair (a, b) with b = 0 and then identiﬁes pairs by saying that ab = dc if and only if ad = bc. We proceed in the same way in the general case. Thus let R be a nonzero integral domain, form the set 3 = {(a, b) | a ∈ R, b ∈ R, b = 0}, F and impose the equivalence relation (a, b) ∼ (c, d) if ad = bc. The relation ∼ is certainly reﬂexive and symmetric. To see that it is transitive, suppose that (a, b) ∼ (c, d) and (c, d) ∼ (e, f ). Then ad = bc and c f = de, and these together force ad f = bc f = bde. In turn, this implies a f = be since R is an integral domain and d is assumed = 0. Thus ∼ is transitive and is an equivalence relation. Let F be the set of equivalence classes. 3 is (a, b)+(c, d) = (ad+bc, bd), the expression The deﬁnition of addition in F we get by naively clearing fractions, and we want to see that addition is consistent with the equivalence relation. In checking this, we need change only one of the pairs at a time. Thus suppose that (a , b ) ∼ (a, b) and that (c, d) is given. We know that a b = ab , and we want to see that (ad + bc, bd) ∼ (a d + b c, b d), i.e., that (ad + bc)b d = (a d + b c)bd. In other words, we are to check that adb d = a dbd; we see immediately that this equality is valid since ab = a b. Consequently addition is consistent with the equivalence relation and descends to be deﬁned on the set F of equivalence classes. Taking into account the properties satisﬁed by members of an integral domain, 3 and it follows we check directly that addition is commutative and associative on F, that addition is commutative and associative on F. 3 and hence the The element (0, 1) is a two-sided identity for addition in F, class of (0, 1) is a two-sided identity for addition in F. We denote this class by 0. Let us identify this class. A pair (a, b) is in the class of (0, 1) if and only if 0 · b = 1 · a, hence if and only if a = 0. In other words, the class of (0, 1) consists of all (0, b) with b = 0. 3 we have (a, b) + (−a, b) = (ab + b(−a), bb) = (0, b2 ) ∼ (0, 1), and In F, therefore the class of (−a, b) is a two-sided inverse to the class of (a, b) under

380

VIII. Commutative Rings and Their Modules

addition. Consequently F is an abelian group under addition. 3 is (a, b)(c, d) = (ac, bd), and it is The deﬁnition of multiplication in F routine to see that this deﬁnition is consistent with the equivalence relation. Therefore multiplication descends to be deﬁned on F. We check by inspection 3 and it follows that it is that multiplication is commutative and associative on F, commutative and associative on F. The element (1, 1) is a two-sided identity for 3 and the class of (1, 1) is therefore a two-sided identity for multiplication in F, multiplication in F. We denote this class by 1. If (a, b) is not in the class 0, then a = 0, as we saw above. Then ab = 0, and we have (a, b)(b, a) = (ab, ab) ∼ (1, 1) = 1. Hence the class of (b, a) is a two-sided inverse of the class of (a, b) under multiplication. Consequently the nonzero elements of F form an abelian group under multiplication. For one of the distributive laws, the computation (a, b)((c, d) + (e, f )) = (a, b)(c f + de, d f ) = (a(c f + de), bd f ) = (ac f + ade, bd f ) ∼ (acb f + bdae, b2 d f ) = (ac, bd) + (ae, b f ) = (a, b)(c, d) + (a, b)(e, f ) shows that the classes of (a, b)((c, d) + (e, f )) and of (a, b)(c, d) + (a, b)(e, f ) are equal. The other distributive law follows from this one since F is commutative under multiplication. Therefore F is a ﬁeld. The ﬁeld F is called the ﬁeld of fractions of the integral domain R. The function η : R → F deﬁned by saying that η(r ) is the class of (η, 1) is easily checked to be a homomorphism of rings sending 1 to 1. It is one-one. Let us call it the canonical embedding of R into F. The pair (F, η) has the universal mapping property stated in Proposition 8.6 and illustrated in Figure 8.5. ϕ

R −−−→ F ⏐ ⏐ η ϕ F FIGURE 8.5. Universal mapping property of the ﬁeld of fractions of R. Proposition 8.6. Let R be a nonzero integral domain, let F be its ﬁeld of fractions, and let η be the canonical embedding of R into F. Whenever ϕ is a one-one ring homomorphism of R into a ﬁeld F carrying 1 to 1, then there exists ϕ η, and 3 ϕ is one-one a unique ring homomorphism 3 ϕ : F → F such that ϕ = 3 as a homomorphism of ﬁelds. ϕ is the extension of ϕ from R to F. Once this REMARK. We say that 3 proposition has been proved, it is customary to drop η from the notation and regard R as a subring of its ﬁeld of fractions.

3. Prime and Maximal Ideals

381

3 we deﬁne (a, b) = ϕ(a)ϕ(b)−1 . PROOF. If (a, b) with b = 0 is a pair in F, This is well deﬁned since b = 0 and since ϕ, being one-one, cannot have ϕ(b) = 0. Let us see that is consistent with the equivalence relation, i.e., that (a, b) ∼ (a , b ) implies (a, b) = (a , b ). Since (a, b) ∼ (a , b ), we have ab = a b and therefore also ϕ(a)ϕ(b ) = ϕ(a )ϕ(b) and (a, b) = ϕ(a)ϕ(b)−1 = ϕ(a )ϕ(b )−1 = (a , b ), as required. We can thus deﬁne 3 ϕ of the class of (a, b) to be (a, b), and 3 ϕ is well deﬁned ϕ (η(r )) = 3 ϕ (class of (r, 1)) = as a function from F to F . If r is in R, then 3 (r, 1) = ϕ(r )ϕ(1)−1 , and this equals ϕ(r ) since ϕ is assumed to carry 1 into 1. Therefore 3 ϕ η = ϕ. For uniqueness, let the class of (a, b) be given in F. Since b is nonzero, this class is the same as the class of (a, 1)(b, 1)−1 , which equals η(a)η(b)−1 . Since (3 ϕ η)(a) = ϕ(a) and (3 ϕ η)(b) = ϕ(b), we must have 3 ϕ (class of (a, b)) = ϕ. 3 ϕ (η(a))3 ϕ (η(b))−1 = ϕ(a)ϕ(b)−1 . Therefore ϕ uniquely determines 3 If K is a ﬁeld, then R = K[X ] is an integral domain, and Proposition 8.6 applies to this R. The ﬁeld of fractions consists in effect of formal rational expressions P(X )Q(X )−1 in the indeterminate X , with the expected identiﬁcations made. We write K(X ) for this ﬁeld of fractions. More generally the ﬁeld of fractions of the integral domain K[X 1 , . . . , X n ] consists of formal rational expressions in the indeterminates X 1 , . . . , X n , with the expected identiﬁcations made, and is denoted by K(X 1 , . . . , X n ). 3. Prime and Maximal Ideals In this section, R will denote a commutative ring, not necessarily having an identity. We shall introduce the notions of “prime ideal” and “maximal ideal,” and we shall investigate relationships between these two notions. A proper ideal I in R is prime if ab ∈ I implies a ∈ I or b ∈ I . The ideal I = R is not prime, by convention.5 We give three examples of prime ideals; a fourth example will be given in a proposition immediately afterward. EXAMPLES. (1) For Z, it was shown in an example just before Proposition 4.21 that each ideal is of the form mZ for some integer m. We may assume that m ≥ 0. The prime ideals are 0 and all pZ with p prime. To see this latter fact, consider mZ with m ≥ 2. If m = ab nontrivially, then neither a nor b is in I , but ab is in I ; hence I is not prime. Conversely if m is prime, and if ab is in I = mZ, then 5 This convention is now standard. Books written before about 1960 usually regarded I = R as a prime ideal. Correspondingly they usually treated the zero ring as an integral domain.

382

VIII. Commutative Rings and Their Modules

ab = mc for some integer c. Since m is prime, Lemma 1.6 shows that m divides a or m divides b. Hence a is in I or b is in I . Therefore I is prime. (2) If K is a ﬁeld, then each ideal in R = K[X ] is of the form A(X )K[X ] with A(X ) in K[X ], and A(X )K[X ] is prime if and only if A(X ) is 0 or is a prime polynomial. In fact, each ideal is of the form A(X )K[X ] by Proposition 5.8. If A(X ) is not a constant polynomial, then the argument that A(X )K[X ] is prime if and only if the polynomial A(X ) is prime proceeds as in Example 1, using Lemma 1.16 in place of Lemma 1.6. (3) In R = Z[X ], the structure of the ideals is complicated, and we shall not attempt to list all ideals. Let us observe simply that the ideal I = X Z[X ] is prime. In fact, if A(X )B(X ) is in X Z[X ], then A(X )B(X ) = XC(X ) for some C(X ) in Z[X ]. If the constant terms of A(X ) and B(X ) are a0 and b0 , this equation says that a0 b0 = 0. Therefore a0 = 0 or b0 = 0. In the ﬁrst case, A(X ) = X P(X ) for some P(X ), and then A(X ) is in I ; in the second case, B(X ) = X Q(X ) for some Q(X ), and then B(X ) is in I . We conclude that I is prime. Proposition 8.7. An ideal I in the commutative ring R is prime if and only if R/I is an integral domain. / I and PROOF. If a proper ideal I fails to be prime, choose ab in I with a ∈ b∈ / I . Then a + I and b + I are nonzero in R/I and have product 0 + I . So R/I is nonzero and has a zero divisor; by deﬁnition, R/I fails to be an integral domain. Conversely if R/I (is nonzero and) has a zero divisor, choose a + I and b + I nonzero with product 0 + I . Then neither a nor b is in I but ab is in I . Since I is certainly proper, I is not prime. A proper ideal I in the commutative ring R is said to be maximal if R has no proper ideal J with I J . If the commutative ring R has an identity, a simple way of testing whether an ideal I is proper is to check whether 1 is in I ; in fact, if 1 is in I , then I ⊇ R I ⊇ R1 = R implies I = R. Maximal ideals exist in abundance when R is nonzero and has an identity, as a consequence of the following result. Proposition 8.8. In a commutative ring R with identity, any proper ideal is contained in a maximal ideal. PROOF. This follows from Zorn’s Lemma (Section A5 of the appendix). Speciﬁcally let I be the given proper ideal, and form the set S of all proper ideals that contain I . This set is nonempty, containing I as a member, and we order it by inclusion upward. If we have a chain in S, then the union of the members of the chain is an ideal that contains all the ideals in the chain, and it is

3. Prime and Maximal Ideals

383

proper since it does not contain 1. Therefore the union of the ideals in the chain is an upper bound for the chain. By Zorn’s Lemma the set S has a maximal element, and any such maximal element is a maximal ideal containing I . Lemma 8.9. If R is a nonzero commutative ring with identity, then R is a ﬁeld if and only if the only proper ideal in R is 0. PROOF. If R is a ﬁeld and I is a nonzero ideal in R, let a = 0 be in I . Then 1 = aa −1 is in I , and consequently I = R. Conversely if the only ideals in R are 0 and R, let a = 0 be given in R, and form the ideal I = a R. Since 1 is in R, a is in I . Thus I = 0. Then I must be R. So there exists some b in R with 1 = ba, and a is exhibited as having the inverse b. Proposition 8.10. If R is a commutative ring with identity, then an ideal I is maximal if and only if R/I is a ﬁeld. REMARK. One can readily give a direct proof, but it seems instructive to give a proof reducing the result to Lemma 8.9. PROOF. We consider R and R/I as unital R modules, the ideals for each of R and R/I being the R submodules. The quotient ring homomorphism R → R/I is an R homomorphism. By the First Isomorphism Theorem for modules (Theorem 8.3), there is a one-one correspondence between the ideals in R containing I and the ideals in R/I . Then the result follows immediately from Lemma 8.9. Corollary 8.11. If R is a commutative ring with identity, then every maximal ideal is prime. PROOF. If I is maximal, then R/I is a ﬁeld by Proposition 8.10. Hence R/I is an integral domain, and I must be prime by Proposition 8.7. In the converse direction nonzero prime ideals need not be maximal, as the following example shows. However, Proposition 8.12 will show that nonzero prime ideals are necessarily maximal in certain important rings. EXAMPLE. In R = Z[X ], we have seen that I = X Z[X ] is a prime ideal. But I is not maximal since X Z[X ] + 2Z[X ] is a proper ideal that strictly contains I . Proposition 8.12. In R = Z or R = K[X ] with K a ﬁeld, every nonzero prime ideal is maximal. PROOF. Examples 1 and 2 at the beginning of this section show that every nonzero prime ideal is of the form I = p R with p prime. If such an I is given and if J is any ideal strictly containing I , choose a in J with a not in I . Since a

384

VIII. Commutative Rings and Their Modules

is not in I = p R, it is not true that p divides a. So p and a are relatively prime, and there exist elements x and y in R with x p + ya = 1, by Proposition 1.2c or 1.15d. Since p and a are in J , so is 1. Therefore J = R, and I is not strictly contained in any proper ideal. So I is maximal. EXAMPLE. Algebraic number ﬁelds Q[θ]. These were introduced brieﬂy in Chapter IV and again in Section 1 as the Q linear span of all powers 1, θ, θ 2 , . . . . Here θ is a nonzero complex number, and we make the assumption that Q[θ] is a ﬁnite-dimensional vector space over Q. Proposition 4.1 showed that Q[θ] is then indeed a ﬁeld. Let us see how this conclusion relates to the results of the present section. In fact, write a nontrivial linear dependence of 1, θ, θ 2 , . . . over Q in the form c0 + c1 θ + c2 θ 2 + · · · + cn−1 θ n−1 + θ n = 0. Without loss of generality, suppose that this particular linear dependence has n as small as possible among all such relations. Then θ is a root of P(X ) = c0 + c1 X + c2 X 2 + · · · + cn−1 X n−1 + X n . Consider the substitution homomorphism E : Q[X ] → C given by E(A(X )) = A(θ ). This ring homomorphism carries Q[X ] onto the ring Q[θ], and the kernel is some ideal I . Speciﬁcally I consists of all polynomials A(X ) with A(θ ) = 0, and P(X ) is one of these of lowest possible degree. Proposition 5.8 shows that I consists of all multiples of some polynomial, and that polynomial may be taken to be P(X ) by minimality of the integer n. Proposition 8.1 therefore shows that Q[θ] ∼ = Q[X ]/P(X )Q[X ] as a ring. If P(X ) were to have a nontrivial factorization as P(X ) = Q 1 (X )Q 2 (X ), then P(θ ) = 0 would imply Q 1 (θ ) = 0 or Q 2 (θ ) = 0, and we would obtain a contradiction to the minimality of n. Therefore P(X ) is prime. By Example 2 earlier in the section, I = P(X )Q[X ] is a nonzero prime ideal, and Proposition 8.12 shows that it is maximal. By Proposition 8.10 the quotient ring Q[θ] = Q[X ]/P(X )Q[X ] is a ﬁeld. These computations with Q[θ] underlie the ﬁrst part of the theory of ﬁelds that we shall develop in Chapter IX.

4. Unique Factorization We have seen that the positive members of Z and the nonzero members of K[X ], when K is a ﬁeld, factor into the products of “primes” and that these factorizations are unique up to order and up to adjusting each of the prime factors in K[X ] by a unit. In this section we shall investigate this idea of unique factorization more generally. Zero divisors are problematic from the point of view of factorization, and it will be convenient to exclude them. Therefore we work exclusively with integral domains.

4. Unique Factorization

385

The ﬁrst observation is that unique factorization is not a completely general notion for integral domains. Let us consider an example in detail. √ members EXAMPLE. R √ √ are of √ = Z[ −5 ]. This is the subring of C whose the form a + b −5 with√a and b integers. Since (a + b −5 )(c + d −5 ) = (ac − 5cd) + (ad + bc) −5, R is closed under multiplication and is indeed a √ √ √ subring. Deﬁne N (a + b −5 ) = a 2 + 5b2 = (a + b −5 )(a + b −5). This is a nonnegative-integer-valued function on R and is 0 only on the 0 element of R. Since complex conjugation is an automorphism of C, we check immediately that √ √ √ √ N (a + b −5 )(c + d −5 ) = N (a + b −5 )N (c + d −5 ). The group of units of R, i.e., of elements with inverses under multiplication, is denoted by R × as usual. If r is in R × , then rr −1 = 1, and so N (r )N (r −1 ) = N (1) = 1. Consequently the units r of R all have N (r ) = 1. Setting a 2 +5b2 = 1, we see that the units are ±1. The product formula for N shows that if we start factoring a member of R, then factor its factors, and so on, and if we forbid factorizations into two factors when one is a unit, then the process of factorization has to stop at some point. So complete factorization makes sense. Now consider the equality √ √ 6 = (1 + −5 )(1 − −5 ) = 2 · 3. √ √ The factors here have N (1 + −5 ) = N (1 − −5 ) = 6, N (2) = 4, and N (3) = 9. Considering the possible values of a 2 + 5b2 , we see√that N ( · )√does not take on either of the values 2 and 3 on R. Consequently 1 + −5, 1 − −5, 2, and 3 do not have nontrivial factorizations. On the other hand, consideration √ of the values of N ( · ) shows that 2 and 3 are not products of either of 1 ± −5 with units. We conclude that the displayed factorizations of 6 show that unique factorization has failed. Thus unique factorization is not universal for integral domains. It is time to be careful about terminology. With Z and K[X ], we have referred to the individual factors in a complete factorization as “primes.” Their deﬁning property in Chapter I was that they could not be factored further in nontrivial fashion. Primes in these rings were shown to have the additional property that if a prime divides a product then it divides one of the factors. It is customary to separate these two properties for general integral domains. Let us say that a nonzero element a divides b if b = ac for some c. In this case we say also that a is a factor of b. In an integral domain R, a nonzero element r that is not a unit is said to be irreducible if every factorization r = r1r2 in R has the property that either r1 or r2 is a unit. Nonzero nonunits that are not irreducible are said

386

VIII. Commutative Rings and Their Modules

to be reducible. A nonzero element p that is not a unit is said to be prime6 if the condition that p divides a product ab always implies that p divides a or p divides b. Prime implies irreducible. In fact, if p is a prime that is reducible, let us write p = r1r2 with neither r1 nor r2 equal to a unit. Since p is prime, p divides r1 or r2 , say r1 . Then r1 = pc with c in R, and we obtain p = r1r2 = pcr2 . Since R is an integral domain, 1 = cr2 , and r2 is exhibited as a unit with inverse c, in contradiction to the assumption that r2 is not a unit. On the other hand, √ irreducible does not imply √ prime. In fact, we saw √ in √ Z[ −5 ] that 1 + −5 is irreducible. But 1 + √ −5 divides 2 · 3, and 1 + −5 does not divide either of 2 or 3. Therefore 1 + −5 is not prime. We shall see in a moment that the distinction between “irreducible” and “prime” lies at the heart of the question of unique factorization. Let us make a deﬁnition that helps identify our problem precisely. We say that an integral domain R is a unique factorization domain if R has the two properties (UFD1) every nonzero nonunit of R is a ﬁnite product of irreducible elements, (UFD2) the factorization in (UFD1) is always unique up to order and to multiplication of the factors by units. The problem that arises for us for a given R is to decide whether R is a unique factorization domain. The following proposition shows the relevance of the distinction between “irreducible” and “prime.” Proposition 8.13. In an integral domain R in which (UFD1) holds, the condition (UFD2) is equivalent to the condition (UFD2 ) every irreducible element is prime. REMARKS. In fact, showing that irreducible implies prime was the main step in Chapter I in proving unique factorization for positive integers and for K[X ] when K is a ﬁeld. The mechanism for carrying out the proof that irreducible implies prime for those settings will be abstracted in Theorems 8.15 and 8.17. PROOF. Suppose that (UFD2) holds, that p is an irreducible element, and that p divides ab. We are to show that p divides a orp divides b. We may assume that ab = 0. Write ab = pc, and let a = i pi , b = j p j , and c = k qk be factorizations via (UFD1) into products of irreducible elements. 6 This deﬁnition enlarges the deﬁnition of “prime” in Z to include the negatives of the usual prime numbers. Unique factorization immediately extends to nonzero integers of either sign, but the prime factors are now determined only up to factors of ±1. In cases where confusion about the sign of an integer prime might arise, the text will henceforth refer to “primes of Z” or “integer primes” when both signs are allowed, and to “positive primes” or “prime numbers” when the primes are understood to be as in Chapter I.

4. Unique Factorization

387

Then i, j pi p j = p k qk . By (UFD2) one of the factors on the left side is εp for some unit ε. Then p either is of the form ε −1 pi and then p divides a, or is of the form ε −1 p j and then p divides b. Hence (UFD2 ) holds. Conversely suppose that (UFD2 ) holds. Let the nonzero nonunit r have two factorizations into irreducible elements as r = p1 p2 · · · pm = ε0 q1 q2 · · · qn with m ≤ n and with ε0 a unit. We prove the uniqueness by induction on m, the case m = 0 being trivial and the case m = 1 following from the deﬁnition of “irreducible.” Inductively from (UFD2 ) we know that pm divides qk for some k. Since qk is irreducible, qk = εpm for some unit ε. Thus we can cancel qk and obtain p1 p2 · · · pm−1 = ε0 εq1 q2 · · · qk · · · qn , the hat indicating an omitted factor. By induction the factors on the two sides here are the same except for order and units. Thus the same conclusion is valid when comparing the two sides of the equality p1 p2 · · · pm = ε0 q1 q2 · · · qn . The induction is complete, and (UFD2) follows. It will be convenient to simplify our notation for ideals. In any commutative ring R with identity, if a is in R, we let (a) denote the ideal Ra generated by a. An ideal of this kind with a single generator is called a principal ideal. More generally, if a1 , . . . , an are members of R, then (a1 , . . . , an ) denotes the ideal Ra1 + · · · + Ran generated by a1 , . . . , an . For example, in Z[X ], (2, X ) denotes the ideal 2Z + X Z of all polynomials whose constant term is even. The following condition explains a bit the mystery of what it means for an element to be prime. Proposition 8.14. A nonzero element p in an integral domain R is prime if and only if the ideal ( p) in R is prime. PROOF. Suppose that the element p is prime. Then the ideal ( p) is not R; in fact, otherwise 1 would have to be of the form 1 = r p for some r ∈ R, r would be a multiplicative inverse of p, and p would be a unit. Now suppose that a product ab is in the ideal ( p). Then ab = pr for some r in R, and p divides ab. Since p is prime, p divides a or p divides b. Therefore the ideal ( p) is prime. Conversely suppose that ( p) is a prime ideal with p = 0. Since ( p) = R, p is not a unit. If p divides the product ab, then ab = pc for some c in R. Hence ab is in ( p). Since ( p) is assumed prime, either a is in ( p) or b is in ( p). In the ﬁrst case, p divides a, and in the second case, p divides b. Thus the element p is prime. An integral domain R is called a principal ideal domain if every ideal in R is principal. At the beginning of Section 3, we saw a reminder that Z is a principal ideal domain and that so is K[X ] whenever K is a ﬁeld. It turns out that unique factorization for these cases is a consequence of this fact.

388

VIII. Commutative Rings and Their Modules

Theorem 8.15. Every principal ideal domain is a unique factorization domain. REMARKS. Let R be the given principal ideal domain. Proposition 8.13 shows that it is enough to show that (UFD1) and (UFD2 ) hold in R. PROOF OF (UFD1). Let a1 be a nonzero nonunit of R. If a1 is not irreducible, then a1 has a factorization a1 = a2 b2 in which neither a2 nor b2 is a unit. If a2 is not irreducible, then a2 has a factorization a2 = a3 b3 in which neither a3 nor b3 is a unit. We continue in this way as long as it is possible to do so. Let us see that this process cannot continue indeﬁnitely. Assume the contrary. The equality a1 = a2 b2 with b2 not a unit says that a1 is in the ideal (a2 ) and a2 is not in the ideal (a1 ). Arguing in this way with a2 , a3 , and so on, we obtain (a1 ) (a2 ) (a3 ) · · · . Let I = ∞ n=1 (an ). Then I is an ideal. Since R is a principal ideal domain, I = (a) for some a. This element a must be in (ak ) for some k, and then we have (ak ) = (ak+1 ) = · · · = (a). This is a contradiction, and hence the process does not continue indeﬁnitely. Therefore some irreducible element c1 , namely the element ak in the above argument, divides a1 . Write a1 = c1 a2 , and repeat the above argument with a2 . Iterating this construction, we obtain an = cn an+1 for each n with cn irreducible. Thus a1 = c1 c2 · · · cn an+1 with c1 , . . . , cn irreducible. Let us see that this process cannot continue indeﬁnitely. Assuming the contrary, we are led to the strict inclusions (a1 ) (a2 ) (a3 ) · · · . Again we cannot have such an inﬁnite chain of strict inclusions in a principal ideal domain, and we must have (an ) = (an+1 ) at some stage. Then cn has to be a unit, contradiction. Thus an has no nontrivial factorization, and a1 = c1 · · · cn−1 an is the desired factorization. This proves (UFD1). PROOF OF (UFD2 ). If p is an irreducible element, we prove that the ideal ( p) is maximal. Corollary 8.11 shows that ( p) is prime, and Proposition 8.14 shows that p is prime. Thus (UFD2 ) will follow. The element p, being irreducible, is not a unit. Thus ( p) is proper. Suppose that I is an ideal with I ( p). Since R is a principal ideal domain, I = (c) for some c. Then p = r c for some r in R. Since I = ( p), r cannot be a unit. Therefore the irreducibility of p implies that c is a unit. Then I = (c) = (1) = R, and we conclude that ( p) is maximal. Let us record what is essentially a corollary of the proof.

4. Unique Factorization

389

Corollary 8.16. In a principal ideal domain, every nonzero prime ideal is maximal. PROOF. Let ( p) be a nonzero prime ideal. Proposition 8.14 shows that p is prime, and prime elements are automatically irreducible. The proof of the uniqueness part of Theorem 8.15 then deduces in the context of a principal ideal domain that ( p) is maximal. Principal ideal domains arise comparatively infrequently, and recognizing them is not necessarily easy. The technique that was used with Z and K[X ] generalizes slightly, and we take up that generalization now. An integral domain R is called a Euclidean domain if there exists a function δ : R → {integers ≥ 0} such that whenever a and b are in R with b = 0, there exist q and r in R with a = bq + r and δ(r ) < δ(b). The ring Z of integers is a Euclidean domain if we take δ(n) = |n|, and the ring K[X ] for K a ﬁeld is a Euclidean domain if we take δ(P(X )) to be 2deg P if P(X ) = 0 and to be 0 if P(X ) =√0. √ Another example of a Euclidean √ Z[ −1 ]√= Z + Z 2−1 of √domain is the ring = (a + b −1 )(a − b −1 ) = a + b2 , Gaussian integers. It has δ(a + b −1 ) √ a and b being integers. Let us abbreviate −1 as i. To see that δ has the required property, we ﬁrst extend δ to Q[i], writing δ(x + yi) = (x + yi)(x − yi) = x 2 + y 2 if x and y are rational. We use the fact that δ(zz ) = δ(z)δ(z )

for z and z in Q[i],

which follows from the computation δ(zz ) = zz · zz = zzz z = δ(z)δ(z ). For real number u, let [u] be the greatest integer ≤ u. Every real u satisﬁes any [u + 1 ] − u ≤ 1 . Given a + ib and c + di with c + di = 0, we write 2 2 a + bi (a + bi)(c − di) ac + bd bc − ad = = 2 + 2 i. 2 2 2 c + di c +d c +d c + d2 < ; < ; 1 bc−ad 1 + + , q = Put p = ac+bd 2 2 , and r +si = (a +bi)−(c+di)( p +qi). c2 +d 2 c2 +d 2 Then a + bi = (c + di)( p + qi) + (r + si), and a + bi − ( p + qi) . δ(r + si) = δ (a + bi) − (c + di)( p + qi) = δ(c + di)δ c + di ac+bd bc−ad The complex number x + yi = a+bi c+di − ( p + qi) = c2 +d 2 − p + c2 +d 2 − q i has |x| ≤ 12 and |y| ≤ 12 , and therefore δ(x + yi) = x 2 + y 2 ≤ 14 + 14 = 12 . Hence δ(r + si) < δ(c + di), as required.

390

VIII. Commutative Rings and Their Modules

Some further examples of this kind appear in Problems 13 √ and 25–26 at the end of the chapter. The matter is a little√delicate. The ring Z[ −5 ] may seem √ superﬁcially similar to Z[ −1 ]. But Z[ −5 ] does not have unique factorization, and√the following theorem, in combination with Theorem 8.15, assures us that Z[ −5 ] cannot be a Euclidean domain. Theorem 8.17. Every Euclidean domain is a principal ideal domain. PROOF. Let I be an ideal in R. We are to show that I is principal. Without loss of generality, we may assume that I = 0. Choose b = 0 in I with δ(b) as small as possible. Certainly I ⊇ (b). If a = 0 is in I , write a = bq + r with δ(r ) < δ(b). Then r = a − bq is in I with δ(r ) < δ(b). The minimality of b forces r = 0 and a = bq. Thus I ⊆ (b), and we conclude that I = (b).

5. Gauss’s Lemma In the previous section we saw that every principal ideal domain has unique factorization. In the present section we shall establish that certain additional integral domains have unique factorization, namely any integral domain R[X ] for which R is a unique factorization domain. A prototype is Z[X ], which will be seen to have unique factorization even though there exist nonprincipal ideals like (2, X ) in the ring. An important example for applications, particularly in algebraic geometry, is K[X 1 , . . . , X n ], where K is a ﬁeld; in this case our result is to be applied inductively, making use of the isomorphism K[X 1 , . . . , X n ] ∼ = K[X 1 , . . . , X n−1 ][X n ] given in Corollary 4.31. For the conclusion that R[X ] has unique factorization if R does, the heart of the proof is application of a result known as Gauss’s Lemma, which we shall prove in this section. Gauss’s Lemma has additional consequences for R[X ] beyond unique factorization, and we give them as well. Before coming to Gauss’s Lemma, let us introduce some terminology and prove one preliminary result. In any integral domain R, we call two nonzero elements a and b associates if a = bε for some ε in the group R × of units. The property of being associates is an equivalence relation because R × is a group. Still with the nonzero integral domain R, let us deﬁne a greatest common divisor of two nonzero elements a and b to be any element c of R such that c divides both a and b and such that any divisor of a and b divides c. Any associate of a greatest common divisor of a and b is another greatest common divisor of a and b. Conversely if a and b have a greatest common divisor, then any two greatest common divisors are associates. In fact, if c and c are greatest common divisors, then each of them divides both a and b, and the deﬁnition forces each

5. Gauss's Lemma

391

of them to divide the other. Thus c = cε and c = c ε , and then c = c ε ε and 1 = ε ε. Consequently ε is a unit, and c and c are associates. If R is a unique factorization domain, then any two nonzero elements a and b have a greatest common divisor. In fact, we decompose a and b into the product m piki and of a unitby powers of nonassociate irreducible elements as a = ε i=1 n lj b = ε j=1 p j . For each p j such that p j is associate to some pi , we replace p j by pi in the factorization of b, adjusting ε as necessary, and then we reorder the factors of a and b so that the common pi ’s are the ones for 1 ≤ i ≤ r . Then c = ri=1 pimin(ki ,li ) is a greatest common divisor of a and b. We write GCD(a, b) for a greatest common divisor of a and b; as we saw above, this is well deﬁned up to a factor of a unit.7 One should not read too much into the notation. In a principal ideal domain if a and b are nonzero, then, as we shall see momentarily, GCD(a, b) is deﬁned by the condition on ideals that (GCD(a, b)) = (a, b). This condition implies that there exist elements x and y in R such that xa + yb = GCD(a, b). However, in the integral domain Z[X ], in which GCD(2, X ) = 1, there do not exist polynomials A(X ) and B(X ) with A(X )2 + B(X )X = 1. To prove that (GCD(a, b)) = (a, b) in a principal ideal domain, write (c) for the principal ideal (a, b); c satisﬁes c = xa + yb for some x and y in R. Since a and b lie in (c), a = r c and b = r c. Hence c divides both a and b. In the reverse direction if d divides a and b, then ds = a and ds = b. Hence c = xa + yb = (xs + ys )d, and d divides c. So c is indeed a greatest common divisor of a and b. In a unique factorization domain the deﬁnition of greatest common divisor immediately extends to apply to n nonzero elements, rather than just two. We readily check up to a unit that GCD(a1 , . . . , an ) = GCD GCD(a1 , . . . , an−1 ), an . Moreover, we can allow any of a2 , . . . , an to be 0, and there is no difﬁculty. In addition, we have GCD(da1 , . . . , dan ) = d GCD(a1 , . . . , an )

up to a unit

if d and a1 are not 0. Let R be a unique factorization domain. If A(X ) is a nonzero element of R[X ], we say that A(X ) is primitive if the GCD of its coefﬁcients is a unit. In this case no prime of R divides all the coefﬁcients of A(X ). 7 Greatest common divisors can exist for certain integral domains that fail to have unique factorization, but we shall not have occasion to work with any such domains.

392

VIII. Commutative Rings and Their Modules

Theorem 8.18 (Gauss’s Lemma). If R is a unique factorization domain, then the product of primitive polynomials is primitive. PROOF #1. Arguing by contradiction, let A(X ) = am X m + · · · + a0 and B(X ) = bn X n + · · · + b0 be primitive polynomials such that every coefﬁcient of A(X )B(X ) is divisible by some prime p. Since A(X ) and B(X ) are primitive, we may choose k and l as small as possible such that p does not divide ak and does not divide bl . The coefﬁcient of X k+l in A(X )B(X ) is a0 bk+l + a1 bk+l−1 + · · · + ak bl + · · · + ak+l b0 and is divisible by p. Then all the individual terms, and their sum, are divisible by p except possibly for ak bl , and we conclude that p divides ak bl . Since p is prime and p divides ak bl , p must divide ak or bl , contradiction. PROOF #2. Arguing by contradiction, let A(X ) and B(X ) be primitive polynomials such that every coefﬁcient of A(X )B(X ) is divisible by some prime p. Proposition 8.14 shows that the ideal ( p) is prime, and Proposition 8.7 shows that R = R/( p) is an integral domain. Let ϕ : R → R [X ] be the composition of the quotient homomorphism R → R and the inclusion of R into constant polynomials in R [X ], and let : R[X ] → R [X ] be the corresponding substitution homomorphism of Proposition 4.24 that carries X to X . Since A(X ) and B(X ) are primitive, (A(X )) and (B(X )) are not zero. Their product (A(X ))(B(X )) = (A(X )B(X )) is 0 since p divides every coefﬁcient of A(X )B(X ), and this conclusion contradicts the assertion of Proposition 4.29 that R [X ] is an integral domain. Let F be the ﬁeld of fractions of the unique factorization domain R. The consequences of Theorem 8.18 exploit a simple relationship between R[X ] and F[X ], which we state below as Proposition 8.19. Once that proposition is in hand, we can state the consequences of Theorem 8.18. If A(X ) is a nonzero polynomial in R[X ], let c(A) to be the greatest common divisor of the coefﬁcients, i.e., c(A) = GCD(an , . . . , a1 , a0 )

if A(X ) = an X n + · · · + a1 X + a0 .

The element c(A) is well deﬁned up to a factor of a unit. In this notation the deﬁnition of “primitive” becomes, A(X ) is primitive if and only if c(A) is a unit. If A(X ) is not necessarily primitive, then at least c(A) divides each coefﬁcient of A(X ), and hence c(A)−1 A(X ) is in R[X ], say with coefﬁcients bn , . . . , b1 , b0 . Then we have c(A) = GCD(an , . . . , a1 , a0 ) = GCD(c(A)bn , . . . , c(A)b1 , c(A)b0 ) = c(A)GCD(bn , . . . , b1 , b0 ) = c(A)c c(A)−1 A(X )

5. Gauss's Lemma

393

up to a unit factor, and hence c c(A)−1 A(X ) is a unit. We conclude that A(X ) ∈ R[X ]

implies that

c(A)−1 A(X ) is primitive.

Proposition 8.19. Let R be a unique factorization domain, and let F be its ﬁeld of fractions. If A(X ) is any nonzero polynomial in F[X ], then there exist α in F and A0 (X ) in R[X ] such that A(X ) = α A0 (X ) with A0 (X ) primitive. The scalar α and the polynomial A0 (X ) are unique up to multiplication by units in R. REMARK. We call A0 (X ) the associated primitive polynomial to A(X ). According to the proposition, it is unique up to a unit factor in R. PROOF. Let A(X ) = cn X n + · · · + c1 X + c0 with each ck in F. We can write −1 That is, each ck as a k bk with ak and bk in R and bk = 0. We clear fractions. n th we let β = k=0 bk . Then the k coefﬁcient of β A(X ) is ak l =k bl and is in R. Hence β A(X ) is in R[X ]. The observation just before the proposition shows that c(β A)−1 β A is primitive. Thus A(X ) = α A0 (X ) with α = β −1 c(β A) and A0 (X ) = c(β A)−1 β A(X ), A0 (X ) being primitive. This proves existence. If α1 A1 (X ) = α2 A2 (X ) with α1 and α2 in F and with A1 (X ) and A2 (X ) primitive, choose r = 0 in R such that r α1 and r α2 are in R. Up to unit factors in R, we then have r α1 = r α1 c(A1 ) = c(r α1 A1 ) = c(r α2 A2 ) = r α2 c(A2 ) = r α2 . Hence, up to a unit factor in R, we have α1 = α2 . This proves uniqueness. Corollary 8.20. Let R be a unique factorization domain, and let F be its ﬁeld of fractions. (a) Let A(X ) and B(X ) be nonzero polynomials in R[X ], and suppose that B(X ) is primitive. If B(X ) divides A(X ) in F[X ], then it divides A(X ) in R[X ]. (b) If A(X ) is an irreducible polynomial in R[X ] of degree > 0, then A(X ) is irreducible in F[X ]. (c) If A(X ) is a monic polynomial in R[X ] and if B(X ) is a monic factor of A(X ) within F[X ], then B(X ) is in R[X ]. (d) If A(X ), B(X ), and C(X ) are in R[X ] with A(X ) primitive and with A(X ) = B(X )C(X ), then B(X ) and C(X ) are primitive. PROOF. In (a), write A(X ) = B(X )Q(X ) in F(X ), and let Q(X ) = ρ Q 0 (X ) be a decomposition of Q(X ) as in Proposition 8.19. Since c(A)−1 A(X ) is primitive, the corresponding decomposition of A(X ) is A(X ) = c(A) c(A)−1 A(X ) . The equality A(X ) = ρ B(X )Q 0 (X ) then reads c(A)(c(A)−1 A(X )) = ρ B(X )Q 0 (X ). Since B(X )Q 0 (X ) is primitive according to Theorem 8.18, the uniqueness in Proposition 8.19 shows that c(A)−1 A(X ) = B(X )Q 0 (X ) except possibly for a unit factor in R. Then B(X ) divides A(X ) with quotient c(A)Q 0 (X ), apart from a unit factor in R. Since c(A)Q 0 (X ) is in R[X ], (a) is proved.

394

VIII. Commutative Rings and Their Modules

In (b), the condition that deg A(X ) > 0 implies that A(X ) is not a unit in F[X ]. Arguing by contradiction, suppose that A(X ) = B(X )Q(X ) in F[X ] with neither of B(X ) and Q(X ) of degree 0. Let B(X ) = β B0 (X ) be a decomposition of B(X ) as in Proposition 8.19. Then we have A(X ) = B0 (X )(β Q(X )), and (a) shows that β Q(X ) is in R[X ], in contradiction to the assumed irreducibility of A(X ) in R[X ]. In (c), write A(X ) = B(X )Q(X ), and let B(X ) = β B0 (X ) be a decomposition of B(X ) as in Proposition 8.19. Then we have A(X ) = B0 (X )(β Q(X )) with β Q(X ) in F[X ]. Conclusion (a) shows that β Q(X ) is in R[X ]. If b ∈ R is the leading coefﬁcient of B0 (X ) and if q ∈ R is the leading coefﬁcient of β Q(X ), then we have 1 = bq, and consequently b and q are units in R. Since B(X ) = β B0 (X ) and B(X ) is monic, 1 = βb, and therefore β = b−1 is a unit in R. Hence B(X ) is in R[X ]. In (d), we argue along the same lines as in (a). We may take B(X ) = c(B)(c(B)−1 B(X )) and C(X ) = c(C)(c(C)−1 C(X )) as decompositions of B(X ) and *C(X ) according to Proposition 8.19. Then we have A(X ) = + (c(B)c(C)) c(B)−1 B(X )c(C)−1 C(X ) . Theorem 8.18 says that the factor in brackets is primitive, and the uniqueness in Proposition 8.19 shows that 1 = c(B)c(C), up to unit factors. Therefore c(B) and c(C) are units in R, and B(X ) and C(X ) are primitive. Corollary 8.21. If R is a unique factorization domain, then the ring R[X ] is a unique factorization domain. REMARK. As was mentioned at the beginning of the section, Z[X ] and K[X 1 , . . . , X n ], when K is a ﬁeld, are unique factorization domains as a consequence of this result. PROOF. We begin with the proof of (UFD1). Suppose that A(X ) is a nonzero member of R[X ]. We may take its decomposition according to Proposition 8.19 to be A(X ) = c(A)(c(A)−1 A(X )). Consider divisors of c(A)−1 A(X ) in R[X ]. These are all primitive, according to (d). Hence those of degree 0 are units in R. Thus any nontrivial factorization of c(A)−1 A(X ) is into two factors of strictly lower degree, both primitive. In a ﬁnite number of steps, this process of factorization with primitive factors has to stop. We can then factor c(A) within R. Combining the factorizations of c(A) and c(A)−1 A(X ), we obtain a factorization of A(X ). For (UFD2 ), let P(X ) be irreducible in R[X ]. Since the factorization P(X ) = c(P)(c(P)−1 P(X )) has to be trivial, either c(P) is a unit, in which case P(X ) is primitive, or c(P)−1 P(X ) is a unit, in which case P(X ) has degree 0. In either case, suppose that P(X ) divides a product A(X )B(X ). In the ﬁrst case, P(X ) is primitive. Since F[X ] is a principal ideal domain, hence a unique factorization domain, either P(X ) divides A(X ) in F[X ] or P(X )

5. Gauss's Lemma

395

divides B(X ) in F[X ]. By symmetry we may assume that P(X ) divides A(X ) in F[X ]. Then (a) shows that P(X ) divides A(X ) in R[X ]. In the second case, P(X ) = P has degree 0 and is prime in R. Write A(X )B(X ) = P Q(X ) with Q(X ) in R[X ]. Once more we argue along the same lines as in (a). We may take A(X ) = c(A)(c(A)−1 A(X )), B(X ) = c(B)(c(B)−1 B(X )), and Q(X ) = c(Q)(c(Q)−1 Q(X )) as the decompositions of A(X ), B(X ), and Q(X ) according to Proposition 8.19. Then we have + * c(A)c(B) c(A)−1 A(X )c(B)−1 B(X ) = Pc(Q) c(Q)−1 Q(X ) . Theorem 8.18 shows that the product in brackets is primitive, and the uniqueness in Proposition 8.19 shows that we have c(A)c(B) = Pc(Q) up to factors of units in R. Since P is prime in R, P divides c(A) or P divides c(B). By symmetry we may assume that P divides c(A). Then P divides A(X ) since c(A) divides every coefﬁcient of A(X ). The ﬁnal application, Eisenstein’s irreducibility criterion, is proved somewhat in the style of Gauss’s Lemma (Theorem 8.18). We shall give only the analog of Proof #1 of Gauss’s Lemma, leaving the analog of Proof #2 to Problem 21 at the end of the chapter. Corollary 8.22 (Eisenstein’s irreducibility criterion). Let R be a unique factorization domain, let F be its ﬁeld of fractions, and let p be a prime in R. If A(X ) = a N X N + · · · + a1 X + a0 is a polynomial of degree ≥ 1 in R[X ] such that p divides a N −1 , . . . , a0 but not a N and such that p 2 does not divide a0 , then A(X ) is irreducible in F[X ]. REMARK. The polynomial A(X ) will be irreducible in R[X ] also unless all its coefﬁcients are divisible by some nonunit of R. PROOF. Without loss of generality, we may replace A(X ) by c(A)−1 A(X ) and thereby assume that A(X ) is primitive. Corollary 8.20b shows that it is enough to prove irreducibility in R[X ]. Assuming the contrary, suppose that A(X ) factors in R[X ] as A(X ) = B(X )C(X ) with B(X ) = bm X m + · · · + b1 X + b0 , C(X ) = cn X n + · · · + c1 X + c0 , and neither of B(X ) and C(X ) equal to a unit. Corollary 8.20d shows that B(X ) and C(X ) are primitive. In particular, B(X ) and C(X ) have to be nonconstant polynomials. Deﬁne ak = 0 for k > N , bk = 0 for k > m, and ck = 0 for k > n. Since p divides a0 = b0 c0 and p is prime, p divides either b0 or c0 . Without loss of generality, suppose that p divides b0 . Since p 2 does not divide a0 , p does not divide c0 . We show, by induction on k, that p divides bk for every k < N . The case k = 0 is the base case of the induction. If p divides b j for j < k, then we have ak = b0 ck + b1 ck−1 + · · · + bk−1 c1 + bk c0 .

396

VIII. Commutative Rings and Their Modules

Since k < N , the left side is divisible by p. The inductive hypothesis shows that p divides every term on the right side except possibly the last. Consequently p divides bk c0 . Since p does not divide c0 , p divides bk . This completes the induction. Since C(X ) is nonconstant, the degree of B(X ) is < N , and therefore we have shown that every coefﬁcient of B(X ) is divisible by p. Then c(B) is divisible by p, in contradiction to the fact that B(X ) is primitive. EXAMPLES. (1) Cyclotomic polynomials in Q[X ]. Let us see for each prime number p that the polynomial (X ) = X p−1 + X p−2 + · · · + X + 1 is irreducible in Q[X ]. p −1 = We have X p − 1 = (X − 1)(X ). Replacing X − 1 by Y gives 1) p (Y p+ k Y (Y + 1). The left side, by the Binomial Theorem, is k=1 k Y . Hence p p k−1 (Y + 1) = . The binomial coefﬁcient kp is divisible by p k=1 k Y for 1 ≤ k ≤ p − 1 since p is prime, and therefore the polynomial (Y ) = (Y + 1) satisﬁes the condition of Corollary 8.22 for the ring Z. Hence (Y ) is irreducible over Q[Y ]. A nontrivial factorization of (X ) would yield a nontrivial factorization of (Y ), and hence (X ) is irreducible over Q[X ]. (2) Certain polynomials in K[X, Y ] when K is a ﬁeld. Since K[X, Y ] ∼ = K[X ][Y ], it follows that K[X, Y ] is a unique factorization domain, and any member of K[X, Y ] can be written as A(X, Y ) = an (X )Y n + · · · + a1 (X )Y + a0 (X ). The polynomial X is prime in K[X, Y ], and Corollary 8.22 therefore says that A(X, Y ) is irreducible in K(X )[Y ] if X does not divide an (X ) in K[X ], X divides an−1 (X ), . . . , a0 (X ) in K[X ], and X 2 does not divide a0 (X ) in K[X ]. The remark with the corollary points out that A(X, Y ) is irreducible in K[X, Y ] if also there is no nonconstant polynomial in K[X ] that divides every ak (X ). For example, Y 5 + X Y 2 + X Y + X is irreducible in K[X, Y ]. 6. Finitely Generated Modules The Fundamental Theorem of Finitely Generated Abelian Groups (Theorem 4.56) says that every ﬁnitely generated abelian group is a direct sum of cyclic groups. If we think of abelian groups as Z modules, we can ask whether this theorem has some analog in the context of R modules. The answer is yes—the theorem readily extends to the case that Z is replaced by an arbitrary principal ideal domain. The surprising addendum to the answer is that we have already treated a second special case of the generalized theorem. That case arises when the principal ideal domain is K[X ] for some ﬁeld K. If V is a ﬁnite-dimensional vector space over K and L : V → V is a K linear map, then V becomes a K[X ] module under the deﬁnition X v = L(v). This module is ﬁnitely generated even without the X present because V is ﬁnite-dimensional, and the generalized theorem that we

6. Finitely Generated Modules

397

prove in this section recovers the analysis of L that we carried out in Chapter V. When K is algebraically closed, we obtain the Jordan canonical form; for general K, we obtain a different canonical form involving cyclic subspaces that was worked out in Problems 32–40 at the end of Chapter V. The deﬁnitions for the generalization of Theorem 4.56 are as follows. Let R be a principal ideal domain. A subset S of an R module M is called a set of generators of M if M is the smallest R submodule of M containing all the members of S. If {m s | s ∈ S} is a subset of M, then the set of all ﬁnite sums s∈S rs m s is an R submodule, but it need not contain the elements m s and therefore need not be the R submodule generated by all the m s . However, if M and all other rs equal to 0 exhibits m s0 as in the R is unital, then taking rs0 = 1 submodule of all ﬁnite sums s∈S rs m s . For this reason we shall insist that all the R submodules in this section be unital. We say that the R module M is ﬁnitely generated if it has a ﬁnite set of generators. The main theorem gives the structure of unital ﬁnitely generated R modules when R is a principal ideal domain. We need to take a small preliminary step that eliminates technical complications from the discussion, the same step that was carried out in Lemma 4.51 and Proposition 4.52 in the case of Z modules, i.e., abelian groups. Lemma 8.23. Let R be a commutative ring with identity, and let ϕ : M → N be a homomorphism of unital R modules. If ker ϕ and image ϕ are ﬁnitely generated, then M is ﬁnitely generated. PROOF. Let {x1 , . . . , xm } and {y1 , . . . , yn } be respective ﬁnite sets of generators for ker ϕ and image ϕ. For 1 ≤ j ≤ n, choose x j in M with ϕ(x j ) = yj . We shall prove that {x1 , . . . , xm , x1 , . . . , xn } is a set of generators for M. Thus let x be in M. Since ϕ(x) is in image ϕ, there exist r1 , . . . , rn in R with ϕ(x) = r1 y1 +· · ·+rn yn . The element x = r1 x1 + · · · + rn xn of M has ϕ(x ) = r1 y1 + · · · + rn yn = ϕ(x). Therefore ϕ(x − x ) = 0, and there exist s1 , . . . , sm in R such that x − x = s1 x1 + · · · + sm xm . Consequently x = s1 x1 + · · · + sm xm + x = s1 x1 + · · · + sm xm + r1 x1 + · · · + rn xn . Proposition 8.24. If R is a principal ideal domain, then any R submodule of a ﬁnitely generated unital R module is ﬁnitely generated. Moreover, any R submodule of a singly generated unital R module is singly generated. PROOF. Let M be unital and ﬁnitely generated with a set {m 1 , . . . , m n } of n generators, and deﬁne Mk = Rm 1 + · · · + Rm k for 1 ≤ k ≤ n. Then Mn = M since M is unital. We shall prove by induction on k that every R submodule of Mk is ﬁnitely generated. The case k = n then gives the proposition. For k = 1, suppose that S is an R submodule of M1 = Rm 1 . Since S is an R submodule

398

VIII. Commutative Rings and Their Modules

and every member of S lies in Rm 1 , the subset I of all r in R with r m 1 in S is an ideal with I m 1 = S. Since every ideal in R is singly generated, we can write I = (r0 ). Then S = I m 1 = Rr0 m 1 , and the single element r0 m 1 generates S. Assume inductively that every R submodule of Mk is known to be ﬁnitely generated, and let Nk+1 be an R submodule of Mk+1 . Let q : Mk+1 → Mk+1 /Mk be the quotient R homomorphism, and let ϕ be the restriction q Nk+1 , mapping Nk+1 into Mk+1 /Mk . Then ker ϕ = Nk+1 ∩ Mk is an R submodule of Mk and is ﬁnitely generated by the inductive hypothesis. Also, image ϕ is an R submodule of Mk+1 /Mk , which is singly generated with generator equal to the coset of m k+1 . Since an R submodule of a singly generated unital R module was shown in the previous paragraph to be singly generated, image ϕ is ﬁnitely generated. Applying Lemma 8.23 to ϕ, we see that Nk+1 is ﬁnitely generated. This completes the induction and the proof. According to the deﬁnition in Example 9 of modules in Section 1, a free R module is a direct sum, ﬁnite or inﬁnite, of copies of the R module R. A free R module is said to have ﬁnite rank if some direct sum is a ﬁnite direct sum. A unital R module M is said to be cyclic if it is singly generated, i.e., if M = Rm 0 for some m 0 in M. In this case, we have an R isomorphism M ∼ = R/I , where I is the ideal {r ∈ R | r m 0 = 0}. Before coming to the statement of the theorem and the proof, let us discuss the heart of the matter, which is related to row reduction of matrices. We regard the space M1n (R) of all 1-row matrices with n entries in R as a free R module. Suppose that R is a principal ideal domain, and suppose that we have a particular 2-by-n matrix with entries in R and with the property that the two rows have nonzero elements a and b, respectively, in the ﬁrst column. We can regard the set of R linear combinations of the two rows of our particular matrix as an R submodule of the free R module M1n (R). Let c = GCD(a, b). This member of R is deﬁned only up to multiplication by a unit, but we make a deﬁnite choice of it. The idea is that we can do a kind of invertible row-reduction step that simultaneously replaces the two rows of our 2-by-n matrix by a ﬁrst row whose ﬁrst entry is c and a second row whose ﬁrst entry is 0; in the process the corresponding R submodule of M1n (R) will be unchanged. In fact, we saw in the previous section that the hypothesis on R implies that there exist members x and y of R with xa + yb = c. Since c divides a and b, we can this equality as rewrite x y x(ac−1 ) + y(bc−1 ) = 1. Then the 2-by-2 matrix M = −bc−1 ac−1 with entries in R has the property that

a ∗ c ∗ x y = . b ∗ 0 ∗ −bc−1 ac−1 c ∗ This equation shows explicitly that the rows of 0 ∗ lie in the R linear span of the

6. Finitely Generated Modules

399

a ∗ rows of b ∗ . The key fact about M is that its determinant x(ac−1 ) + y(bc−1 ) is 1 and that M is therefore invertible with entries in R: the inverse is just a ∗ −1 M −1 = ac−1 −y . This invertibility shows that the rows of b ∗ lie in the R bc x c ∗ linear span of 0 ∗ . Consequently the R linear span of the rows of our given 2-by-n matrix is preserved under left multiplication by M. In effect we can do the same kind of row reduction of matrices over R as we did with matrices over Z in the proof of Theorem 4.56. The only difference is that this time we do not see constructively how to ﬁnd the x and y that relate a, b, and c. Thus we would lack some information if we actually wanted to follow through and calculate a particular example. We were able to make calculations to imitate the proof of Theorem 4.56 because we were able to use the Euclidean algorithm to arrive at what x and y are. In the present context we would be able to make explicit calculations if R were a Euclidean domain. Theorem 8.25 (Fundamental Theorem of Finitely Generated Modules). If R is a principal ideal domain, then (a) the number of R summands in a free R module of ﬁnite rank is independent of the direct-sum decomposition, (b) any R submodule of a free R module of ﬁnite rank n is a free R module of rank ≤ n, (c) any ﬁnitely generated unital R module is the ﬁnite direct sum of cyclic modules. REMARK. Because of (a), it is meaningful to speak of the rank of a free R module of ﬁnite rank; it is the number of R summands. By convention the 0 module is a free R module of rank 0. Then the statement of (b) makes sense. Statement (c) will be ampliﬁed in Corollary 8.29 below. PROOF. Let M be a free R module of the form Rx1 ⊕ · · · ⊕ Rxn , and suppose that y1 , . . . , ym are elements of M such that no nontrivial combination r1 y1 +· · · + rm ym is 0. Deﬁne an m-by-n matrix C with entries in R by n yi = j=1 C i j x j for 1 ≤ i ≤ m. If F is the ﬁeld of fractions of R, then we can regard C as a matrix with entries in F. As such, the matrix has rank ≤ n. If m > n, then the rows are linearly m dependent, and we can ﬁnd members qi Ci j = 0 for 1 ≤ j ≤ n. Clearing q1 , . . . , qm of F, not all 0, such that i=1 m fractions, we obtain members r1 , . . . , rm of R, not all 0, such that i=1 ri C i j = 0 for 1 ≤ j ≤ n. Then m m n n m n ri yi = ri Ci j x j = ri C i j x j = 0x j = 0, i=1

i=1

j=1

j=1

i=1

j=1

in contradiction to the assumed independence property of y1 , . . . , ym . Therefore we must have m ≤ n.

400

VIII. Commutative Rings and Their Modules

If we apply this conclusion to a set x1 , . . . , xn that exhibits M as free and to another set, possibly inﬁnite, that does the same thing, we ﬁnd that the second set has ≤ n members. Reversing the roles of the two sets, we ﬁnd that they both have n members. This proves (a). For (b) and (c), we shall reduce the result to a lemma saying that a certain kind of result can be achieved by row and column reduction of matrices with entries in R. Let M be a free R module of rank n, deﬁned by a subset x1 , . . . , xn of M, and let N be an R submodule of M. Proposition 8.24 shows that N is ﬁnitely generated. any independence property. We let y1 , . . . , ym be generators, not necessarily with Deﬁne an m-by-n matrix C with entries in R by yi = nj=1 Ci j x j . We can recover M as the set of R linear combinations of x1 , . . . , xn , and we can recover N as the set of R linear combinations of y1 , . . . , ym . If B is an n-by-n matrix with entries in R and with determinant in the group R × of units, then 5.5 shows that B −1 exists and has entries in R. If Corollary n we deﬁne xi = j=1 Bi j x j , then any R linear combination of x1 , . . . , xn is an n (B −1 )ki xi = R linear combination of x1 , . . . , xn . Also, the computation i=1 −1 i, j (B )ki Bi j x j = j δk j x j = x k shows that any R linear combination of x1 , . . . , xn is an R linear combination of x1 , . . . , xn . Thus we can recover the same M and N if we replace C by C B. Arguing in the same way with y1 , . . . , ym and y1 , . . . , ym , we see that we can recover the same M and N if we replace C B by AC B, where A is an m-by-m matrix with entries in R and with determinant in R × . Lemma 8.26 below will say that we can ﬁnd A and B such that the nonzero entries of D = AC B are exactly the diagonal ones Dkk for 1 ≤ k ≤ l, where l is a certain integer with 0 ≤ l ≤ min(m, n). That is, the resulting equations restricting y1 , . . . , ym in terms of x1 , . . . , xn will be of the form Dkk xk for 1 ≤ k ≤ l, (∗) yk = 0 for l + 1 ≤ k ≤ m. Now let us turn to (b) and (c). For (b), the claim is that the elements yk with 1 ≤ k ≤ l exhibit N as a free R module. We know that y1 , . . . , ym generate N and hence that y1 , . . . , yl generate N . For the independence, suppose we can ﬁnd members r1 , . . . , rl not all 0 in R such that lk=1 rk yk = 0. Then substitution gives lk=1 rk Dkk xk = 0, and the independence of x1 , . . . , xl forces rk Dkk = 0 for 1 ≤ k ≤ l. Since R is an integral domain, rk = 0 for such k. Thus indeed the elements yk with 1 ≤ k ≤ l exhibit N as a free R module. Since l ≤ min(m, n), the rank of N is at most the rank of M. For (c), let Q be a ﬁnitely generated unital R module, say with n generators. By the universal mapping property of free R modules (Example 9 in Section 1),

6. Finitely Generated Modules

401

there exists a free R module M of rank n with Q as quotient. Let x1 , . . . , xn be generators of M that exhibit M as free, and let N be the kernel of the quotient R homomorphism M → Q, so that Q ∼ = M/N . Then (b) shows that N is a that exhibit free R module of rank m ≤ n. Let y1 , . . . , ym be generators of N N as free, and deﬁne an m-by-n matrix C with entries in R by yi = nj=1 Ci j x j for 1 ≤ i ≤ m. The result is that we are reduced to the situation we have just considered, and we can obtain equations of the form (∗) relating their respective generators, namely y1 , . . . , ym for N and x1 , . . . , xn for M. For 1 ≤ k ≤ n, deﬁne Mk = Rxk and Nk =

Ryk = R Dkk xk

for 1 ≤ k ≤ l,

0

for l + 1 ≤ k ≤ n,

∼ N1 ⊕ · · · ⊕ Nn . Then Mk /Nk is R isomorphic to the cyclic R module so that N = R/(Dkk ) if 1 ≤ k ≤ l, while Mk /Nk = Mk is isomorphic to the cyclic R module R if l + 1 ≤ k ≤ n. Applying Proposition 8.5, we obtain M/N ∼ = (M1 /N1 ) ⊕ · · · ⊕ (Mn /Nn ). = (M1 ⊕ · · · ⊕ Mn )/(N1 ⊕ · · · ⊕ Nn ) ∼ Thus M/N is exhibited as a direct sum of cyclic R modules.

To complete the proof of Theorem 8.25, we are left with proving the following lemma, which is where row and column reduction take place. Lemma 8.26. Let R be a principal ideal domain. If C is an m-by-n matrix with entries in R, then there exist an m-by-m matrix A with entries in R and with determinant in R × and an n-by-n matrix B with entries in R and with determinant in R × such that for some l with 0 ≤ l ≤ min(m, n), the nonzero entries of D = AC B are exactly the diagonal entries D11 , D22 , . . . , Dll . PROOF. The matrices A and B will be constructed as products of matrices of determinant ±1, and then det A and det B equal ±1 by Proposition 5.1a. The matrix A will correspond to row operations on C, and B will correspond to column operations. Each factor will be the identity except in some 2-by-2 block. Among the row and column operations of interest are the interchange of two rows or two columns, in which the 2-by-2 block is 01 10 . Another row operation of interest replaces two rows having respective j th entries a and b by R linear combinations of them in which a and b are replaced by c = GCD(a, b) and 0. x y If x(ac−1 ) + y(bc−1 ) = 1, then the 2-by-2 block is −bc−1 ac−1 . A similar operation is possible with columns. The reduction involves an induction that successively constructs the entries D11 , D22 , . . . , Dll , stopping when the part of C involving rows and columns

402

VIII. Commutative Rings and Their Modules

numbered ≥ l + 1 has been replaced by 0. We start by interchanging rows and columns to move a nonzero entry into position (1, 1). By a succession of row operations as in the previous paragraph, we can reduce the entry in position (1, 1) to the greatest common divisor of the entries of C in the ﬁrst column, while reducing the remaining entries of the ﬁrst column to 0. Next we do the same thing with column operations, reducing the entry in position (1, 1) to the greatest common divisor of the members of the ﬁrst row, while reducing the remaining entries of the ﬁrst row to 0. Then we go back and repeat the process with row operations and with column operations as many times as necessary until all the entries of the ﬁrst row and column other than the one in position (1, 1) are 0. We need to check that this process indeed terminates at some point. If the entries that appear in position (1, 1) as the iterations proceed are c1 , c2 , c3 , . . . , then we have (c1 ) ⊆ (c2 ) ⊆ (c3 ) ⊆ · · · . The union of these ideals is an ideal, necessarily a principal ideal of the form (c), and c occurs in one of the ideals in the union; the chain of ideals must beconstant after that stage. Once the corner entry becomes x y constant, the matrices −bc−1 ac−1 for the row operations can be chosen to be 1 0 of the form −ba −1 1 , and the result is that the row operations do not change the entries of the ﬁrst row. Similar remarks apply to the matrices for the column operations. The upshot is that we can reduce C in this way so that all entries of the ﬁrst row and column are 0 except the one in position (1, 1). This handles the inductive step, and we can proceed until at some l th stage we have only the 0 matrix to process. This completes the proof of Theorem 8.25. In Theorem 4.56, in which we considered the special case of abelian groups, we obtained a better conclusion than in Theorem 8.25c: we showed that the direct sum of cyclic groups could be written as the direct sum of copies of Z and of cyclic groups of prime-power order, and that in this case the decomposition was unique up to the order of the summands. We shall now obtain a corresponding better conclusion in the setting of Theorem 8.25. The existence of the decomposition into cyclic modules of a special kind uses a very general form of the Chinese Remainder Theorem, whose classical statement appears as Corollary 1.9. The generalization below makes use of the following operations of addition and multiplication of ideals in a commutative ring with identity: if I and J are ideals, then I + J denotes the set of sums x + y with x ∈ I and y ∈ J , and I J denotes the set of all ﬁnite sums of products x y with x ∈ I and y ∈ J ; the sets I + J and I J are ideals. Theorem 8.27 (Chinese Remainder Theorem). Let R be a commutative ring with identity, and let I1 , . . . , In be ideals in R such that Ii + I j = R whenever i = j.

6. Finitely Generated Modules

403

(a) If elements x1 , . . . , xn of R are given, then there exists x in R such that x ≡ x j mod I j , i.e., x − x j is in I j , for all j. The element x is unique if I1 ∩ · · · ∩ In = 0. (b) The map ϕ : R → nj=1 R/I j given by ϕ(r ) = (. . . , r + I j , . . . ) is an onto ring homomorphism, its kernel is nj=1 I j , and the homomorphism descends to a ring isomorphism n ( R Ij ∼ = R/I1 × · · · × R/In . (c) The intersection

n

j=1

j=1 I j

and the product I1 · · · In coincide.

PROOF. For existence in (a) when n = 1, we take x = x1 . For existence when n = 2, the assumption I1 + I2 = R implies that there exist a1 ∈ I1 and a2 ∈ I2 with a1 + a2 = 1. Given x1 and x2 , we put x = x1 a2 + x2 a1 , and then x ≡ x1 a2 ≡ x1 mod I1 and x ≡ x2 a1 ≡ x2 mod I2 . For general n, the assumption I1 + I j = R for j ≥ 2 implies that there exist a j ∈ I1 and b j ∈ I j with a j + b j = 1. If we expand out the product 1 = nj=2 (a j + b j ), then all terms but one on the right side involve some a j and are therefore in I1 . That one term is b2 b2 · · · bn , and it is in nj=2 I j . Thus I1 + nj=2 I j = R. The case n = 2, which was proved above, yields an element y1 in R such that and y1 ≡ 0 mod j =1 I j . y1 ≡ 1 mod I1 Repeating this process for index i and using the assumption Ii + I j = R for j = i, we obtain an element yi in R such that and yi ≡ 0 mod j =i I j . yi ≡ 1 mod Ii If we put x = x1 y1 + · · · + xn yn , then we have x ≡ xi yi mod Ii ≡ xi mod Ii for each i, and the proof of existence is complete. For uniqueness in (a), if we have two elements x and x satisfying the congruences, then their difference x − x lies in I j for every j, hence is 0 under the assumption that I1 ∩ · · · ∩ In = 0. In (b), the map ϕ is certainly a ring homomorphism. The existence result in (a) shows that ϕ is onto, and the proof of the uniqueness result identiﬁes the kernel. The isomorphism follows. For (c), consider the special case that I and J are ideals with I + J = R. Certainly I J ⊆ I ∩ J . For the reverse inclusion, choose x ∈ I and y ∈ J with x + y = 1; this is possible since I + J = R. If z is in I ∩ J , then z = zx + zy with zx in J I and zy in I J . Thus z is exhibited as in I J . Consequently I1 I2 = I1 ∩ I2 . Suppose inductively that I1 · · · Ik = I1 ∩· · ·∩ Ik . We saw in the proof of (a) that Ik+1 + j =k+1 I j = R, and thus we certainly have

404

VIII. Commutative Rings and Their Modules

Ik+1 + kj=1 I j = R. The special case in the previous paragraph, in combination k with the inductive hypothesis, shows that Ik+1 I1 · · · Ik = Ik+1 · j=1 I j = k+1 j=1 I j . This completes the induction and the proof. Corollary 8.28. Let R be a principal ideal domain, and let a = εp1k1 · · · pnkn be a factorization of a nonzero nonunit element a into the product of a unit and powers of nonassociate primes. Then there is a ring isomorphism k R/(a) ∼ = R/( p11 ) × · · · × R/( pnkn ). k

k

PROOF. Let I j = ( p j j ) in Theorem 8.27. For i = j, we have GCD( piki , p j j ) = k

1. Since R is a principal ideal domain, there exist a and b in R with apiki +bp j j = 1, k

and consequently ( piki ) + ( p j j ) = R. The theorem applies, and the corollary follows. Corollary 8.29. If R is a principal ideal domain, then any ﬁnitely generated s R of unital R module M is the direct sum of a nonunique free R submodule i=1 a well-deﬁned ﬁnite rank s ≥ 0 and the R submodule T of all members m of M such that r m = 0 for some r = 0 in R. In turn, the R submodule T is isomorphic to a direct sum n k R/( p j ), T ∼ = j

j=1 k

where the p j are primes in R and the ideals ( p j j ) are not necessarily distinct. The number of summands ( p k ) for each class of associate primes p and each positive integer k is uniquely determined by M. PROOF. Theorem 8.25c gives M = F ⊕ nj=1 Ra j , where F is a free R submodule of some ﬁnite rank s and the a j ’s are nonzero members of M that are each annihilated by some nonzero member of R. The set T of all m with r m = 0 for some r = 0 in R is exactly nj=1 Ra j . Then F is R isomorphic to M/T , hence is isomorphic to the same free R module independently of what direct-sum decomposition of M is used. By Theorem 8.25a, s is well deﬁned. The cyclic R module Ra j is isomorphic to R/(b j ), where (b j ) is the ideal of all elements r in R with ra j = 0. The ideal (b j ) is nonzero by assumption and is not all of R since the element r = 1 has 1a j = a j = 0. Applying Corollary n k 8.28 for each j and adding the results, we obtain T ∼ = i=1 R/( pi i ) for suitable primes pi and powers ki . The isomorphism in Corollary 8.28 is given as a ring isomorphism, and we are reinterpreting it as an R isomorphism. The primes pi that arise for ﬁxed (b j ) are distinct, but there may be repetitions in the pairs ( pi , ki ) as j varies. This proves existence of the decomposition.

6. Finitely Generated Modules

405

If p is a prime in R, then the elements m of T such that p k m = 0 for some k k are the ones corresponding to the sum of the terms in nj=1 R/( p j j ) in which p j is an associate of p. Thus, to complete the proof, it is enough to show that the R isomorphism class of the R module N = R/( pl1 ) ⊕ · · · ⊕ R/( plm ) with p ﬁxed and with 0 < l1 ≤ · · · ≤ lm completely determines the integers l1 , . . . , lm . For any unital R module L, we can form the sequence of R submodules p j L. The element p carries p j L into p j+1 L, and thus each p j L/ p j+1 L is an R module on which p acts as 0. Consequently each p j L/ p j+1 L is an R/( p) module. Corollary 8.16 and Proposition 8.10 together show that R/( p) is a ﬁeld, and therefore we can regard each p j L/ p j+1 L as an R/( p) vector space. We shall show that the dimensions dim R/( p) ( p j N / p j+1 N ) of these vector spaces determine the integers l1 , . . . , lm . We start from p j N = p j R/( pl1 ) ⊕ · · · ⊕ p j R/( plm ). The term p j R/( plk ) is 0 if j ≥ lk . Thus p j R/( plk ) = p j R/ plk R. pjN = j 3. Since [L : Q] is a divisor of 6 greater than 3, [L : Q] = 6. Thus [K : L] = 1, and K = L. (2) k = Q and F(X ) = X 3 − X − 13 . Application of Corollary 8.20c to the polynomial G(X ) = −3X 2 F(1/ X ) = X 3 + 3X 2 − 3 shows that G(X ) has no degree-one factor and hence is irreducible over Q. Then it follows that F(X ) is irreducible over Q. The proof of Theorem 9.12 takes k1 = Q(r ), where r 3 − r − 13 = 0. Then division gives X3 − X −

1 3

= (X − r )(X 2 + r X + (r 2 − 1)).

The discriminant b2 − 4ac of the quadratic factor is r 2 − 4(r 2 − 1) = 4 − 3r 2 =

r2 , (1 + 2r )2

the right-hand equality following from direct computation. This discriminant is a square in k1 = Q(r ), and hence X 2 + r X + (r 2 − 1) factors into degree-one factors in Q(r ) without passing to an extension ﬁeld. Therefore L = Q(r ) with [L : Q] = 3. Theorem 9.13 (uniqueness of splitting ﬁeld). If F(X ) is a nonconstant polynomial in k[X ], then any two splitting ﬁelds of F(X ) over k are k isomorphic.

456

IX. Fields and Galois Theory

The idea of the proof is simple enough, but carrying out the idea runs into a technical complication. The idea is to proceed by induction, using the uniqueness result for simple algebraic extensions (Theorem 9.11) repeatedly until all the roots have been addressed. The difﬁculty is that after one step the coefﬁcients of the two quotient polynomials end up in two distinct but k isomorphic ﬁelds. Thus at the second step Theorem 9.11 does not apply directly. What is needed is the reformulated version given below as Theorem 9.11 , which lends itself to this kind of induction. In addition, as soon as the induction involves at least three steps, the above statement of Theorem 9.13 does not lend itself to a direct inductive proof. For this reason we shall instead prove a reformulated version Theorem 9.13 of Theorem 9.13 that is ostensibly more general than Theorem 9.13. Recall from Proposition 4.24 that a general substitution homomorphism that starts from a polynomial ring can have two ingredients. One is the substitution of some element, such as x, for the indeterminate X , and the other is a homomorphism that is made to act on the coefﬁcients. If the homomorphism is σ , let us write F σ (X ) to indicate the polynomial obtained by applying σ to each coefﬁcient of F(X ). Theorem 9.11 . Let k and k be ﬁelds, and let σ : k → k be a ﬁeld isomorphism. Suppose that F(X ) is a monic prime polynomial in k[X ] and that K = k(x) and K = k (x ) are simple algebraic extensions such that F(x) = 0 and Fσ (x ) = 0. Then there exists a ﬁeld isomorphism ϕ : k(x) → k (x ) such that ϕ k = σ and ϕ(x) = x . PROOF. The argument is essentially unchanged from the proof of Theorem 9.11. We start from the substitution homomorphism G(X ) → G σ (x ) that replaces X by x and that operates by σ on the coefﬁcients. This descends to a ﬁeld map of k[x] into k [x ], and the homomorphism must be onto k [x ] by a count of dimensions. Theorem 9.13 . Let k and k be ﬁelds, and let σ : k → k be a ﬁeld isomorphism. If F(X ) is a nonconstant polynomial in k[X ] and if L and L σ are respective splitting ﬁelds for F(X ) over k and for F (X ) over k , then there exists a ﬁeld isomorphism ϕ : L → L such that ϕ k = σ and such that ϕ sends the set of roots of F(X ) to the set of roots of F σ (X ). PROOF. We proceed by induction on n = deg F(X ), the case n = 1 being evident. Assume the result for degree n − 1. Let G(X ) be a prime factor of F(X ) over k. Then G σ (X ) is a prime factor of F σ (X ) over k . The polynomials G(X ) and G σ (X ) have roots in L and L , respectively. Fix one such root for each, say x1 and x1 . By Theorem 9.11 , there exists a ﬁeld isomorphism σ1 : k(x1 ) → k (x1 ) extending σ and satisfying σ1 (x1 ) = x1 . Write F(X ) = (X − x1 )H (X ) with coefﬁcients in k(x1 ), by the Factor Theorem (Corollary 1.13). Applying σ1 to

3. Finite Fields

457

the coefﬁcients, we obtain F σ (X ) = (X − x1 )H σ1 (X ) with coefﬁcients in k (x1 ). Then L and L are splitting ﬁelds for H (X ) and H σ1 (X ) over k(x1 ) and k (x1 ), respectively. By induction we can extend σ1 to an isomorphism ϕ : L → L , and the theorem readily follows.

3. Finite Fields In this section we shall use the results on splitting ﬁelds in Section 2 to classify ﬁnite ﬁelds up to isomorphism. So far, the examples of ﬁnite ﬁelds that we have encountered are the prime ﬁelds F p = Z/ pZ with p elements, p being any prime number, and the ﬁeld of 4 elements in Example 3 of ﬁelds in Section IV.4. Every ﬁnite ﬁeld has to contain a subﬁeld isomorphic to one of the prime ﬁelds F p , and Proposition 4.33 observed as a consequence that any ﬁnite ﬁeld necessarily has p n elements for some prime number p and some integer n > 0. Theorem 9.14. For each p n with p a prime number and with n a positive integer, there exists up to isomorphism one and only one ﬁeld with p n elements. n Such a ﬁeld is a splitting ﬁeld for X p − X over the prime ﬁeld F p . If q = p n , it is customary to denote by Fq a ﬁeld of order q. The theorem says that Fq exists and is unique up to isomorphism. Some authors refer to ﬁnite ﬁelds as Galois ﬁelds. Some preparation is needed before we can come to the proof of the theorem. We need to carry over the simplest aspects of differential calculus to polynomials with coefﬁcients in an arbitrary ﬁeld k. First we give an informal deﬁnition of the derivative of a polynomial; then we give a more precise deﬁnition. For any polynomial F(X ) = nj=0 c j X j in k[X ], we informally deﬁne the derivative to be the polynomial F (X ) =

n j=1

jc j X j−1 =

n−1

( j + 1)c j+1 X j .

j=0

The more precise deﬁnition uses the deﬁnition of members of k[X ] as inﬁnite sequences of members of k whose terms are 0 from some point on. In this notation if F = (c0 , c1 , . . . , cn , 0, . . . ) with c j in the j th position for j ≤ n and with 0 in the j th position for j > n, then F = (c1 , 2c2 , . . . , ncn , 0, . . . ) with ( j + 1)c j+1 in the j th position for j ≤ n − 1 and with 0 in the j th position for j > n − 1. In any event, the mapping F → F is k linear from k[X ] to itself. The operation is called differentiation.

IX. Fields and Galois Theory

458

Proposition 9.15. Differentiation on k[X ] satisﬁes the product rule: F = G H implies F = G H + G H . PROOF. als. Thus F (X ) = n X m+n−1 .

Because of the k linearity, it is enough to prove the result for monomilet G(X ) = X m and H (X ) = X n , so that F(X ) = X m+n . Then (m + n)X m+n−1 , G (X )H (X ) = m X m+n−1 , and G(X )H (X ) = Hence we indeed have F (X ) = G (X )H (X ) + G(X )H (X ).

Corollary 9.16. If n is a positive integer, if r is in k, and if F(X ) = (X − r )n in k[X ], then F (X ) = n(X − r )n−1 . PROOF. This is immediate by induction from Proposition 9.15 since the derivative of X − r is 1. Corollary 9.17. Let r be in k, and let F(X ) be in k[X ]. If (X − r )2 divides F(X ), then F(r ) = F (r ) = 0. PROOF. Write F(X ) = (X − r )2 G(X ). If we substitute r for X , we see that F(r ) = 0. If instead we differentiate, using Proposition 9.15 and Corollary 9.16, then we obtain F (X ) = 2(X − r )G(X ) + (X − r )2 G (X ). Substituting r for X , we obtain F (r ) = 0 + 0 = 0. Lemma 9.18. If k is a ﬁeld of characteristic p = 0, then the map ϕ : k → k given by ϕ(x) = x p is a ﬁeld mapping. REMARK. The map x → x p is often called the Frobenius map. If k is a ﬁnite ﬁeld, then it must carry k onto k since one-one implies onto for functions from a ﬁnite set to itself; in this case the map is an automorphism of k. PROOF. The computation ϕ(uv) = (uv) p = u p v p = ϕ(u)ϕ(v) shows that ϕ respects products. If u and v are in k, then ϕ(u + v) = (u + v) p = ϕ(u) +

p−1 j=1

p p− j j v j u

+ ϕ(v) = ϕ(u) + ϕ(v),

the last equality holding since the binomial coefﬁcient pj has a p in the numerator for 1 ≤ j ≤ p − 1. Thus ϕ is a ring homomorphism. Since ϕ(1) = 1, ϕ is a ﬁeld mapping. PROOF OF UNIQUENESS IN THEOREM 9.14. Let k be a ﬁnite ﬁeld, say of characteristic p, and let P be the prime ﬁeld of order p within k. We know that P is isomorphic to F p = Z/ pZ. Since k is a ﬁnite-dimensional vector space over P, we know also that k has order q = p n for some integer n > 0. The multiplicative group k× of k thus has order q − 1, and every x = 0 in k therefore satisﬁes

3. Finite Fields

459

x q−1 = 1. Taking x = 0 into account, we see that every member of k satisﬁes x q = x. Forming the polynomial X q − X in P[X ], we see that every member of k is a root of this polynomial. Iterated application q times of the Factor Theorem (Corollary 1.13) shows that X q − X factors into degree-one factors in k. Since every member of k is a root of X q − X , k is a splitting ﬁeld of X q − X over P. Then the uniqueness of the prime ﬁeld up to isomorphism, in combination with the uniqueness of the splitting ﬁeld of X q − X given in Theorem 9.13 , shows that k is uniquely determined up to isomorphism. PROOF OF EXISTENCE IN THEOREM 9.14. Let q = p n be given, and deﬁne k to be a splitting ﬁeld of X q − X over F p = Z/ pZ. The ﬁeld k exists by Theorem 9.12, and it has characteristic p. Since X q − X is monic of degree q, the deﬁnition of splitting ﬁeld says that we can write X q − X = (X − u 1 )(X − u 2 ) · · · (X − u q )

with all u j ∈ k.

Because of Lemma 9.18, the map ϕ(u) = u q , which is the n th power of the map u → u p , is a ﬁeld mapping of k into itself. The members of k ﬁxed by ϕ form a subﬁeld of k, and these elements of k are exactly the members of the set S = {u 1 , . . . , u q }. Therefore S is a subﬁeld of k, necessarily containing F p = Z/ pZ. Since X q − X splits in S and since the roots of X q − X generate S, S is a splitting ﬁeld of X q − X over F p . In other words, S = k. To complete the proof, it is enough to show that the elements u 1 , . . . , u q are distinct, and then k will be a ﬁeld of q elements. The question is therefore whether some root of X q − X has multiplicity at least 2, i.e., whether (X −r )2 divides X q − X for some r in k. Corollary 9.17 gives a necessary condition for this divisibility, saying that the derivative of X q − X must have r as a root. However, the derivative of X q − X is q X q−1 − 1 = −1, and the constant polynomial −1 has no roots. We conclude that k has q elements. Corollary 9.19. If q and r are integers with 2 ≤ q ≤ r , then the ﬁnite ﬁeld Fq is isomorphic to a subﬁeld of the ﬁnite ﬁeld Fr if and only if r = q n for some integer n ≥ 1. PROOF. If Fq is isomorphic to a subﬁeld of Fr , then we may consider Fr as a vector space over Fq , say of dimension n. In this case, Fr has q n elements. n Conversely let r = q n , and regard Fr as a splitting ﬁeld of X q − X over the prime ﬁeld F p , by Theorem 9.14. Let S be the subset of Fr of all roots of X q − X . n −1 = q n−1 + q n−2 + · · · + 1, we have Putting a = q − 1 and k = qq−1 X ka − 1 = (X a − 1)(X (k−1)a + X (k−2)a + · · · + 1). n

n

Multiplying by X , we see that X q − X is a factor of X q − X . Since X q − X splits in Fr and has distinct roots, the same is true of X q − X . Therefore |S| = q.

460

IX. Fields and Galois Theory

Let q = p m . The m th power of the homomorphism of Lemma 9.18 on k = Fr is x → x q , and the subset of Fr ﬁxed by this homomorphism is a subﬁeld. Thus S is a subﬁeld, and it has q elements. 4. Algebraic Closure Algebraically closed ﬁelds—those for which every nonconstant polynomial with coefﬁcients in the ﬁeld has a root in the ﬁeld—were introduced in Section V.1, and it was mentioned at that time that every ﬁeld is a subﬁeld of some algebraically closed ﬁeld. We shall prove that existence theorem in this section in a form lending itself to a uniqueness result. Throughout this section let k be a ﬁeld. We begin by giving further descriptions of algebraically closed ﬁelds that take the theory of Sections 1–2 into account. Proposition 9.20. The following conditions on the ﬁeld k are equivalent: (a) k has no nontrivial algebraic extensions, (b) every irreducible polynomial in k[X ] has degree 1, (c) every polynomial in k[X ] of positive degree has at least one root in k, (d) every polynomial in k[X ] of positive degree factors over k into polynomials of degree 1. PROOF. If (a) holds, then (b) holds since any irreducible polynomial of degree greater than 1 would give a nontrivial simple algebraic extension (Theorem 9.10). If (b) holds and a polynomial of positive degree is given, apply (b) to an irreducible factor to see that the given polynomial has a root; thus (c) holds. Condition (c) implies condition (d) by induction and the Factor Theorem. If (d) holds and if K is an algebraic extension of k, let x be in K, and let F(X ) be the minimal polynomial of x over k. Then F(X ) is irreducible over k, and (d) says that F(X ) has degree 1. Hence x is in k, and we conclude that K = k. A ﬁeld satisfying the equivalent conditions of Proposition 9.20 is said to be algebraically closed. EXAMPLES OF ALGEBRAICALLY CLOSED FIELDS. (1) The Fundamental Theorem of Algebra (Theorem 1.18) says that C is algebraically closed. This theorem was not proved in Chapter I, but a proof will be given in this chapter in Section 10. (2) Let K be the subset of all members of C that are algebraic over Q. By Corollary 9.9, K is a subﬁeld of C. Example 1 shows that every polynomial in Q[X ] splits in K, and Lemma 9.21 below then allows us to conclude that K is algebraically closed.

4. Algebraic Closure

461

(3) Fix a prime number p, and start with k0 = F p as the prime ﬁeld Z/ pZ. Enumerate the members of F p [X ], letting Fn (X ) be the n th such polynomial. We construct kn by induction on n so that kn is a splitting ﬁeld for Fn (X ) over kn−1 when n ≥ 1. Then k0 ⊆ k1 ⊆ k2 ⊆ · · · is an increasing sequence of ﬁelds containing F p . Let K be the union. Any two elements of K lie in a single kn , and it follows that K is closed under the ﬁeld operations. Any three elements lie in a single kn , and it follows that any of the deﬁning properties of a ﬁeld is valid in K because it is valid in kn . Therefore K is a ﬁeld. This ﬁeld is an extension of F p , and every polynomial in F p [X ] splits in K. As in Example 2, Lemma 9.21 below shows that K is algebraically closed. Lemma 9.21. If K/k is an algebraic extension of ﬁelds and if every nonconstant polynomial in k[X ] splits into degree-one factors in K, then K is algebraically closed. PROOF. Let K be an algebraic extension of K, and let x be in K . Let G(X ) be the minimal polynomial of x over K, and write G(X ) as G(X ) = X n + cn−1 X n−1 + · · · + c0

with all ci ∈ K.

Then x is algebraic over k(cn−1 , . . . , c0 ), which is a ﬁnite extension of k by Theorem 9.8. By Corollary 9.7, x lies in a ﬁnite extension of k. Thus Proposition 9.4 shows that x is algebraic over k. Let F(X ) be the minimal polynomial of x over k. By assumption this splits over K, say as F(X ) = (X − x1 ) · · · (X − xm )

with all xi ∈ K.

Evaluating at x and using the fact that F(x) = 0, we see that x = x j for some j. Therefore x is in K, and K is algebraically closed. An extension ﬁeld K/k is an algebraic closure of k if K is algebraic over k and if K is algebraically closed. Example 2 of algebraically closed ﬁelds above gives an algebraic closure of Q, and Example 3 gives an algebraic closure of F p . Theorem 9.22 (Steinitz). Every ﬁeld k has an algebraic closure, and this is unique up to k isomorphism. REMARKS. The proof of existence is modeled on the argument for Example 3 of algebraic closures. However, we are not free in general to use a simple union of a sequence of ﬁelds and have to work harder. Because there is no evident set of possibilities within which we are forming extension ﬁelds, Zorn’s Lemma is inconvenient to use and tends to result in an unintuitive construction. Instead, we use Zermelo’s Well-Ordering Theorem, whose use more closely parallels the inductive construction in Example 3.

462

IX. Fields and Galois Theory

PROOF OF EXISTENCE. With k as the given ﬁeld, let S be the set of nonconstant polynomials s(X ) in k[X ], and introduce a well ordering into S by means of Zermelo’s Well-Ordering Theorem (Section A5 of the appendix). Let us write ≺ for “strictly precedes in the ordering” and for “equals or strictly precedes.” For each s ∈ S, let s¯ be the successor of s, i.e., the ﬁrst element among all elements t with s ≺ t. We write s0 for the ﬁrst element of S. Without loss of generality, we may assume that S has a last element s∞ . The idea is to construct simultaneously two kinds of things: (i) an algebraic extension ﬁeld ks /k for each s ∈ S such that ks0 = k and such that ks¯ is a splitting ﬁeld for s(X ) over ks whenever s ≺ s∞ , (ii) a ﬁeld mapping ϕut : kt → ku for each ordered pair of elements t and u in S having t u, such that ϕtt = 1 for all t and such that t u v implies ϕvt = ϕvu ϕut . These extension ﬁelds and mappings are to be such that ks = t≺s ϕst (kt ) whenever s is not a successor and is not s0 . If such a system of extension ﬁelds and ﬁeld homomorphisms exists, then Lemma 9.21 applies to a splitting ﬁeld over ks∞ of the nonconstant polynomial s∞ (X ) and shows that this splitting ﬁeld is algebraically closed; since this splitting ﬁeld is an algebraic extension of k, it is an algebraic closure of k. A partial such system through t0 means a system consisting of ﬁelds ks with s t0 and ﬁeld homomorphisms ϕut with t u t0 such that the above conditions hold as far as they are applicable. A partial system exists through the ﬁrst member s0 of S because we can take ks0 = k and ϕs0 s0 = 1. Arguing by contradiction, we suppose that such a system of extension ﬁelds and ﬁeld homomorphisms fails to exist through some member of S. Let t0 be the ﬁrst member of S such that there is no partial system through t0 . Suppose that t0 is the successor of some element t1 in S. We know that a partial system exists through t1 . If we let kt0 be a splitting ﬁeld for t1 (X ) over kt1 , and if we deﬁne ϕt0 t1 ϕt1 t for t t1 , ϕt0 t = 1 for t = t0 , then the enlarged system is a partial system through t0 , contradiction. Thus t0 cannot be the successor of some element of S. When t0 is not a successor, at least kt is deﬁned for t ≺ t0 and ϕut is deﬁned for t u ≺ t0 . We want to form a union, but we have to keep the ﬁeld operations aligned properly in the process. Deﬁne a “t-allowable tuple” to be a function u → xu deﬁned for t u ≺ t0 such that xu is in ku and ϕvu (xu ) = xv whenever t u v ≺ t0 . If x is in kt , then an example of a t-allowable tuple is given by u → ϕut (x) for t u ≺ t0 . If t ≺ t0 and t ≺ t0 , then we can apply ﬁeld operations to the t-allowable tuple u → xu and to the t -allowable tuple u → yu , obtaining max(t, t )-allowable

4. Algebraic Closure

463

tuples u → xu + yu , u → −xu , u → xu yu , and xu → xu−1 as long as xt = 0. These operations are meaningful since each ϕvu is a ﬁeld mapping. If t ≺ t0 and t ≺ t0 , we say that the t-allowable tuple u → xu is equivalent to the t -allowable tuple u → yu if xu = yu for max(t, t ) u ≺ t0 . The result is an equivalence relation, and the equivalence relation respects the ﬁeld operations in the previous paragraph. We deﬁne kt0 to be the set of equivalence classes of allowable tuples with the inherited ﬁeld operations. The 0 element is the class of the s0 -allowable tuple u → 0, and the multiplicative identity is the class of the s0 -allowable tuple u → 1. It is a routine matter to check that kt0 is a ﬁeld. If t ≺ t0 is given, we deﬁne the function ϕt0 t : kt → kt0 as follows: if x is in kt , we form the t-allowable tuple u → ϕut (x) and take its equivalence class, which is a member of kt0 , as ϕt0 t (x). Then ϕt0 t is evidently a ﬁeld mapping. It is evident also that ϕt0 v ϕvu = ϕt0 u when u v ≺ t0 . Deﬁning ϕt0 t0 to be the identity, we have a complete system of ﬁeld mappings ϕvu for kt0 . The ﬁnal step is to check that kt0 is the union of the images of the ϕt0 t for t ≺ t0 . Thus choose a representative of an equivalence class in kt0 . Let the representative be a t-allowable tuple u → xu for t u ≺ t0 . The element xt is in kt , and the condition xu = ϕut (xt ) is just the condition that the class of u → xu be the image of xt under ϕt0 t . Hence every member of kt0 is in the image of some ϕt0 t with t ≺ t0 , and we have a contradiction to the hypothesis that a partial system through t0 does not exist. This completes the proof of existence. For the uniqueness in Theorem 9.22, we again need a serious application of the Axiom of Choice, but here Zorn’s Lemma can be applied fairly routinely. The proof will show a little more than is needed, and in fact the uniqueness in Theorem 9.22 will be derived as a consequence of Theorem 9.23 below. Theorem 9.23. Let K be an algebraically closed ﬁeld, and let K be an algebraic extension of a ﬁeld k. If ϕ is a ﬁeld mapping of k into K , then ϕ can be extended to a ﬁeld mapping of K into K . PROOF OF UNIQUENESS IN THEOREM 9.22 USING THEOREM 9.23. Let K and K be algebraic closures of k, and let ϕ : k → K be the inclusion mapping. Theorem 9.23 supplies a ﬁeld mapping : K → K such that k = ϕ, i.e., such that ﬁxes k. Since K is an algebraic closure of k, so is (K). Then K is an algebraic extension of the algebraically closed ﬁeld (K), and we must have (K) = K . Thus is a k isomorphism of K onto K .

PROOF OF THEOREM 9.23. Let S be the set of all triples (L, L , ψ) such that L is a ﬁeld with k ⊆ L ⊆ K and ψ is a ﬁeld mapping of L onto the subﬁeld L of K with ψ k = ϕ. The set S is nonempty since (k, ϕ(k), ϕ) is a member of it. Deﬁning (L1 , L1 , ψ1 ) ⊆ (L2 , L2 , ψ2 ) to mean that L1 ⊆ L2 ,

464

IX. Fields and Galois Theory

that L1 ⊆ L2 , and that ψ1 as a set of ordered pairs is a subset of ψ2 as a set of ordered pairs, we partially order S by inclusionupward. α , Lα , ψα )} is If {(L a nonempty chain , α ψα , and put ψ = α Lα , α Lα in S, formthe triple = ψ . Then ψ L L , and consequently α α α α α α α Lα , α Lα , α ψα is an upper bound in S for the chain. By Zorn’s Lemma, S has a maximal element (L0 , L0 , ψ0 ). We shall prove that L0 = K, and the proof will be complete. Fix x in K, and let F(X ) be the minimal polynomial of x over L0 . The minimal polynomial of ψ0 (x) over L0 is then F ψ0 (X ). Since K is algebraically closed, F ψ0 (X ) has a root x in K . By Theorem 9.11 , ψ0 : L0 → L can be extended to an isomorphism 0 : L0 (x) → L0 (x ) such that ψ0 (x) = x . Then (L0 (x), L0 (x ), 0 ) is in S and contains (L0 , L0 , ψ0 ). This containment, if strict, would contradict the fact that (L0 , L0 , ψ0 ) is a maximal element of S. Thus equality must hold: L0 (x) = L0 . Therefore x is in L0 , and we conclude that L0 = K.

5. Geometric Constructions by Straightedge and Compass Classical Euclidean geometry attached a certain emphasis to constructions in the Euclidean plane that could be made by straightedge and compass. These are often referred to casually as constructions by “ruler and compass,” but one is not allowed to use the markings on a ruler. Thus “straightedge and compass” is a more accurate description. In these constructions the starting conﬁguration may be regarded as a line with two points marked on the line. Allowable constructions are the following: to form the line through a given point different from ﬁnitely many other lines through that point, to form the line through two distinct points, to form a circle with a given center and a radius different from that of ﬁnitely many other circles through the point, and to form a circle with a given center and radius. Intersections of a line or a circle with previous lines and circles establish new points for continuing the construction. For example a line perpendicular to a given line at a given point can be constructed by drawing any circle centered at the point, using the two intersection points as centers of new circles, drawing those circles so as to have radius larger than the ﬁrst circle, and forming the line between their two points of intersection. An angle at the point P of intersection between two intersecting lines A and B may be bisected by drawing any circle centered at P, selecting one of the points of intersection on each line so that P and the two new points Q and R describe the angle, drawing circles with that same radius centered at Q and R, and forming the line between the points of intersection of the two circles. And so on.

5. Geometric Constructions by Straightedge and Compass

465

Three notable problems remained unsolved in antiquity: (i) how to double a cube, i.e., how to construct the side of a cube of double the volume of a given cube, (ii) how to trisect any constructible angle, i.e., how to divide the angle into three equal parts by means of constructed lines, (iii) how to square a circle, i.e., how to construct the side of a square whose area equals that of a given disk. In this section we shall use the elementary ﬁeld theory of Sections 1–2 to show that doubling a cube and trisecting a 60-degree angle are impossible with straightedge and compass. As to (iii), we shall reduce a proof of the impossibility of squaring the circle to a proof that π is transcendental over Q. This latter proof we give in Section 14. The ﬁrst step is to translate the problem of geometric constructibility into a statement in algebra. Since we are given two points on a line, we can introduce Cartesian coordinates for the Euclidean plane, taking one of the points to be (0, 0) and the other point to be (1, 0). Points in the Euclidean plane are now determined by their Cartesian coordinates, which determine all distances. Distances in turn can be laid off on the x-axis from (0, 0). Thus the question becomes, what points on the x-axis can be constructed? c a

b

d

FIGURE 9.1. Closure of positive constructible x coordinates under multiplication and division. Let C be the set of constructible x coordinates. We are given that 0 and 1 are in C. Closure of C under addition and subtraction is evident; the straightedge is not even necessary for this step. Figure 9.1 indicates why the positive elements of C are closed under multiplication and division. In more detail we take two intersecting lines and mark three known positive members of C as the distances a, b, c in the ﬁgure. Then we form the line through the two points marking a and b, and we form a line parallel to that line through the point marked off by the distance c. The intersection of this parallel line with the other original line deﬁnes a distance d. Then a/b = c/d, and so d = bc/a. By taking a = 1, we see that we can multiply any two members b and c in C, obtaining a result in C.

466

IX. Fields and Galois Theory

By instead taking c = 1, we see that we can divide. The conclusion is that C is a ﬁeld.

c a b FIGURE 9.2. Closure of positive constructible x coordinates under square roots. Figure 9.2 indicates why the positive elements of C are closed under taking square roots. In more detail let a and b be positive members of C with a < b. By forming a circle whose diameter is a segment of length b and by forming a line perpendicular to that line at the point marked by a, we determine the pictured √ right triangle with a side c satisfying a/c = c/b. Then c = ab. By taking one of a and b to be 1, we see that the square root of the other of a and b is in C. This completes the proof of the direct part of the following theorem. Theorem 9.24. The set C of x coordinates that can be constructed from x = 1 and x = 0 by straightedge and compass forms a subﬁeld of R such that the square root of any positive element of the ﬁeld lies in the ﬁeld. Conversely the members of C are those real numbers lying in some subﬁeld Fn of R of the form √ √ √ F1 = Q( a0 ), F2 = F1 ( a1 ), . . . , Fn = Fn−1 ( an−1 ) with each a j in Fj and with a0 , . . . , an−1 all ≥ 0. PROOF OF CONVERSE. Suppose we have a subﬁeld F = Fn of R of the kind described in the statement of the theorem. The possibilities for obtaining a new constructible point from F by an additional construction arise from three situations: the intersection of two lines, each passing through two points of F; the intersection of a line and a circle, each determined by data from F; and the intersection of two circles, each determined by data from F. In the case of two intersecting lines, each line is of the form ax + by = c for suitable coefﬁcients a, b, c in F, and the intersection is a point (x, y) in F × F. So intersections of lines do not force us to enlarge F. For a line and a circle, we assume that the line is given by ax + by = c with a, b, c in F, that the circle has radius in F and center in F × F, and that the lines and the circle actually intersect. The circle is then given by (x −h)2 +(y−k)2 = r 2 with h, k, r in F. Substitution of the equation of the line into the equation of the

5. Geometric Constructions by Straightedge and Compass

467

circle gives us a quadratic equation either for x, and x then determines y, or for y, and y then determines x. The quadratic equation has real roots, √ and thus its discriminant is ≥ 0. The result is that x and y are in a ﬁeld F( l ) for some l ≥ 0 in F. For two circles, without loss of generality, we may take their equations to be x 2 + y2 = r 2

and

(x − h)2 + (y − h)2 = s 2

with r, h, k, s in F. Subtracting gives 2xh + 2yk = h 2 + k 2 − s 2 + r 2 . With this equation and with x 2 + y 2 = r 2 , we again have a line and circle that are being intersected. Thus the same remarks apply as in the previous paragraph. The conclusion is that any new single construction of points of intersection by √ straightedge and compass leads from F to F( l ) for some l ≥ 0 in F. Thus every member of the set C is as described in the theorem. To apply the theorem to prove the impossibility of the three never-accomplished constructions that were described earlier in the section, we observe that [Fi : Fi−1 ] in the theorem equals 1 or 2 for each i. Consequently every member of the k. constructible set C lies in a ﬁnite algebraic extension of Q of degree 2k for some √ 3 For the problem of doubling√a cube, the question amounts to constructing 2. √ We argue by contradiction. If 3 2 lies in Fn as in the theorem, then Q( 3 2 ) ⊆ Fn . With k as the integer ≤ n such that [Fn : Q] = 2k , Corollary 9.7 gives √ √ √ 3 3 3 2k = [Fn : Q] = [Fn : Q( 2 )] [Q( 2 ) : Q] = 3[Fn : Q( 2 )]. Thus 3 must divide a power of 2, and we have arrived at a contradiction. We conclude that it is not possible to double a cube with straightedge and compass. For the problem of trisecting any constructible angle, let us show that a 60◦ angle cannot be trisected. A 60◦ angle is itself constructible, being the angle between two sides in an equilateral triangle. Trisecting a 60◦ angle amounts to constructing cos 20◦ ; sin 20◦ is then (1 − cos2 20◦ )1/2 . To proceed, we derive an equation satisﬁed by cos 20◦ , starting from (cos 20◦ + i sin 20◦ )3 = cos 60◦ + i sin 60◦ =

1 2

+

√ i 3 2 .

We expand the left side and extract the real part of both sides to obtain cos3 20◦ − 3 cos 20◦ sin2 20◦ = 12 . Substituting sin2 20◦ = 1 − cos2 20◦ and simplifying, we see that r = cos 20◦ satisﬁes 4r 3 − 3r − 12 = 0.

468

IX. Fields and Galois Theory

Arguing with Corollary 8.20 as in Example 2 of splitting ﬁelds in Section 2, we readily check that 4X 3 − 3X − 12 is irreducible over Q. Hence [Q(cos 20◦ ) : Q] = 3, and we are led to the same contradiction as for the problem of doubling the cube. Therefore it is not possible to trisect a 60◦ angle with straightedge and compass. For the problem of squaring a circle, let A be the area of the circle, and let 2 2 r be the radius. √ , with r given. √ If the square has side x, then x = A = πr Thus x = r π, and the essence of the matter is to construct π . However, π is known to be transcendental by a theorem of √ F. Lindemann (1882); we give a proof in Section 14. Since π is transcendental, π is transcendental. A fourth notable problem, which leads to further insights, concerns the construction of a regular polygon of outer radius 1 with n sides. This construction is easy with straightedge and compass when n is a power of 2 or is 3 times a power of 2, and Euclid showed that a construction is possible for n = 5. But a construction cannot be managed with straightedge and compass for n = 9, for example, because a central angle in this case is 40◦ and the constructibility of cos 40◦ would imply the constructibility of cos 20◦ . Thus the question is, for what values of n can a regular n-gon be constructed with straightedge and compass? The remarkable answer was given by Gauss. By a Fermat number is meant N any integer of the form 22 + 1. A Fermat prime is a Fermat number that is prime. The Fermat numbers for N = 0, 1, 2, 3, 4 are 3, 5, 17, 257, 65537, and each is a Fermat prime. No larger Fermat primes are known.2 The answer given by Gauss, which we shall prove in stages in Sections 6–9, is as follows. Theorem 9.25 (Gauss).3 A regular n-gon is constructible with straightedge and compass if and only if n is the product of distinct Fermat primes and a power of 2. We can show the relevance of Fermat primes right now, and we can give an indication that if n is a prime number, then a regular n-gon can be constructed if and only if n is a Fermat prime. But a full proof even of this statement will make use of Galois groups, which we take up in the next three sections. For the necessity let n be prime, and suppose that a regular n-gon is constructible. Returning from degrees to radians, we observe that each central angle is 2π/n. Thus the constructibility implies the constructibility of cos 2π/n, and it Fermat numbers for N ≥ 5 are known not to be prime, sometimes by the discovery of N an explicit factor and sometimes by a veriﬁcation that 3 to the power 22 −1 is not congruent to −1 N 5 2 modulo 2 + 1. (Cf. Lemma 9.46.) For example Euler discovered that 641 divides 22 + 1. 3 Gauss announced both the necessity and the sufﬁciency in this theorem in his Disquisitiones Arithmeticae in 1801, but he included a proof of only the sufﬁciency (partly in his articles 336 and 365). A proof of the necessity appeared in a paper of Pierre-Laurent Wantzel in 1837. 2 Many

6. Separable Extensions

469

follows that e2πi/n = cos 2π/n + i sin 2π/n is in the ﬁeld C + iC of constructible points in the complex plane. We have the factorization X n − 1 = (X − 1)(X n−1 + X n−2 + · · · + X + 1). and e2πi/n is a root of the second factor. The ﬁrst example of Eisenstein’s criterion (Corollary 8.22) in Section VIII.5 shows that the second factor is irreducible. According to the results of Section 1, Q(e2πi/n ) is a simple algebraic extension of Q of degree n − 1. Applying Theorem 9.24, we see that n − 1 must be a power of two. Let us write n − 1 = 2m . Suppose m = a2 N with a odd. If a > 1, then the equality N N n = 2a2 + 1 = (22 )a + 1a exhibits n as the sum of two a th powers, necessarily N divisible by 22 +1. Since n is assumed prime, we conclude that a = 1. Therefore N n = 22 + 1, and n is a Fermat prime. We do not quite succeed in proving the converse at this point. If n is the Fermat N prime 22 + 1, then the above argument shows that the degree of Q(e2πi/n ) over N Q is 22 . However, we cannot yet conclude that Q(e2πi/n ) can be built from Q by successively adjoining 2 N square roots, and thus the converse part of Theorem 9.24 is not immediately applicable. Once we have the theory of Galois groups in hand, we shall see that the existence of these intermediate extensions involving square roots is ensured, and then the constructibility follows. 6. Separable Extensions The Galois group Gal(K/k) of a ﬁeld extension K/k is deﬁned to be the set Gal(K/k) = {k automorphisms of K} with composition as group operation. An instance of this group was introduced in the context of Example 9 of Section IV.1; in this example the ﬁeld k was the ﬁeld Q of rationals and the ﬁeld K was a number ﬁeld Q[θ], where θ is algebraic over Q. In studying Gal(K/k) in this chapter, we ordinarily assume that dimk K < ∞, but there will be instances where we do not want to make such an assumption. Beginning in this section, we take up a study of Galois groups in general. We shall be interested in relationships between ﬁelds L with k ⊆ L ⊆ K and subgroups of Gal(K/k). If H is a subgroup of Gal(K/k), then K H = x ∈ K | ϕ(x) = x for all ϕ ∈ H is a ﬁeld called the ﬁxed ﬁeld of H ; it provides an example of an intermediate ﬁeld L and gives a hint of the relationships we shall investigate. We begin with some examples; in each case the base ﬁeld k is the ﬁeld Q of rationals.

470

IX. Fields and Galois Theory

EXAMPLES OF GALOIS GROUPS. √ (1a) K = Q( −1 ). If ϕ is in Gal(K/Q), then we must have ϕ Q = 1, and √ √ √ a root of X 2 + 1. Thus ϕ( −1 ) = ± −1. Since Q and ϕ( √ −1 ) must be √ −1 √ √ generate Q( √−1 ), there are at most two such ϕ’s. On√the other hand, Q( −1 ) and Q(− −1 ) are simple extensions of Q such that −1 and − −1 have the same minimal polynomial. √ Theorem √ 9.11 therefore produces a Q auto√ morphism of Q( −1 ) with ϕ( −1 ) = − −1, namely complex conjugation. We conclude that Gal(K/Q) has order 2, hence that Gal(K/Q) ∼ = C2 . √ (1b) K = Q( 2 ). The same argument applies as in Example 1a, and the conclusion Gal(K/Q) ∼ = C2 . The nontrivial element of the Galois group √ is that √ carries 2 into − 2 and is different from complex conjugation. √ √ (2) K = Q( 3 2 ). If ϕ is in Gal(K/Q), then ϕ = 1, and ϕ( 3 2 ) has to be Q

a root of X 3 − 2. But K√is a subﬁeld of R,√and there is only one root of X 3 − 2 √ √ 3 3 3 3 in R. Hence ϕ( 2 ) = 2. Since Q and 2 generate Q( 2 ) as a ﬁeld, we see that ϕ = 1. We conclude that Gal(K/Q) has order 1, i.e., is the trivial group. (3) K = Q(r ), where r is a root of X 3 − X − 13 . Any ϕ in Gal(K/Q) ﬁxes Q and sends r to a root of X 3 − X − 13 . In Example 2 of splitting ﬁelds in Section 2, we saw that all three complex roots of X 3 − X − 13 lie in K. Arguing as in Example 1a, we see that Gal(K/Q) has order 3, hence that Gal(K/Q) ∼ = C3 . (4) K = Q(e2πi/17 ). According to Section 5, this is the ﬁeld we need to consider in addressing the constructibility of a regular 17-gon. We saw in that section that [K : Q] = 16 and that the minimal polynomial of e2πi/17 over Q is X 16 + X 15 + · · · + X + 1. The other roots of the minimal polynomial in C are e2πil/17 for 2 ≤ l ≤ 16, and these all lie in K. Theorem 9.11 therefore gives us a Q automorphism ϕl of K sending e2πi/17 into e2πil/17 for each l with 1 ≤ l ≤ 16. Since Q and e2πi/17 generate K, a Q automorphism of K is completely determined by its effect on e2πi/17 . Thus the order of Gal(K/Q) is 16. Let us determine the group structure. Since ϕl sends e2πi/17 into e2πil/17 , it sends e2πir/17 = (e2πi/17 )r into (e2πil/17 )r = e2πilr/17 . If we drop the exponential from the notation, we can think of ϕl as deﬁned on the integers modulo 17, the formula being ϕl (r ) = rl mod 17. From this viewpoint ϕl is an automorphism of the additive group of F17 . Lemma 4.45 shows that the group of additive automorphisms of F17 is isomorphic to F× 17 , and it follows from Corollary 4.27 that Gal(K/Q) ∼ = C16 . For our application of constructibility of a regular 17gon, we would like to know whether the elements of K are constructible. Taking Theorem 9.24 into account, we therefore seek an intermediate ﬁeld L of which K is a quadratic extension. Since we know that Gal(K/Q) is cyclic, we can let H ⊆ Gal(K/Q) ∼ = C16 be the 2-element subgroup, and it is natural to try the ﬁxed ﬁeld L = K H . To understand this ﬁxed ﬁeld, we need to understand the

6. Separable Extensions

471

∼ isomorphism F× 17 = C 16 better. Modulo 17, we have 32 = 9,

34 = −22 ,

38 = 24 = −1,

316 = 1.

8 Consequently 3 is a generator of the cyclic group F× 17 . Then H = {3 , 1} = {±1}, 2πir/17 ) = e−2πir/17 = and L = {x ∈ K | ϕ−1 (x) = ϕ+1 (x) = x}. Since ϕ−1 (e e2πir/17 with the overbar indicating complex conjugation, we see that

¯ L = K H = {x ∈ K | x = x}. It is not hard to check that indeed [K : L] = 2. Next we need a subﬁeld L of L with [L : L ] = 2. We try L = K H with H equal to the 4-element cyclic subgroup of Gal(K/Q). Here we have a harder time checking whether L is indeed a quadratic extension of L , but we shall see in Section 8 that it is.4 We continue in this way, and ultimately we end up with the chain of subﬁelds that exhibits the members of K as constructible. We seek to formulate the kind of argument in the above √ examples as a general theorem. We have to rule out the bad behavior of Q( 3 2 ), where one root of the minimal polynomial lies in the ﬁeld but others do not, and we shall do this by assuming that the extension ﬁeld is a “normal” extension, in a sense to be deﬁned in Section 7. In addition, our style of argument shows that we might run into trouble if our irreducible polynomials over k can have repeated roots in K. We shall rule out this bad behavior by insisting that the extension be “separable,” a condition that we introduce now. The extension will automatically be separable if K has characteristic 0. For the remainder of this section, ﬁx the base ﬁeld k. An irreducible polynomial F(X ) in k[X ] is called separable if it splits into distinct degree-one factors in its splitting ﬁeld, i.e., if f (X ) = an (X − x1 ) · · · (X − xn )

with xi = x j for i = j.

Once this splitting into distinct degree-one factors occurs in the splitting ﬁeld, it occurs in any larger ﬁeld as well. Lemma 9.26. A polynomial F(X ) in k[X ] has no repeated roots in its splitting ﬁeld K if and only if GCD(F, F ) = 1, where F (X ) is the derivative of F(X ). 4 Actually, Section 8 will point out how Corollary 9.36 in Section 7 already handles this step. In fact, Corollary 9.37 handles this step with no supplementary argument.

472

IX. Fields and Galois Theory

PROOF. The polynomial F(X ) has repeated roots in K if and only if F(X ) is divisible by (X − r )2 for some r ∈ K, if and only if some r ∈ K has F(r ) = F (r ) = 0 (by Corollary 9.17), if and only if some r ∈ K has (X − r ) dividing F(X ) and also F (X ) (by the Factor Theorem), if and only if some r ∈ K has (X − r ) dividing GCD(F, F ) when the GCD is computed in K, if and only if GCD(F, F ) = 1 when the GCD is computed in K (by unique factorization in K[X ]). However, the Euclidean algorithm calculates GCD(F, F ) without reference to the ﬁeld, and the GCD is therefore the same when computed in K as it is when computed in k. The lemma follows. Proposition 9.27. An irreducible polynomial F(X ) in k[X ] is separable if and only if F (X ) = 0. In particular, every irreducible (necessarily nonconstant) polynomial is separable if k has characteristic 0. PROOF. Since the polynomial F(X ) is irreducible and GCD(F, F ) divides F(X ), GCD(F, F ) equals 1 or F(X ) in all cases. If F (X ) = 0, then GCD(F, F ) = F(X ), and Lemma 9.26 implies that F(X ) is not separable. Conversely if F (X ) = 0, then the facts that GCD(F, F ) divides F (X ) and that deg F < deg F together imply that GCD(F, F ) cannot equal F(X ). So GCD(F, F ) = 1, and Lemma 9.26 implies that F(X ) is separable. Fix an algebraic extension K of k. We say that an element x of K is separable over k if the minimal polynomial of x over k is separable. We say that K is a separable extension of k if every x in K is separable over k. EXAMPLES OF SEPARABLE EXTENSIONS AND EXTENSIONS NOT SEPARABLE. (1) In characteristic 0, every algebraic extension K of k is separable, by Proposition 9.27. (2) Every algebraic extension K of a ﬁnite ﬁeld k is separable. In fact, if x is in K, then [k(x) : k] is ﬁnite. Hence k(x) is a ﬁnite ﬁeld. Then we may assume that K is a ﬁnite ﬁeld, say of order q = p n with p prime. Since the multiplicative group K× has order q − 1, every nonzero element of K is a root of X q−1 − 1, and every root of K is therefore a root of X q − X . The minimal polynomial F(X ) of x over k must then divide X q − X . However, we know that X q − X splits over K and has no repeated roots. Thus F(X ) splits over K and has no repeated roots. Then F(X ) is separable over k, and x is separable over k. (3) Let k = F p (x) be a transcendental extension of the ﬁnite ﬁeld F p . Because this extension is transcendental, X p − x is irreducible over k. Let K be the simple algebraic extension k[X ]/(X p − x), which we can write more simply as k(x 1/ p ). The minimal polynomial of x 1/ p over k is X p − x, and its derivative is p X p−1 = 0 since the derivative of the constant x is 0. By Proposition 9.27, x 1/ p is not separable over k.

6. Separable Extensions

473

The way that separability enters considerations with Galois groups is through the following theorem, explicitly or implicitly. One of the corollaries of the theorem is that if K/k is an algebraic extension, then the set of elements in K separable over k is a subﬁeld of K. Theorem 9.28. Let k ⊆ L ⊆ K be an inclusion of ﬁelds such that K is a simple algebraic extension of L of the form K = L(α), let K be an algebraic closure of K, and let M(X ) be the minimal polynomial of α over L. Then the number of ﬁeld mappings of K into K ﬁxing k is the product of the number of distinct roots of M(X ) in K by the number of ﬁeld mappings of L into K ﬁxing k. REMARKS. An algebraic closure K of K exists by Theorem 9.22. Because K is known to exist, the present theorem reduces to Theorem 9.11 when L = k. PROOF. Any ﬁeld mapping ϕ : K → K is uniquely determined by ϕ L and ϕ(α). If σ = ϕ L , then the equality M(α) = 0 implies that M σ (ϕ(α)) = 0, and thus ϕ(α) has to be a root of M σ (X ). The number of distinct roots of M σ (X ) in K equals the number of distinct roots of M(X ) in K; hence the number of possibilities for ϕ(α) is at most the number of distinct roots of M(X ) in K. Consequently the number of such ϕ’s ﬁxing k is bounded above by the product of the number of distinct roots of M(X ) in K times the number of ﬁeld mappings σ of L into K ﬁxing k. For an inequality in the reverse direction, let σ : L → K be any ﬁeld mapping of L into K ﬁxing k, put L = σ (L), let x be any root of M σ (X ), and form the subﬁeld L (x) of K. Theorem 9.11 shows that there exists a ﬁeld isomorphism ϕ : L(α) → L (x) with ϕ L = σ and ϕ(α) = x, and we can regard ϕ as a ﬁeld mapping of K into K ﬁxing k, extending σ , and having ϕ(α) = x. Thus the number of ﬁeld mappings ϕ : K → k ﬁxing k is bounded below by the product of the number of distinct roots of M(X ) in K times the number of ﬁeld homomorphisms σ of L into K ﬁxing k. Corollary 9.29. Let K = k(α1 , . . . , αn ) be a ﬁnite algebraic extension of the ﬁeld k, and let K be an algebraic closure of K. Then the number of ﬁeld mappings of K into K ﬁxing k is ≤ [K : k]. Moreover, the following conditions are equivalent: (a) the number of ﬁeld mappings of K into K ﬁxing k equals [K : k], (b) each α j is separable over k(α1 , . . . , α j−1 ) for 1 ≤ j ≤ n, (c) each α j is separable over k for 1 ≤ j ≤ n. PROOF. For 1 ≤ j ≤ n, let M j (X ) be the minimal polynomial of α j over k(α1 , . . . , α j−1 ), let d j be the degree of M j (X ), and let s j be the number of distinct roots of M j (X ) in K. Then s j ≤ d j with equality for a particular j if and only if

474

IX. Fields and Galois Theory

α j is separable over k(α1 , . . . , α j−1 ), by deﬁnition. Also, [K : k] = nj=1 d j by Corollary 9.7, and the number of ﬁeld mappings of K into K ﬁxing k is nj=1 s j by iterated application of Theorem 9.28. From these facts, the ﬁrst conclusion of the corollary is immediate, and so is the equivalence of (a) and (b). Condition (a) is independent of the order of enumeration of α1 , . . . , αn . Since we can always take any particular α j to be ﬁrst, we obtain the equivalence of (a) and (c). Corollary 9.30. Let K = k(α1 , . . . , αn ) be a ﬁnite algebraic extension of the ﬁeld k. If each α j for 1 ≤ j ≤ n is separable over k, then K/k is a separable extension. PROOF. Let β be in K, We apply the equivalence of (a) and (c) in Corollary 9.29 once to the set of generators {α1 , . . . , αn } and once to the set of generators {β, α1 , . . . , αn }, and the result is immediate. Corollary 9.31. If K/k is an algebraic ﬁeld extension, then the subset L of elements of K that are separable over k is a subﬁeld of K. PROOF. If α and β are given in L, we apply Corollary 9.30 to the extension k(α, β) of k to see that L contains the subﬁeld generated by k and the elements α and β. Proposition 9.32. If K/k is a separable algebraic extension and if L is a ﬁeld with k ⊆ L ⊆ K, then K is separable over L, and L is separable over k. PROOF. The separability assertion about L/k says the same thing about elements of L that separability of K/k says about those same elements, and it is therefore immediate that L/k is separable. Next let us consider K/L. If x is in K, let F(X ) be its minimal polynomial over k, and let G(X ) be its minimal polynomial over L. Since F(X ) is in L[X ] and F(x) = G(x) = 0, G(X ) divides F(X ). Since K/k is separable, F(X ) splits into distinct degree-one factors in its splitting ﬁeld F. The ﬁeld F contains a splitting ﬁeld of G(X ), and thus the degree-one factors of G(X ) in F[X ] are a subset of the degree-one factors of F(X ) in F[X ]. There are no repeated factors for F(X ), and there can be no repeated factors for G(X ). Thus x is separable over L, and K/L is a separable extension. In studying Galois groups, we shall be chieﬂy interested in the following situation in Corollary 9.29: K is an algebraic ﬁeld extension K = k(α1 , . . . , αn ) of k for which every ﬁeld mapping of K into an algebraic closure that ﬁxes k actually carries K into itself. We seek conditions under which this situation arises,

6. Separable Extensions

475

and then we mine the consequences. As we did in the study begun in Theorem 9.28, we begin with the case of a simple algebraic extension. Let K = k(γ ) be a simple algebraic extension of k, and let F(X ) be the minimal polynomial of γ over k. Any member ϕ of the Galois group Gal(K/k) carries γ to another root γ of F(X ), and ϕ is uniquely determined by γ since k and γ generate the ﬁeld K. An element ϕ of Gal(K/k) carrying γ to γ can exist only if γ is in K. If γ is in K, then k(γ ) ⊇ k(γ ), and the equal ﬁnite dimensionality of k(γ ) and k(γ ) forces k(γ ) = k(γ ). In other words, if γ is in K, then the unique k isomorphism k(γ ) → k(γ ) of Theorems 9.10 and 9.11 carrying γ to γ is a member of Gal(K/k). Making a count of what happens to all the elements γ , we see that we have proved the following. Proposition 9.33. Let K = k(γ ) be a simple algebraic extension of k, and let F(X ) be the minimal polynomial of γ . Then | Gal(K/k)| ≤ [K : k] with equality if and only if F(X ) is a separable polynomial and K is a splitting ﬁeld of F(X ) over k. √ EXAMPLE. For K = Q( 3 2 ) with minimal polynomial F(X ), we know that F(X ) does not split in K; the nonreal roots of F(X ) do not lie in K. Proposition 9.33 gives us | Gal(K/Q)| < [K : Q] = 3, and a glance at the argument preceding Proposition 9.33 shows that | Gal(K/Q)| has to be 1. It is possible to investigate the case of several generators directly, but it is more illuminating to reduce it to the case of a single generator as in Proposition 9.33. The tool for doing so is the following important theorem. Theorem 9.34 (Theorem of the Primitive Element). Let K/k be a separable algebraic extension with [K : k] < ∞. Then there exists an element γ in K such that K = k(γ ). PROOF. We can write K = k(x1 , . . . , xn ), and we proceed by induction on n, the case n = 1 being trivial. For general n, let L = k(x1 , . . . , xn−1 ), so that K = L(xn ). By the inductive hypothesis, L is of the form L = k(α) for some α in K, and thus K = k(α, xn ). Changing notation, we see that it is enough to prove that whenever K is a separable algebraic extension of the form K = k(α, β), then K is of the form K = k(γ ) for some γ . We shall show this for some γ of the form γ = β + cα with c in k. Because every ﬁnite extension of a ﬁnite ﬁeld is separable (by Example 2 of separable extensions), we may assume that k is an inﬁnite ﬁeld.

476

IX. Fields and Galois Theory

Let F(X ) and G(X ) be the minimal polynomials of α and β over k, and let K be an extension in which F(X )G(X ) splits, i.e., in which F(X ) and G(X ) both split. Let α1 = α, α2 , . . . , αm and β1 = β, β2 , . . . , βn be the roots of F(X ) and G(X ) in K , in each case necessarily distinct by deﬁnition of separability of α and β. Deﬁne L = k(γ ) with γ = β + cα, where c is a member of k to be speciﬁed. For suitable c, we shall show that α is in L. Then β = γ − cα must be in L, and we obtain K ⊆ L. Since γ is in K, the reverse inclusion is built into the construction, and thus we will have K = L. We shall compute the minimal polynomial of α over L. We know that α is a root of F(X ), and we put H (X ) = G(γ − cX ). Then H (X ) is in L[X ] ⊆ K [X ], and G(β) = 0 implies H (α) = 0. Therefore X − α divides both F(X ) and H (X ) in the ring K [X ]. Let us determine GCD(F, H ) in K [X ]. The separability of α says that X − α divides F(X ) only once. Since F(X ) splits in K [x], any other prime divisor of GCD(F, H ) in K [X ] has to be of the form X − α j with j = 1. The deﬁnition of H (X ) gives H (α j ) = G(γ − cα j ). If G(γ − cα j ) = 0, then γ − cα j = βi for some i, with the consequence that β + cα − cα j = βi and c = (βi − β)(α − α j )−1 . Since k is an inﬁnite ﬁeld, some choice of c in K makes GCD(F, H ) = X − α in K [X ]. Then GCD(F, H ) = X − α, up to a scalar factor, in L[X ] since F(X ) and H (X ) are in L[X ] and since the GCD can be computed without reference to the ﬁeld containing both elements. The ratio of the constant term to the coefﬁcient of X has to be in L independently of the scalar factor multiplying X − α, and therefore α is in L. This completes the proof.

7. Normal Extensions Proposition 9.33 suggests that the failure of equality to hold in the inequality | Gal(K/k)| ≤ [K : k] has something to do with the failure of polynomials over k to split √ fully in K once they have at least one root in K. In the case of the extension Q( 3 2 )/Q, where equality fails, the Galois group is trivial and therefore gives us no information about the extension. Thus it makes sense to regard the failure of equality to hold as an undesirable situation. Accordingly, we make a deﬁnition, choosing among several equivalent conditions one that is easy to apply. A ﬁnite separable5 algebraic extension K of a ﬁeld k is said to be normal over k if K is the splitting ﬁeld of some F(X ) in k[X ]. The following proposition asserts some powerful consequences of this condition. 5 A more advanced treatment might proceed without the assumption of separability for as long as possible. But it is unnecessary to do so in this volume, and the assumption of separability makes the Theorem of the Primitive Element available to us.

7. Normal Extensions

477

Proposition 9.35. Let K be a ﬁnite separable algebraic extension of a ﬁeld k, so that | Gal(K/k)| ≤ [K : k]. Then the following are equivalent. (a) K is the splitting ﬁeld of some F(X ) in k[X ], i.e., K is normal over k, (b) every irreducible polynomial F(X ) in k[X ] with a root in K splits in K, i.e., K contains a splitting ﬁeld for each such F(X ), (c) | Gal(K/k)| = [K : k], (d) k = KG for G = Gal(K/k). REMARKS. We prove that (a) and (c) are equivalent, that the equivalent (a) and (c) imply (d), that (d) implies (b), and that (b) implies (a). PROOF. By separability and Theorem 9.34 we can write K = k(γ ) throughout the proof for some γ in K. Let M(X ) be the minimal polynomial of γ over k. Suppose (a) holds. We prove (c). Write K = k(γ ) = k(α1 , . . . , αn ), where α1 , . . . , αn are the roots of some F(X ) in k[X ] that splits over k. We may assume that F(X ) has no repeated prime factors and therefore, by separability of K/k, that α1 , . . . , αn are distinct. Then γ = H (α1 , . . . , αn ) for some H in k[X 1 , . . . , X n ]. Proposition 9.33 will establish (c) if we show that M(X ) splits over K. Let K ⊇ K be a ﬁnite extension in which M(X ) splits, and let γ be a root of M(X ) in K . We are to show that γ is in K. Theorem 9.11 produces a k isomorphism ϕ : k(γ ) → k(γ ) with ϕ(γ ) = γ . Since ϕ(αi ) is a root of F(X ) for each i, ϕ(αi ) = α j (i) for some j = j (i) that is unique since α1 , . . . , αn are distinct. Thus ϕ permutes {α1 , . . . , αn }, and γ = ϕ(γ ) = ϕ(H (α1 , . . . , αn )) = H1 (α1 , . . . , αn ) for some H1 in k[X 1 , . . . , X n ]. Therefore γ is in k(α1 , . . . , αn ) = K. This proves (c). Suppose (c) holds. We prove (a). Proposition 9.33, in the presence of condition (c) and the given separability of K/k, implies that K is a splitting ﬁeld of M(X ) over k. Thus (a) holds. Suppose (a) holds. We prove (d). Let k = KG . Since every member of Gal(K/k) ﬁxes k , Gal(K/k) ⊆ Gal(K/k ). Meanwhile, (a) for K/k implies (a) for K/k , and K is separable over k by Proposition 9.32. Since (a) implies (c), (c) holds for both k and k, and we have [K : k] = | Gal(K/k)| ≤ | Gal(K/k )| = [K : k ]. Since k ⊇ k, the inequality of dimensions implies that k = k. Thus (d) holds. Suppose (d) holds. We prove (b). Let F(X ) be an irreducible polynomial in k[X ] having a root r in K. The polynomial F(X ) is necessarily the minimal polynomial of r over k. Enumerate {ϕ(r ) | ϕ ∈ Gal(K/k)} as r1 , . . . , rn , with

IX. Fields and Galois Theory

478

n any possible repetitions included. If J (X ) is deﬁned to be i=1 (X − ri ), then expansion of the product gives |G|−1 ri X + ri r j X |G|−2 − · · · ± ri . J (X ) = X |G| − i

i< j

i

Each member ϕ of Gal(K/k) carries each coefﬁcient of J (X ) into itself since ϕ permutes the elements ri . Since KG = k, we see therefore that J (X ) is in k[X ]. Since J (r ) = 0 and F(X ) is the minimal polynomial of r , F(X ) divides J (X ). Over K, J (X ) splits because of its deﬁnition. By unique factorization, F(X ) must split too. Thus (b) holds. Finally if (b) holds, then M(X ), being irreducible over k and having γ as a root in K, splits in K. Thus K is a splitting ﬁeld for M(X ) over k, and (a) holds. Corollary 9.36. If K is a ﬁnite normal separable extension of k and if L is a ﬁeld with k ⊆ L ⊆ K, then K is a ﬁnite normal separable extension of L, and the subgroup H = Gal(K/L) of Gal(K/k) has |H | · [L : k] = | Gal(K/k)| . PROOF. The ﬁeld K is a separable extension of the intermediate ﬁeld L by Proposition 9.32, and it is a normal extension by Proposition 9.35a. Therefore Proposition 9.35c gives | Gal(K/L)| = [K : L], and we have |H |·[L : k] = | Gal(K/L)|·[L : k] = [K : L]·[L : k] = [K : k] = | Gal(K/k)|, the last two equalities holding by Corollary 9.7 and Proposition 9.35c.

Corollary 9.37. Let K/k be a separable algebraic extension, and suppose that H is a ﬁnite subgroup of Gal(K/k). Then K/K H is a ﬁnite normal separable extension, H is the subgroup Gal(K/K H ) of Gal(K/k), and [K : K H ] = |H |. PROOF. Proposition 9.32 shows that K is separable over K H . For an arbitrary element x of K, form the polynomial in K[X ] given by F(X ) =

ϕ∈H

(X − ϕ(x)).

If ϕ0 is in H , then F ϕ0 is given by replacing each ϕ(x) by ϕ0 ϕ(x), and the product is unchanged. Therefore F(X ) = F ϕ0 (X ), and F(X ) is in K H [X ]. Thus F(X ) is a polynomial in K H [X ] that has x as a root and splits in K. The minimal polynomial M(X ) of x over K H must divide F(X ), and it too has x as a root.

8. Fundamental Theorem of Galois Theory

479

By unique factorization in K[X ], M(X ) must split in K. Thus K/K H will be a normal extension if it is shown that [K : K H ] < ∞. The element x has [K H (x) : K H ] = deg M(X ) ≤ deg F(X ) = |H |, and the claim is that [K : K H ] ≤ |H |. Assuming the contrary, we would at some point have an inequality [K H (x1 , . . . , xn ) : K H ] > |H | because every element of K is algebraic over k. By the Theorem of the Primitive Element (Theorem 9.34), K H (x1 , . . . , xn ) = K H (z) for some element z, and therefore [K H (x1 , . . . , xn ) : K H ] = [K H (z) : K H ] ≤ |H |, contradiction. We conclude that [K : K H ] ≤ |H |. From the previous paragraph, K/K H is a ﬁnite separable normal extension. The deﬁnition of K H shows that H ⊆ Gal(K/K H ), and Proposition 9.35c gives | Gal(K/K H )| = [K : K H ]. Putting these facts together with the inequality [K : K H ] ≤ |H | from the previous paragraph, we have |H | ≤ | Gal(K/K H )| = [K : K H ] ≤ |H | with equality on the left only if H = Gal(K/K H ). Equality must hold throughout the displayed line since the ends are equal, and therefore H = Gal(K/K H ). 8. Fundamental Theorem of Galois Theory We are now in a position to obtain the main result in Galois theory. Theorem 9.38 (Fundamental Theorem of Galois Theory). If K is a ﬁnite normal separable extension of k, then there is a one-one inclusion-reversing correspondence between the subgroups H of Gal(K/k) and the subﬁelds L of K that contain k, corresponding elements H and L being given by L = KH

and

H = Gal(K/L).

The effect of the theorem is to take an extremely difﬁcult problem, namely ﬁnding intermediate ﬁelds, and reduce it to a problem that is merely difﬁcult, namely ﬁnding the Galois group. For example the ﬁniteness of Gal(K/k) implies that there are only ﬁnitely many subgroups of Gal(K/k), and the theorem therefore implies that there are only ﬁnitely many intermediate ﬁelds; this ﬁniteness of the number of intermediate ﬁelds is not so obvious without the theorem. As a reminder of the availability of Theorem 9.38, Proposition 9.35, and Corollary 9.36, it is customary to refer to a ﬁnite normal separable extension as a ﬁnite Galois extension. Before coming to the proof of the theorem, let us examine what the theorem says for the examples in Section 6. In each case the ﬁeld k is the ﬁeld Q of rationals. The extensions are separable because the characteristic is 0.

480

IX. Fields and Galois Theory

EXAMPLES. √ (1a) K = Q( −1 ). This is a splitting ﬁeld for X 2 + 1. Proposition 9.33 gives | Gal(K/Q)| = [K : Q] = 2. Thus Gal(K/Q) ∼ = C2 . There are no nontrivial subgroups, and there are consequently no intermediate ﬁelds. We knew this already since there cannot be any intermediate Q vector spaces between Q and K. Thus the theorem tells us nothing new. √ (1b) K = Q( 2 ). Similar remarks apply. √ and (2) K = Q( 3 2 ). This extension is not normal, √ √ the theorem does not apply to K. If we adjoin r to K with r 2 + ( 3 2 )r + ( 3 2 )2 = 0, we obtain a splitting ﬁeld K for X 3 − 2 over Q. Then K is a normal extension of Q, and the theorem applies. Since each element of Gal(K /Q) permutes the three roots of X 3 − 2 and is determined by its effect on these roots, Gal(K /Q) is isomorphic to a subgroup of the symmetric group S3 . The Galois group Gal(K /Q) has order [K : Q] = 6 and hence is isomorphic to the whole symmetric group S3 . The group S3 has three subgroups of order 2 and one subgroup of order 3. Therefore K has three intermediate ﬁelds of degree 3 and one of degree 2. The intermediate ﬁelds of degree 3 are the three ﬁelds generated by Q and one of the three roots of X 3 − 2. The intermediate ﬁeld of degree 2 corresponds to the alternating subgroup of order 3 and is the subﬁeld generated by Q and the cube roots of 1. It is a splitting ﬁeld for X 2 + X + 1 over Q. (3) K = Q(r ), where r is a root of X 3 − X − 13 . We know from Section 2 that X 3 − X − 13 is irreducible over Q and splits in K, and K by deﬁnition is therefore normal. Proposition 9.33 tells us that Gal(K/Q) has order 3 and hence is isomorphic to C3 . There are no nontrivial subgroups, and Theorem 9.38 tells us that there are no intermediate ﬁelds. We could have seen in more elementary fashion that there are no intermediate ﬁelds by using Corollary 9.7, since the corollary tells us that the degree of an intermediate ﬁeld would have to divide 3. (4) K = Q(e2π 1/17 ). We have seen that [K : Q] = 16 and that Gal(K/Q) ∼ = × ∼ F17 = C16 . Let c be a generator of the cyclic Galois group. Let H2 = {1, c8 }, H4 = {1, c4 , c8 , c12 }, and H8 = {1, c2 , c4 , c6 , c8 , c10 , c12 , c14 }. Then put L2 = K H2 ,

L4 = K H4 ,

L8 = K H8 .

The inclusions among our subgroups are {1} ⊆ H2 ⊆ H4 ⊆ H8 ⊆ Gal(K/Q), and the theorem says that the correspondence with intermediate ﬁelds reverses inclusions. Then we have K ⊇ L2 ⊇ L4 ⊇ L8 ⊇ Q.

8. Fundamental Theorem of Galois Theory

481

Applying Corollary 9.36, we see that each of these subﬁelds is a quadratic extension of the next-smaller one. Theorem 9.24 says that the members of K are therefore constructible with straightedge and compass. Consequently a regular 17-gon is constructible with straightedge and compass. The constructibility or nonconstructibility of regular n-gons for general n will be settled in similar fashion in the next section. In Section 12 we return to the question of using Galois theory to guide us through the actual steps of the construction when it is possible. PROOF OF THEOREM 9.38. The function L → Gal(K/L) has domain the set of all intermediate ﬁelds and range the set of all subgroups of Gal(K/k), since an element in Gal(K/L) is necessarily in Gal(K/k). Each such extension K/L is separable by Proposition 9.32 and is normal by Proposition 9.35a. Thus Proposition 9.35d applies to each K/L and shows that L = KGal(K/L) . Consequently the function L → Gal(K/L) is one-one. If H is a subgroup of Gal(K/k), then Corollary 9.37 shows that L = K H is an intermediate ﬁeld for which H = Gal(K/L), and therefore the function L → Gal(K/L) is onto. It is immediate from the deﬁnition of Galois group that L1 ⊆ L2 implies Gal(K/L1 ) ⊇ Gal(K/L2 ), and it is immediate from the formula L = KGal(K/L) that Gal(K/L1 ) ⊇ Gal(K/L2 ) implies L1 ⊆ L2 . This completes the proof. Corollary 9.39. If K is a ﬁnite Galois extension of k and if L is a subﬁeld of K that contains k, then L is a normal extension of k if and only if Gal(K/L) is a normal subgroup of Gal(K/k). In this case, the map Gal(K/k) → Gal(L/k) given by restriction from K to L is a group homomorphism that descends to a group isomorphism ( Gal(K/k) Gal(K/L) ∼ = Gal(L/k). PROOF. Let L correspond to H = Gal(K/L) in Theorem 9.38, so that L = K H . If ϕ is in Gal(K/k), then Kϕ H ϕ

−1

= {k ∈ K | ϕhϕ −1 (k) = k for all h ∈ H } = {ϕ(k ) ∈ K | ϕh(k ) = ϕ(k ) for all h ∈ H } = {ϕ(k ) ∈ K | h(k ) = k for all h ∈ H } = ϕ(K H ) = ϕ(L).

Since the correspondence of Theorem 9.38 is one-one onto, ϕ H ϕ −1 = H if and only if ϕ(L) = L. Therefore H is a normal subgroup of Gal(K/k) if and only if ϕ(L) = L for all ϕ ∈ Gal(K/k). Now suppose that H is a normal subgroup of Gal(K/k). We have just seen that ϕ(L) = L for all ϕ ∈ Gal(K/k). Then each ϕ deﬁnes by restriction a member

482

IX. Fields and Galois Theory

ϕ = ϕ L of Gal(L/k), and ϕ → ϕ is certainly a group homomorphism. The kernel of ϕ → ϕ is the subgroup of Gal(K/k) given by ϕ ∈ Gal(K/k) ϕ L = 1 , and this is just ( Gal(K/L). Thus ϕ → ϕ descends to a one-one homomorphism of Gal(K/k) Gal(K/L) into Gal(L/k), and we have | Gal(K/k)|/| Gal(K/L)| ≤ | Gal(L/k)|. We make use of Corollary 9.7 relating degrees of extensions. Applying Proposition 9.35c to K/k and K/L, as well as Proposition 9.33 to L/k, we obtain ( [L : k] = [K : k] [K : L] = | Gal(K/k)|/| Gal(K/L)| ≤ | Gal(L/k)| ≤ [L : k], with equality at the ﬁrst ≤ sign only if ϕ → ϕ is onto Gal(L/k) and with equality at the second ≤ sign only if L is the splitting ﬁeld over k of the minimal polynomial of a certain element γ of L. Equality must hold in both cases because the end members of the display are equal, and we conclude that ϕ → ϕ is onto and that L/k is a normal extension. We are left with proving that if L/k is a normal extension, then H is a normal subgroup of Gal(K/k). Thus let L/k be normal. In view of the conclusion of the ﬁrst paragraph of the proof, it is enough to prove that ϕ(L) = L for all ϕ ∈ Gal(K/k). By deﬁnition of normal extension, L is the splitting ﬁeld of some polynomial F(X ) in k[X ]. We may assume that F(X ) is monic. Let us write F(X ) = (X − x1 ) · · · (X − xn )

with all x j in L.

Applying a given member ϕ of Gal(K/k) to the coefﬁcients, we obtain F(X ) = (X − ϕ(x1 )) · · · (X − ϕ(xn )), and here the ϕ(x j )’s are known only to be in K. By unique factorization in K[X ], ϕ(xi ) = x j (i) for some j = j (i). Therefore ϕ(xi ) is in L for all i. Since L is the splitting ﬁeld of F(X ) over k, L = k(x1 , . . . , xn ). Thus ϕ maps L into L. The examples of Galois groups given in Section 6 all involved ﬁelds that are ﬁnite extensions of the rationals Q. As we shall see in Section 17, it is important for the understanding of Galois groups of ﬁnite extensions of Q to be able to identify Galois groups of ﬁnite extensions of ﬁnite ﬁelds. This matter is addressed in the following proposition.

9. Application to Constructibility of Regular Polygons

483

Proposition 9.40. Let K be a ﬁnite extension of the ﬁnite ﬁeld Fq , where q = pa and p is prime, and suppose that [K : Fq ] = n. Then K is a Galois extension of Fq , the Galois group Gal(K/Fq ) is cyclic of order n, and a generator a is the a th -power Frobenius automorphism x → x q = x p . n

PROOF. Theorem 9.14 shows that K is a splitting ﬁeld for X q − X over F p . n Hence it is a splitting ﬁeld for X q − X over Fq , and K/Fq is a normal extension. n The polynomial X q − X has no multiple roots, and it follows that K/Fq is a separable extension. Deﬁne ϕ by ϕ(x) = x q . Lemma 9.18 shows that ϕ is an automorphism of K. Since every member of Fq× has order dividing q − 1, every nonzero element of Fq is ﬁxed by ϕ. The map ϕ certainly carries 0 to 0, and thus ϕ is in Gal(K/Fq ). By a similar argument, ϕ n ﬁxes every element of K, and hence ϕ n = 1. Corollary 4.27 shows that K× is cyclic, hence that there exists an element y in K× such that y l = 1 for 1 ≤ l < q n − 1. This y has y l = y for 2 ≤ l ≤ q n − 1. Then k ϕ k (y) = y q cannot be 1 for 1 ≤ k ≤ n − 1, and ϕ must have order exactly n. This shows that ϕ generates a cyclic subgroup of order n in Gal(K/Fq ). Since n is an upper bound for the order of Gal(K/Fq ) by Proposition 9.33, this cyclic subgroup exhausts the Galois group. EXAMPLE. Suppose that we are given a polynomial with coefﬁcients in F p and we want to ﬁnd the Galois group of a splitting ﬁeld. Since there are efﬁcient computer programs for factoring the polynomial into irreducible polynomials, let us take that factorization as done. The Galois group will be cyclic of some order with generator the Frobenius automorphism x → x p . For an irreducible polynomial of degree n, the splitting ﬁeld has degree n, and the smallest power of x → x p that gives the identity is the n th power. The conclusion is that the Galois group is cyclic of order equal to the least common multiple of the degrees of the irreducible constituents, a generator being the Frobenius automorphism.

9. Application to Constructibility of Regular Polygons In this section we use Galois theory to give a proof of Theorem 9.25 concerning the constructiblity of regular n-gons. Let us recall the statement. THEOREM 9.25 (Gauss). A regular n-gon is constructible with straightedge and compass if and only if n is the product of distinct Fermat primes and a power of 2. N

PROOF OF SUFFICIENCY. First suppose that n is a Fermat prime n = 22 + 1. N Let K = Q(e2πi/n ). We saw in Section 5 that the degree [K : Q] is 22 , hence is

484

IX. Fields and Galois Theory

a power of 2. Furthermore we know that K is a separable extension of Q, being of characteristic 0, and it is normal, being the splitting ﬁeld for X n − 1 over Q. N In Section 6 we saw that the Galois group Gal(K/Q) is cyclic of order 22 . Let c be a generator of this group. For each integer k with 0 ≤ k ≤ 2 N , let H2k be 2 N −k the unique cyclic subgroup of Gal(K/Q) of order 2k . For this subgroup, c2 is a generator. Put L2k = K H2k . Then we have inclusions {1} ⊆ H2 ⊆ H22 ⊆ · · · H2k ⊆ · · · ⊆ H22 N −1 ⊆ H22 N = Gal(K/Q), the index being 2 at each stage. Theorem 9.38 says that the correspondence with intermediate ﬁelds reverses inclusions and that the degree of each consecutive extension of subﬁelds matches the index of the corresponding consecutive subgroups. The intermediate ﬁelds are therefore of the form K ⊇ L2 ⊇ L22 ⊇ · · · L2k ⊇ · · · ⊇ L22 N −1 ⊇ L22 N = Q, and the degree in each case is 2. In view of the formula for the roots of a quadratic polynomial, each extension is obtained by adjoining some square root. By Theorem 9.24 the members of K are constructible with straightedge and compass. In particular, e2πi/n is constructible, and a regular n-gon is constructible. Next suppose that e2πi/r and e2πi/s are both constructible and that GCD(r, s) = 1. Choose integers a and b with ar + bs = 1, so that as + br = r1s . Then the equality (e2πi/s )a (e2πi/r )b = e2πi/(r s) shows that e2πi/(r s) is constructible. This proves the sufﬁciency for any product of distinct Fermat primes. Bisection of an angle is always possible with straightedge and compass, as was observed in the third paragraph of Section 5, and the proof of the sufﬁciency in Theorem 9.25 is therefore complete. REMARKS. The above proof shows that the construction is possible, but it gives little clue how to carry out the construction. We shall address this matter further in Section 12. We turn our attention to the necessity—that n has to be the product of distinct Fermat primes and a power of 2 if a regular n-gon is constructible. For the moment let n ≥ 1 be any integer. Let us consider the distinct n th roots of 1 in C, which are ek2πi/n for 0 ≤ k < n. The order of each of these elements divides n, and the order is exactly n if and only if GCD(k, n) = 1. In this case we say that ek2πi/n is a primitive n th root of 1. Deﬁne the cyclotomic polynomial n (X ) by (X − ek2πi/n ). n (X ) = GCD(k,n)=1, 0≤k 0, then ϕ(n) =

r j=1

k −1

pj j

( p j − 1).

For constructibility this must be a power of 2. Then each p j dividing n must be 1 more than a power of 2, i.e., must be 2 or a Fermat prime, and the only p j allowed to have p 2j dividing n is p j = 2.

10. Application to Proving the Fundamental Theorem of Algebra In this section we use Galois theory to give a proof of the Fundamental Theorem of Algebra. Let us recall the statement. THEOREM 1.18 (Fundamental Theorem of Algebra). Any polynomial in C[X ] with degree ≥ 1 has at least one root.

10. Application to Proving the Fundamental Theorem of Algebra

487

We begin with a lemma that handles three easy special cases. Lemma 9.43. There are no ﬁnite extensions of R of odd degree greater than 1, the only extension of R of degree 2 up to R isomorphism is C, and there are no ﬁnite extensions of C of degree 2. PROOF. If K is a ﬁnite extension of R of odd degree and if x is in K, then [R(x) : R] is odd, and consequently the minimal polynomial F(X ) of x over R is irreducible of odd degree. By Proposition 1.20, which is derived from the Intermediate Value Theorem of Section A3 of the appendix, F(X ) has at least one root in R. Therefore F(X ) has degree 1, and x is in R. If F(X ) is an irreducible polynomial in R[X ] of degree 2, then F(X ) splits in C by the quadratic formula, and hence the only extension of R of degree 2 is C, up to R isomorphism, by the uniqueness of splitting ﬁelds (Theorem 9.13). Let G(X ) = X 2 + bX + c be a polynomial in C[X ] of degree 2. Then G(X ) has a root x in C given by the quadratic formula since every member of C has a square root6 in C, and G(X ) cannot be irreducible. Since any ﬁnite extension of C of degree 2 would have to be of the form C(x), with x equal to a root of an irreducible quadratic polynomial over C, there can be no such extension. PROOF OF THEOREM 1.18. First let us show that every irreducible member F(X ) of R[X ] splits over C. Let K be a splitting ﬁeld for F(X ). Say that [K : R] = 2m N with N odd. Then K is a Galois extension of R, and | Gal(K/R)| = 2m N . By the Sylow Theorems (particularly Theorem 4.59a), let H be a Sylow 2-subgroup of Gal(K/R). This H has |H | = 2m . The ﬁeld L = K H that corresponds to H under Theorem 9.38 has [L : R] = N with N odd, and the ﬁrst conclusion of Lemma 9.43 shows that N = 1. Thus | Gal(K/R)| = 2m . Corollary 4.40 shows that Gal(K/R) has nested subgroups of all orders 2m−k with 0 ≤ k ≤ m, and Theorem 9.38 says that the corresponding ﬁxed ﬁelds are nested and have respective degrees 2k with 0 ≤ k ≤ m. The extension ﬁeld of R for k = 1 is necessarily C by Lemma 9.43, and Lemma 9.43 shows that there are no quadratic extensions of C. Therefore m = 0 or m = 1, and the possible splitting ﬁelds for F(X ) are R and C in the two cases. To complete the proof, suppose that K is a ﬁnite algebraic extension of C of degree n. Then K is a ﬁnite algebraic extension of R of degree 2n. The Theorem of the Primitive Element allows us to write K = R(x) for some x ∈ K, and the minimal polynomial of x over R necessarily has degree 2n. The previous paragraph shows that this polynomial splits in C. Thus x is in C, and K = C. This completes the proof. see that every member of C has a square root in C, let √c + di be given with c and √d real and with d = 0. Let a and b be real numbers with a 2 = 12 (c + c2 + d 2 ), b2 = 12 (−c + c2 + d 2 ), and sgn(ab) = sgn d. Then (a + bi)2 = c + di. 6 To

488

IX. Fields and Galois Theory

11. Application to Unsolvability of Polynomial Equations with Nonsolvable Galois Group The quadratic formula for ﬁnding the roots of a quadratic polynomial has in principle been known since the time of the Babylonians about 400 B.C.7 The corresponding problem of ﬁnding roots of cubics was unsolved until the sixteenth century, and Cardan’s formula was discovered at that time. The original formula assumes real coefﬁcients and was in two parts, a ﬁrst case corresponding to what we now view as one real root and two complex roots, the second case corresponding to what we view as three real roots.8 There is a similar formula, but more complicated, for solving quartics. Further centuries passed with no progress on ﬁnding a corresponding formula for the roots of a polynomial of degree 5 or higher. The introduction of Galois theory in the early nineteenth century made it possible to prove a surprising negative statement about all degrees beyond 4. Suppose that we are given a polynomial equation with coefﬁcients in the ﬁeld Q or a more general ﬁeld k of characteristic 0. In this section we use Galois theory to address the question whether the roots of the equation in a splitting ﬁeld can be expressed in terms of k and the adjunction of ﬁnitely many n th roots to the ﬁeld, for various values of n. For the moment let us say in this case that the roots are “expressible in terms of the members of k and radicals.” We shall make this notion more precise shortly. Recall from Section IV.8 that with a ﬁnite group G, we can ﬁnd a strictly decreasing sequence of subgroups starting with G and ending with {1} such that each subgroup is normal in the next larger one and each quotient group is simple. Such a series was deﬁned to be a composition series for G. The Jordan– H¨older Theorem (Corollary 4.50) says that the respective consecutive quotients are isomorphic for any two composition series, apart from the order in which they appear. We deﬁne the ﬁnite group G to be solvable if each of the consecutive quotients is cyclic of prime order, rather than nonabelian. It is enough that the group have a normal series for which each of the consecutive quotients is abelian. Examples of solvable and nonsolvable groups are obtainable from the calculations in Section IV.8: abelian groups and groups of prime-power order are always solvable, the symmetric group S4 and each of its subgroups are solvable, and the 7 The Babylonians did not actually have equations but had an algorithmic method that amounted to completing the square. 8 Cardan’s name was Girolamo Cardano. The solution in the ﬁrst case of the cubic seems to have been discovered by Scipione dal Ferro and later by Nicolo Tartaglia. Dal Ferro died in 1526 and passed the secret method to his student Antonio Fior. In 1535 Fior engaged in a public contest with Tartaglia at solving cubics, and he lost. Cardano wheedled the solution method in the ﬁrst case from Tartaglia, published it in 1539, and discovered and published the solution in the second case. Cardano’s student Lodovico Ferrari discovered how to solve quartics, and Cardano published that solution as well. See “St. Andrews” in the Selected References for more information.

11. Application to Unsolvability of Polynomial Equations with Nonsolvable Group

489

symmetric group S5 is not solvable since a composition series is S5 ⊇ A5 ⊇ {1} and the group A5 is simple (Theorem 4.47). Modulo a precise deﬁnition for a ﬁeld k of the words “expressible in terms of the members of k and radicals,” the answer to our main question is as follows. Theorem 9.44 (Abel, Galois).9 Let k be a ﬁeld of characteristic 0, let F(X ) be in k[X ], and let K be a splitting ﬁeld of F(X ) over k. Then the roots of F(X ) are expressible in terms of the members of k and radicals if and only if the group Gal(K/k) is solvable. EXAMPLE. With k = Q, let F(X ) be the polynomial F(X ) = X 5 − 5X + 1 in Q[X ]. We shall show that (i) F(X ) is irreducible over Q, (ii) F(X ) has three roots in R and one pair of conjugate complex roots in C, (iii) the splitting ﬁeld K over Q of any polynomial of degree 5 for which (i) and (ii) hold has Galois group with Gal(K/Q) ∼ = S5 . We know that from Theorem 4.47 that S5 is not solvable, and Theorem 9.44 therefore allows us to conclude that the roots of X 5 − 5X + 1 are not expressible in terms of the members of Q and radicals. To prove (i), we apply Eisenstein’s criterion (Corollary 8.22) to the polynomial F(X − 1) = X 5 − 5X 4 + 10X 3 − 10X 2 + 5 and to the prime p = 5, and the irreducibility is immediate. To prove (ii), we observe that F(−2) < 0, F(0) > 0, F(1) < 0, F(2) > 0. Applying the Intermediate Value Theorem (Section A3 of the appendix), we see that there are at least three roots in R. Since F (X ) = 5(X 4 − 1) has exactly the two roots ±1 in R, F(X ) has at most three roots in R by an application of the Mean Value Theorem. To prove (iii), label the roots 1, 2, 3, 4, 5 with 1 and 2 denoting the nonreal