\(\newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\P}{\mathcal P} \newcommand{\B}{\mathcal B} \newcommand{\F}{\mathbb F} \newcommand{\E}{\mathcal E} \newcommand{\brac}[1]{\left(#1\right)} \newcommand{\matrixx}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\vmatrixx}[1]{\begin{vmatrix}#1\end{vmatrix}} \newcommand{\limn}{\lim_{n\to\infty}} \newcommand{\nul}{\mathop{\mathrm{Nul}}} \newcommand{\col}{\mathop{\mathrm{Col}}} \newcommand{\rank}{\mathop{\mathrm{Rank}}} \newcommand{\dis}{\displaystyle} \newcommand{\spann}{\mathop{\mathrm{span}}} \newcommand{\range}{\mathop{\mathrm{range}}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerr}[1]{\left\langle #1 \right\rangle} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\qed}{\quad \blacksquare} \newcommand{\tr}{\mathop{\mathrm{tr}}} \) Math2121 Tutorial (Spring 12-13): Jordan Canonical Form

Saturday, March 30, 2013

Jordan Canonical Form

I am recently interested in the proof of this classical result (since I am unable to solve some problem which turns out to be a simple application of this theorem). After I understand everything, I tried to jot down the shortest path to arrive to Jordan Canonical Theorem, which results in this post. 

More detail can be found in the PDF Format.

Reference: Algebra by Michael Artin; Linear Algebra Done Right; 簡明線性代數 (prof. Meng); Linear Algebra by Stephen H. Frieberg and others; Some PDF I searched.


Upper-triangularization 
To understand linear algebra throughoutly, one must also go through complex vector spaces. The reason is that if restricted to real vector spaces, it is not easy to simplify a real matrix by means of change of basis. The easiest example is the rotation in $\R^2$ through an angle $\pi/2$ counter-clockwise, $R$. One cannot find a real eigenvalue of $R$ because no vector can be parallel to itself after a rotation of $\pi/2$, namely, for any $v\neq 0$ and any $\lambda\in \R$, $Rv\neq \lambda v$.

However, under complex field, the theory we have learnt for real vector spaces ARE ALL TRUE for complex vector spaces. The concepts defined under real vector spaces have an direct analogue for complex vector spaces by replacing the symbol $\R$ by $\C$ (say the scalar in linear dependence, eigenvalues, etc).

Suppose we are concerned with complex matrices, then every complex matrix has at least one eigenvalue. Not only that, if we consider the matrix $A$ as a map $\C^n\to \C^n$, then we have the following:
Theorem 1.  Every square complex matrix can be made upper-trianglar under a change of basis.
More precisely, we can find a basis $\{u_1,u_2,\dots,u_n\}$ of $\C^n$ such that, with $P=\matrixx{u_1&\cdots &u_n}$, $P^{-1}AP$ is upper-triangular. i.e., $A$ is similar to an upper-triangular matrix.

Proof. We prove by induction on the size of matrices. The proposition is obviously true for $1\times 1$ matrix. Suppose every $(n-1)\times (n-1)$ matrix is similar to an upper-triangular matrix. Let $A$ be $n\times n$, then there is $u\in \C^n\setminus\{0\}$ and $\lambda\in \C$ such that \[

Au=\lambda u.

\] Extend $\{u\}$ to a basis of $\C^n$: $\{u,v_1,\dots,v_{n-1}\}$ and let $P= \matrixx{u&v_1&\cdots&v_{n-1}}$, then \[

P^{-1}AP =\matrixx{
\lambda&*\\
0&A'
},

\] where $A'$ is $(n-1)\times (n-1)$, hence by induction hypothesis, there is an invertible $Q\in \C^{(n-1)\times (n-1)}$ such that $Q^{-1}A'Q$ is upper-triangular. Let $D=\matrixx{1&0\\0&Q }$, then $D^{-1} = \matrixx{1&0\\0&Q^{-1} }$, hence \[

\underbrace{D^{-1}P^{-1}}_{(PD)^{-1}}APD = \matrixx{\lambda &*\\0&Q^{-1} A'Q},

\] since $PD$ is invertible, we are done. 
Q.E.D.


Theorem 1 can be easily translated to every linear $T:V\to V$, with $V$ a finite dimensional complex vector space. Namely:
Theorem 1'. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear, then there is a basis $\{v_1,v_2,\dots,v_n\}$ of $V$ such that \[Tv_k\in \spann\{v_1,v_2,\dots,v_k\}\] for each $k=1,2,\dots,n$.
Now we are going to show that under a change of basis, a matrix can be made not only upper-triangular, but also in the following much simpler form:
Jordan Canonical Form Theorem.  Let $V$ be a finite dimensional complex vector space and $T:V\to V$ a linear map, then there is a basis $\B$ of $V$ such that \[[T]_\B=\matrixx{J_1&0&\cdots &0\\
0&J_2&\cdots &0\\
\vdots&\vdots &\ddots&\vdots\\0&0&\cdots &J_k},\] where $J_i$ is a matrix of the form \[

\matrixx{
\lambda&1&0&\cdots&0&0\\
0&\lambda&1&\cdots&0&0\\
\vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\
0&0&0&\cdots&\lambda&1\\
0&0&0&\cdots&0&\lambda
}.

\]
The basis in Jordan canonical form theorem is called a Jordan basis. To show the existence of Jordan basis we will need a few technical preliminary results and also the concept of generalized eigenspace which we are going to study.

Cayley-Hamilton Theorem
Definition. Let $V$ be a finite dimensional complex vector space and $T$ linear. We denote $p_T$ the characteristic polynomial of $T$, namely, \[p_T=\det (zI-T).\]
Here $\det T$ is define to be $\det [T]_\B$, for any basis $\B$ of $V$. It can be shown that $\det T$ is independent of the choices of bases of $V$. Since $\lambda$ is an eigenvalue of $T$ iff $p_T(\lambda)=0$, thus $\det T=(-1)^np_T(0)=$ product of eigenvalues (repeated according to algebraic multiplicity). The upper-triangularization enables us to give a simple proof to the following:
Theorem 2 (Cayley-Hamilton). Suppose that $V$ is a finite dimensional complex vector space and $T:V\to V$ is linear, then $p_T(T)=0$.
Given a polynomial $q(z)=a_0+a_1z+\cdots+a_nz^{n}$, the symbol $q(T)$ means \[

q(T)=a_0I+a_1T+\cdots+a_nT^{n},

\] whereas in theorem 2 the statement $p_T(T)=0$ means $p_T(T)$ is a zero linear map from $V$ to $V$.

Proof. By theorem 1, there is a basis $\{v_1,v_2,\dots,v_n\}$ of $V$ such that \[Tv_k\in \spann \{v_1,v_2,\dots,v_k\}\] for $k=1,2,\dots,n$. Moreover, \[

[T]_\B=\matrixx{
\lambda_1&&&*\\
0&\lambda_2&&\\
0&0&\ddots&\\
0&0&0&\lambda_n
}.

\] Hence $p_T(z)=\det (zI-T)=(z-\lambda_1)(z-\lambda_2)\cdots(z-\lambda_n)$. Now to show $p_T(T)=0$, we try to show for each $k=1,2,\dots,n$, $(T-\lambda_1 I)\cdots (T-\lambda_kI)v_k=0$. By the matrix representation, we know that $Tv_1=\lambda_1v_1$, so the case that $k=1$ is done. Suppose that \begin{align*}

 0&=(T-\lambda_1 I)\\
&=(T-\lambda_1 I) (T-\lambda_2I)v_2\\
&\,\,\vdots\\
&=(T-\lambda_1 I)\cdots (T-\lambda_{k-1}I)v_{k-1}.

\end{align*} Since $(T-\lambda_k I)v_k=Tv_k-\lambda_k v_k\in \spann \{v_1,v_2,\dots,v_{k-1}\}$, hence by induction we are done. 
Q.E.D.

Generalized Eigenspaces
Definition. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear. Let $\lambda$ be an eigenvalue of $T$, we define \[\E_\lambda=\{v\in V: (T-\lambda I)^kv=0,\exists k \ge 1\},\] called a generalized eigenspace with eigenvalue $\lambda$. Every $v\in \E_\lambda\setminus \{0\}$ is called a generalized eigenvector with eigenvalue $\lambda$. 
Of couse if we denote $E_\lambda$ the eigenspace of $T$, then \[

 \{0\}\subsetneq E_\lambda \subseteq \E_\lambda.

\] Generally $E_\lambda\neq \E_\lambda$, for example, consider $A=\matrixx{1&1\\0&1}$, the characteristic polynomial of $A$ is $\det(A-zI) = (1-z)^2$, so the only eigenvalue is $1$. But $A-I\to \matrixx{0&1\\0&0}$, hence $\dim E_\lambda =\dim \nul (A-I)=1$. However, $(A-I)^2=\matrixx{0&0\\0&0}$, so $\dim \nul (A-I)^2=2$.

Remark. Note that $v\in \E_\lambda\setminus \{0\}$ $\iff$ $v\neq 0$ and there is $n\ge1$, $(T-\lambda I)^nv=0$ $\iff$ there is $k\ge 0$ such that $(T-\lambda I)^kv$ is an eigenvector of $\lambda$.
Theorem 3. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear. Let $\lambda,\mu$ be eigenvalues of $T$.
(i) $\E_\lambda=\ker (T-\lambda I)^{\dim V}$.
(ii) $\E_\lambda\cap \E_\mu =\{0\}$.
(iii) Let $\mu\neq \lambda$, then $(T-\mu I)|_{\E_\lambda}:\E_\lambda\to \E_\lambda$ is invertible.
Proof. (i) Consider the following: \[

\{0\}\subsetneq \ker (T-\lambda I)\subseteq \ker (T-\lambda I)^2\subseteq \cdots\subseteq \ker (T-\lambda I)^{\dim V}\subseteq \E_\lambda.

\] If there is $k<\dim V$ such that $ \ker (T-\lambda I)^k= \ker (T-\lambda I)^{k+1}$, then \[

 \ker(T-\lambda I)^k= \ker (T-\lambda I)^{k+1} = \ker (T-\lambda I)^{k+2}=\cdots .

\] (check!) Thus we are done since $\E_\lambda =\bigcup_{k\ge 1}\ker(T-\lambda I)^k$. If there no such $k$, then \[\dim \ker(T-\lambda I)^{\dim V} = \dim V,\] so we are also done.

(ii) Let $v\in \E_\lambda\setminus \{0\}$. For the sake of contradiction, suppose $v\in \E_\mu$. Since $v\in \E_\lambda$, there is $k\ge 0$ such that $v_k:=(T-\lambda I)^kv$ is a $\lambda$-eigenvector. On the other hand, there is also $n\ge 0$ such that $(T-\mu I)^n v_k$ is a $\mu$-eigenvector. Then \[

T(T-\mu I)^n v_k = (T-\mu I)^n v_kTv_k = \lambda (T-\mu I)^n v_k,

\] thus $(T-\mu I)^n v_k$ is $\lambda$-eigenvector and $\mu$-eigenvector at the same time, a contradiction.

(iii) Let $v\in \E_\lambda$, then of course $(T-\mu I)v\in \E_\lambda$. Hence $\E_\lambda$ is $(T-\mu I)$-invariant. To prove invertibility, let $v\in \E_{\lambda}$ and $(T-\mu I)v=0$, then $v\in \E_{\mu}$, hence $v\in \E_\lambda\cap \E_\mu=\{0\}$, so $v=0$. Thus $(T-\mu I)|_{\E_\lambda}$ is injective, hence invertible. 
Q.E.D.

Theorem 4. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear. The sum of all generalized eigenspaces is a direct sum.
Proof. Let $\lambda_1,\dots,\lambda_k$ be the eigenvalues of $T$, let $v_i\in \E_{\lambda_i}$, $i=1,2,\dots,k$ be such that \[

v_1+v_2+\cdots+v_k=0,

\] we need to show $v_1=v_2=\cdots =v_k=0$. To do this, choose $d_1$ large such that  $(T-\lambda_1I)^{d_1}v_1=0$, choose $d_2$ large such that $(T-\lambda_2I)^{d_2}v_2=0$, ..., choose $d_{k-1}$ large such that $(T-\lambda_{k-1}I)^{d_{k-1}}v_{k-1}=0$, then \[

(T-\lambda_1I)^{d_1}(T-\lambda_2I)^{d_2}\cdots (T-\lambda_{k-1}I)^{d_{k-1}}v_k=0.

\] By (ii) of theorem 4, $(T-\lambda_i)|_{\E_{\lambda_k}}$ are injective, $i=1,2,\dots,k-1$, hence $v_k=0$. We repeat the process to show $v_{k-1}=v_{k-2}=\cdots=v_1=0$. 
Q.E.D.

Theorem 5. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear, then $V$ is a direct sum of all generalized eigenspaces of $T$.
Proof. Let $\lambda_1,\dots,\lambda_k$ be the eigenvalues of $T$, we just need to show $V\subseteq \E_{\lambda_1}+\cdots +\E_{\lambda_k}$.

Let $x\in V$, then $p_T(z)=(z-\lambda_1)^{d_1}\cdots (z-\lambda_k)^{d_k}$, by Cayley-Hamilton theorem $p_T(T)=0$, i.e.,\[

(T-\lambda_1I)^{d_1}(T-\lambda_2I)^{d_2}\cdots (T-\lambda_kI)^{d_k}x=0.

\] This shows that $(T-\lambda_2I)^{d_2}\cdots (T-\lambda_kI)^{d_k}x\in \E_{\lambda_1}$. By (ii) of theorem 3, there is $w_1\in E_{\lambda_1}$, \[

(T-\lambda_2I)^{d_2}\cdots (T-\lambda_kI)^{d_k}(x-w_1)=0.

\] The process can be repeated such that $x=w_1+w_2+\cdots +w_k$, where $w_i\in \E_{\lambda_i}$. 
Q.E.D.

Proof of Jordan Canonical Form Theorem
Theorem 6. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear. Suppose that $T^m=0$ for some $m\ge 1$ (i.e., $T$ is nilpotent), then there is a basis of $V$ of the form \[

u_1,Tu_1,\dots T^{a_1-1}u_1,\dots ,u_k,Tu_k,\dots ,T^{a_k-1}u_k,

\] where $T^{a_i}u_i=0$ for $1\leq i \leq k$.

Proof. We prove by induction on the dimension. Suppose $\dim V=1$, then we are done. Suppose $\dim V=k$ and every vector spaces of dimension less than $k$ has a basis of the form described in the theorem. Note that $T(V)\subsetneq V$, otherwise \[V=T(V)=T^2(V)=\cdots =T^m(V)=\{0\},\] a contradiction. As $T(V)$ is $T$-invariant, by induction hypothesis $T(V)$ has a basis \[

v_1,Tv_1,\dots,T^{b_1-1}v_1,\dots,v_k,Tv_k,\dots,T^{b_k-1}v_k,

\] where $T^{b_i}v_i=0$. Since $v_i\in T(V)$, we choose $u_i\in V$ such that $Tu_i=v_i$. Now $\{T^{b_i-1}v_i:i=1,2,\dots,k\}\subseteq \ker T$ is linearly independent, we extend this to a basis of $\ker T$: $\{T^{b_1-1}v_1,\dots T^{b_k-1}v_k,w_1,w_2,\dots,w_l\}$. We claim that \[

u_1,Tu_1,\dots,T^{b_1}u_1,\dots, u_k,Tu_k,\dots,T^{b_k}u_k,w_1,w_2,\dots,w_l

\] form a basis of $V$. The linear independence is a left as a routine checking. To prove it does span $V$, we try to compare the dimension of $V$ and the number of the vectors. We note that $\dim \ker T=k+l$. Also, $\dim T(V)=b_1+b_2+\cdots +b_k$. Hence\begin{align*}

\dim V&=\dim \ker T+\dim T(V)\\
&=(k+l)+(b_1+\cdots +b_k)\\
&=(b_1+1)+\cdots +(b_k+1) +l

\end{align*} Q.E.D.

Proof of Jordan Canonical Form Theorem. Let $V$ be a finite dimensional complex vector space and $T:V\to V$ linear, theorem 5 asserts that \[V=\E_{\lambda_1} \oplus \E_{\lambda_2}\oplus\cdots\oplus \E_{\lambda_k},\] where $\lambda_i$ are eigenvalues of $T$. Hence $N_i:=(T-\lambda_i)|_{\E_{\lambda_i}}$ is nilpotent, and the result follows by applying theorem 6 to $N_1,N_2,\dots,N_k$ and noting that $T=(N_1+\lambda_1I)\oplus\cdots\oplus (N_k+\lambda_kI) $.
Q.E.D.

Application.
  1. Jordan decomposition can be used to prove the first theorem in this page. Which is usually mentioned in numerical analysis course (In fact I am able to provide a much simpler proof without Jordan form).
  2. Interestingly, if we understand the "uniqueness" of Jordan form well (which I explain in detail in the PDF format of this post), it is easy to show for every square matrix $A$, $A$ and $A^T$ are similar! 
  3. See the last 2 page in this pdf file.

No comments:

Post a Comment