Cholesky factorization and Conjugate Gradient

Introduction

Both two approaches are used to solve the linear equation:
$$
Ax=b
$$

Cholesky factorization

Preconditions:

A hermitian matrix: $A^*=A$
A positive definitive: for any $X$, $(X^*)AX\gt 0$

Basic idea

Decompose $A$ into $A=LL^{*}$ with $L$ an lower triangular matrix, so the solution is changed to:
$$
\begin{cases}
L^*x=Y\
LY=b
\end{cases}
$$

LU Decomposition

First of all, there is no constraints to A, and the LU decomposition solution is proposed:

L: lower triangular matrix whose diagonal is ones.
U: upper triangular matrix

$$
L=\left [\begin{matrix}
1 & 0 & \cdots & 0\
l_{21} & 1 & \cdots & 0\
\vdots & \vdots & \ddots &\vdots\
l_{n1} & l_{n2} & \cdots & 1
\end{matrix}\right ]\
$$
$$
U=\left [\begin{matrix}
u_{11} & u_{12} & \cdots & u_{1n}\
0 & u_{22} & \cdots & u_{2n}\
\vdots & \vdots & \ddots &\vdots\
0 & 0 & \cdots & u_{nn}
\end{matrix}\right ]
$$

$$
A=LU
$$
So
$$
a_{ij}=\sum_k^{\min{\left(i,j\right)}}l_{il}u_{kj}
$$
Suppose that $i\lt j$,
$$
\begin{aligned}
a_{ij}&=\sum_k^{i-1}l_{ik}u_{kj}+u_{ij}\
u_{ij}&=a_{ij}-\sum_k^{i-1}l_{ik}u_{kj}\
a_{ji}&=\sum_k^{i-1}l_{jk}u_{ki}+l_{ji}u_{ii}\
l_{ji}u_{ii}&=a_{ji}-\sum_k^{i-1}l_{jk}u_{ki}
\end{aligned}
$$
The functions above is called Doolittle Decomposition, and the value of $L$ and $U$ can be calculated.

And the linear equation: $Ax=b$, can be changed into:
$$
\begin{cases}
Ux=Y\
LY=b
\end{cases}
$$

Gaussian Elimination(pivot de gausse)

Here, Gaussian Elimination is an effective way to generate $L$ and $U$.

As we have a matrix $A$, we want to change it into an upper triangular matrix, firstly we eliminate the first column:

$a_{i1}\leftarrow a_{i1}-a_{11}\times \frac{a_{i1}}{a_{11}}=0$

This equals to:
$$
L=\left [\begin{matrix}
1 & 0 & \cdots & 0\
-\frac{a_{21}}{a_{11}} & 1\
\vdots & & \ddots\
-\frac{a_{n1}}{a_{11}} & & & 1
\end{matrix}\right ]
$$
$$
A^{(1)}=LA^{(0)}
$$
Hence we define:
$$
L_i=\left [\begin{matrix}
1 & 0 & \cdots & 0\
& 1\
&l_{i+1,i}\
&\vdots & \ddots\
&l_{n,i} & & 1
\end{matrix}\right ]
$$
With
$$
l_{i,j}=-\frac{a_{i,j}^{(j-1)}}{a_{j,j}^{(j-1)}}
$$
After $N-1$ steps,
$$
A^{(N-1)}=L_{N-1}…L_1A^{(0)}
$$
$A^{(N-1)}$ is an upper diagonal matrix, what’s more:
$$
L=L_{N-1}…L_1=\left [\begin{matrix}
1 & 0 & \cdots & 0\
l_{2,1}& 1\
&l_{i+1,i}\
\vdots&\vdots & \ddots\
l_{n,1}&l_{n,i} & \cdots & 1
\end{matrix}\right ]
$$

LDLT Decomposition

In this case, $A$ is an symetric matrix: $A^T=A$

We introduce a matrix $D$: $D=Diag(U)$, so
$$
U=D\tilde{U}
$$
$$
\tilde{U}=
\left [\begin{matrix}
1 & \frac{u_{12}}{u_{11}} & \cdots & \frac{u_{1n}}{u_{11}}\
0 & 1 & \cdots & \frac{u_{2n}}{u_{22}}\
\vdots & \vdots & \ddots &\vdots\
0 & 0 & \cdots & 1
\end{matrix}\right ]
$$
So $A=LD\tilde{U}$, as $A^T=A$
$$
\begin{aligned}
LD\tilde{U}&=(LD\tilde{U})^T\
LD\tilde{U}&=\tilde{U}DL^T
\end{aligned}
$$

So one of the solutions is: $\tilde{U}^T=L$

As shown in Doolittle Decomposition,
$$l_{ji}u_{ii}=a_{ji}-\sum_k^{i-1}l_{jk}u_{ki}$$
So if $u_{ii}\neq 0$, $L$ and $U$ have only one solution: $\tilde{U}^T=L$.
$$
A=LDL^T
$$

Proposition:
If the Order Principal Minor Determinant of A is zero => LU decomposition has only one solution

Demonstration

In the previous introduction of gaussian elimination, an essential condition to ensure the existence and 1-existence is that each $a_{ii}\neq 0$.

If one determinant of one minor matrix is zero => at least two columns are linearly dependent. After doing gaussian elimination, one of the two columns $i$ is totally zero, so $a_{ii}=0$

Cholesky

In Real Number Field

In real number field, the hermitian matrix is equal to symetric matrix.

Demonstration of D is positive:

$L$’s diagonal is ones and $L$ is a triangle matrix => $L$ is reversable

$A$ is positive definitive => $L^{-1}A{L^{-1}}^T\gt 0$ => $D>0$

Hence we can decompose D into: $\tilde{D}$ with $\tilde{d_{i,i}}=\sqrt{d_{ii}}$
$$
D=\tilde{D}\tilde{D}=\tilde{D}\tilde{D}^T
$$
Hence
$$
A=L\tilde{D}\tilde{D}^TL=(L\tilde{D})(L\tilde{D})^T
$$

Construction

//TODO

Conjugate Gradient

Precondition:

$A$ symetric positive definitive

Basic idea

Given a set of independent vectors $r_0, r_1, …, r_{n-1}$, so we want to construct another vectors $p_0, p_1, …, p_{n-1}$, which fits:
$$\langle p_i,p_j\rangle=0$$
So x can be represented as: $x=\sum_{i=0}^{n-1}\alpha_ip_i$

Process

Define inner product $\langle.,.\rangle$ as:
$$\langle x,y\rangle=x^TAy$$
Translate $\alpha_i$:
$$
\alpha_i=\frac{\langle x,p_i\rangle}{\langle p_i,p_i\rangle}
=\frac{x^TAp_i}{pi^TAp_i}
=\frac{b^Tp_i}{p_i^TAp_i}
$$
Define $r_i$:
$$r_i=b-Ax_i,\;x_i=\sum_{k\lt i}\alpha_kp_k$$
Find conjugate vectors $p_0,…,p_{n-1}$:
given $r_i$, let
$$p_i=r_i-\sum_{k\lt i}\beta_kp_k$$
and $\beta_k=\frac{\langle r_i,p_k\rangle}{\langle p_k,p_k\rangle}$

Simplification

Simplify $r_i$

$$
\begin{aligned}
r_{i+1}&=b-Ax_{i+1}\
&=b-A(x_i+\alpha_ip_i)\
&=r_i-A\alpha_ip_i
\end{aligned}
$$

Simplify $\beta_k$

$$
\beta_k=\frac{\langle r_i,p_k\rangle}{\langle p_k,p_k\rangle}
=\frac{r_i^TAp_k}{p_k^TAp_k}
\;\;\;\;\text{$k\lt i$}
$$
$$
p_k^Tr_i=p_k^T(b-Ax_i)=0\;\;\;\;\forall k\lt i
$$
$$
\begin{aligned}
p_k^TAr_i&=\frac{1}{\alpha_k}(r_k-r_{k+1})^Tr_i\
&=
\begin{cases}
0&\text{$k\lt i-1$}\
\frac{1}{-\alpha_{i-1}}r_i^Tr_i&\text{$k=i-1$}
\end{cases}
\end{aligned}
$$
We can use the same approach to calculate $p_k^TAp_k$

Simplify $\alpha_i$

$$
\alpha_i
=\frac{b^Tp_i}{p_i^TAp_i}=\frac{(r_i+Ax_i)^Tp_i}{p_i^TAp_i}=\frac{r_i^Tp_i}{p_i^TAp_i}=\frac{r_i^Tr_i}{p_i^TAp_i}
$$

Algorithm

Conjugate Gradient

Input: Given right matrix $b$, positive symetric matrix A and initial iterative value $x_0\in R^n$(usually considered as $0^n$)

Output: solution $x$

States
$r_0=b-Ax_0$
$p_0=r_0$

repeat
    $\alpha_k:=\frac{r_k^Tr_k}{p_k^TAP_k}$
    $x_{k+1}:=x_k+\alpha_kp_k$
    $r_{k+1}:=r_k-\alpha_kAp_k$
     if $r_{k+1}\lt \epsilon$  then
         return $x_{k+1}$
     end
    $\beta_k:=\frac{r_{k+1}^Tr_{k+1}}{r_k^Tr_k}$
    $p_{k+1}:=r_{k+1}+\beta_kp_k$
until convergence
return $x_{k+1}$

Demonstration (Schmidt orthogonalization)

Schmidt orthogonalization is a method to achieve a set of orthogonalized vectors from a set of linearly independent vectors.

Here, as mentioned before:
$$p_i=r_i-\sum_{k\lt i}\alpha_kp_k$$
we use Mathematical Induction to demonstrate it:

for $p_0,p_1$ $\langle p_0,p_1\rangle=0$
in case $p_j,\forall i\lt j$:
$$
\begin{aligned}
\langle p_i,p_j\rangle&=\langle p_i,r_j-\sum_{k\lt j}\alpha_kp_k\rangle\
&=\langle p_i,r_j\rangle-\langle p_i,\sum_{k\lt j}\alpha_kp_k\rangle\
&=\langle p_i,r_j\rangle-\langle p_i,\alpha_ip_i\rangle\
&=\langle p_i,r_j\rangle-\frac{\langle p_i,r_j\rangle}{\langle p_i,p_i\rangle}\langle p_i,p_i\rangle\
&=0
\end{aligned}
$$