Introduction
This paper[1] uses laplacian eigenmaps to calculate the coordinate of graph. However, in order to solve the running time problem in large-scale graph, the author provides an algorthm called ACE to have a fast calculation.
Background
Laplacian Eigenmaps
Given:
- Laplace Matrix L:
In graph G, the Laplace matrix of G: L=D-A.
With D is the degree matrix (a diagonal matrix indicating the number of edges connected to each graph), A is the adjacent matrix.
As described in paper[1],
$$
L_{ij} =
\begin{cases}
\sum_{k=1}^nw_{ik}& \text{i=j}\
-w_{ij}& \text{i$\neq$j}
\end{cases}
$$ - Mass Matrix M:
- Positive Semi-Definitive Graph G(V,E,M):
V: a set of nodes
E: a set of weighted edges => L is positive and semi-definite.
We can use only L and M to describe the whole graph: G(L,M) - A set of nodes x:
$$x = (x_1,…,x_k,…,x_n)^T$$
Basic Idea:
Hall energy:
$$
E(x) = x^TLx = \frac{1}{2}\sum_{i,j=1}^nw_{ij}(x_i - x_j)^2
$$
target:
connected nodes should be placed near to each other.
Minimizing problem:
$$
min_x\; x^TLx
$$
with constraint which constrains the overall scale:
$$
x^TMx = 1
$$
sometimes in need of:
$$
x^TM\cdot1_n = 0
$$
to avoid the case where all the nodes are at the same location.
Solution:
Find eigenvalues that: $Lu = \mu Mu$ and choose the smallest eigenvalues.
Obviously $\mu_1=0, u_1=1_n$ is the degenerate case.
In the 2D case: $u_2$ is x-coordinate and $u_3$ is y-coordinate.
Demonstration
- Demonstration of Hall energy
$$
\begin{aligned}
x^TLx&=x^T(D-A)x\
&=x^TDx-x^TAx\
&=\sum_{i=1}^n w_{ii}x_i^2-\sum_{i=1}^n\sum_{j\neq i}^nw_{ij}x_ix_j\
&=\frac{1}{2}\sum_{i,j}w_{ij}|x_i-x_j|^2
\end{aligned}
$$ - Solution of minimization problem
Using Lagrange Multiplier:
$$
f(x) = x^TLx + \lambda(x^TMx-1)
$$
in minimum case, derivation of f(x) is zero.
As to the matrix derivation:
$$
\frac{\partial y}{\partial x} = Y\
Y_{ij} = \frac{\partial y_i}{\partial x_j}
$$
so $\frac{\partial x}{\partial x} = 1_{n\times n}$
We can calculate that:
$$
\begin{aligned}
\frac{\partial f}{\partial x} &= Lx+L^Tx+\lambda Mx + \lambda M^Tx\
&= 0
\end{aligned}
$$
as L and M are symmetric matrix, so
$$
Lx = -\lambda Mx
$$
ACE Algorithm
The ACE algorithm is used to accelerate the calculation of eigenvectors in large graph, such as $10^5$ node.
The algorithm is divided into two different steps, coarsening step can give us a good prediction of eigenvectors through a coarsened small graph.
The refinement step starts with the predicted eigenvectors, and use an algorithm named PI to approximate the eigenvectors to the more accurate ones.
Coarsening
The basic idea of coarsening is to project the large graph with up to $10^5$ nodes into a much smaller one.
The link between the small graph, $G^c$ here and the initial large-scale graph $G$ is the interpolation matrix $A$, which interpolates m-dimensional vectors $y \in \mathbb{R}^m$ into n-dimensional ones. The interpolation problem here is translated as:
$$x = Ay$$
Interpolation matrix
Proposition
- A is in size of $n\times m$ with $m \lt n$
- All elements $A_{ij} \gt 0$
- Think A as weight, $\sum_{j=1}^mA_{ij} = 1$
- Full column rank: Rank(A) = m
Generation
- Edge Contaction Interpolation
Cluster edges into different subsets, and contract all the node in the same subset into a node. - Weighted Interpolation
//TBD
Eigenmap problem
$$min_yy^TA^TLAy\
y^TA^TMAy=1$$
with constraint
$$y^TA^TM\cdot1_n = 0$$
In order to solve this problem, we let
$$L^c = A^TLA\
M^c = A^TM$$
so the first and third problem has been changed into:
$$min_yy^TL^cy$$
and
$$y^TM^c \cdot 1_m = 0$$
Demonstration
The desmonstration here explains why we can change the problem into the laplacian eigenmap problem with $L^c$ and $M^c$.
Which means that we should prove $G^c(L^c,M^c)$ is still a semi-positive definitive graph. so
$L^c$ is semi-positive definitive
as for any x
$$
x^TLx\geq 0 \
x^TA^{T-1}L^cA^{-1}x\geq 0
$$
A diagonal and extrictly positive, so $x\neq 0 => Ax \neq 0$ is a bijection, so we can have $L^c$ is semi-positive definitive.
Moreover, since in general $M^c \neq A^TMA$, in fact, in the previous generation of the interpolation matrix, we can consider them as similar to each other.
The coarsening process can be applied further by further until we achieve a satisfying small graph.
refinement
After coarsening, we can have the eigenvectors of coarsened graph: $u_1^{‘},u_2^{‘},…,u_p^{‘}$, we can get the guessed initial eigenvectors $\hat{u_1},…,\hat{u_p}$.
Hence the refinement process refines the guessed eigenvectors to achieve an accurate eigenvectors of the initial graph.
Power Iteration Algorithm
Power iteration algorithm shows a simple approach to calculate the largest eigenvalue of a matrix.
Proposition
given a symmetric matrix A, and a non-zero normalized initial vector $v^{(0)}$, that $v^{(0)}$ is not orthogonal to the eigenvector of the largest eigenvalue. The iteration:
$$v^{(k+1)} = Av^{(k)}\v^{(k+1)} = \frac{v^{(k+1)}}{|v^{(k+1)}|}$$
converges to the largest eigenvector.
Demonstration
suppose we have a set of seperated eigenvalues
$\lambda_1,…,\lambda_n$, and the coresponding orthogonal eigenvectors $e_1,…,e_n$.
The initial vector is:
$$
v^{(0)} = c_1e_1+…+c_ne_n
$$
When multipled with matrix A k times, the vector is:
$$
\begin{aligned}
v^{(k)} &= c_1\lambda_1^ke_1 +…+c_n\lambda_n^ke_n \
&= \lambda_1^k(c_1e_1+c_2(\frac{\lambda_2^k}{\lambda_1^k})e_2+…+c_n(\frac{\lambda_n^k}{\lambda_1^k})e_n)
\end{aligned}
$$
as $\lambda_i<\lambda_1,i\gt 1$, so when k is very large, $(\frac{\lambda_i}{\lambda_1})^k\to0$
$$v^{(k)}\to\lambda_1^kc_1e_1$$
so we can calculate eigenvalue
$$\lambda_1 = \frac{v^{(k+1)}}{v^{(k)}}$$
If we normalized the vector each time:
$$
\begin{aligned}
b^{(k)} &= \frac{A^kb_0}{|A^kb_0|}\
&= (\frac{\lambda_1}{|\lambda_1|})^k\frac{c_1}{|c_1|}\frac{e_1+\frac{1}{c_1}\sum_{i=2}^nc_i(\frac{\lambda_i}{\lambda_1})^ke_i}{|e_1+\frac{1}{c_1}\sum_{i=2}^nc_i(\frac{\lambda_i}{\lambda_1})^ke_i|}\
&\xrightarrow{k\to\infty}(\frac{\lambda_1}{|\lambda_1|})^k\frac{c_1}{|c_1|}\frac{e_1}{|e_1|}
\end{aligned}
$$
Therefore, $b^{(k)}$ converges to the normalized eigenvector $e_1$.
Problem Reformulation
Let
$B=M^{-\frac{1}{2}}LM^{-\frac{1}{2}}$, $v=M^{\frac{1}{2}}u$
so the problem $Lu=\mu Mu$ is changed into:
$$
Bv=\mu v
$$
Since the power iteration algorithm can calculate the biggest eigenvalue, however, what we need is the smallest one. Therefore we reprocess $B$ into $\hat{B}$.
$$
\hat{B}=g\cdot I-B
$$
with
$$
g=max_i(B_{ii}+\sum_{j\neq i}|B_{ij}|)
$$
Demonstration
- $\hat{B}$ is semi-positive definitive: obvious
- $\hat{B}$ has the same eigenvector as $B$, but with inversed eigenvalue
$$
\begin{aligned}
\hat{B}v &= g\cdot Iv - Bv\
&= (g-\mu)v
\end{aligned}
$$
So the biggest eigenvalue of $\hat{B}$ is the smallest eigenvalue of $B$. The PI can then be applied.
Algorithm
Here is the algorithm of the refinement process:
% $\hat{u_1}, \hat{u_2}, …, \hat{u_n}$ is a set of the guessed eigenvectors derived from $u_1’,u_2’,….,u_n’$
% L, M, B and $\hat{B}$ matrix already known or calculated
% $\epsilon$ is tolerance in convergence
$v_1 = M^{\frac{1}{2}}\cdot 1_n$
$v_1 = \frac{v_1}{|v_1|}$
for i = 2 to n
$\hat{v_i}\leftarrow M^\frac{1}{2}u_i$
$\hat{v_i} = \frac{v_i}{|v_i|}$
repeat
$v_i\leftarrow \hat{v_i}$
for $j=1$ to $i-1$
% orthogonalize to clean the part of previous calculated eigenvalues
$v_i\leftarrow v_i-(v_i^Tv_j)v_j$
end for
$\hat{v_i}\leftarrow \hat{B}\cdot v_i$
$\hat{v_i} = \frac{\hat{v_i}}{|\hat{v_i}|}$
until $\hat{v_i}^T\cdot v_i\gt 1-\epsilon$
$v_i\leftarrow \hat{v_i}$
$u_i\leftarrow M^{-\frac{1}{2}}v_i$
end for
return $u_2,u_3,…,u_p$
Reference
[1] Koren Y, Carmel L, Harel D. ACE: A Fast Multiscale Eigenvectors Computation for Drawing Huge Graphs[J]. Proc IEEE Info Vis, 2002:137.
[2] K. M. Hall, ???An r-dimensional Quadratic Placement Algorithm???, Management Science 17 (1970), 219-229.
[3] En.wikipedia.org. (2018). Power iteration. [online] Available at: https://en.wikipedia.org/wiki/Power_iteration [Accessed 24 Jul. 2018].