next up previous contents index
Next: Topic-specific PageRank Up: PageRank Previous: Definition:   Contents   Index


The PageRank computation

How do we compute PageRank values? Recall the definition of a left eigenvector from Equation 214; the left eigenvectors of the transition probability matrix $P$ are $N$-vectors $\vec{\pi}$ such that
\begin{displaymath}
\vec{\pi}\matrix{P} = \lambda \vec{\pi}.
\end{displaymath} (255)

The $N$ entries in the principal eigenvector $\vec{\pi}$ are the steady-state probabilities of the random walk with teleporting, and thus the PageRank values for the corresponding web pages. We may interpret Equation 255 as follows: if $\vec{\pi}$ is the probability distribution of the surfer across the web pages, he remains in the steady-state distribution $\vec{\pi}$. Given that $\vec{\pi}$ is the steady-state distribution, we have that $\pi P = 1\pi$, so 1 is an eigenvalue of P. Thus if we were to compute the principal left eigenvector of the matrix $P$ -- the one with eigenvalue 1 -- we would have computed the PageRank values.

There are many algorithms available for computing left eigenvectors; the references at the end of Chapter 18 and the present chapter are a guide to these. We give here a rather elementary method, sometimes known as power iteration. If $\vec{x}$ is the initial distribution over the states, then the distribution at time $t$ is $\vec{x}P^t$. As $t$ grows large, we would expect that the distribution $\vec{x}P^t$[*] is very similar to the distribution $\vec{x}P^{t+1}$, since for large $t$ we would expect the Markov chain to attain its steady state. By Theorem 21.2.1 this is independent of the initial distribution $\vec{x}$. The power iteration method simulates the surfer's walk: begin at a state and run the walk for a large number of steps $t$, keeping track of the visit frequencies for each of the states. After a large number of steps $t$, these frequencies ``settle down'' so that the variation in the computed frequencies is below some predetermined threshold. We declare these tabulated frequencies to be the PageRank values.

We consider the web graph in Exercise 21.2.3 with $\alpha=0.5$. The transition probability matrix of the surfer's walk with teleportation is then


\begin{displaymath}
P=\left(
\begin{array}{ccc}
1/6 & 2/3 & 1/6 \\
5/12 & 1/6 & 5/12 \\
1/6 & 2/3 & 1/6 \\
\end{array} \right).
\end{displaymath} (256)

Imagine that the surfer starts in state 1, corresponding to the initial probability distribution vector $\vec{x_0}=(1\ 0\ 0)$. Then, after one step the distribution is
\begin{displaymath}
\vec{x_0}P=\left(
\begin{array}{ccc}
1/6 & 2/3 & 1/6 \\
\end{array} \right)=\vec{x_1}.
\end{displaymath} (257)

After two steps it is
\begin{displaymath}
\vec{x_1}P=\left(\begin{array}{ccc}
1/6 & 2/3 & 1/6 \\
\...
...ay}{ccc}
1/3 & 1/3 & 1/3 \\
\end{array}\right) = \vec{x_2}.
\end{displaymath} (258)

Continuing in this fashion gives a sequence of probability vectors as shown in Figure 21.3 .

Figure 21.3: The sequence of probability vectors.
\begin{figure}\begin{tabular}{\vert c\vert\vert c\vert c\vert c\vert}
\hline
$...
...\cdots$\\
$\vec{x}$\ & 5/18 & 4/9 & 5/18\\
\hline
\end{tabular}
\end{figure}

Continuing for several steps, we see that the distribution converges to the steady state of $\vec{x}=(5/18\quad 4/9\quad 5/18)$. In this simple example, we may directly calculate this steady-state probability distribution by observing the symmetry of the Markov chain: states 1 and 3 are symmetric, as evident from the fact that the first and third rows of the transition probability matrix in Equation 256 are identical. Postulating, then, that they both have the same steady-state probability and denoting this probability by $p$, we know that the steady-state distribution is of the form $\vec{\pi}=(p\ \ 1-2p\ \ p)$. Now, using the identity $\vec{\pi}=\vec{\pi}P$, we solve a simple linear equation to obtain $p=5/18$ and consequently, $\vec{\pi}=(5/18\ 4/9\ 5/18)$.

The PageRank values of pages (and the implicit ordering amongst them) are independent of any query a user might pose; PageRank is thus a query-independent measure of the static quality of each web page (recall such static quality measures from Section 7.1.4 ). On the other hand, the relative ordering of pages should, intuitively, depend on the query being served. For this reason, search engines use static quality measures such as PageRank as just one of many factors in scoring a web page on a query. Indeed, the relative contribution of PageRank to the overall score may again be determined by machine-learned scoring as in Section 15.4.1 .

\begin{figure}
% latex2html id marker 32410
\vspace{5mm}
\begin{center}
\begi...
...he
word that occurs in the anchor text of the corresponding link.}
\end{figure}

Worked example. Consider the graph in Figure 21.4 . For a teleportation rate of 0.14 its (stochastic) transition probability matrix is:

\begin{displaymath}
\begin{tabular}{rrrrrrr}
0.02&0.02&0.88&0.02&0.02&0.02&0.02\...
...2&0.45&0.45\\
0.02&0.02&0.02&0.31&0.31&0.02&0.31
\end{tabular}\end{displaymath} (259)

The PageRank vector of this matrix is:
\begin{displaymath}
\vec{x} = (
0.05\quad
0.04\quad
0.11\quad
0.25\quad
0.21\quad
0.04\quad
0.31
)\end{displaymath} (260)

Observe that in Figure 21.4 , $q_2$, $q_3$, $q_4$ and $q_6$ are the nodes with at least two in-links. Of these, $q_2$ has the lowest PageRank since the random walk tends to drift out of the top part of the graph - the walker can only return there through teleportation. End worked example.


next up previous contents index
Next: Topic-specific PageRank Up: PageRank Previous: Definition:   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07