next up previous contents index
Next: Term frequency and weighting Up: Parametric and zone indexes Previous: Learning weights   Contents   Index


The optimal weight g

We begin by noting that for any training example $\Phi_j$ for which $s_T(d_j,q_j)=0$ and $s_B(d_j,q_j)=1$, the score computed by Equation 14 is $1-g$. In similar fashion, we may write down the score computed by Equation 14 for the three other possible combinations of $s_T(d_j,q_j)$ and $s_B(d_j,q_j)$; this is summarized in Figure 6.6 .

Figure 6.6: The four possible combinations of $s_T$ and $s_B$.
\begin{figure}\begin{tabular}{\vert c\vert c\vert\vert c\vert}
\hline
$s_T$\ &...
... & 1 & $1-g$\\
1 & 0 & $g$\\
1 & 1 & 1\\
\hline
\end{tabular}
\end{figure}

Let $n_{01r}$ (respectively, $n_{01n}$) denote the number of training examples for which $s_T(d_j,q_j)=0$ and $s_B(d_j,q_j)=1$ and the editorial judgment is Relevant (respectively, Non-relevant). Then the contribution to the total error in Equation 17 from training examples for which $s_T(d_j,q_j)=0$ and $s_B(d_j,q_j)=1$ is

\begin{displaymath}[1-(1-g)]^2n_{01r} + [0-(1-g)]^2n_{01n}.
\end{displaymath} (18)

By writing in similar fashion the error contributions from training examples of the other three combinations of values for $s_T(d_j,q_j)$ and $s_B(d_j,q_j)$ (and extending the notation in the obvious manner), the total error corresponding to Equation 17 is
\begin{displaymath}
(n_{01r}+n_{10n})g^2+(n_{10r}+n_{01n})(1-g)^2 + n_{00r} + n_{11n}.
\end{displaymath} (19)

By differentiating Equation 19 with respect to $g$ and setting the result to zero, it follows that the optimal value of $g$ is

\begin{displaymath}
\frac{n_{10r}+n_{01n}}{n_{10r}+n_{10n}+n_{01r}+n_{01n}}.
\end{displaymath} (20)

Exercises.


next up previous contents index
Next: Term frequency and weighting Up: Parametric and zone indexes Previous: Learning weights   Contents   Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07