next up previous contents index
Next: Maximum tf normalization Up: Variant tf-idf functions Previous: Variant tf-idf functions   Contents   Index


Sublinear tf scaling

It seems unlikely that twenty occurrences of a term in a document truly carry twenty times the significance of a single occurrence. Accordingly, there has been considerable research into variants of term frequency that go beyond counting the number of occurrences of a term. A common modification is to use instead the logarithm of the term frequency, which assigns a weight given by


\begin{displaymath}
\mbox{wf}_{t,d}=\left\{
\begin{array}{ll}
1+\log \mbox{tf}...
...x{tf}_{t,d}>0 \\
0 & \mbox{otherwise}
\end{array}.
\right.
\end{displaymath} (28)

In this form, we may replace $\mbox{tf}$ by some other function $\mbox{wf}$ as in (28), to obtain:

\begin{displaymath}
\mbox{wf-idf}_{t,d} = \mbox{wf}_{t,d} \times \mbox{idf}_t.
\end{displaymath} (29)

Equation (23) can then be modified by replacing tf-idf by wf-idf as defined in (29).



© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07