For the reasons given, a biword index is not the standard solution.
Rather, a positional index is most commonly
employed. Here, for each
term in the vocabulary, we store postings of the
form docID: position1, position2, ...
, as shown in
Figure 2.11 , where each position is a token index in the document.
Each posting will also usually record the term
frequency, for reasons discussed in Chapter 6 .
To process a phrase query, you still need to access the inverted index entries for each distinct term. As before, you would start with the least frequent term and then work to further restrict the list of possible candidates. In the merge operation, the same general technique is used as before, but rather than simply checking that both terms are in a document, you also need to check that their positions of appearance in the document are compatible with the phrase query being evaluated. This requires working out offsets between the words.
Worked example. Satisfying phrase queries.phrasequery Suppose the postings lists for to and be are as in Figure 2.11 , and the query is ``to be or not to be''. The postings lists to access are: to, be, or, not. We will examine intersecting the postings lists for to and be. We first look for documents that contain both terms. Then, we look for places in the lists where there is an occurrence of be with a token index one higher than a position of to, and then we look for another occurrence of each word with token index 4 higher than the first occurrence. In the above lists, the pattern of occurrences that is a possible match is:
to:End worked example....; 4:
...,429,433
; ...
![]()
be:...; 4:
...,430,434
; ...
![]()
The same general method is applied for within word proximity
searches, of the sort we saw in westlaw:
employment /3 placeHere, /