scoring module

This module contains classes for scoring (and sorting) search results.

Base classes

class whoosh.scoring.WeightingModel

Abstract base class for scoring models. A WeightingModel object provides a method, scorer, which returns an instance of whoosh.scoring.Scorer.

Basically, WeightingModel objects store the configuration information for the model (for example, the values of B and K1 in the BM25F model), and then creates a scorer instance based on additional run-time information (the searcher, the fieldname, and term text) to do the actual scoring.

final(searcher, docnum, score)

Returns a final score for each document. You can use this method in subclasses to apply document-level adjustments to the score, for example using the value of stored field to influence the score (although that would be slow).

WeightingModel sub-classes that use final() should have the attribute use_final set to True.

Parameters:
  • searcherwhoosh.searching.Searcher for the index.
  • docnum – the doc number of the document being scored.
  • score – the document’s accumulated term score.
Return type:

float

idf(searcher, fieldname, text)

Returns the inverse document frequency of the given term.

scorer(searcher, fieldname, text, qf=1)

Returns an instance of whoosh.scoring.Scorer configured for the given searcher, fieldname, and term text.

class whoosh.scoring.BaseScorer

Base class for “scorer” implementations. A scorer provides a method for scoring a document, and sometimes methods for rating the “quality” of a document and a matcher’s current “block”, to implement quality-based optimizations.

Scorer objects are created by WeightingModel objects. Basically, WeightingModel objects store the configuration information for the model (for example, the values of B and K1 in the BM25F model), and then creates a scorer instance.

block_quality(matcher)

Returns the maximum limit on the possible score the matcher can give in its current “block” (whatever concept of “block” the backend might use). This can be an estimate and not necessarily the actual maximum score possible, but it must never be less than the actual maximum score.

If this score is less than the minimum score required to make the “top N” results, then we can tell the matcher to skip ahead to another block with better “quality”.

max_quality()

Returns the maximum limit on the possible score the matcher can give. This can be an estimate and not necessarily the actual maximum score possible, but it must never be less than the actual maximum score.

score(matcher)

Returns a score for the current document of the matcher.

supports_block_quality()

Returns True if this class supports quality optimizations.

class whoosh.scoring.WeightScorer(maxweight)

A scorer that simply returns the weight as the score. This is useful for more complex weighting models to return when they are asked for a scorer for fields that aren’t scorable (don’t store field lengths).

class whoosh.scoring.WeightLengthScorer

Base class for scorers where the only per-document variables are term weight and field length.

Subclasses should override the _score(weight, length) method to return the score for a document with the given weight and length, and call the setup() method at the end of the initializer to set up common attributes.

Scoring algorithm classes

class whoosh.scoring.BM25F(B=0.75, K1=1.2, **kwargs)

Implements the BM25F scoring algorithm.

>>> from whoosh import scoring
>>> # Set a custom B value for the "content" field
>>> w = scoring.BM25F(B=0.75, content_B=1.0, K1=1.5)
Parameters:
  • B – free parameter, see the BM25 literature. Keyword arguments of the form fieldname_B (for example, body_B) set field- specific values for B.
  • K1 – free parameter, see the BM25 literature.
class whoosh.scoring.TF_IDF
class whoosh.scoring.Frequency

Scoring utility classes

class whoosh.scoring.FunctionWeighting(fn)

Uses a supplied function to do the scoring. For simple scoring functions and experiments this may be simpler to use than writing a full weighting model class and scorer class.

The function should accept the arguments searcher, fieldname, text, matcher.

For example, the following function will score documents based on the earliest position of the query term in the document:

def pos_score_fn(searcher, fieldname, text, matcher):
    poses = matcher.value_as("positions")
    return 1.0 / (poses[0] + 1)

pos_weighting = scoring.FunctionWeighting(pos_score_fn)
with myindex.searcher(weighting=pos_weighting) as s:
    results = s.search(q)

Note that the searcher passed to the function may be a per-segment searcher for performance reasons. If you want to get global statistics inside the function, you should use searcher.get_parent() to get the top-level searcher. (However, if you are using global statistics, you should probably write a real model/scorer combo so you can cache them on the object.)

class whoosh.scoring.MultiWeighting(default, **weightings)

Chooses from multiple scoring algorithms based on the field.

The only non-keyword argument specifies the default Weighting instance to use. Keyword arguments specify Weighting instances for specific fields.

For example, to use BM25 for most fields, but Frequency for the id field and TF_IDF for the keys field:

mw = MultiWeighting(BM25(), id=Frequency(), keys=TF_IDF())
Parameters:default – the Weighting instance to use for fields not specified in the keyword arguments.
class whoosh.scoring.ReverseWeighting(weighting)

Wraps a weighting object and subtracts the wrapped model’s scores from 0, essentially reversing the weighting model.