[Xapian-discuss] Help with weights

Olly Betts olly at survex.com
Wed Jul 2 02:14:47 BST 2008


On Tue, Jul 01, 2008 at 04:15:53PM -0700, Robert Kaye wrote:
> Let me ask a specific question -- in my release index (an index of CD  
> titles, essentially) I have a field called type. When the value of  
> this field is "album" I give it a termcount of 100. All other values  
> for this field and all other fields get a termcount of 1.
> 
> For the enquire, I use a stock object. I do not define a weighting  
> system, do not tinker with doc order or sort order. When I search for  
> the term "love" in the release title (very common term), the top hits  
> are the ones that contain the word "love" twice. Good.
> 
> But, for all the hits that have the word "love" in them once, I would  
> expect to see the releases of type "album" to be near the top. But  
> they are not:

Are you adding this type term to queries?  If not, the effect of
indexing the type term with those termcounts will be to increase the
document length of albums.  That will tend to decrease the importance of
each occurrence of "love" in the album title, so albums will indeed tend
to rank lower.

Perhaps a better approach would be to keep the type term with wdf 1
regardless of the type, and then take your query and adjust it like so:

Xapian::Query album_boost("XTYPEalbum");
album_boost = Xapian::Query(Xapian::Query::OP_SCALE_WEIGHT, album_boost, 4.2);
query = Xapian::Query(Xapian::Query::OP_AND_MAYBE, query, album_boost);

You can adjust the 4.2 factor to alter how much albums are boosted, and
you can also search "fairly", or boost individual tracks instead if you
prefer - and none of this requires a reindex.

Cheers,
    Olly



More information about the Xapian-discuss mailing list